Ulf Wendel

PoC: HTTP, JSON, JavaScript, Map&Reduce built-in to MySQL

What if MySQL had an HTTP interface and would reply JSON? What if MySQL had server-side JavaScript? What if you could store JSON documents in MySQL? And, what if there was Map and Reduce built-in to MySQL? Make it happen, today. A proof of concept MySQL 5.6 daemon plugin presented at IPC Spring 2012.

Eye-opening community contributions

Mycached (2009, Cybozu) was a ground breaking achievement. A MySQL server deamon plugin was used to add a new client interface (Memcache protocol) to MySQL. Together with the new protocol came new access methods, an additional data model (key – value) and a new output format (JSON). The work was later superseeded by HandlerSocket (Dena/Yoshinori Matsunobu, 2010) which in turn was abandoned in favour of InnoDB Memcached respectively MySQL Cluster/NDB Memcached by Oracle. Stewart Smith presented a HTTP JSON plugin for Drizzle in 2011. The value of MySQL as storage – not only for SQL – was exposed..

Motivation

The steadily growing popularity of JavaScript deserves a proof of concept. Client-side JavaScript has to use proxying to access MySQL, for example, through PHP/Apache or node.js. Client-side JavaScript is limited to HTTP/Websocket but MySQL speaks neither one. A direct wire between MySQL and client-side JavaScript would help.

Client-side JavaScript   Client-side JavaScript
|   |
Apache   |
PHP   |
|   |
MySQL   MySQL

Proxing means deploying an extra set of software (e.g. PHP, Apache). Extra layers add delays. Extra layers require extra resource – less computing power is left for MySQL.

HTTP for MySQL

MySQL Server daemon plugins are libraries loaded into the MySQL process. A deamon plugin can do anything it likes, for example, start a multi-threaded HTTP webserver . Embedding a webserver can easily be done using the proven BSD libraries libevent and libevht. The embedded webserver shall handle GET /?sql=<mysql>, execute <mysql> and return the result as JSON. JSON is a lightweight data format and it is the natural one for JavaScript.

JSON for MySQL

Running a SQL statement from within a deamon plugin requires setting up and tearing down a THD (thread descriptor object). The event scheduler source has pointers how this is to be done. Running the statement is no problem, but where is the result? The result is gone by the wind. MySQL has streamed it into the Protocol class member of THD for sending out during result generation. Create your own Protocol_json class, wrap all calls for sending data to a function that appends the data to a string, set the protocol class of THD to Protocol_json and fetch the JSON buffered into the string after query execution. Return using the embedded webserver. Done.

Client-side JavaScript   Client-side JavaScript
|   |
Apache   |
PHP   |
|   |
MySQL   MySQL
400 Requests/s, Load 34   1606 Requests/s, Load 2,5
All benchmarks done in a VM on a Celeron Duo 1.2 Ghz subnotebook. 32 clients. Peak values at 16 clients are a tad better.

Server side JavaScript

Another BSD software is used for adding server-side JavaScript: Google V8. Using the function developed to run SQL, we can store the source code of server-side JavaScript scripts in a table, fetch it upon request, cache the source code and execute it. He’s the result for Hello World.

mysql> select * from js_applications where name='internetsuperhero'\G
*************************** 1. row ***************************
  name: internetsuperhero
source: function main() { return "Hello world"; } main(); 

Client-side JavaScript   Client-side JavaScript
|   |
Apache   |
PHP   |
|   |
MySQL   MySQL
1107 Requests/s, Load 27   2360 Requests/s, Load 5

Server side JavaScript gets access to MySQL data

It get more interesting as soon as we link the function for running SQL and returning results as JSON strings to the users’ script. We export the function to JavaScript. Server-side JavaScript has now access to all data inside MySQL. Below are the results for SELECT 1. Please recall, this is a proof of concept. No less. No more.

mysql> select * from js_applications where name='select'\G
*************************** 1. row ***************************
  name: select
source: function main() { return ulf("SELECT 1"); } main();
1 row in set (0,00 sec)

Client-side JavaScript   Client-side JavaScript
|   |
Apache   |
PHP   |
|   |
MySQL   MySQL
448 Requests/s, Load high..   1312 Requests/s, Load low…

Server side JavaScript manipulates JSON documents

What if a we store JSON documents in BLOBs? Server-side JavaScript can fetch the contents of the BLOBs. It gets the data as JSON. JSON can be converted into JavaScript objects. The object can then be manipulated, converted back to JSON and send out via HTTP to the user. Of course, it would also be possible to store the results in a table, a "view"

function filter_names() { 
  var s = ulf("SELECT document FROM test_documents"); 
  var docs = JSON.parse(s);
  var res = []; 
  for (i = 0; i < docs.length; i++) { 
    var doc = JSON.parse(docs[i]); 
    if (doc.firstname !== undefined) { 
      res[i] = "Hi " + doc.firstname; 
    } 
  } 
  return JSON.stringify(res); 
}

Client-side JavaScript   Client-side JavaScript
|   |
Apache   |
PHP   |
|   |
MySQL   MySQL
358 Requests/s, Load high, 34   641 Requests/s, Load 9

That’s cool. We have now MySQL speaking HTTP, replying JSON, server-side JavaScript has access to MySQL data and we can keep "documents" inside MySQL, MySQL can do map and reduce. All goals of the proof of concept have been reached. Whoever can do map, can do reduce as well. Reduce is just another iteration in the result generation loop.

Map and Reduce using SQL vs. handler interface

The handler interface is the MySQL internal low level interface to fetch data. One key motivator behind the Memcached protocol additions to MySQL was to bypass the SQL layer for better performance. Maybe we can use the handler interface to make the map function look more like a map function and improve performace? Most often the map function is called for every row.

function map(){ 
   var res; 
   var row = JSON.parse(doc.before); 
   if (row.firstname !== undefined) 
     res = "Hi " + row.firstname; 
   doc.after = JSON.stringify(res); 
} 
map();

Client-side JavaScript   Client-side JavaScript
Using handler interface   Using SQL
|   |
MySQL   MySQL
571 Requests/s, Load high, 9   641 Requests/s, Load 9

Performance decreases. Once cause could be the increased number of C++ to JavaScript context switches.

Proof given!

The original goals of the proof of concept have been achieved. It is possible to expose the value of MySQL as a storage, to follow out motto not only SQL and to give client-side JavaScript a direct wire to MySQL.

Out of curiosity I benchmarked how many rows (documents) can be read per second. A single thread managed to process 12,700 documents per second. Integrating V8 into the read loop but neither compiling nor running a script let the performance drop to 11,500 documents per second. Compiling and running an empty script resulted in 8,700 documents being scanned per second. And, its been 8,300 documents per second that could be mapped.

Will you make it happen, will you create an HTTP interface for MySQL, today?

Happy hacking!

@Ulf_Wendel Follow me on Twitter

2 Comments

  1. The reason for memcpy() on your slide 34 are alignement sensitive CPU architectures.

  2. 10/10 points for Wlad. What other rating would be possible. Glad to have you still around in the MySQL eco system!