Ulf Wendel

MySQL 5.7: SQL functions for JSON

| 9 Comments

For decades MySQL has considered BLOB a sink for data junk that didn’t fit the strictly typed SQL bill. As the outside world began voting for JavaScript, JSON and key-value/key-document stores, TEXT/BLOB was rediscovered as the only way to store JSON inside MySQL. But having no SQL support for the JSON serialization format, JSON remained junk for MySQL. Community developed SQL functions could not cure the issue. That said, MySQL 5.7 introduces SQL functions to work on JSON documents! MySQL slowly takes a different position on JSON.

From the labyrinth: labs.mysql.com

In the recent past, MySQL has developed a talent hiding feature previews at http://labs.mysql.com/. This is where you find MySQL Fabric, and this is where you find the JSON UDFs (user defined functions, pluggable SQL functions). Please keep in mind, all labyrinth treasures are of pre-production quality.

Visit Labs, click to download and install the JSON functions. If there is no binary for you, try a source build. The README guides you through the installation. Basically, it means copying the library that contains the pluggable SQL functions into MySQL’s plugin directory and loading the functions into MySQL.

CREATE FUNCTION json_valid RETURNS integer SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_search RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_extract RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_replace RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_append RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_remove RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_set RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_merge RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_contains_key RETURNS integer SONAME 'libmy_json_udf.so';


(Note: With my source build the resulting .so file got a name – libmy_json_udf.so – different from the one – libmy_json.so – assumed in the README.)

Putting expectations: writing about pre-production materials…

Let’s put expectations straight first. Whoever claims all downloads from labs.mysql.com are of highest, near production quality has probably never worked himself with the materials. Labs is for early feature previews. Quality can be accordingly. I think, this is fair. Any developer pushing work out as early as possible, given the constraints of a big US-based corporate company, deserves a “hurray!”. I can only ask you for providing user feedback as early as possible. Even if a quick test for one function shows the below result. Someone has been brave enough to push out code before the maximum amount of internal QA has been applied.

JSON JSON_VALID() JSLint Validator Comment
"a" Invalid Invalid Invalid. Must be object or array.
[1] Invalid Valid Valid. Array can be top level element.
["a"] Invalid Valid Valid. Follow-up error.
{"a":true} Valid Valid Object with pair.
{"a":false,} Valid Invalid Pair may only be followed by comma, if another pair follows.
{"a":null} Valid Valid Object with pair.
{"a":nuLL} Valid Invalid There is exactly one way to spell null: ‘null’.
{"a":True} Valid Invalid Follow-up error.
{"a":FaLSE} Valid Invalid Follow-up error.
{"a":12345678901234567890} Valid Valid Any number of digits allowed.
{"a":-0.12} Valid Valid Negative number.
{"a":-0.1234E+39} Valid Valid US national finance…
{"a":1.23e0001} Valid Valid
[1,{"a":"bc"}] Invalid Valid Follow-up error.
{"1":"d","2":{"a":"bc"}} Valid Valid Follow-up error.
{"a":"\z"} Valid Invalid Requires special handling.
{"a":"\u01"} Valid Invalid Must be four hexadecimal digits
{"a":[1,]} Invalid Invalid Unlike with objects, the wrong syntax is detected.

If you glance over the results, you may come to the conclusion:

  • This is pre-production with glitches
  • This is pre-production with smelly array handling

Putting expectations: TEXT/BLOB handling

As hinted in the introduction, MySQL’s support for TEXT/BLOB columns is not the best possible, but those are exactly the SQL column types one would use for storing JSON inside MySQL. Whoever says MySQL in 2013 likely speaks InnoDB when it comes to storage. InnoDB, for example, stores only a 768 bytes prefix of a variable-length column in the index. The rest goes off-page causing extra I/O. Please note, I/O could be pretty cheap once the working set has been loaded into memory.

If your JSON document requires more than 768 bytes, storage is not optimal. 768 bytes do not mean 768 characters. Possibly, you will be using CHARSET=utf8, means a character takes 1 to 4 bytes to store. However, you may also go for latin1 and have all unicode encoded upfront as \xxxx. In the worst case, 768 bytes means less than 200 characters before off-page storage happens.

Indexing is limited to the prefix. There is no function based index in MySQL to speed up searches on individual JSON document keys.

The PBXT storage engine from MySQL 5.1 times had pretty fancy BLOB streaming capabilities. If you wanted to do something like MapReduce in MySQL, you would probably love to see streaming supported to avoid materialization of intermediate results and to keep processing buffers small. Another use case for streaming, which is unrelated to JSON, are media files (video, audio).

All this does not mean MySQL becomes useless! Performance could still be pretty solid… – give it a try. I rarely show any figures because I have only one respectively two core notebooks available for testing.

Introduction to the MySQL JSON functions

The JSON UDF download contains SQL functions to search JSON documents and functions to modify JSON document. If a function wants you to define on which subset of the entire document it shall work, it is listed under “by key” in the below table. Otherwise, it is listed as driven “by value” (not very precise, however, I had no better idea).

Search
by value JSON_SEARCH()
by key JSON_CONTAINS_KEY(), JSON_EXTRACT()
Modify
by value JSON_MERGE()
by key JSON_APPEND(), JSON_REMOVE(), JSON_REPLACE(), JSON_SET()

A function that works “by key” on a subset of a JSON document, usually has the prototype: JSON_SOMETHING(string json_document, string key [, string key...], other_parameter). The first argument passed to the functions is always the JSON document to work on. It follows a variable number of arguments that describe on which subpart of the entire document the function shall work. Having this string key [, string key...] arguments in the middle of the function signature is a bit odd. Some programming languages may forbid this for style considerations. However, depending on the function, further arguments may follow, such as the value to search for or to add.

The key syntax is the key to understanding…

Let’s use JSON_CONTAINS_KEY(string document, string key[, string key...]) to explore the syntax at the example of a most basic JSON document. The JSON document consists of an object with one member called “key”.


{
   "key":"value"
}

To check whether the document has a member called “key” you call JSON_CONTAINS_KEY('{"key":"value"}', "key"). Easy! Albeit: not beautiful.

The deeper your JSON document is nested, the longer the variable length string key [, string key...] part in your function call gets. To get down to the nested object from the document below you write:


{
   "key":"value",
   "other_key_level_1":{
      "another_key_level_2":1
   }
}

mysql> select json_contains_key(
  '{"key":"value","other_key_level_1":{"another_key_level_2":1}}', 
    -> "other_key_level_1", "another_key_level_2")\G
*************************** 1. row ***************************
json_contains_key(
'{"key":"value","other_key_level_1":{"another_key_level_2":1}}',
"other_key_level_1", "another_key_level_2"): 1

From my short experience with the functions, nesting of arbitrary depth is supported. Means, the limit should be in the tens of tousands or the available amount of memory.

Keys and arrays

Keys and arrays work as expected. Array keys are zero-based. The first element in an array is accessed through the key “0”, the second element through key “1” and so forth. Note, however, that you have to use strings.


mysql> select json_contains_key('{"key":[1]}', "key", "0")\G
*************************** 1. row ***************************
json_contains_key('{"key":[1]}', "key", "0"): 1

No secret key to candies…

It may sound a bit though but this is a case where you see the difference between a MeeTooSQL and a system that has had the time and resources to add syntactic sugar. With a bit of syntactic sugar, say “SELECT document.member[offset]” this could look much more appealing. On the contrary, I am talking syntaxtic sugar only! Syntactic sugar is really hard to add with todays MySQL. The MySQL parser is not modular, and at the hearth of the entire system, which forbids icing on the cake in the making. Not to speak of ignoring the SQL standard and the option to support such a notation in UDFs at the cost of parsing string arguments (over processing a **va_args list). Still, as a developer, … See also,
Searching document stores in 2013: from 1983 to SQL:2003 in a blink?

Searching for values

Searching JSON documents is most limited to exact match. There is no support for wildcards. Neither are regular expressions supported not fuzzy matching as with SQL LIKE. I am not aware of a straight forward way to do a case insensitive search. Converting the entire JSON document to upper or lower case and comparing with a correspondingly modified search value is no option as the conversion would also affect JSON object member names (your “keys”).

Here’s what the README, the only documentation available apart from the *.c[omment]/*.h[elp] files, says about JSON_SEARCH(): Searches for specified value in the document. Returns key path of the element which contains the value in reverse order or NULL if parsing failed.. I assume that there is no way to search for “keys” with a specific name.

Whatever, here’s my attempt to find “value” in the most basic document…


{
   "key":"value"
}

mysql> select json_search('{"key":"value"}', "value" )\G
*************************** 1. row ***************************
json_search('{"key":"value"}', "value" ): NULL

I don’t quite buy that there is no “value” in my writing. Maybe I got the syntax wrong?


mysql> select json_search('{"a":{"b":"c"}}', "c" )\G
*************************** 1. row ***************************
json_search('{"a":{"b":"c"}}', "c" ): NULL

Unfortunately, the result does not look any better. I must be missing something fundamental not only documentation, because:

Search
Document JSON_SEARCH(what) Comment
{"key":"value"} value Not found
{"a":{"b":"c"}} c Not found
{"a":"b", "c": "d"} d Not found
{"a":"b","c":["2", "3"]} 2 Not found
{"a":[1]} 1 Found
Features
Exact string match Supported
Regular expression Unsupported
Fuzzy wildcard Unsupported
User defined comparsion expression, e.g. for case-insensitive search Unsupported

Fast forward to a successful search. Upon success, the function returns a […] key path of the element which contains the value in reverse order […].

mysql> select json_search('{"key":[1]}', "1" )\G
*************************** 1. row ***************************
json_search('{"key":[1]}', "1" ): 0:key::


Note the key path returned: 0:key::. The interesting bit here is the double double dot ::. JSON_CONTAINS_KEY unveils that no magic key “::” exists. But, if it does not exist, why does JSON_SEARCH report it? There is no single root element in JSON documents, because JSON documents do not form trees, and there seems to be no need for a prefix.

mysql> select json_contains_key('{"a":[1]}', "::" )\G
*************************** 1. row ***************************
json_contains_key('{"a":[1]}', "::" ): 0

mysql> select json_contains_key('{"a":[1]}', ":" )\G
*************************** 1. row ***************************
json_contains_key('{"a":[1]}', ":" ): 0

Modifying data

It may be entertaining to point out bugs and quirks of a pre-production version, but what really matters is the feature set and direction. Search works, albeit limited to most basic exact string match. What about modifying data? Again, there are functions that work on parts of a document and one that does not require a key path.

JSON_MERGE:

Merges two or more documents into one. Returns first document with following documents appended.

But what exactly is a document? A document is anything that JSON_VALID considers valid. As argued above, JSON_VALID should consider both an object and an array as a valid document. That is {"key":"value"} respectively [1].

mysql> select json_merge('{"a":"b"}', '{"c":"d"}' )\G
*************************** 1. row ***************************
json_merge('{"a":"b"}', '{"c":"d"}' ): {"a":"b", "c":"d"}


Simple cases work flawless. But then, there are the more tricky ones such as merging and object with an array. This may be actually undefined and there may be no solution: under which member name should the array appear? Or, what if two objects are merged that have members of the same name at the same level (see table below).

The README continues stating about JSON_MERGE: If one of following documents does not contain an opening curly bracket first documents merged are returned and warning is generated. Reasonable, but…

mysql> select json_merge('{"a":"b"}', '"c":"d"}' )\G
*************************** 1. row ***************************
json_merge('{"a":"b"}', '"c":"d"}' ): {"a":"b"}
1 row in set (0,00 sec)

mysql> show warnings;
Empty set (0,00 sec)


And the last sentence from the README is: [returns] NULL if first document does not contain an opening curly bracket. Which tells me that “document” is a quite fuzzy description for the functions parameters. Assuming the function would work on documents (objects or arrays), a missing opening curly bracket (applies to object only) would be just one of many possible ways of passing invalid “documents” as input values. Whatever, test and fail:

mysql> select json_merge('"a":"b"}', '{"c":"d"}' )\G
*************************** 1. row ***************************
json_merge('"a":"b"}', '{"c":"d"}' ): "a":"b", "c":"d"}

Merging: JSON_MERGE
JSON 1 JSON 2 Result Comment
{"a":"b"} {"c":"d"} {"a":"b","c":"d"} OK
{"a":"b"} {"c":{"d":"e"}} {"a":"b","c":{"d":"e"}} OK
{"a":"b"} {} {"a":"b", } Input: two valid objects, Result: invalid object
{"a":"b"} {"a":"c"} {"a":"b", "a":"c" } OK, but for serialization into native JavaScript variable questionable
{"a":"b"} {"a":"b"} {"a":"b", "a":"b" } OK, but again seems a bit questionable considering serialization into any native programming language object
{"a":"b"} [1] {"a":"b"} Somewhat undefinable result, error/warning missing
[1,2] [3,4,5] [1,2] Questionable, error/warning missing
{"a":"b"} "c":"d"} {"a":"b"} OK, but error/warning missing
{"a":"b"} {true: "c"} {"a":"b", true:"c"} Wrong, returns invalid JSON, error/warning missing
"a":"b"} {"c":"d"} "a":"b","c":"d"} Wrong, README says NULL shall be returned

The remainder

I’ll iterate over the remaining functions only very briefly. The story pretty much continues as it has begun.

It seems that JSON_APPEND can be used inject new data into an existing document: Inserts new element into JSON document. Returns document with appended element or NULL if parsing failed.. The syntax documented is json_append(doc, keypart1, keypart2, ..., new_element). When I first time read this, I looked pretty puzzled when I tried to append an element to the end of an array:

mysql> select json_append('{"a":[1]}', "a", "2" )\G
*************************** 1. row ***************************
json_append('{"a":[1]}', "a", "2" ): {"a":[1]}


The trick is that here new_value should read new_key, new_value. The correct syntax seems to be:

mysql> select json_append('{"a":[1]}', "a", "99", "3" )\G
*************************** 1. row ***************************
json_append('{"a":[1]}', "a", "99", "3" ): {"a":[1, 3]}


Note, that I used offset “99”. Any offset greater or equal to “1” worked for me. Next try: appending pairs to a nested object:

mysql> select json_append('{"a":{"b":"c"}}', "a", "d", "e" )\G
*************************** 1. row ***************************
json_append('{"a":{"b":"c"}}', "a", "d", "e" ): {"a":{"b":"c", "d": e}}


Have you noticed? Invalid JSON returned…

Modifying: JSON_APPEND
JSON key part new_key new_value Result Comment
{"a":"b"} "a" "c" "d" {“a”:”b”} Nothing inserted, no error/warning
{"a":"b"} "a" "c" {“a”:”b”} Nothing inserted, no error/warning, unclear syntax description. Same with arrays.
{"a":[1]} "a" 99 2 {“a”:[1, 2]} OK, nice offset handling
"a":{"b":"c"}} "a" "d" "e" {“a”:{“b”:”c”, “d”: e}} Invalid JSON returned

All in all, it smells as if it was too early to blog about it. Instead of demoing how to use JSON with MySQL, I ended up in a life debug session for PlanetMySQL. It is not really hard to do the testing. JSON has a pretty simple gramma. Go and play with the productions of the gramma, add a bit of “first user, no docs”, and its easy to find yet another “gotcha”. Invalid JSON returned:

mysql> select json_replace('{"a":[1]}', "a", "b")\G
*************************** 1. row ***************************
json_replace('{"a":[1]}', "a", "b"): {"a":b

Taken from the README. For most functions manipulating or generating JSON documents is true: Warning! This version does not check whole document for validity. Hey, it is a labs.mysql.com pre-production release :-).

  • json_append(doc, keypart1, keypart2, ..., new_element)
    Inserts new element into JSON document. Returns document with appended element or NULL if parsing failed.

    Note: json_append(doc, keypart1, keypart2, ..., new_pair_key, new_pair_value) might be a better description, see above.
  • json_contains_key(doc, keypart1, keypart2, ...)
    Checks if documents contains specified key. Returns TRUE if key exists, FALSE if not exists, NULL if parsing failed.
  • json_extract(doc, keypart1, keypart2, ...)
    Extracts value of the specified key. Returns value of the key specified, NULL if parsing failed.
  • json_merge(doc1, doc2, ...)
    Merges two or more documents into one. Returns first document with following documents appended. If one of following documents does not contain an opening curly bracket first documents merged are returned and warning is generated NULL if first document does not contain an opening curly bracket.
  • json_remove(doc, keypart1, keypart2, ...)
    Removes element specified by the key. Returns document without the element or NULL if parsing failed.
  • json_replace(doc, keypart1, keypart2, ..., new_value)
    Updates value of specified key. Returns document with replaced key or original document if no such an element found, NULL if parsing failed.
  • json_search(doc, value)
    Searches for specified value in the document. Returns key path of the element which contains the value in reverse order or NULL if parsing failed.
  • json_set(doc, keypart1, keypart2, ..., new_value)
    Performs kind of INSERT ... ON DUPLICATE KEY UPDATE operation.
    Returns document with updated or inserted element or NULL if parsing failed.
  • json_test_parser(doc)
    Returns text representation of parse tree of the JSON document, empty string if document is invalid. This function is supposed to use for tests only and should not be used in production.
  • json_valid(doc)
    Checks if doc is valid JSON document. Returns TRUE if document is valid, FALSE is document is invalid.

My initial take

The new set of SQL functions to work on JSON is a most welcome addition. BUT there are issues with the very first pre-production version published on MySQL Labs.

The list of functions available covers only most basic search and modification. Check the PHP manual array functions to get an idea of what developers can envision to do with hashes and have readily available as a built-in function! For example, there is no way to recursively search a document/hash using a user-supplied comparison function. This, however, does not bother me much: once there is a place for dumping new functions, it is only a matter of demand until someone adds them.

TEXT/BLOB columns holding JSON documents remain second class citizen to MySQL. Think off page storage, think indexing. Given MySQLs’ current pace, are you pessimistic about the future…?

Happy hacking!

@Ulf_Wendel Follow me on Twitter