Ulf Wendel

Searching data in NotOnlyMySQL databases: two extremes

MySQL and NoSQL go together. In the data center, that’s a known. Inside MySQL, that’s lesser known. MySQL 5.6 is both a puristic key value store and a full-fledged relational database. MySQL could also be developed into a schema free document store supporting MapReduce as a search method. With regards to searching those are the extremes. What search methods should a NotOnlyMySQL offer? The extremes first.

Thanks, MySQL. Losing a key is no nightmare!

Memcached is a high performance key value store for volatile data. Memcached is part of a classic scaling story: first, we buy a bigger machine for MySQL, then we try replication, then we cache database results in Memcached.

| |
MySQL Protocol Memcache Protocol
| |
Complex queries, critical data
Key lookup, volatile data

InnoDB and NDB Memcache API plugins turn MySQL 5.6 into a key value store. Adding Memcache protocol support to MySQL was a logical step not only because of the popularity of MySQL being used together with Memcache but also because the Memcache protocol is perfect match for exposing MySQL’s internal high performance storage APIs to the user.

| |
MySQL Protocol Memcache Protocol
| |
SQL Key lookup

Data can be inserted into MySQL using both MySQL Protocol or Memcache protocol. Data can be searched using either SQL or doing a key lookup. There are two ways to write and read the same data.

Extreme one: darling, I lost the keys for the data treasure!

SQL is a very powerful query language. Key lookup is the very opposite. Key lookup is not powerful enough for the majority of applications. Assume you configured MySQL 5.6 to make a table with employees and their hobbies accessible through the new Memcache interface. How do you get a list of all employees?

Key Value
Ulf_Wendel keep beloved three-wheeler running, photography, sewing, cycling
Johannes_Schlueter cycling, hacking, family
Andrey_Hristov renovate appartements, economics, winter sports

Using SQL the answer is straight forward: SELECT name_key FROM employees_kv_list.

The query cannot be answered using Memcache protocol. This is quite weird: you have created a list of employees, it exists but you cannot access your data. No key for you to your data treasure! A get [key] command is not enough to search a key value store. Lucky you, MySQL allows you to query the data using either SQL or the less powerful Memcache protocol commands.

An advanced key value store supporting data types

Redis is another key value NoSQL database. Search or query capabilities of Redis are not significantly better than the ones of Memcache. At least, you can do a "ls" on your key value file system. The KEY [pattern] command can get you the list of employees – yippie!

Next task: find the cyclists among the employees. Nothing as easy as that using SQL: SELECT name_key FROM employees_kv_list WHERE hobbies_values LIKE '%cycling%'.

Armed with nothing but KEY [pattern] a user of a key value store would be forced to read all values and filter them on the client-side. But then, what about consistent read and traffic?

GET Ulf_Wendel
GET Johannes_Schlueter
GET Andrey_Hristov

Redis is more advanced. Redis supports data types. Values can be strings, hashes, lists, sets, sorted sets and more. Sets could be used to reduce the traffic when trying to find the cyclists among the employees. SISMEMBER can test a set for a member.

SADD Ulf_Wendel "keep beloved three-wheeler running" "photography" "sewing" "cycling"
SISMEMBER Ulf_Wendel "cycling"

I am not after blaming key value stores. Key value work fantastic for a certain, narrow set of applications once you mastered the data model. Their simplicity allows them to offer O(1) time complexity – for key lookups. For nothing else. Ad-hoc queries are no strengh of basic key value stores.

Extreme two: MapReduce – you do the search!

Document oriented key value stores often feature MapReduce. MapReduce offers as powerful search possibilities as SQL.

For MapReduce you develop filter functions that can be run by the data store. Unlike with the certainly extreme examples from above, values are not transferred from the server to the client for client-side filtering. Traffic is lower and thus execution speed is potentially better.

It is powerful. It could be done with MySQL, that’s for sure. But, do you really want to write a script for every query?

Search/query capabilities
poor … -> … powerful
Key lookup MapReduce

What is in the middle?

MySQL speaks key value. MySQL speaks SQL. MySQL could support MapReduce. We have the extremes covered. What is in the middle that may cover the magic 80% of the use cases? To be continued the other day.

Happy hacking!

@Ulf_Wendel Follow me on Twitter