Ulf Wendel

MySQL Fabric’s initial decision on data models and transactions put into perspective

2013/10/17 - by admin - 2 Comments

When you design a distributed database you are faced with a variety of choices each having its well known pro’s and con’s. MySQL Fabric made an intial decision for one design. Let me try to put in context of NoSQL approaches to degree I understand the matter. Understanding the data model is key to understanding Fabric. There are limits: by design.

Fabric?

MySQL Fabric is an administration tool to build large “farms” of MySQL servers. In its most basic form, a farm is a collection of MySQL Replication clusters. In its most advanced form, a farm is a collection of MySQL Replication clusters with sharding on top.

There are plenty of presentations on slideshare and blog postings on PlanetMySQL to get you started. If you don’t know Fabric yet, please, consult them first. I’ll not present a practical view on Fabric but theoretical ramblings on the data model.

Recap: desire vs. reality

Scaling data stores beyond a single machine is a challenge. He hope is that the more servers work together in a distributed database, the higher the throughput. Distributed databases shall be always available, be scalable and offer distribution transparency.

From an application developers point of view distribution transparency is probably the most important one. Developers would love to use synchronous (eager) clusters. All servers shall show all the updates immediately. Developers would love to be able to send all queries to any of the servers around without bothering about read or write (update anywhere). Developers would love see a sytem where every server stores all data to answer any question (full replication).

Sorry, Fabric will disappoint. Why? … be warned, I take an even more theoretical route to argue than usual. And, I usually already get to hear I need to take a more practical view…

Replication has hard scalability limits. If all servers shall be able to answer all read questions, then all your data must be on every server of the cluster. If – sooner (eager/synchronous) or later (lazy/asynchronous) – all updates shall appear on all servers, then the updates must be applied to all servers. Classically, such clusters follow a ROWA approach: Read One, Write All. A ROWA system can run all reads locally one a single server without interaction with other systems. Whereas all write operations require coordination with all the other systems.

(Tilt?! Work through slides 25 and following. NoSQL systems sometimes offer tuning options but no ultimate solution. Not tilt?! Enjoy the precise differentiation between replica and node that Mats applied to the Fabric manual versus my writing.)

Simplified ROWA scalability approximation

Every server (replica or node, not making a differentiation here for simplicitly) in a ROWA cluster has to handle:

  • L = its own local work (reads and writes) – productive work
  • R = remote work (because all writes affect all servers) – unproductive work

The processing capacity of every server i in a ROWA cluster is Ci = Li + Wi. The server has to handle productive local work and unproductive remote work. First, simple lesson: a server i of a ROWA cluster will always be “slower” than a standalone server. The standalone server has no extra unproductive work to do.

We can put the server in a ROWA cluster in relation to a standalone server to get an estimation about the scaleout capabilities of the system: scaleout = ∑i=1nLi / C. Looks worse than it is. It only sums up the values for all servers in the cluster.

Next, we need an estimate for the remote work. That’s the write load that affects all the servers in the cluster. There are two ways of to apply the writes on remote servers:

  • symetric update processing = all local writes are replayed at the remote servers
  • asymmetric update processing = remote servers apply writesets (the actual changes) instead of the operations

Reminds you remotely of MySQL Replication formats: statement-based vs. row-based vs. mixed? Yes, that’s roughly the real life counterpart to the theortical model.

For symetric update processing, we can assume: Ri = w * (n – 1) * Li. Skipping a few steps and cheating with using a nice book, we end up with: scaleout = n / (1 + w * (n – 1)). For asynmetric processing we shall make a guess how efficient it is compared to symetric. If, for example, we assume applying writesets takes 1/4 or fully executing write operations, then wo = 0.25. One ends up with scaleout = n / (1 + w * wo * (n -1)).

(You did not believe I would understand this, did you? That’s Mats’ or MySQL Cluster folks skill level not mine! I need books to cheat.)

The hard earned, easy to grasp graphics

Armed with those formulas you can make a rough estimation of replication scaleout limits. Below is the plot for scaleout = n / (1 + w * (n – 1)). That’s ROWA, symetric update, write ratio from w=0.0 (0%) to w=1.0 (100%) and 1 to 16 servers. Read loads scale perfectly. With only 10% write load, you reach a scaleout factor of 4 with some six machines. It requires six machines to handle the load four standalone servers managed to handle. If you double the number of servers to 12, you are not even reaching the processing capability of six standalone servers.

Scaleout symetic update ROWA

I was too lazy to plot a graph for asymmetric update processing. However, it only shifts the picture a bit as you already can tell from the formula: scaleout = n / (1 + w * wo * (n -1)).

Before you ask: the above hints the theoretical scaleout potential of Galera/Percona Cluster – strong on distribution transparency, not so strong on scaleout.

Partial replication

Sorry, dear developers, nobody can give you the best possible distribution transparency together with the best possible scaleout. However, if users are willing to trade in a bit of the distribution transparency, scaleout capabilities become dramatically better!

The solution is to allow that not every server must have all the data. If a data item has copies at only r<=n servers (nodes), then the remote work does not comes from n-1 but only r-1 servers. The formula for partial replication and asymmetric update processing becomes: scaleout = n / (1 + w * wo * (r – 1)). Here’s the resulting graph for w = 0.2 (20% writes) and wo = 0.25 (asymmetric update four times more efficient than symmetric update). Stunning, ain’t it?

Scaleout symetic update ROWA

Welcome to MySQL Fabric (limit: thousands of nodes) and sharding! Or, MySQL Cluster at a smaller scale (limit: tens of nodes but millions of transactions per second). MySQL Cluster is also using sharding. Horizontal partitioning (sharding) is one approach for partial replication. There are two major reasons for using sharding:

  1. single server cannot handle the amount of data
  2. single server cannot handle the write load

The price you, as a user, have to pay is on distribution transparency:

  1. you may have to choose a specific server using some rules
  2. some questions cannot be answered by a single server, you have to collect answers from many

The first one is cheap with MySQL Fabric: provide a shard key, compute which server to use, send your queries there. The second one is expensive. Very expensive, depending on your needs. Again, depending on your needs, this (2) is where NoSQL systems may beat Fabric big times. Stay tuned, I am not talking MapReduce here! Data model, queries and scaleout capability go hand in hand.

Quick recap on MySQL Fabric

As said, if you are new to MySQL Fabric, please read some introduction first. For example, MySQL 5.7 Fabric: Introduction to High Availability and Sharding. Below is a quick recap on the data model and the replication model employed by Fabric. As you can see, it is optimized to scale writes on sharded tables. Reads are only fast if they cover no more than one partition of a sharded table plus any number of global tables.

MySQL Fabric data model
  Global Group  
  All unsharded tables g1…gn  
Shard Group for tpartition:1   Shard Group for tpartition:n
g1…gn   g1…gn
tpartition:1   tpartition:n

Replication based on primary copy for High Availability and read scale-out. Write scale-out is added by using horizontal partitioning. Clients query a distributed (future) middleware (Fabric) for catalog information. Clients pick servers depending on user/application provided shard keys and catalog information.

MySQL Fabric replication details
  Global Group  
  Primary Copy cluster  
Shard Group 1   Shard Group n
Primary Copy cluster   Primary Copy cluster
Copy of global group   Copy of global group
Primary of tpartition:1   Primary of tpartition:n

Fabrics weak spot

In the worst case a Fabric user asks a question that covers all partitions of a sharded table. No single MySQL server in the farm can answer the question, no single server stores all the data. Worse, there is no single server in the farm that can understand your intention behind question and warn you that it can’t give you a meaningful reply! There’s no way for a server to know whether you intended to query all shards or the partition on one shard.

SELECT shard_key, other_column FROM t1 ORDER BY other_column
|   |   |
Shard 1   Shard 2   Shard 3
t1(0..1000)   t1(1001..2000)   t1(2001..3000)

Database theory knows about the challenge and there are several solutions to problem. The below list is roughly ordered by distribution transparency. Highest distribution transparency (“user-friendliness”) first, lowest last:

From a users perspective it would be best if there was a server that could answer all questions because it would have all data from all partial replication servers in one logical data model. It would be then the servers task to translate queries that spawn multiple partial replication servers (here: shards) in appropriate access pattern. This is more or less what MySQL Cluster does behind the scenes (docs, presentation). When using MySQL Cluster, sharding is 100% transparent on the SQL level. The big advantages of transparency is the biggest drawback. Users and sales forget all to easy that a lot of networking is going on that will slow things down.

Full replication query server using remote database links
(no local data materialization)
Logical t0..3000
CREATE TABLE t1all(...) ENGINE=SHARDING PARTITION_CONNECT=user:password@Shard1 PARTITION_CONNECT=user:password@Shard2 PARTITION_CONNECT=user:password@Shard3

(Using UNION something similar might be possible with ENGINE=FEDERATED.)
| | |
Shard1 Shard2 Shard3
t1(0..1000) t1(1001..2000) t1(2001..3000)

If done properly, this is a very nice approach. The server takes care of distributed transactions (MySQL Cluster, Spider: pessimistic, 2PC), the server takes care of all SQL and tries to use “push down conditions” to minimize the amount of data sent around. There are no major negatives but, maybe, MySQL lacks it (= time to develop) and product differentiation with MySQL Cluster.

Please note, this is not the best approach if the majority of your queries is accessing one partition only. For example, Spider, adds latency to queries that access individual partitions only. But that’s not the question here. Anyway, if I was a power-user considering the use of MySQL Fabric, I’d evaluate this option very thoroughly. Maybe, I would even extend Fabric to setup and manage such full replication servers and teach my clients to use them.

Fictional hybrid replication approach on top of Fabric
Global Group Full Replication Server
Queries on unsharded tables Queries spawning multiple partitions
Primary One or more servers
Copy1 Copyn
Queries on global tables and one partition per shard
Shard1 Shardn
Primary
Copy1 Copyn

Another hybrid solution that works out-of-the box would be using MySQL 5.7 multi-source replication to merge shards on server. Although this is easy to setup it has one obvious killer-disadvantage: data is physically materialized on one server.

Full replication query server
(using local data materialization)
t1(0..1000) t1(1001..2000) t1(2001..3000)
Shard1 Shard2 Shard3
| | |
Multi-Source Replication
Full replication query server
t0..3000

As said above, sharding is applied for two reasons: either size of an entity or write scalability limits. If sharding is applied to reduce the volumne of data a server can handle, how could one build one server that handles all the data… If sharding is applied mostly because of write scalability reasons, this is a low hanging fruit to solve the distributed query problem. Ignoring all questions about the lazy (asynchronous) nature of MySQL Replication and stale data, this is a very sexy approach: no expensive distributed transactions! It even fits the marketing bill. Use slave for OLAP is no new story.

SELECT shard_key, other_column FROM t1 ORDER BY other_column
Middleware or client library handling cross-partition queries
(local data materialization, not SQL feature complete)
|   |   |
Shard 1   Shard 2   Shard 3
t1(0..1000)   t1(1001..2000)   t1(2001..3000)

Finally, one could try to use a middleware or client library for gathering data from multiple partitions. This seems to be the road that Fabric plans to use. This approach will scale by client/middleware but there is no code one could reuse, for example, to process the ORDER BY clause in the example. Furthermore, the problem of materializing will remain. Problems have to solved for each and every driver again (read: different programming languages, no way to use one library for all). Most likely, none of the drivers will ever become as intelligent as a servers’ optimizer. I have no clue what the strategic goal is here.

Distributed transaction control is likely to be pessimistic including distributed locks and potential for deadlocks (a 2PC/XA classic, see slide 30 and following for details).

Are NoSQL data models any better?

There are many reasons why NoSQL solutions became popular. Considering the aspect of cloud and web-scale only, their biggest achievement might be reminding us of, or even inventing, data models that scale virtually indefinitely. Of course, partial replication is used… divide and conquer rules.

The first wave of major NoSQL systems was designed around key-value data models and sharding. The leading systems are BigTable, Dynamo and PNUTS with many open source projects that followed their ideas: HBase, Cassandra, Voldemort, Riak, [Big]CouchDB, MongoDB and many more.

Key-value table variations
Name Value structure Notes
Blob data model BLOB Value is uninterpreted binary
Relational data model Fixed set of columns. MySQL/SQL-92: Flat (scalar) values only (related: Fabric)
Column family data model Multiple sets of columns Flat values, wide columns no problem (related: MariaDB Dynamic Columns)
Document data model No column set restrictions. Values can be nested (related: N1NF/NF2), wide and sparse columns no problem.

Key-value systems restrict users to atomic key based access. Early systems offered no guarantees when a query spawned multiple values. Take a second and compare with Fabric. How weak is Fabrics’ weak spot?

What makes the query limitations in key-value systems a little less restricting from an application developers point of view is the logical entity a value can hold. Particularily a nested value, as in the document model, can hold multiple logical rows from several flat relational tables. Translated in Fabric speech: document = shard + relevant global tables + x. All MySQL has to offer here are some, limited JSON functions. On an aside: this is only one of many ways to argue why MySQL should have strong JSON/BSON support.

Simplified applications view on logical data model (see text)
  Fabric Document
Client protocol Binary (MySQL Client Server), Memcache (to some degree) Binary (vendor specific), Memcache (sometimes), HTTP (sometimes)
(Read) Question SELECT * FROM tn, gn
WHERE t.shard_key = <key>
fetch(<key>)
(Write) Query UPDATE tn SET ... WHERE t.shard_key = <key>,
UPDATE gn SET ...
(requires use of distributed transaction to be atomic)
update(<key>, <doc>)
(single atomic operation)
Logical entity tn
+ global table gn

tn
+ global table gn
+ any other table un respectively column

If you are not willing to give up a millimeter on ACID and other RDBMS strengths, stop reading. This is simply not for you. Listing RDBMS strengths for comparison is out of scope for this (already too long) blog post.

Second generation NoSQL NewSQL: co-location

There is a good number of NoSQL solution users today. In 2012 some market researchers predicted upto 25% market loss for MySQL within five years. I doubt those reasearchers included the second generation of NoSQL stores from around 2010 in their results but rather based their prediction on the then popular open source siblings of the 2007 systems. Thus, the 25% is about the simplistic key value model.

The second generation of NoSQL stores continues to aim keeping accesses local to a single node. I am speaking of node here, as partial replication continues to be a must. Second generation systems include but are not limited to ElasTras, H-Store, Megastore, Spanner and – yes – Cloud SQL Server (Microsoft SQL Azure backend). In general, I believe there is some swing back to stricter schemas and declarative SQL-like query languages, however, let’s look at data models only. Let’s consider only data models that are statically defined but none that adapt dynamically to the actual access pattern. To me, a dynamical data model seems out of reach with regards to Fabric.

  • hierarchical: tree schema
  • hierarchical: entity groups
  • [keyed] table group

There are three kinds of (logical) tables in a tree schema: primary tables, secondary tables and global tables. The primary key of the primary table acts as a partitioning key. Secondary tables reference the primary tables using foreign keys. Global tables are additional read-mostly tables available on all nodes.

Hierarchical: tree schema data model
Primary table
p(kp)
|   |
Secondary table
s1(kp, ks1)
  Secondary table
s2(kp, ks2)
Global table
g1(kg1)
 

A partition on a node stores all matching records of the primary (sharded) table and the corresponding records in all secondary tables that reference the primary through a foreign key constraint. The records of the tables that are frequently joined are stored together on one node. Plus, additional global tables. It may be possible to use Fabric in a similar way, however, it would be very uncomfortable, possibly complex, manual operation. What’s hot about this approach is that need for distributed transactions for write queries is likely reduced. Its likely that updates spawn primary and secondary tables on one node only.

Hierarchical: entity group data model
Root table
p(keg)
|   |
Child table
c1(keg, kc1)
  Child table
c2(keg, kc2)

The basic idea with entity groups is similar. There are root tables/entities and child tables that reference the root tables by foreign keys. Records that reference each other belong together and shall form an entity group. There’s no counterpart to global tables in this model. Again, there is no way to formulate the “those tables belong together” with Fabric.

The table groups data model allows users to define sets of tables that shall be co-located on a node. In their simplest form those sets may consist of arbitrary tables. The tables of a table group may or may relate to each others through foreign keys.

Keyed table group data model
Row group
Table
t1(kt1, partition_keyc)
  Table
t1(kt1, partition_keyc)

Partitioning is applied to keyed tables groups. All tables in a keyed table group have a column that acts as a partitioning key. The partitioning key does not have to be the primary key unlike as in the two models above. All rows in a keyed table group that have the same partition key form a row group. Partitions contain sets of row groups.

The keyed table group model allows a bit more flexibility as neither foreign keys not primary keys have to be taken into account when grouping tables and creating partitions.

Pfft rambling… what’s the moral?

Having written all this, what’s the moral? Well, if you didn’t know before you should know now why partital replication is the key to massive scaleout. Adding partial replication to MySQL means opening Pandora’s box. Accesses spawning multiple nodes become extremly expensive. Fabric has no answer yet how this is to be adressed. One proposal is to move parts of the task towards the drivers. I’m working for the Connectors team and my take is “ouch, really!?”. There are alternatives: adding full replication servers to the farm is one.

As ever, when you hit a problem its good to look around and ask how competition solves it. NoSQL and NewSQL systems know about the issues of distributed transactions. To some degree they work around them and try to reduce the cases when distributed transactions are required. If anyhow possible, queries shall run on individual nodes only. The document model is one prime example. MySQL could bend itself towards the model with relatively low investments.

Second generation NoSQL stores suggest hierarchical models. Again, Fabric could learn a lesson or two – long term. Again, its about avoiding accesses that spawn multiple nodes.

Speaking of distributed transactions: this would make a great topic for an endless follow-up posting titled “Fabric transactions and data model…”.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

In all fairness…

When I say anything that sounds like “MySQL designed”, I refer to the guys that made MySQL Fabric happen. But, there are countless customers that introduced sharding to the MySQL world long before. It is hard to find adequate wording – the end result is what counts! Replicated databases have been an active area of research since the 1980’s. What’s rather new are the data models that NoSQL systems use to attack the challenges of distributed transactions. What’s also rather new are the stunning WAN/VPN and routing options really big players like Google have.

MySQL 5.7: SQL functions for JSON

2013/10/09 - by admin - 9 Comments

For decades MySQL has considered BLOB a sink for data junk that didn’t fit the strictly typed SQL bill. As the outside world began voting for JavaScript, JSON and key-value/key-document stores, TEXT/BLOB was rediscovered as the only way to store JSON inside MySQL. But having no SQL support for the JSON serialization format, JSON remained junk for MySQL. Community developed SQL functions could not cure the issue. That said, MySQL 5.7 introduces SQL functions to work on JSON documents! MySQL slowly takes a different position on JSON.

From the labyrinth: labs.mysql.com

In the recent past, MySQL has developed a talent hiding feature previews at http://labs.mysql.com/. This is where you find MySQL Fabric, and this is where you find the JSON UDFs (user defined functions, pluggable SQL functions). Please keep in mind, all labyrinth treasures are of pre-production quality.

Visit Labs, click to download and install the JSON functions. If there is no binary for you, try a source build. The README guides you through the installation. Basically, it means copying the library that contains the pluggable SQL functions into MySQL’s plugin directory and loading the functions into MySQL.

CREATE FUNCTION json_valid RETURNS integer SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_search RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_extract RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_replace RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_append RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_remove RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_set RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_merge RETURNS string SONAME 'libmy_json_udf.so';
CREATE FUNCTION json_contains_key RETURNS integer SONAME 'libmy_json_udf.so';


(Note: With my source build the resulting .so file got a name – libmy_json_udf.so – different from the one – libmy_json.so – assumed in the README.)

Putting expectations: writing about pre-production materials…

Let’s put expectations straight first. Whoever claims all downloads from labs.mysql.com are of highest, near production quality has probably never worked himself with the materials. Labs is for early feature previews. Quality can be accordingly. I think, this is fair. Any developer pushing work out as early as possible, given the constraints of a big US-based corporate company, deserves a “hurray!”. I can only ask you for providing user feedback as early as possible. Even if a quick test for one function shows the below result. Someone has been brave enough to push out code before the maximum amount of internal QA has been applied.

JSON JSON_VALID() JSLint Validator Comment
"a" Invalid Invalid Invalid. Must be object or array.
[1] Invalid Valid Valid. Array can be top level element.
["a"] Invalid Valid Valid. Follow-up error.
{"a":true} Valid Valid Object with pair.
{"a":false,} Valid Invalid Pair may only be followed by comma, if another pair follows.
{"a":null} Valid Valid Object with pair.
{"a":nuLL} Valid Invalid There is exactly one way to spell null: ‘null’.
{"a":True} Valid Invalid Follow-up error.
{"a":FaLSE} Valid Invalid Follow-up error.
{"a":12345678901234567890} Valid Valid Any number of digits allowed.
{"a":-0.12} Valid Valid Negative number.
{"a":-0.1234E+39} Valid Valid US national finance…
{"a":1.23e0001} Valid Valid
[1,{"a":"bc"}] Invalid Valid Follow-up error.
{"1":"d","2":{"a":"bc"}} Valid Valid Follow-up error.
{"a":"\z"} Valid Invalid Requires special handling.
{"a":"\u01"} Valid Invalid Must be four hexadecimal digits
{"a":[1,]} Invalid Invalid Unlike with objects, the wrong syntax is detected.

If you glance over the results, you may come to the conclusion:

  • This is pre-production with glitches
  • This is pre-production with smelly array handling

Putting expectations: TEXT/BLOB handling

As hinted in the introduction, MySQL’s support for TEXT/BLOB columns is not the best possible, but those are exactly the SQL column types one would use for storing JSON inside MySQL. Whoever says MySQL in 2013 likely speaks InnoDB when it comes to storage. InnoDB, for example, stores only a 768 bytes prefix of a variable-length column in the index. The rest goes off-page causing extra I/O. Please note, I/O could be pretty cheap once the working set has been loaded into memory.

If your JSON document requires more than 768 bytes, storage is not optimal. 768 bytes do not mean 768 characters. Possibly, you will be using CHARSET=utf8, means a character takes 1 to 4 bytes to store. However, you may also go for latin1 and have all unicode encoded upfront as \xxxx. In the worst case, 768 bytes means less than 200 characters before off-page storage happens.

Indexing is limited to the prefix. There is no function based index in MySQL to speed up searches on individual JSON document keys.

The PBXT storage engine from MySQL 5.1 times had pretty fancy BLOB streaming capabilities. If you wanted to do something like MapReduce in MySQL, you would probably love to see streaming supported to avoid materialization of intermediate results and to keep processing buffers small. Another use case for streaming, which is unrelated to JSON, are media files (video, audio).

All this does not mean MySQL becomes useless! Performance could still be pretty solid… – give it a try. I rarely show any figures because I have only one respectively two core notebooks available for testing.

Introduction to the MySQL JSON functions

The JSON UDF download contains SQL functions to search JSON documents and functions to modify JSON document. If a function wants you to define on which subset of the entire document it shall work, it is listed under “by key” in the below table. Otherwise, it is listed as driven “by value” (not very precise, however, I had no better idea).

Search
by value JSON_SEARCH()
by key JSON_CONTAINS_KEY(), JSON_EXTRACT()
Modify
by value JSON_MERGE()
by key JSON_APPEND(), JSON_REMOVE(), JSON_REPLACE(), JSON_SET()

A function that works “by key” on a subset of a JSON document, usually has the prototype: JSON_SOMETHING(string json_document, string key [, string key...], other_parameter). The first argument passed to the functions is always the JSON document to work on. It follows a variable number of arguments that describe on which subpart of the entire document the function shall work. Having this string key [, string key...] arguments in the middle of the function signature is a bit odd. Some programming languages may forbid this for style considerations. However, depending on the function, further arguments may follow, such as the value to search for or to add.

The key syntax is the key to understanding…

Let’s use JSON_CONTAINS_KEY(string document, string key[, string key...]) to explore the syntax at the example of a most basic JSON document. The JSON document consists of an object with one member called “key”.


{
   "key":"value"
}

To check whether the document has a member called “key” you call JSON_CONTAINS_KEY('{"key":"value"}', "key"). Easy! Albeit: not beautiful.

The deeper your JSON document is nested, the longer the variable length string key [, string key...] part in your function call gets. To get down to the nested object from the document below you write:


{
   "key":"value",
   "other_key_level_1":{
      "another_key_level_2":1
   }
}

mysql> select json_contains_key(
  '{"key":"value","other_key_level_1":{"another_key_level_2":1}}', 
    -> "other_key_level_1", "another_key_level_2")\G
*************************** 1. row ***************************
json_contains_key(
'{"key":"value","other_key_level_1":{"another_key_level_2":1}}',
"other_key_level_1", "another_key_level_2"): 1

From my short experience with the functions, nesting of arbitrary depth is supported. Means, the limit should be in the tens of tousands or the available amount of memory.

Keys and arrays

Keys and arrays work as expected. Array keys are zero-based. The first element in an array is accessed through the key “0”, the second element through key “1” and so forth. Note, however, that you have to use strings.


mysql> select json_contains_key('{"key":[1]}', "key", "0")\G
*************************** 1. row ***************************
json_contains_key('{"key":[1]}', "key", "0"): 1

No secret key to candies…

It may sound a bit though but this is a case where you see the difference between a MeeTooSQL and a system that has had the time and resources to add syntactic sugar. With a bit of syntactic sugar, say “SELECT document.member[offset]” this could look much more appealing. On the contrary, I am talking syntaxtic sugar only! Syntactic sugar is really hard to add with todays MySQL. The MySQL parser is not modular, and at the hearth of the entire system, which forbids icing on the cake in the making. Not to speak of ignoring the SQL standard and the option to support such a notation in UDFs at the cost of parsing string arguments (over processing a **va_args list). Still, as a developer, … See also,
Searching document stores in 2013: from 1983 to SQL:2003 in a blink?

Searching for values

Searching JSON documents is most limited to exact match. There is no support for wildcards. Neither are regular expressions supported not fuzzy matching as with SQL LIKE. I am not aware of a straight forward way to do a case insensitive search. Converting the entire JSON document to upper or lower case and comparing with a correspondingly modified search value is no option as the conversion would also affect JSON object member names (your “keys”).

Here’s what the README, the only documentation available apart from the *.c[omment]/*.h[elp] files, says about JSON_SEARCH(): Searches for specified value in the document. Returns key path of the element which contains the value in reverse order or NULL if parsing failed.. I assume that there is no way to search for “keys” with a specific name.

Whatever, here’s my attempt to find “value” in the most basic document…


{
   "key":"value"
}

mysql> select json_search('{"key":"value"}', "value" )\G
*************************** 1. row ***************************
json_search('{"key":"value"}', "value" ): NULL

I don’t quite buy that there is no “value” in my writing. Maybe I got the syntax wrong?


mysql> select json_search('{"a":{"b":"c"}}', "c" )\G
*************************** 1. row ***************************
json_search('{"a":{"b":"c"}}', "c" ): NULL

Unfortunately, the result does not look any better. I must be missing something fundamental not only documentation, because:

Search
Document JSON_SEARCH(what) Comment
{"key":"value"} value Not found
{"a":{"b":"c"}} c Not found
{"a":"b", "c": "d"} d Not found
{"a":"b","c":["2", "3"]} 2 Not found
{"a":[1]} 1 Found
Features
Exact string match Supported
Regular expression Unsupported
Fuzzy wildcard Unsupported
User defined comparsion expression, e.g. for case-insensitive search Unsupported

Fast forward to a successful search. Upon success, the function returns a […] key path of the element which contains the value in reverse order […].

mysql> select json_search('{"key":[1]}', "1" )\G
*************************** 1. row ***************************
json_search('{"key":[1]}', "1" ): 0:key::


Note the key path returned: 0:key::. The interesting bit here is the double double dot ::. JSON_CONTAINS_KEY unveils that no magic key “::” exists. But, if it does not exist, why does JSON_SEARCH report it? There is no single root element in JSON documents, because JSON documents do not form trees, and there seems to be no need for a prefix.

mysql> select json_contains_key('{"a":[1]}', "::" )\G
*************************** 1. row ***************************
json_contains_key('{"a":[1]}', "::" ): 0

mysql> select json_contains_key('{"a":[1]}', ":" )\G
*************************** 1. row ***************************
json_contains_key('{"a":[1]}', ":" ): 0

Modifying data

It may be entertaining to point out bugs and quirks of a pre-production version, but what really matters is the feature set and direction. Search works, albeit limited to most basic exact string match. What about modifying data? Again, there are functions that work on parts of a document and one that does not require a key path.

JSON_MERGE:

Merges two or more documents into one. Returns first document with following documents appended.

But what exactly is a document? A document is anything that JSON_VALID considers valid. As argued above, JSON_VALID should consider both an object and an array as a valid document. That is {"key":"value"} respectively [1].

mysql> select json_merge('{"a":"b"}', '{"c":"d"}' )\G
*************************** 1. row ***************************
json_merge('{"a":"b"}', '{"c":"d"}' ): {"a":"b", "c":"d"}


Simple cases work flawless. But then, there are the more tricky ones such as merging and object with an array. This may be actually undefined and there may be no solution: under which member name should the array appear? Or, what if two objects are merged that have members of the same name at the same level (see table below).

The README continues stating about JSON_MERGE: If one of following documents does not contain an opening curly bracket first documents merged are returned and warning is generated. Reasonable, but…

mysql> select json_merge('{"a":"b"}', '"c":"d"}' )\G
*************************** 1. row ***************************
json_merge('{"a":"b"}', '"c":"d"}' ): {"a":"b"}
1 row in set (0,00 sec)

mysql> show warnings;
Empty set (0,00 sec)


And the last sentence from the README is: [returns] NULL if first document does not contain an opening curly bracket. Which tells me that “document” is a quite fuzzy description for the functions parameters. Assuming the function would work on documents (objects or arrays), a missing opening curly bracket (applies to object only) would be just one of many possible ways of passing invalid “documents” as input values. Whatever, test and fail:

mysql> select json_merge('"a":"b"}', '{"c":"d"}' )\G
*************************** 1. row ***************************
json_merge('"a":"b"}', '{"c":"d"}' ): "a":"b", "c":"d"}

Merging: JSON_MERGE
JSON 1 JSON 2 Result Comment
{"a":"b"} {"c":"d"} {"a":"b","c":"d"} OK
{"a":"b"} {"c":{"d":"e"}} {"a":"b","c":{"d":"e"}} OK
{"a":"b"} {} {"a":"b", } Input: two valid objects, Result: invalid object
{"a":"b"} {"a":"c"} {"a":"b", "a":"c" } OK, but for serialization into native JavaScript variable questionable
{"a":"b"} {"a":"b"} {"a":"b", "a":"b" } OK, but again seems a bit questionable considering serialization into any native programming language object
{"a":"b"} [1] {"a":"b"} Somewhat undefinable result, error/warning missing
[1,2] [3,4,5] [1,2] Questionable, error/warning missing
{"a":"b"} "c":"d"} {"a":"b"} OK, but error/warning missing
{"a":"b"} {true: "c"} {"a":"b", true:"c"} Wrong, returns invalid JSON, error/warning missing
"a":"b"} {"c":"d"} "a":"b","c":"d"} Wrong, README says NULL shall be returned

The remainder

I’ll iterate over the remaining functions only very briefly. The story pretty much continues as it has begun.

It seems that JSON_APPEND can be used inject new data into an existing document: Inserts new element into JSON document. Returns document with appended element or NULL if parsing failed.. The syntax documented is json_append(doc, keypart1, keypart2, ..., new_element). When I first time read this, I looked pretty puzzled when I tried to append an element to the end of an array:

mysql> select json_append('{"a":[1]}', "a", "2" )\G
*************************** 1. row ***************************
json_append('{"a":[1]}', "a", "2" ): {"a":[1]}


The trick is that here new_value should read new_key, new_value. The correct syntax seems to be:

mysql> select json_append('{"a":[1]}', "a", "99", "3" )\G
*************************** 1. row ***************************
json_append('{"a":[1]}', "a", "99", "3" ): {"a":[1, 3]}


Note, that I used offset “99”. Any offset greater or equal to “1” worked for me. Next try: appending pairs to a nested object:

mysql> select json_append('{"a":{"b":"c"}}', "a", "d", "e" )\G
*************************** 1. row ***************************
json_append('{"a":{"b":"c"}}', "a", "d", "e" ): {"a":{"b":"c", "d": e}}


Have you noticed? Invalid JSON returned…

Modifying: JSON_APPEND
JSON key part new_key new_value Result Comment
{"a":"b"} "a" "c" "d" {“a”:”b”} Nothing inserted, no error/warning
{"a":"b"} "a" "c" {“a”:”b”} Nothing inserted, no error/warning, unclear syntax description. Same with arrays.
{"a":[1]} "a" 99 2 {“a”:[1, 2]} OK, nice offset handling
"a":{"b":"c"}} "a" "d" "e" {“a”:{“b”:”c”, “d”: e}} Invalid JSON returned

All in all, it smells as if it was too early to blog about it. Instead of demoing how to use JSON with MySQL, I ended up in a life debug session for PlanetMySQL. It is not really hard to do the testing. JSON has a pretty simple gramma. Go and play with the productions of the gramma, add a bit of “first user, no docs”, and its easy to find yet another “gotcha”. Invalid JSON returned:

mysql> select json_replace('{"a":[1]}', "a", "b")\G
*************************** 1. row ***************************
json_replace('{"a":[1]}', "a", "b"): {"a":b

Taken from the README. For most functions manipulating or generating JSON documents is true: Warning! This version does not check whole document for validity. Hey, it is a labs.mysql.com pre-production release :-).

  • json_append(doc, keypart1, keypart2, ..., new_element)
    Inserts new element into JSON document. Returns document with appended element or NULL if parsing failed.

    Note: json_append(doc, keypart1, keypart2, ..., new_pair_key, new_pair_value) might be a better description, see above.
  • json_contains_key(doc, keypart1, keypart2, ...)
    Checks if documents contains specified key. Returns TRUE if key exists, FALSE if not exists, NULL if parsing failed.
  • json_extract(doc, keypart1, keypart2, ...)
    Extracts value of the specified key. Returns value of the key specified, NULL if parsing failed.
  • json_merge(doc1, doc2, ...)
    Merges two or more documents into one. Returns first document with following documents appended. If one of following documents does not contain an opening curly bracket first documents merged are returned and warning is generated NULL if first document does not contain an opening curly bracket.
  • json_remove(doc, keypart1, keypart2, ...)
    Removes element specified by the key. Returns document without the element or NULL if parsing failed.
  • json_replace(doc, keypart1, keypart2, ..., new_value)
    Updates value of specified key. Returns document with replaced key or original document if no such an element found, NULL if parsing failed.
  • json_search(doc, value)
    Searches for specified value in the document. Returns key path of the element which contains the value in reverse order or NULL if parsing failed.
  • json_set(doc, keypart1, keypart2, ..., new_value)
    Performs kind of INSERT ... ON DUPLICATE KEY UPDATE operation.
    Returns document with updated or inserted element or NULL if parsing failed.
  • json_test_parser(doc)
    Returns text representation of parse tree of the JSON document, empty string if document is invalid. This function is supposed to use for tests only and should not be used in production.
  • json_valid(doc)
    Checks if doc is valid JSON document. Returns TRUE if document is valid, FALSE is document is invalid.

My initial take

The new set of SQL functions to work on JSON is a most welcome addition. BUT there are issues with the very first pre-production version published on MySQL Labs.

The list of functions available covers only most basic search and modification. Check the PHP manual array functions to get an idea of what developers can envision to do with hashes and have readily available as a built-in function! For example, there is no way to recursively search a document/hash using a user-supplied comparison function. This, however, does not bother me much: once there is a place for dumping new functions, it is only a matter of demand until someone adds them.

TEXT/BLOB columns holding JSON documents remain second class citizen to MySQL. Think off page storage, think indexing. Given MySQLs’ current pace, are you pessimistic about the future…?

Happy hacking!

@Ulf_Wendel Follow me on Twitter

Mir unbekannte Welten: DOAG SIG Development/Tools

2013/09/26 - by admin - 0 comments

Beim gestrigen Event DB-Programmierung: Oracle 12c für Entwickler der Deutsche ORACLE-Anwendergruppe (DOAG) SIG Development/Tools erfolgte ein Blick über den Tellerrand für Referent und Teilnehmer. Ich durfte über die verschiedenen Cluster Optionen für MySQL sprechen und die Sicht des PHP-Entwicklers andeuten.

Im Bereich verteilte Datenbanken ist MySQL mittlerweile sehr breit aufgestellt. Wer sich als DBA einen ersten Überblick verschaffen möchte, der kann die verschiedene Systeme mittels zweier Fragen grob einordnen (Folie 1-16):

  1. Können alle Transaktionen auf allen Replikas ausgeführt werden?
  2. Erfolgt die Konfliktlösung zwischen Transaktionen verschiedener Replikas sofort oder zeitversetzt?

Die Antworten bestimmen die grundsätzlichen Vor- und Nachteile der vier sich ergebenden Kombinationsmöglichkeiten. Es zeigt sich, daß MySQL in mindestens drei Feldern vertreten ist:

  • Lazy (wann: asynchron) Primary (wo: Master) Copy
  • Lazy (wann: asynchron) Update Anywhere (wo: “Multi-Master”)
  • Eager (wann: synchron) Update Anywhere (wo: “Multi-Master”)

Innerhalb dieser ersten Einordnung ist weiterhin nach vollständiger und partieller Replikation zu unterscheiden. Damit weitet sich die Auswahl bei MySQL auf:

  • Lazy (wann: asynchron) Primary (wo: Master) Copy
    • Vollständig
    • Partiell
  • Lazy (wann: asynchron) Update Anywhere (wo: “Multi-Master”)
  • Eager (wann: synchron) Update Anywhere (wo: “Multi-Master”)
    • Vollständig
    • Partiell

Wer hätte das von "der kleinen Webdatenbank" gedacht?

Je nach System ergeben prinzipbedingte Vor- und Nachteile. Insbesonders bei der beliebten MySQL Replikation kommen neue Aufgaben auf die Applikation zu (Folien 27 – 30). Der PHP MySQL Treiber versucht viele der Aufgaben zu abstrahieren und zu übernehmen, um die Entwicklung von Applikationen zu vereinfachen.

Im Rahmen eines DOAG SIG Development/Tools Vortrags sind die Details für den Oracle DBA eher unwichtig. Entsprechend lag der Fokus während des Vortrags auf den grundsätzlichen Fragen nach den verfügbaren Systemen und der Frage auf welcher Ebene ein Load Balancer eingesetzt werden sollte (Folien 31-45).

Wer MySQL als Cluster einsetzt, der findet hoffentlich in der Präsentation bei der Darstellung des PHP Treibers (Folien 45-85) eine Checkliste für die wichtigsten Aspekte beim Umgang mit einem beliebigen MySQL Cluster. Das Wort “PHP” sollte nicht abschrecken. Die Herausforderungen und Lösungsansätze sind sprachübergreifend. Und, natürlich führen viele Wege zum Ziel. Wer seit Jahren eine funktionierende Lösung hat, sollte sie weiterverwenden.

85 Folien?! Jede zweite ist eine Kommentierung dessen, was während des Vortrags sichtbar war. So wird der Vortrag auch ohne Referent online lebendig…

PHP Unconference Hamburg 2013: hungry for more

2013/09/23 - by admin - 0 comments

A day before the 7th edition of the PHP Unconference Hamburg the caretake of the venue called the organizers to inform them that their event cannot take place. The space would be needed as a polling station for the german federal elections… Eventually, the polling station was moved to the 7th floor and, well, business as usual: 330+ visitors, ~70 session proposals, 32 high quality sessions run, especially my own on MySQL 5.7 with built-in sharding ;-), ~40 Euros entry fee for what costs 400+ Euros elsewhere!

Great, WOW, please more, but…

I am a huge fan of the PHP Unconference. Ever since the first edition the who-is-who of the national PHP community gathers at this family stlyle event. Very few events made it into this league from the start. Having a big name or being a sponsor, however, does not mean you make it on the session table as its the people that vote for sessions every morning. Only topics that really matter to the audience have a chance to make it. You can find photos of the session table at http://www.php-unconference.de/.

Unbenannt

(See the end of the posting for a slideshow. Stefan has a really cool picture showing how crowded it is, if you make it to the event on a saturday morning – *yawn* – at: http://album.yabba.net/v/Stefan/PHPUnconference/2013/)

There are long breaks between the sessions to force people to discuss and exchange ideas instead of hunting from one session to another session in a huge conference hotel. The building of the University of Hamburg, where the unconference takes place, is just about perfect for the event. There’s a “bath tub” where 200+ people can sit surrounded by six lecture auditoriums.

The breaks are not the opportunity for discussions. When I came there on saturday, some looked amazingly tired and spoke about intensive discussions held in pubs at the Reeperbahn the night before. Well, I hope, they all have been fit enough to join the official, meanwhile traditional free drinks PHPProst event that took place on saturday evening.

Did we all get a bit lazy?

All in all the event does not need to hide. The unconferences continues to compare well with much more expensive business conferences and – due to the low ticket price – is very attractive to students.

But over the years, some bad habits crept in as well. Tickets are rare and very hard to get. The event is usually sold out in days or even hours, because its such high quality. If you know you cannot make it, please give your ticket to someone else. I wish the organizers had a place for controlled ticket trading on the events web site.

In the early years of the event it was common to run a feedback session at the end of the unconference. Everybody was asked to write down positive as well as negative impressions. Probably, many of you do retrospectives on a regular basis as you know the value of them. This year there was only one lonely “well done” or the like on the cupboard. With all the retrospectives experts around we should be able to do better.

The sessions are hot. This year its been a very, very close race among the top proposals. You had to get like 30+ votes to make it (each participant has four votes per day). I missed some fun sessions. Sure, with all the experts being around to present the cutting edge topcis and all the presentations suggested on stuff that matters to solve pressing day to day problems, its hard to get through with fun sessions or excotic topics. Having Thomas talk about Arduino with PHP seemed the maximum possible.

I wish there had been more. Unfortunately I have nobody but myself to blame. Unconferences are driven by their audience. Keep going there, keep the standards high for next years 8th edition.

Maybe I come up with a nice idea for PHP Unconference Europe 2014/ myself. The place where it takes place deserves a special show. #PHPucEU will take place on the island of Majorca – probably in May 2014. Can you think of a better place? Majorca is affordable, sunny, easy to travel to, …

2013 Sponsors & Supporters

No such community event is possible without sponsors and organizers (BOOT – Best Of Open Technologies e.V.). Thanks, everybody!

Thanks, Galileo computing: I won a book: “CouchDB – Das Praxisbuch für Entwickler und Administratoren“. Next time I meet Till in Berlin, I’ll ask him to sign it.

2014 – looking for help

The organizers are looking for helping hands to make the event happen again in 2014. If you like Open Source, you happen to be in Hamburg and you have some spare time, please, drop them an email: orga@php-unconference.de.

Happy gathering! Your Obi-Wan Kenobi*

* Upon arrival, my badge seemed lost. Instead of searching for a long time they gave me one with the name Obi-Wan Kenobi on it. Well, why not. The wrong date on it allows me to claim one day after the event that this is live blogging…

MySQL 5.7 Fabric: any good?

2013/09/23 - by admin - 8 Comments

MySQL 5.7 has sharding built-in through MySQL Fabric (Lab Release). And, MySQL has scaled the marketing team to handle 500.000 customer requests/s – or, was it MySQL 5.7 query throughput? Whatever. MySQL Fabric frees you from implementing sharding yourself. MySQL Fabric simplifies the management of MySQL Replication clusters of any size.

Is it any good? Yes, but MySQL must not stop here. This is not enough. MySQL is still a little too much behind trend setters.

PDF-Download

Technically, MySQL Fabric is little more than a new management tool. At best, one could speak of a middleware based replication design to manage groups of primary copy clusters.

MySQL Fabric
Shard Group Shard Group
  Primary (Master)  
Secondary (Slave, Copy) Secondary (Slave, Copy)

Catching up: simplified node and cluster deployment

Many NoSQL systems born after MySQL put much emphasis on distributed databases. Clustering and replicating databases became a must in 2010+. Setup and administration is easier than with MySQL Replication. MySQL Replication was introduced in 2000. Back then, replication was no mainstream. Very few deployed tens, hundrets or even thousands of MySQL servers for one web site. If setting up only two servers, how cares how comfortable it is as long as the database scales… In the years thereafter users created a rich set deployment tools.

Then came the new generation of databases: cluster setup made easy. This was a hard way of reminding MySQL of its own goals: it should take no longer than ten minutes from download to the running system. If your running MySQL system shall include a MySQL cluster, this is a hard to meet time limit. For a first-time user something either works, or it is out – the next minute. Agile, fast incremental updates, … MySQL Fabric has the potential to fix this. It is a safe bet to predict competition from within the MySQL ecosystem will strike back. Choice is always welcome!

MongoDB MySQL 5.7 with Fabric
1) Install mongods Install mysqlds
2) On any node, create replica set Using mysqlfabric, create master group
3) On primary, add secondaries Using mysqlfabric, add nodes, choose primary
4) Built-in (proper) failure detector automatically activated Choose failure detector, activate. E.g., use built-in (imperfect) failure detector.

Sharding

Partitioning (sharding) is the only way to scale reads and writes in a distributed database. Please see, the presentation “DIY – Distributed Database Cluster, or: MySQL Cluster” for an introduction into the theory behind distributed databases and why MySQL Cluster is an auto-partitioning/auto-sharding solution.

  table: users  
shard_key first_name other_columns[…]
|
Partitioning rule, for example: RANGE
|
MySQL server/shard 1 MySQL server/shard 2 MySQL server/shard 3
table: users,
shard_key = 1 .. 100
table: users,
shard_key = 101 .. 200
table: users,
shard_key = 201 .. 300
shard_key first_name other_columns[…]
1 John
Elena
100 Zacharias

There are many approaches implement sharding on top of MySQL. Many big MySQL users have created their own application based sharding solutions years ago. Application based sharding was the use case for the International PHP Conference 2008 presentation on asynchronous queries supported by the PHP mysqlnd library (slides). The idea behind the approach is to have the application route certain queries to certain shards often based on some static rule.

MySQL Fabric follows the same basic pattern: clients must route requests to appropriate shards, application developers must provide a shard key. Fabric knows which shard must be used for any given shard key and applications or drivers talk to Fabric to learn about it.

This is not the most transparent solution (as opposed to MySQL Cluster) but a perfectly valid design decision. Application developers get maximum control over the data distribution. This is, for example, important when optimizing for latency and data locality. In general there are other ways to achieve the same but at the end of the day, as long as you can do what is needed…

Transactions and data model

Distributed databases have an enemy: transactions. As soon as lazy primary copy (classic MySQL Replication) is not sufficient to scale a given workload, and data is partitioned, distributed transactions may be required by queries spawning multiple partitions on multiple nodes. With MySQL Fabric, application developers might start using XA transactions. An XA transaction coordinates local transactions from multiple servers (shards). If, for example, you would want to fetch a consistent view of rows from multiple shards, you might start a distributed transaction covering those shards.

Distributed transactions are slow compared to local transactions. Some may argue it can be done (when implemented properly) sufficently fast (I’m among those), but the ultimate proof that local transactions are always faster is quite simple. Your network cable measures one meter, your HDD cable measures ten centimeters. Given constant speed of light any communication on the HDD cable will always be 10x faster. And, sooner or later we talk flash/RAM instead of spinning disks – even if we did not, a distributed transaction would have to persist results for D(urability) in ACID. On an aside: MySQL Cluster is "cheating" here, it will flush periodically to disk and avoid delaying transactions by waiting for the disk.

Hence, distributed databases must support a data model that avoids distributed transactions, if you want maximum performance. If anyhow possible, co-location should be used. This can be achieved, for example, using a tree schema (H-Store, ElasTraS), entity groups (Google Megastore) or table groups (Cloud SQL Server – Microsoft SQL Azure backend). Or, you denormalize (key/value respectively key/document).

How about Fabric? Well, there are global tables… I guess we have a lot to discover and explain to application developers :-). Guess what the topic of the workshop is that I am allowed to give at the PHP Summit (Berlin) in December: data models for distributed databases.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

PoC: Using a Group Communication System (Isis2) to improve MySQL Replication HA

2013/08/08 - by admin - 7 Comments

Modern NoSQL solutions make good, old MySQL Replication appear weak on High Availability (HA). Basically, MySQL users have three choices for MySQL Replication HA: give up on HA, believe that doubling single points of failures means HA, or go for a proper but complex solution. Albeit, as NoSQL world and competition proves, solid HA can be dead simple: embed a Group Communication System (GCS) into MySQL! No single point of failure and near zero client deployment is doable. In parts, the proposal surpassed Pacemaker/Corosync. Read on: story, slides, experimental code.

Free tip of the day: A Single Point of Failure cannot cure a SPOF

If on a sailing boat on the wide, wide ocean, and the captain is the only one who knows how to sail, would you feel safe? No? Me neither. The captain is a Single Point of Failure (SPOF). So is the master/primary in a MySQL Replication cluster.

Out-of-the-box MySQL Replication setup
Writes Reads
Master (SPOF) Slave (one of many)
  Slave
  Slave

If the boat was monitored by a helicopter, that will eventually fail or loose sight to the ship, would you feel significantly safer? No? Me neither. The helicopter is a single point of failure. So is MHA or mysqlfailover (now GA!) when used for monitoring of the master. Using a single monitor to trigger MySQL replication master failover means doubling the number of SPOFs!

MySQL Replication setup aiming at 99.9% HA
  Writes Reads
Monitor, e.g. mysqlfailover/MHA (SPOF) Master (SPOF) Slave (one of many)
    Slave
    Slave

Of course, I’m overding it. If the probability of one system to fail is 1:1000, then the probability of two systems failing at the same time is 1:10000 1000 * 1000 = 1000000, 1:1000000 (Thanks Ben for the hint!, Readers: see Baron’s comment below). That’s an improvement. Plus, the big achievement of both mysqlfailover and MHA is the automation of the actual master failover procedure. For this reason alone you shoud use either one.

A proper but complex no SPOF setup

A proper cluster solution such as Windows Failover Clustering or Pacemaker/Corosync/DRBD is way more secure (BTW, the whitepapers are worth checking). Pacemaker/Corosync eliminates all single points of failures. Additionally, it prevents transaction loss and speeds up the failover using Distributed Replicated Block Device to update a standby MySQL replication master server.

MySQL Replication 3rd party setup aiming at 99.99% HA
    Writes   Reads
Pacemaker (CRM) Corosync (CCM) Master (Standby) DRBD  
Pacemaker (CRM) Corosync (CCM) Master (Active) DRBD Slave (one of many)
Pacemaker (CRM) Corosync (CCM)     Slave
Pacemaker (CRM) Corosync (CCM)     Slave (one of many)

This is a true Unix style design. There are many, independent, small programs working together. At the core you find a Cluster Resource Manager (CRM) responsible for taking all kinds of actions, for example, database failover. And, at the core you find a seperate Cluster Communication Manager (CCM) that adds communication channels between all the nodes. A CCM is usually using reundant connections between the nodes. Given three nodes A, B, C connected to each other with A and B loosing their connection A->B, the CCM still allows A and B to communicate through C: A->C->B. The Pacemaker CRM and the Corosync CCM run on all the nodes. No single point of failure.

That is exactly what you want, but its way too complicated.

A typical failover procedure ignores the client badly!

MySQL HA solutions are all too often server-centric. Much emphasis is on the actual master failover procedure. There is little, if any, automation of client deployment. To avoid client deployment, the master is often assigned a virtual IP that is transferred as part of a failover.

Failed master New master
Virtual IP ->

But, master failover is not the only event clients need to be made aware of. For example, slave additions or failures. Telling a DBA to deploy clients manually is 1990s. This is not what keeps your product relevant in 2013. Sometimes a blown-up single monitor is proposed that clients can query to learn about changes to the cluster topology. Seriously: another SPOF, another communication channel but SQL?

Clients should continiously adapt themselves to cluster changes

The PHP mysqlnd drivers replication and load balacing plugin (PECL/mysqlnd_ms) proves the potential of a driver integrated proxy:

  • making complex clusters transparent to use: read-write splitting, load balancing, service levels (e.g. abstract on consistency). semi-automatic server selection for sharding and partitioning, …
  • scale by client – no bottlenecks due to centralized proxies
  • fail by client – failure does not affect many unlike with a central proxy such as MySQL Proxy

If the HA solution would only tell the driver about the clusters state, the driver could hide even more of a clusters complexity and improve ease of use. Imagine drivers got real-time server load information and would adapt their load balancing dynamically. Drivers could stop bombarding a badly lagging slave with requests to give it time to breath or lower their write rates.

The solution: Isis2, CRM/CCM moves into a MySQL plugin

Here comes the code for system that could offer proper HA without much added complexity! The idea is simple: use a Group Communication System (GCS) in a MySQL Server Plugin to connect all servers of a cluster with each other. Use the communication primitives of the GCS to exchange state information synchronously. State information is made of role (master, slave), status (on, off, standby), possibly current system load, or whatever else may be of interest.

At any time a GCS can give a list of its members. Thus, at any time, a client can ask for a list of all servers in the cluster. The GCS jointly decides on membership changes. It recognizes the addition of servers but also their failure. If a server fails, appropriate action can be taken. For example, a failover script can be run to elect a new master and reconfigure the MySQL Replication cluster.

Best HA today   Proposed PoC
No SPOF but complex   No SPOF, easy, best possible client support
Pacemaker (CRM) -v  
Corosync (CCM) -v  
MySQL | MySQL
  > MySQL Isis2 daemon plugin (CRM/CCM)
  > MySQL I_S cluster state information plugin
DRBD   (GTID logic)

Clients learn about changes in the cluster topology by querying INFORMATION_SCHEMA tables. If a client fails to connect to one server, it can pick any of the remaining and ask it for an update on the list of servers. Then, if need be, the client can reconfigure itself. No DBA action needed, zero administration within reach. As state information is exposed through I_S tables, clients do not have to learn from special, possibly centralized servers likely requiring extra communication protocols. There is no risk of management/monitoring servers getting slashdotted. And, updating client programs is easy as can be due to the use of plain SQL.

Just a few server plugins…

By embedding a GCS into a MySQL plugin the HA stack is greatly simplified. The job of Pacemaker and Corosync is done by the MySQL plugin. All you need for a MySQL HA setup could be delivered in one download package. All you need to add HA would be load some MySQL plugins… well, in theory. You get the point.

On the hunt for a C/C++ GCS

The biggest challenge in implementing such a system is the hunt for a free and Open Source Group Communication System that can be embedded in a C/C++ MySQL server plugin. Corosync has a client/deamon-server design that is not very tempting. Its brother, the Spread Toolkit, is somewhat limited to ~40 nodes. Neither solution has an appealing API. The rest is either old, is implemented in the wrong language (Java) or is out of question for me (Galera). I had almost given up searching… if only MySQL had a GCS.

Then came Isis2. Wrong language (C#) but what an API! The Isis2 Cloud Computing Library is the successor of a true classic, the Isis library. Experts will immediately associate the Virtual Synchrony Model (see also slide 34 and following) with Isis and recall the success stories. Isis2 combines Virtual Synchrony and Paxos ideas (see video). Paxos are commonly used in NoSQL solutions.

Most important: how easy Isis2 makes distributed cloud computing. A simple job like exchanging state information between MySQL servers becomes absolutely trivial!

One downside: C# to C/C++ language barrier

For my purposes, Isis2 has one big downside. It is written in C#/.NET. A MySQL C/C++ server plugin cannot use it directly. MySQL cannot become a direct member of an Isis2 group. Instead, one has to write an Isis2 client in C# which communicates with MySQL through a network socket. As we now have two independent processes, there is an additional heartbeat between MySQL and the Isis2 client in the PoC. This heartbeat would not be required if everything was in one process, which would be the case if one would use a pure C/C++ GCS.

Programming language Component
C/C++ MySQL
C/C++ MySQL daemon Isis2 heartbeat plugin
  |
C#/.NET (Mono) Isis2 client socket server

I’m happy to pay this price for the wonderful API that Isis2 offers. Expressed in extra lines of code the price for the PoC is probably less than 20% respectively 200 lines. Let the hacking begin…

The Isis2 client socket server

Disclaimer: all the code I show is entirely untested and partically known to be buggy or unstable. I stopped my fun tour through C#/.NET and MySQL plugin development after three days. This was the time it took me to realize “it could be done”, which is all I wanted for a PoC. That said, here is the code for the Isis2 client.


using System;
using System.Text;
using System.Net;
using System.Net.Sockets;
using System.Threading;

using Isis;
using System.Collections.Generic;
using System.Linq;

namespace IsisMySQLConnector
{

[AutoMarshalled]
class MySQLServer {
    public string isis_daemon;
    public string host;
    public string port;
    public string status;
    public string role;
    public int ttl;
    public string last_update;

    public MySQLServer() { }
    public MySQLServer(string daemon, string h, string p, string st, string ro, string up) {
        isis_daemon = daemon;
        host = h;
        port = p;
        status = st;
        role = ro;
        ttl = 30;
        last_update = up;
    }
}

delegate void RemoteRegisterMySQLServer(string daemon, string host, string port, string status, string role, string last_update);
delegate void RemoteUnregisterMySQLServer(string daemon, string host, string port);
delegate void RemoteExpireMySQLServer(string daemon, string host, string port);
delegate void RemoteHeartbeat(string daemon, string host, string port, string status, int ttl, string last_update);
delegate void RemoteViewChange(View v);
delegate void LoadMySQLServersCheckpoint(List<MySQLServer> mysqlServers);

class GroupConnector
{
    protected Isis.Timeout myTO = new Isis.Timeout(10000, Isis.Timeout.TO_FAILURE);
    protected EOLMarker myEOL = new EOLMarker();
    protected int UPDATE_MYSQL_REGISTER = 1;
    protected int UPDATE_MYSQL_UNREGISTER = 2;
    protected int UPDATE_MYSQL_HEARTBEAT = 3;
    protected int UPDATE_MYSQL_EXPIRE = 4;
    protected string isis_group;
    protected bool connected = false;
    protected Group myGroup;

    private Object thisLock = new Object();
    protected List<MySQLServer> mysqlServers = new List<MySQLServer>();

    internal const byte MYSQL_SERVER_TID = 123;

    public GroupConnector(string group) {
        isis_group = group;
        Thread.CurrentThread.Name = "GroupConnector Main " + isis_group;
        IsisSystem.Start();
        Msg.RegisterType(typeof(MySQLServer), MYSQL_SERVER_TID);
    }

    ~GroupConnector() {
        IsisSystem.Shutdown();
    }

    public void Connect() {
        if (connected) {
            throw new Exception("Already connected.");
        }

        myGroup = new Group(isis_group);
        myGroup.Handlers[UPDATE_MYSQL_REGISTER] += (RemoteRegisterMySQLServer)RemoteRegisterMySQLServer;
        myGroup.Handlers[UPDATE_MYSQL_UNREGISTER] += (RemoteUnregisterMySQLServer)RemoteUnregisterMySQLServer;
        myGroup.Handlers[UPDATE_MYSQL_HEARTBEAT] += (RemoteHeartbeat)RemoteHeartbeat;
        myGroup.Handlers[UPDATE_MYSQL_EXPIRE] += (RemoteExpireMySQLServer)RemoteExpireMySQLServer;
        myGroup.ViewHandlers += (Isis.ViewHandler)RemoteViewChange;

        myGroup.MakeChkpt += (Isis.ChkptMaker)delegate(View nv) {
            lock(thisLock) {
                if (mysqlServers.Count > 0) {
                    Console.WriteLine("Transferring state to joining member");
                    myGroup.SendChkpt(mysqlServers);
                }
                myGroup.EndOfChkpt();
            }
        };

        myGroup.LoadChkpt += (LoadMySQLServersCheckpoint)delegate(List<MySQLServer> initialMySQLServers) {
            lock(thisLock) {
                foreach (MySQLServer server in initialMySQLServers) {
                    mysqlServers.Add(server);
                    Console.WriteLine("[GroupConnector][state transfer] received: {0}-{1}-{2}-{3}-{4}-{5}",
                                      server.isis_daemon,
                                      server.host,
                                      server.port,
                                      server.role,
                                      server.status,
                                      server.ttl);
                }
            }
        };
        myGroup.Join();
        connected = true;
    }

    protected void RemoteViewChange(View v) {
        if (v.leavers.Length > 0) {
            Console.WriteLine("[GroupConnector][view change][{0}] Deamon leaves", Thread.CurrentThread.Name);
            Thread t = new Thread(delegate() {
                lock(thisLock) {
                    int idx;
                    foreach (Isis.Address leaver in v.leavers) {
                        while ((idx = mysqlServers.FindIndex(
                                          delegate(MySQLServer server)
                    {
                        return server.isis_daemon == leaver.ToString();
                        })) != -1) {
                            MySQLServer server = mysqlServers[idx];
                            Console.WriteLine("[GroupConnector][view change][{0}] removing server {1}-{2}-{3}-{4}-{5}-{6}", Thread.CurrentThread.Name,
                                              server.isis_daemon,
                                              server.host,
                                              server.port,
                                              server.role,
                                              server.status,
                                              server.ttl
                                             );
                            mysqlServers.RemoveAt(idx);
                        }
                    }

                }
            });
            t.Name = "GroupConnector ViewChange " + isis_group;
            t.Start();
        } else {
            Console.WriteLine("[GroupConnector][view change][{0}] Some view change (e.g. join), no action required", Thread.CurrentThread.Name);
        }
    }

    protected void RemoteRegisterMySQLServer(string daemon, string host, string port, string status, string role, string last_update) {
        Thread t = new Thread(delegate() {
            lock(thisLock) {
                int idx = mysqlServers.FindIndex(
                              delegate(MySQLServer server)
                {
                    return ((server.isis_daemon == daemon) && (server.host == host) && (server.port == port));
                });
                if (idx == -1) {
                    mysqlServers.Add(new MySQLServer(daemon, host, port, status, role, last_update));
                    Console.WriteLine("[GroupConnector][remote mysql register][{0}] Added {1}-{2}-{3}-{4}-{5}-{6}-30 New server count {7}",
                                      Thread.CurrentThread.Name,
                                      daemon,
                                      host,
                                      port,
                                      status,
                                      role,
                                      last_update,
                                      mysqlServers.Count);
                }
            }
        });
        t.Name = "RemoteRegisterMySQLServer " + isis_group;
        myGroup.SetReplyThread(t);
        t.Priority = ThreadPriority.AboveNormal;
        t.Start();
    }

    protected void RemoteUnregisterMySQLServer(string daemon, string host, string port) {
        Thread t = new Thread(delegate() {
            lock(thisLock) {
                int idx = mysqlServers.FindIndex(
                              delegate(MySQLServer server)
                {
                    return ((server.isis_daemon == daemon) && (server.host == host) && (server.port == port));
                });
                if (idx != -1) {
                    MySQLServer server = mysqlServers[idx];
                    Console.WriteLine("[GroupConnector][unregister][{0}] removing server {1}-{2}-{3}-{4}-{5}-{6}-{7}", Thread.CurrentThread.Name,
                                      server.isis_daemon,
                                      server.host,
                                      server.port,
                                      server.role,
                                      server.status,
                                      server.last_update,
                                      server.ttl
                                     );
                    mysqlServers.RemoveAt(idx);
                }
            }
        });
        t.Name = "RemoteUnregisterMySQLServer" + isis_group;
        myGroup.SetReplyThread(t);
        t.Priority = ThreadPriority.AboveNormal;
        t.Start();
    }

    protected void RemoteHeartbeat(string daemon, string host, string port, string status, int ttl, string last_update) {
        if (daemon == IsisSystem.GetMyAddress().ToString()) {
            Console.WriteLine("[GroupConnector][heartbeat][{0}] Thread.CurrentThread.Name Ignoring Hearbeat message to ourselves to avoid deadlock.",
                              Thread.CurrentThread.Name
                             );
            return;
        }
        Thread t = new Thread(delegate() {
            lock(thisLock) {
                int idx = mysqlServers.FindIndex(
                              delegate(MySQLServer server)
                {
                    return ((server.isis_daemon == daemon) && (server.host == host) && (server.port == port));
                });
                if (idx != -1) {
                    mysqlServers[idx].ttl = ttl;
                    mysqlServers[idx].status = status;
                    mysqlServers[idx].last_update = last_update;
                    MySQLServer server = mysqlServers[idx];
                    Console.WriteLine("[GroupConnector][heartbeat][{0}] heartbeat server {1}-{2}-{3}-{4}-{5}-{6}-{7}", Thread.CurrentThread.Name,
                                      server.isis_daemon,
                                      server.host,
                                      server.port,
                                      server.role,
                                      server.status,
                                      server.ttl,
                                      server.last_update
                                     );
                }
            }
        });
        t.Name = "RemoteHeartbeat" + isis_group;
        myGroup.SetReplyThread(t);
        t.Priority = ThreadPriority.AboveNormal;
        t.Start();
    }

    protected void RemoteExpireMySQLServer(string daemon, string host, string port) {
        Thread t = new Thread(delegate() {
            lock(thisLock) {
                int idx = mysqlServers.FindIndex(
                              delegate(MySQLServer server)
                {
                    return ((server.isis_daemon == daemon) && (server.host == host) && (server.port == port));
                });
                if (idx != -1) {
                    MySQLServer server = mysqlServers[idx];
                    Console.WriteLine("[GroupConnector][remote expire][{0}] removing server {1}-{2}-{3}-{4}-{5}-{6}-{7}", Thread.CurrentThread.Name,
                                      server.isis_daemon,
                                      server.host,
                                      server.port,
                                      server.role,
                                      server.status,
                                      server.last_update,
                                      server.ttl
                                     );
                    mysqlServers.RemoveAt(idx);
                }
            }
        });
        t.Name = "RemoteExpireMySQLServer" + isis_group;
        myGroup.SetReplyThread(t);
        t.Priority = ThreadPriority.AboveNormal;
        t.Start();
    }

    public void RegisterMySQLServer(string host, string port, string status, string role) {
        myGroup.SafeSend(UPDATE_MYSQL_REGISTER, IsisSystem.GetMyAddress().ToString(), host, port, status, role, DateTime.Now.ToString());
    }

    public void UnregisterMySQLServer(string host, string port) {
        myGroup.SafeSend(UPDATE_MYSQL_UNREGISTER, IsisSystem.GetMyAddress().ToString(), host, port);
    }

    public string GetMySQLServers() {
        string ServerList = "";
        lock(thisLock) {
            foreach (MySQLServer server in mysqlServers) {
                ServerList += server.isis_daemon + " ";
                ServerList += server.host + " ";
                ServerList += server.port + " ";
                ServerList += server.status + " ";
                ServerList += server.role + " ";
                ServerList += server.ttl + " ";
                ServerList += server.last_update + Environment.NewLine;
            }
        }
        return ServerList;
    }

    public void HeartbeatMySQLServer(string host, string port, string status) {
        lock(thisLock) {
            int idx = mysqlServers.FindIndex(
                          delegate(MySQLServer server)
            {
                return ((server.isis_daemon == IsisSystem.GetMyAddress().ToString()) && (server.host == host) && (server.port == port));
            });
            if (idx != -1) {
                /* We have created this one. Thus, we know how to interpret the time. Its our clock. */
                string last_update = DateTime.Now.ToString();
                Console.WriteLine("LastUpdate before heartbeat {0}", mysqlServers[idx].last_update);
                myGroup.SafeSend(UPDATE_MYSQL_HEARTBEAT, IsisSystem.GetMyAddress().ToString(), host, port, status, 30, last_update);
                mysqlServers[idx].ttl = 30;
                mysqlServers[idx].status = status;
                mysqlServers[idx].last_update = last_update;
                Console.WriteLine("LastUpdate after heartbeat {0}", mysqlServers[idx].last_update);
            }
        }
    }

    public void ExpireMySQLServer() {
        /* Note: we expire servers one by one, may take multiple calls to expire all our dead servers */
        string remove_host = "";
        string remove_port = "";
        lock(thisLock) {
            int idx =mysqlServers.FindIndex(
                         delegate(MySQLServer server)
            {
                return server.isis_daemon == IsisSystem.GetMyAddress().ToString();
            });
            if (idx != -1) {
                /* This is one of our servers for which we have set the last_update time. Thus, only we can
                 * expire it. The last_update time must be interpreted in relation to our local clock only.
                 * Other servers may have a different local time - clocks may not be synchronized. */
                MySQLServer server = mysqlServers[idx];
                DateTime lastUpdate = DateTime.Parse(server.last_update);
                TimeSpan diff = DateTime.Now - lastUpdate;
                TimeSpan maxTTL = new TimeSpan(0, 0, server.ttl);
                if (diff > maxTTL) {
                    remove_host = server.host;
                    remove_port = server.port;
                }
            }
        }
        if (remove_host != "") {
            myGroup.SafeSend(UPDATE_MYSQL_EXPIRE, IsisSystem.GetMyAddress().ToString(), remove_host, remove_port);
        }
    }

}

class GroupDaemon
{
    protected string[] mysqlStatus = {"on", "off", "starting", "stopping", "standby"};
    protected string[] mysqlRole = {"master", "slave", "standalone"};
    protected GroupConnector isisConnector;

    public GroupDaemon(GroupConnector c) {
        isisConnector = c;
    }

    public void ReceiveCallback(IAsyncResult AsyncCall)
    {
        byte[] questionBuffer = new byte[512];
        System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
        Socket listener = (Socket)AsyncCall.AsyncState;
        Socket client = listener.EndAccept(AsyncCall);
        int byteCount = client.Receive(questionBuffer, SocketFlags.None);
        if (byteCount > 0) {
            string question = encoding.GetString(questionBuffer);
            string answer = ParseCommands(question);
            
            Console.WriteLine("Q: {0} '{1}'", byteCount, question);
            Console.WriteLine("A: {0} '{1}'", answer.Length, answer);
            
            client.Send(encoding.GetBytes(answer));
        }
        client.Close();

        listener.BeginAccept(new AsyncCallback(ReceiveCallback), listener);
    }

    public string ParseCommands(string command) {

        if (command.IndexOf(Environment.NewLine) > 0) {
            command = command.Substring(0, command.IndexOf(Environment.NewLine) - 1);
        }

        string[] parts = command.Split(new Char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
        if (parts.Length < 1) {
            return "1 Invalid command";
        }

        for (int i = 0; i < parts.Length; i++) {
            parts[i] = parts[i].ToLower();
        }


        if (parts[0] == "join") {
            string syntax = "join <host> <port> <mysqlStatus> <mysqlRole>";

            if (parts.Length != 5) {
                return "2 Wrong number of parameters, syntax: " + syntax;
            }
            if (!IsListed(parts[3], mysqlStatus)) {
                return "3 Unknown mysqlStatus '" + parts[3] + "', syntax: " + syntax;
            }
            if (!IsListed(parts[4], mysqlRole)) {
                return "4 Unknown mysqlRole '" + parts[4] + "', syntax: " + syntax;
            }

            isisConnector.RegisterMySQLServer(parts[1], parts[2], parts[3], parts[4]);
            return "0 OK";
        } else if (parts[0] == "leave") {
            string syntax = "leave <host> <port>";

            if (parts.Length != 3) {
                return "2 Wrong number of parameters, syntax: " + syntax;
            }

            isisConnector.UnregisterMySQLServer(parts[1], parts[2]);
            return "0 OK";
        } else if (parts[0] == "heartbeat") {
            string syntax = "heartbeat <host> <port> <mysqlStatus>";

            if (parts.Length != 4) {
                return "2 Wrong number of parameters, syntax: " + syntax;
            }
            if (!IsListed(parts[3], mysqlStatus)) {
                return "3 Unknown mysqlStatus '" + parts[3] + "', syntax: " + syntax;
            }

            isisConnector.HeartbeatMySQLServer(parts[1], parts[2], parts[3]);
            return isisConnector.GetMySQLServers();
        } else if (parts[0] == "serverlist") {
            return "0 " + isisConnector.GetMySQLServers();
        }

        return "1 Invalid command";
    }

    protected bool IsListed(string what, string[] list) {
        foreach (string member in list) {
            if (member == what) {
                return true;
            }
        }
        return false;
    }

    public void StartListener(string host, int port)
    {
        IPAddress localAddress = IPAddress.Parse(host);
        Socket listenSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
        IPEndPoint ipEndpoint = new IPEndPoint(localAddress, port);
        listenSocket.Bind(ipEndpoint);
        listenSocket.Listen(100 /* queue length */);
        listenSocket.BeginAccept(new AsyncCallback(ReceiveCallback), listenSocket);
        Console.WriteLine("[GroupDaemon] Server is waiting on socket {0}", listenSocket.LocalEndPoint);
    }

    public void Sleep(int duration) {
        Thread.Sleep(duration);
    }
}

class IsisDaemonMain
{

    /* Endpoint to which MySQL shall talk  - IPv4 host to bind to and listen on */
    protected const string listener_ipv4 = "127.0.0.1";
    /* Endpoint to which MySQL shall talk - Port to bind to and listen on */
    protected const int listener_port = 2200;
    /* Default Isis group */
    protected const string isis_group = "mysql";

    public static void Main(string[] args)
    {
        string host = listener_ipv4;
        int port = listener_port;
        string group = isis_group;

        if (args.Length > 0) {
            if ((args[0] == "help") || (args[0] == "--help") || (args[0] == "-h")) {
                Console.WriteLine("Usage: <name.exe> host port group");
                return;
            }
            host = args[0];
        }
        if (args.Length > 1) {
            port = int.Parse(args[1]);
        }
        if (args.Length > 2) {
            group = args[2];
        }

        try
        {
            GroupConnector connector = new GroupConnector(group);
            Console.WriteLine("[IsisDaemonMain] Connecting to ISIS...");
            connector.Connect();

            Console.WriteLine("[IsisDaemonMain] Starting daemon for communication with MySQL.");
            GroupDaemon daemon = new GroupDaemon(connector);
            daemon.StartListener(host, port);
            Console.WriteLine("[IsisDaemonMain] Started. Listening to client requests.");
            while (true) {
                daemon.Sleep(2000);
                connector.ExpireMySQLServer();
            }
        }
        catch (Exception e)
        {
            Console.WriteLine("[IsisDaemonMain] Caught Exception: {0}", e.ToString());
        }
    }
}
}

At the core: a distributed Isis2 group

Let’s build the core of the proposed MySQL HA cluster solution. The core is made of a distributed Isis2 group. Assume we have three hosts A, B and C in our MySQL cluster. On each host we start an Isis2 client. All clients try to join a distributed group of a certain name. Once they have joined the group, they can communicate with each other. If that all sounds a bit to magic, get yourself the book on the subject of Isis2 (Ken Birman, Guide to Reliable Distributed Systems: Building High-Assurance Applications and Cloud-Hosted Services).

Members of a distributed Isis2 group
  Host A  
  mono ./isis2_daemon.exe  
Host B   Host C
mono ./isis2_daemon.exe   mono ./isis2_daemon.exe

Isis2 offers a rich variety of Send() commands to communicate within the group. Options range from low level unreliable, asynchronously delivered messages for gossip protocols to slow but reliable, totally ordered, virtually synchronous messages. The latter either reach all members of a group or none. As speed or group size is no major concern when exchanging nothing but a list of MySQL servers in a PoC, the PoC is using SafeSend() only. To an outside observer messages appear “synchronously” on all group members.

The Isis2 client connects to a virtual group

If you compile and run it (e.g. dmcs isis_deamon.cs Isis.cs ; mono ./isis_deamon.exe ), the Isis2 client tries to join a virtual, distributed group. For this it contacts the ORACLE and uses the Isis2 ORACLE Rendevous Service. No kidding, that’s how Isis2 calls it.


nixnutz@linux-dstv:~/src/isis_201245> rm isis_deamon.exe ; dmcs isis_deamon.cs Isis.cs ; mono ./isis_deamon.exe 
Isis: Searching for the Isis ORACLE...
[IsisDaemonMain] Connecting to ISIS...

If there are already other clients from other computers registered in the group a checkpoint is done to do a state transfer. The joining client learns the state from the other members. The state consists of a list of all MySQL servers that have registered themselves at their local Isis2 clients to become members.


[state transfer] received: (51971)-127.0.0.1-3400-master-on-30
[state transfer] received: (51971)-192.168.2.1-3306-slave-on-30

Isis2 abstracts away all the glory details from you: the networking, the group membership, the messaging associated with the state transfer and so fort. Those few lines make the core of the procedure.


 protected List<MySQLServer> mysqlServers = new List<MySQLServer>();
[...]
    public GroupConnector(string group) {
        isis_group = group;
        Thread.CurrentThread.Name = "GroupConnector Main " + isis_group;
        IsisSystem.Start();
        Msg.RegisterType(typeof(MySQLServer), MYSQL_SERVER_TID);
    }
[...]
    public void Connect() {
[...]     
        myGroup = new Group(isis_group);
        myGroup.Handlers[UPDATE_MYSQL_REGISTER] += (RemoteRegisterMySQLServer)RemoteRegisterMySQLServer;
        myGroup.Handlers[UPDATE_MYSQL_UNREGISTER] += (RemoteUnregisterMySQLServer)RemoteUnregisterMySQLServer;
        myGroup.Handlers[UPDATE_MYSQL_HEARTBEAT] += (RemoteHeartbeat)RemoteHeartbeat;
        myGroup.Handlers[UPDATE_MYSQL_EXPIRE] += (RemoteExpireMySQLServer)RemoteExpireMySQLServer;
        myGroup.ViewHandlers += (Isis.ViewHandler)RemoteViewChange;

        myGroup.MakeChkpt += (Isis.ChkptMaker)delegate(View nv) {
            lock(thisLock) {
                if (mysqlServers.Count > 0) {
                    Console.WriteLine("Transferring state to joining member");
                    myGroup.SendChkpt(mysqlServers);
                }
                myGroup.EndOfChkpt();
            }
        };

        myGroup.LoadChkpt += (LoadMySQLServersCheckpoint)delegate(List<MySQLServer> initialMySQLServers) {
            lock(thisLock) {
                foreach (MySQLServer server in initialMySQLServers) {
                    mysqlServers.Add(server);
                    Console.WriteLine("[GroupConnector][state transfer] received: {0}-{1}-{2}-{3}-{4}-{5}",
                                      server.isis_daemon,
                                      server.host,
                                      server.port,
                                      server.role,
                                      server.status,
                                      server.ttl);
                }
            }
        };
        myGroup.Join();
        connected = true;
    }
[...]

Did I already say, the Isis2 API is nice? Note this detail from the checkpointing/state transfer instructions. The Isis2 library sends a list of MySQLServer objects over the wire. As an Isis2 developer you do not give up on your standard OOP style…


protected List<MySQLServer> mysqlServers = new List<MySQLServer>();
[...]
myGroup.SendChkpt(mysqlServers);

The big downside of this beauty are the questionmarks behind a once discussed pure C++ port of Isis2. Developing a library that offers the same for C++ objects may be hard if not impossible.

Making an Isis2 client talk to MySQL

As soon as an Isis2 client has joined a distributed group, it starts a socket server to receive commands from the MySQL server associated with it. MySQL can then register itself in the group, announce its state and leave the group when shutdown.

Members of a distributed Isis2 group
  Host A  
  MySQL  
  MySQL Isis2d daemon plugin  
  mono ./isis2_daemon.exe  
Host B   Host C
MySQL   MySQL
MySQL Isis2d daemon plugin   MySQL Isis2d daemon plugin
mono ./isis2_daemon.exe   mono ./isis2_daemon.exe

As you can guess, it does not require much code in the .NET framework to start a socket server and listen to network requests. Nothing fancy: add code for parsing commands and you are done.


  [...]
  public void StartListener(string host, int port)
    {
        IPAddress localAddress = IPAddress.Parse(host);
        Socket listenSocket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
        IPEndPoint ipEndpoint = new IPEndPoint(localAddress, port);
        listenSocket.Bind(ipEndpoint);
        listenSocket.Listen(100 /* queue length */);
        listenSocket.BeginAccept(new AsyncCallback(ReceiveCallback), listenSocket);
        Console.WriteLine("[GroupDaemon] Server is waiting on socket {0}", listenSocket.LocalEndPoint);
    }
  [...]

You can now connect to the Isis2 client using telnet. Below is an example session with the join command that shall be used by MySQL to register itself in the distributed Isis2 group.


nixnutz@linux-dstv:~/src/isis_201245> telnet 127.0.0.1 2200 
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
join
2 Wrong number of parameters, syntax: join <host> <port> <mysqlStatus> <mysqlRole>
Connection closed by foreign host.
nixnutz@linux-dstv:~/src/isis_201245> telnet 127.0.0.1 2200 
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
join 127.0.0.1 3305 on master
0 OK
Connection closed by foreign host.
nixnutz@linux-dstv:~/src/isis_201245> telnet 127.0.0.1 2200 
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
serverlist
0 (62435) 127.0.0.1 3305 on master 30 08.08.2013 17:06:47
Connection closed by foreign host.

The Isis2 client that receives the join command, forwards it to everybody in the group using SafeSend(). The Isis2 manual discusses the messaging API in great detail.


   public void RegisterMySQLServer(string host, string port, string status, string role) {
        myGroup.SafeSend(UPDATE_MYSQL_REGISTER, IsisSystem.GetMyAddress().ToString(), host, port, status, role, DateTime.Now.ToString());
    }

All group members, including the sender, receive the message and add the MySQL server to their server list. The MySQL server is now known to everybody in the group. On whichever host you issue a serverlist command, the MySQL server is listed.

The MySQL server gets dropped from the list if it fails to send a heartbeat to its Isis2 client or sends a leave message. If it fails to send a heartbeat, MySQL might have crashes and appropriate action could be triggered. If the local Isis2 client dies, well, that’s a weak spot (self-set three days coding limit…). The PoC simplifies the case and assumes that the host crashed. All servers associated with the Isis2 client are dropped from the groups server list.

Please note, that all Isis2 members of an Isis2 group jointly decide whether a member has crashed or not. And, it takes more than a single missed heartbeat.

The MySQL plugins

Its nice to see the telnet sessions but it would be even better to have MySQL server plugins that send join, leave, heartbeat messages and parsed serverlist
into something more driver friendly. Nothing as easy as that: one MySQL server plugin for the membership and another one to export an INFORMATION_SCHEMA.ISIS2IS table.

A MySQL cluster that can report its state and take HA actions
  Host A  
  MySQL  
  MySQL Isis2d daemon plugin  
  MySQL Isis2is I_S plugin  
  mono ./isis2_daemon.exe  
Host B   Host C
MySQL   MySQL
MySQL Isis2d daemon plugin   MySQL Isis2d daemon plugin
MySQL Isis2is I_S plugin   MySQL Isis2is I_S plugin
mono ./isis2_daemon.exe   mono ./isis2_daemon.exe

Here you go with the plugin code. Again, its incomplete and even crashing code (UNINSTALL PLUGIN will crash). Its just good enough to make the point. There is not much to say about the plugin code that you cannot find discussed in greater depth in the MySQL manual. Its no more than a wrapper for the network commands shown before.


#include <my_global.h>
#include <sql_priv.h>
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
#include <mysql_version.h>
#include <mysql/plugin.h>
#include <my_dir.h>
#include "my_pthread.h"                     
#include "my_sys.h"                       
#include "m_string.h"            
#include "sql_plugin.h"
#include "sql_class.h"
#include "table.h"

bool schema_table_store_record(THD *thd, TABLE *table);
/*
  Disable __attribute__() on non-gcc compilers.
 */
#if !defined(__attribute__) && !defined(__GNUC__)
#define __attribute__(A)
#endif


#include <sstream>
#include <string>
#include <list>
#include <ctime>

#include <netdb.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>

#include <cstdlib>

#include "isis2.h"

using namespace std;

/* Common/shard utility code, defined at the very end of the file */
static int isis2_common_simple_command(std::string cmd);
static int isis2_common_query(std::string msg, std::string * reply);
static int isis2_common_reply_is_error(std::string * reply);

class Isis2_MySQL_Server
{
public:
  std::string myerrno;
  std::string myerror;
  std::string isis2_member;
  std::string mysql_host;
  std::string mysql_port_or_socket;
  std::string mysql_status;
  std::string mysql_role;
  std::string mysql_heartbeat_ttl;
  std::string mysql_heartbeat_last;

  Isis2_MySQL_Server(std::string p_errno, std::string p_error)
  {
    myerrno = p_errno;
    myerror= p_error;
  }

  Isis2_MySQL_Server(std::string p_isis2_member, std::string p_mysql_host,
                     std::string p_mysql_port_or_socket, std::string p_mysql_status,
                     std::string p_mysql_role, std::string p_mysql_heartbeat_ttl,
                     std::string p_mysql_heartbeat_last)
  {
    myerrno = "";
    myerror = "";
    isis2_member= p_isis2_member;
    mysql_host= p_mysql_host;
    mysql_port_or_socket= p_mysql_port_or_socket;
    mysql_status= p_mysql_status;
    mysql_role= p_mysql_role;
    mysql_heartbeat_ttl= p_mysql_heartbeat_ttl;
    mysql_heartbeat_last= p_mysql_heartbeat_last;
  }
} ;

static std::list<Isis2_MySQL_Server*> cached_serverlist;
static time_t cached_serverlist_update = 0;
static mysql_mutex_t LOCK_cached_serverlist;
static PSI_mutex_key key_LOCK_cached_serverlist;
static int isis2d_plugin_installed = 0;

/* isis2 I_S Information schema plugin (show deamon state) */

static ST_FIELD_INFO isis2is_table_fields[]={

  {"ERROR", 255, MYSQL_TYPE_STRING, 0, 0, 0, 0},
  {"ERRNO", 6, MYSQL_TYPE_LONG, 0, MY_I_S_UNSIGNED, 0, 0},
  {"ISIS2_MEMBER", 255, MYSQL_TYPE_STRING, 0, 0, 0, 0},
  {"MYSQL_HOST", 255, MYSQL_TYPE_STRING, 0, 0, 0, 0},
  {"MYSQL_PORT_OR_SOCKET", 255, MYSQL_TYPE_STRING, 0, 0, 0, 0},
  {"MYSQL_STATUS", 255, MYSQL_TYPE_STRING, 0, 0, 0, 0},
  {"MYSQL_ROLE", 255, MYSQL_TYPE_STRING, 0, 0, 0, 0},
  {"MYSQL_HEARTBEAT_TTL", 255, MYSQL_TYPE_STRING, 0, 0, 0, 0},
  {"MYSQL_HEARTBEAT_LAST", 255, MYSQL_TYPE_STRING, 0, 0, 0, 0},
  {0, 0, MYSQL_TYPE_NULL, 0, 0, 0, 0}
};

static int isis2is_fill_table(THD *thd, TABLE_LIST *tables, Item *cond)
{
  TABLE *table= tables->table;
  int myerrno = 0;
  char * line;
  std::string reply;
  std::string cmd = "serverlist";
  std::size_t found = std::string::npos;
  std::string line_copy;
  std::string p_isis2_member, p_mysql_host, p_mysql_port_or_socket, p_mysql_status,
          p_mysql_role, p_mysql_heartbeat_ttl, p_mysql_heartbeat_last;
  time_t now;
  std::list<Isis2_MySQL_Server*>::iterator it;

  fprintf(stderr, "Fill called\n");
  if (!isis2d_plugin_installed)
  {
    reply = "You must install the isis2d daemon plugin first.";
    myerrno = 1000;
    table->field[0]->store(reply.c_str(), reply.length(), system_charset_info);
    table->field[1]->store(myerrno);
    if (schema_table_store_record(thd, table))
      return 1;
    return 0;
  }
  time(&now);

  if ((cached_serverlist_update == 0) || (difftime(now, cached_serverlist_update) > 3))
  {
    time(&cached_serverlist_update);
    /* refresh server list every n seconds */
    mysql_mutex_lock(&LOCK_cached_serverlist);
    cached_serverlist.clear();
    if ((0 != isis2_common_query(cmd, &reply)) ||
        ((myerrno = isis2_common_reply_is_error(&reply)) > 0))
    {
      cached_serverlist.push_back(new Isis2_MySQL_Server(reply, "1000"));
    } else
    {
      line = strtok((char*) reply.c_str(), "\n");
      while (line != NULL)
      {
        line_copy = line;
        PARSE_ENTRY(line_copy, found, p_isis2_member);
        PARSE_ENTRY(line_copy, found, p_mysql_host);
        PARSE_ENTRY(line_copy, found, p_mysql_port_or_socket);
        PARSE_ENTRY(line_copy, found, p_mysql_status);
        PARSE_ENTRY(line_copy, found, p_mysql_role);
        PARSE_ENTRY(line_copy, found, p_mysql_heartbeat_ttl);
        cached_serverlist.push_back(new Isis2_MySQL_Server(
                                                           p_isis2_member, p_mysql_host,
                                                           p_mysql_port_or_socket, p_mysql_status,
                                                           p_mysql_role, p_mysql_heartbeat_ttl,
                                                           line_copy));
        line = strtok(NULL, "\n");
      }
      mysql_mutex_unlock(&LOCK_cached_serverlist);
    }
  }

  for (it = cached_serverlist.begin(); it != cached_serverlist.end(); ++it)
  {
    table->field[0]->store((*it)->myerror.c_str(), (*it)->myerror.length(), system_charset_info);
    table->field[1]->store((*it)->myerrno.c_str(), (*it)->myerrno.length(), system_charset_info);
    table->field[2]->store((*it)->isis2_member.c_str(), (*it)->isis2_member.length(), system_charset_info);
    table->field[3]->store((*it)->mysql_host.c_str(), (*it)->mysql_host.length(), system_charset_info);
    table->field[4]->store((*it)->mysql_port_or_socket.c_str(), (*it)->mysql_port_or_socket.length(), system_charset_info);
    table->field[5]->store((*it)->mysql_status.c_str(), (*it)->mysql_status.length(), system_charset_info);
    table->field[6]->store((*it)->mysql_role.c_str(), (*it)->mysql_role.length(), system_charset_info);
    table->field[7]->store((*it)->mysql_heartbeat_ttl.c_str(), (*it)->mysql_heartbeat_ttl.length(), system_charset_info);
    table->field[8]->store((*it)->mysql_heartbeat_last.c_str(), (*it)->mysql_heartbeat_last.length(), system_charset_info);
    if (schema_table_store_record(thd, table))
      return 1;
  }
  return 0;
}

static int isis2is_table_init(void *ptr)
{
  ST_SCHEMA_TABLE *schema_table= (ST_SCHEMA_TABLE*) ptr;

  schema_table->fields_info= isis2is_table_fields;
  schema_table->fill_table= isis2is_fill_table;
  return 0;
}

/* isis2 DAEMON plugin (heartbeat) */

static char * sysvar_isis2d_daemon_host;
static MYSQL_SYSVAR_STR(daemon_host,
                        sysvar_isis2d_daemon_host,
                        PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
                        "Isis2 connect daemon host. Can be set during startup only",
                        NULL,
                        NULL,
                        "127.0.0.1");

static unsigned int sysvar_isis2d_daemon_port;
static MYSQL_SYSVAR_UINT(daemon_port,
                         sysvar_isis2d_daemon_port,
                         PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
                         "Isis2 connect daemon port (1 to 65535). Can be set during startup only.",
                         NULL,
                         NULL,
                         2200,
                         0,
                         65535,
                         0);

static char * sysvar_isis2d_mysql_host;
static MYSQL_SYSVAR_STR(mysql_host,
                        sysvar_isis2d_mysql_host,
                        PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
                        "MySQL server host. Can be set during startup only.",
                        NULL,
                        NULL,
                        "127.0.0.1");

static unsigned int sysvar_isis2d_mysql_port;
static MYSQL_SYSVAR_UINT(mysql_port,
                         sysvar_isis2d_mysql_port,
                         PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
                         "MySQL port (1 to 65535). Can be set during startup only.",
                         NULL,
                         NULL,
                         3400,
                         0,
                         65535,
                         0);

struct st_mysql_sys_var * isis2d_sysvars[] ={
  MYSQL_SYSVAR(daemon_host),
  MYSQL_SYSVAR(daemon_port),
  MYSQL_SYSVAR(mysql_host),
  MYSQL_SYSVAR(mysql_port),
  NULL
};

struct isis2d_heartbeat_context
{
  pthread_t heartbeat_thread;
} ;

pthread_handler_t isis2d_heartbeat(void *p)
{
  DBUG_ENTER("isis2d_heartbeat");
  std::stringstream isis_cmd;
  isis_cmd << "heartbeat " << sysvar_isis2d_mysql_host << " " << sysvar_isis2d_mysql_port << " on";

  while (1)
  {
    sleep(10);
    isis2_common_simple_command(isis_cmd.str());
  }
  DBUG_RETURN(0);
}

static int isis2d_plugin_init(void *p)
{
  DBUG_ENTER("isis2d_plugin_init");
  std::stringstream isis_cmd;
  pthread_attr_t attr;
  struct st_plugin_int *plugin = (struct st_plugin_int *) p;
  struct isis2d_heartbeat_context *con;
  con = (struct isis2d_heartbeat_context *)
          my_malloc(sizeof (struct isis2d_heartbeat_context) , MYF(0));

  /* Register MySQL server */
  isis_cmd << "join " << sysvar_isis2d_mysql_host << " " << sysvar_isis2d_mysql_port << " on master";
  isis2_common_simple_command(isis_cmd.str());

  mysql_mutex_init(key_LOCK_cached_serverlist, &LOCK_cached_serverlist, MY_MUTEX_INIT_FAST);
  isis2d_plugin_installed = 1;

  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
  if (pthread_create(&con->heartbeat_thread, &attr, isis2d_heartbeat,
                     (void *) con) != 0)
  {
    fprintf(stderr, "Could not create heartbeat thread!\n");
    exit(0);
  }
  plugin->data = (void *) con;

  DBUG_RETURN(0);
}

static int isis2d_plugin_deinit(void *p)
{
  DBUG_ENTER("isis2d_plugin_deinit");
  struct st_plugin_int *plugin = (struct st_plugin_int *) p;
  struct isis2d_heartbeat_context *con =
          (struct isis2d_heartbeat_context *) plugin->data;
  void *dummy_retval;
  std::stringstream isis_cmd;

  mysql_mutex_destroy(&LOCK_cached_serverlist);

  pthread_cancel(con->heartbeat_thread);
  pthread_join(con->heartbeat_thread, &dummy_retval);

  isis_cmd << "leave " << sysvar_isis2d_mysql_host << " " << sysvar_isis2d_mysql_port;
  isis2_common_simple_command(isis_cmd.str());

  DBUG_RETURN(0);
}

static struct st_mysql_daemon isis2d_plugin ={MYSQL_DAEMON_INTERFACE_VERSION};
static struct st_mysql_information_schema isis2is_plugin ={ MYSQL_INFORMATION_SCHEMA_INTERFACE_VERSION };

mysql_declare_plugin(isis2)
{
  MYSQL_DAEMON_PLUGIN,
          &isis2d_plugin,
          "isis2d",
          "Ulf Wendel",
          "MySQL Isis2 Cloud Library Connector Daemon",
          PLUGIN_LICENSE_GPL,
          isis2d_plugin_init,
          isis2d_plugin_deinit,
          0x0100 ,
          NULL,
          isis2d_sysvars,
          NULL,
          0,
},
{
  MYSQL_INFORMATION_SCHEMA_PLUGIN,
          &isis2is_plugin,
          "isis2is",
          "Ulf Wendel",
          "MySQL Isis2 Cloud Library Connector I_S",
          PLUGIN_LICENSE_GPL,
          isis2is_table_init,
          NULL,
          0x0100,
          NULL,
          NULL,
          NULL,
          0
}

mysql_declare_plugin_end;

/* Common/shared utility functions*/
static int isis2_common_simple_command(std::string cmd)
{
  int myerrno;
  std::string reply;
  if ((0 != isis2_common_query(cmd, &reply)) ||
      ((myerrno = isis2_common_reply_is_error(&reply)) > 0))
  {
    std::stringstream msg;
    msg << cmd << " failed: '" << reply << "' (" << myerrno << ")";
    fprintf(stderr, msg.str().c_str());
    return myerrno;
  } else
  {
    fprintf(stderr, cmd.c_str());
    fprintf(stderr, "\n");
    return 0;
  }
}

static int isis2_common_query(std::string msg, std::string * reply)
{
  struct sockaddr_in address;
  struct in_addr inadr;
  struct hostent *host;
  int sockfd, n;
  char buf[51];

  if (inet_aton(sysvar_isis2d_daemon_host, &inadr))
  {
    host = gethostbyaddr((char*) &inadr, sizeof (inadr), AF_INET);
  } else
  {
    host = gethostbyname(sysvar_isis2d_daemon_host);
  }
  if (host == NULL)
  {
    fprintf(stderr, "Failed to resolve Isis2 daemon host name\n");
    return -1;
  }
  address.sin_family = AF_INET;
  address.sin_port = htons(sysvar_isis2d_daemon_port);
  memcpy(&address.sin_addr, host->h_addr_list[0], sizeof (address.sin_addr));

  if ((sockfd = socket(PF_INET, SOCK_STREAM, 0)) < 0)
  {
    fprintf(stderr, "socket() failed\n");
    return -1;
  }
  if (connect(sockfd, (struct sockaddr*) &address, sizeof (address) ))
  {
    fprintf(stderr, "connect() failed\n");
    return -1;
  }
  msg.append("\r\n");
  if ((unsigned int) write(sockfd, msg.c_str(), msg.length()) != msg.length())
  {
    fprintf(stderr, "write() failed\n");
    return -1;
  }
  while ((n = read(sockfd, buf, sizeof (buf))) > 0)
  {
    for (int i = 0; i < n; i++)
    {
      reply->push_back(buf[i]);
    }
  }
  close(sockfd);
  return 0;
}

static int isis2_common_reply_is_error(std::string * reply)
{
  int isis_errno = -1;
  std::size_t found = reply->find_first_of(" ");
  if (found != std::string::npos)
  {
    isis_errno = atoi(reply->substr(0, found).c_str());
    *reply = reply->substr(found + 1);
  }
  return isis_errno;
}


Loading the Plugins into MySQL

Once you load the plugins into MySQL, the MySQL server will try to register MySQL at its local Isis2 client using join. Every now and then, it sends a heartbeat. Upon server shutdown or plugin uninstallation, it sends leave.

At any time, on any MySQL server that has the plugins installed you can query the I_S for a list of all MySQL servers in the cluster.


mysql> SELECT * FROM INFORMATION_SCHEMA.ISIS2IS\G
*************************** 1. row ***************************
               ERROR: 
               ERRNO: 0
        ISIS2_MEMBER: (50249)
          MYSQL_HOST: 127.0.0.1
MYSQL_PORT_OR_SOCKET: 3400
        MYSQL_STATUS: on
          MYSQL_ROLE: master
 MYSQL_HEARTBEAT_TTL: 30
MYSQL_HEARTBEAT_LAST: 07.08.2013 13:50:22
1 row in set (0,01 sec)

A fictional user manual for the DBA

Here’s the user manual for the HA solution proposed. For every MySQL server do:

  1. Start Isis2 client: mono ./isis_deamon.exe
  2. Configure Connector Plugins, e.g. Isis2 client address
  3. Heartbeat to Isis2: INSTALL PLUGIN isis2d SONAME ‘libisis2.so’
  4. I_S Plugin: INSTALL PLUGIN isis2is SONAME ‘libisis2.so’
  5. Teach your clients to monitor INFORMATION_SCHEMA

Summary

Needless to say, I’m only scratching the surface. Nowhere in my code is a failover script being run or a MySQL server is reconfigured. I named some possible hooks for such actions. After years and years of extensive work by the MySQL community on how exactly to do the server reconfiguration, this was out of interest to me. The server side is mostly solved. Just think of the introduction of GTIDs in MySQL 5.6.

Whether you got for a HA setup with or without a single point of failure is a question of demand. For some, it may be perfectly tolerable. For those requiring five nines – check out MySQL Cluster NDB 7.3. Still, if a SPOF can be avoided, avoid it. A GCS embedded into plugin approach cannot cure the fact that a MySQL master is a SPOF but it can fix the monitoring SPOF. And, if it was combined with DRBD a standby master could be used – if, … if you really want it.

  Host A  
  MySQL Master (SPOF)  
  MySQL Isis2d daemon plugin (dump Isis2 connector)  
  MySQL Isis2is I_S plugin (dump Isis2 connector)  
  mono ./isis2_daemon.exe (many of them)  
Host B   Host C
MySQL Slave (many of them)   MySQL Slave
MySQL Isis2d daemon plugin   MySQL Isis2d daemon plugin
MySQL Isis2is I_S plugin   MySQL Isis2is I_S plugin
mono ./isis2_daemon.exe   mono ./isis2_daemon.exe

As a Connectors guy, I’m not happy with the current status. All to often I have to reply “manual client deployment” when someone asks me about growing or shrinking a MySQL Replication cluster. Even for failover virtual IP transfer may not always do the trick. Then again its “manual client deployment”.

Deployment is annoying and time consuming. People go for centralized proxies as this means less to deploy. Centralized proxies can become bottlenecks, single points of failures and so forth. If only clients could learn about changes in the cluster topology, then continiously self-configuring client-side load balancers became possible. However, I have the impressing that todays HA solutions care little about the client side. Weird: a server would be pointless without drivers and clients. If you keep clients low, you keep the ease-of-use low at the end of the day.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

Connecting MySQL and the Dojo Toolkit: MySQL data store

2013/06/24 - by admin - 0 comments

Based on my own limited experience with the Dojo Toolkit, you either love or hate it. First, I disliked the steep learning curve. After a few days, I began to enjoy it, for example, because of the Dijit interface vodoo: spreadsheet (data grid), auto-completion combobox, effects & animation… Many of these GUI elements can be connected to a dojo store. A store is responsible for accessing and manipulating data.

Spreadsheet

Needless to say how easy it is to populate a sortable, … data grid using a store. Based on the proxy sketched in the previous post, I’ve prototyped two MySQL stores for dojo. One maps to a MySQL table, the other one maps to JSON stored in a BLOB to make MySQL look a bit like a document database.

var store = new WhateverStore(/*...*/)
var ostore = new ObjectStore({ objectStore: store });
var grid = new DataGrid({
  store: ostore,
  query: {},
  queryOptions: {},
  structure: [
    { name: "First Name", field: "first_name", width: "25%" },
    { name: "Last Name", field: "last_name", width: "25%" },
    { name: "Mail", field: "email", width: "50%" }
  ]
  },
  "gridDiv"
);
grid.startup();

Dojo stores

Dojo 1.9 stores are somewhat similar to HTML5/W3C’s IndexedDB object stores. The dojo store API lets you store and fetch objects in a key value style. Unlike with a plain key-value store, the data model is not limited to a one dimensional list/array structure. Instead, a parent-child hierarchy can be established between values. Searching is not limited to key lookup. A dojo store can support sorting and complex filtering. There is even the concept of a transaction.

The two MySQL stores examples implement no more than the most basic functionality. Both ignore transactions and hierarchical storage. Transactions would not be difficult to support. The nested set model could be used for mapping trees to flat relational tables. Associating arbitrary meta data falls into the same category: it could be implemented.

Dojo store API JsonpMySQLFields JsonpMySQL (BLOB)
get(id) yes yes
query(query, options)

  • searching/filtering
  • sorting
  • offsets, e.g. for pagination of results
 

  • search: yes, key only
  • sorting: yes
  • offsets: yes
 

  • search: yes, key only
  • sorting: no
  • offsets: no
put(object, options) yes, no hierarchical structure yes, no hierarchical structure
add(object, options) yes yes
remove(id) yes yes
getIdentity(object) yes yes
queryEngine(query, options) yes, default yes, default
transaction() not implemented not implemented
getChildren(object, options) not implemented not implemented
getMetadata(object) not implemented not implemented

JSON object in BLOB: sure, but…

A MySQL dojo store that maps a table (JsonpMySQLFields) could support all of the dojo store API but all objects would have to have the same structure. The object must map to the table schema. When storing JSON serialized objects in a BLOB column (JsonpMySQL), there is no schema to observe. However, searching and sorting stored objects efficiently becomes almost impossible. MySQL lacks SQL functions to query a JSON object/document stored in a BLOB. Serializing an object to XML, for which MySQL has XML functions, was out of scope. It could be worth trying, some hints are here.

JsonpMySQLFields: mapped table approach

The complete source code of the JsonpMySQLFields dojo data store is below. The store prototype assumes the use of the PHP proxy script from the previous post. The PHP proxy script takes arbitrary SQL statements as a GET parameter, executes the SQL and returns the result as a JSON document. As noted in the last post you should ask yourself twice whether the necessary proxy should allow exeution of arbitrary SQL or a proper REST API would be more appropriate. Whatever your take is, the direct SQL approach is nice for prototyping: implementing a dojo store boils down to dynamically generating some SQL statements and executing them through a proxy script. The details have been described already in the last posting.

define("dojo/store/JsonpMySQLFields", ["../number", "../_base/array", "../string", "../request/script", "../when", "../_base/xhr", "../_base/lang", "../json", "../_base/declare", "./util/QueryResults" /*=====, "./api/Store" =====*/
], function(number, array, string, script, when, xhr, lang, JSON, declare, QueryResults /*=====, Store =====*/){

// No base class, but for purposes of documentation, the base class is dojo/store/api/Store
var base = null;
/*===== base = Store; =====*/

/*=====
var __HeaderOptions = {
		// headers: Object?
		//		Additional headers to send along with the request.
	},
	__PutDirectives = declare(Store.PutDirectives, __HeaderOptions),
	__QueryOptions = declare(Store.QueryOptions, __HeaderOptions);
=====*/

return declare("dojo.store.JsonpMySQLFields", base, {
	// summary:
	//		This is a basic store for RESTful communicating with a server through JSON
	//		formatted data. It implements dojo/store/api/Store.

	constructor: function(options){
		// summary:
		//		This is a basic store for RESTful communicating with a server through JSON
		//		formatted data.
		// options: dojo/store/JsonRest
		//		This provides any configuration information that will be mixed into the store
		declare.safeMixin(this, options);
	},

	method: "http",
	host: "127.0.0.1",
	port: 8080,
	interface: "sql",
	basicAuthUser: "root",
	basicAuthPassword: "secret",
	mysqlTable: "dojo_jsonp_fields",
	mysqlFields: [],
	mysqlIdProperty: "dojo_id",

	// idProperty: String
	//		Indicates the property to use as the identity property. The values of this
	//		property should be unique.
	idProperty: "id",

	// sortParam: String
	//		The query parameter to used for holding sort information. If this is omitted, than
	//		the sort information is included in a functional query token to avoid colliding
	//		with the set of name/value pairs.


	get: function(oid){
		// summary:
		//		Retrieves an object by its identity. This will trigger a GET request to the server using
		//		the url `this.target + id`.
		// id: Number
		//		The identity to use to lookup the object
		// returns: Promise
		//		The object in the store that matches the given id.

		// SQL INJECTION
		var sql = "SELECT " + this.mysqlIdProperty + "," + this.mysqlFields.toString();
		sql += " FROM " + this.mysqlTable + " WHERE dojo_id = " + number.parse(oid);
		return when(
				script.get(
					this._getAddress(sql),
					{jsonp: "jsonp"}
				).then(lang.hitch(this, this._extractFirstRow))
			);
	},

	getIdentity: function(object){
		// summary:
		//		Returns an object's identity
		// object: Object
		//		The object to get the identity from
		// returns: Number
		return object[this.idProperty];
	},

	put: function(object, options){
		// summary:
		//		Stores an object. This will trigger a PUT request to the server
		//		if the object has an id, otherwise it will trigger a POST request.
		// object: Object
		//		The object to store.
		// options: __PutDirectives?
		//		Additional metadata for storing the data.  Includes an "id"
		//		property if a specific id is to be used.
		// returns: dojo/_base/Deferred

		// Store.PutDirectives = declare(null, {
		// summary:
		//		Directives passed to put() and add() handlers for guiding the update and
		//		creation of stored objects.
		// id: String|Number?
		//		Indicates the identity of the object if a new object is created
		// before: Object?
		//		If the collection of objects in the store has a natural ordering,
		//		this indicates that the created or updated object should be placed before the
		//		object specified by the value of this property. A value of null indicates that the
		//		object should be last.
		// parent: Object?,
		//		If the store is hierarchical (with single parenting) this property indicates the
		//		new parent of the created or updated object.
		// overwrite: Boolean?
		//		If this is provided as a boolean it indicates that the object should or should not
		//		overwrite an existing object. A value of true indicates that a new object
		//		should not be created, the operation should update an existing object. A
		//		value of false indicates that an existing object should not be updated, a new
		//		object should be created (which is the same as an add() operation). When
		//		this property is not provided, either an update or creation is acceptable.
		options = options || {};

		var sql = "";
		var values = "";
		var id = ("id" in options) ? options.id : this.getIdentity(object);
		var hasId = typeof id != "undefined";

		if (("overwrite" in options) && options["overwrite"]) {
			if (!hasId) {
				throw "You must provide the id of the object to update";
			}

			array.forEach(this.mysqlFields, lang.hitch(this, function (field) {
				if (field in object) {
					values += field + "=";
					values += "'" + this._escapeString(object[field]) + "', ";
				}
			}));
			if (values.length == 0) {
				throw "Object has no known property for SQL column mapping";
			}

			sql = "UPDATE " + this.mysqlTable + " SET " + values + " version = version + 1";
			sql += " WHERE " + this.mysqlIdProperty + "= " + number.parse(id);

		} else {
			var fields = "";

			if (hasId) {
				fields += this.mysqlIdProperty + ", ";
				values += number.parse(id) + ", "
			}

			array.forEach(this.mysqlFields, lang.hitch(this, function (field) {
				if (field in object) {
					fields += field + ", ";
					values += "'" + this._escapeString(object[field]) + "', "
				}
			}));
			if (fields.length == 0) {
				throw "Object has no known property for SQL column mapping";
			}

			sql = "INSERT INTO " + this.mysqlTable + "(";
			sql += fields.substring(0, fields.length - 2);
			sql += ") VALUES (" + values.substring(0, values.length - 2) + ")";
		}
		return when(
			script.get(
					this._getAddress(sql),
					{jsonp: "jsonp" }
				).then(
					function (reply) {
						if (reply && "last_insert_id" in reply)
							return reply.last_insert_id;

						return reply;
					}
				)
		       );
	},

	add: function(object, options){
		// summary:
		//		Adds an object. This will trigger a PUT request to the server
		//		if the object has an id, otherwise it will trigger a POST request.
		// object: Object
		//		The object to store.
		// options: __PutDirectives?
		//		Additional metadata for storing the data.  Includes an "id"
		//		property if a specific id is to be used.
		options = options || {};
		options.overwrite = false;
		return this.put(object, options);
	},

	remove: function(id, options){
		// summary:
		//		Deletes an object by its identity. This will trigger a DELETE request to the server.
		// id: Number
		//		The identity to use to delete the object
		// options: __HeaderOptions?
		//		HTTP headers.
		options = options || {};
		var sql = "DELETE FROM " + this.mysqlTable + " WHERE " + this.mysqlIdProperty + "=" + number.parse(id);
		return when(
			script.get(
					this._getAddress(sql),
					{jsonp: "jsonp" }
				).then(
					function (reply) {
						if (reply && "errno" in reply) {
							return reply;
						}
						return;
					}
				)
		       );
	},

	query: function(query, options){
		// summary:
		//		Queries the store for objects. This will trigger a GET request to the server, with the
		//		query added as a query string.
		// query: Object
		//		The query to use for retrieving objects from the store.
		// options: __QueryOptions?
		//		The optional arguments to apply to the resultset.
		// returns: dojo/store/api/Store.QueryResults
		//		The results of the query, extended with iterative methods.

		options = options || {};
		var sql = "SELECT " + this.mysqlFields.toString() + ", " + this.mysqlIdProperty;
		sql += " FROM " + this.mysqlTable;

		if (options) {
			if (options.sort && options.sort.length) {
				var order_by = "";
				for (var i = 0; i< options.sort.length; i++) {
					if (order_by.length)
						order_by += ", ";
					var sort = options.sort[i];
					order_by += sort.attribute;
					order_by += (sort.descending) ? " DESC" : " ASC";
				}
				if (order_by.length)
					sql += " ORDER BY " + order_by;
			}

			if (options.start >= 0 || options.cout >= 0) {
				var limit = "";
				if (options.start)
					limit += options.start;
				if ("count" in options && options.count != Infinity) {
					if (limit.length)
						limit += ", ";
					limit += options.count;
				}
				if (limit.length)
					sql += " LIMIT " + limit;
			}
		}
		var results = when(
				script.get(
					this._getAddress(sql),
					{jsonp: "jsonp" }
				).then(lang.hitch(this, this._extractAllRows))
			);
		return QueryResults(results);
	},

	_getAddress : function(query) {
		return this.method + "://" + this.basicAuthUser + ":" + this.basicAuthPassword + "@" +
			this.host + ":" + this.port + "/" + this.interface + encodeURIComponent(query);
	},

	_extractRows : function(result, limit) {
		var data_only = new Array();
		var result_idx, row_idx;
		var object;

		if (result && "errno" in result) {
			data_only.push(result);
			return data_only;
		}

		for (result_idx = 0; result_idx < result.length; result_idx++) {
			if ("errno" in result[result_idx]) {
				data_only.push(result[result_idx]);
			} else {
				for (row_idx = 0; row_idx < result[result_idx].data.length; row_idx++) {
					if ((limit > 0) && (row_idx >= limit)) {
						return data_only;
					}
					tmp = new Object;
					array.forEach(result[result_idx].data[row_idx], function (value, column_idx) {
						tmp[result[result_idx].meta[column_idx].column] = value;
					});
					data_only.push(tmp);
				}
			}
		}
		return data_only;
	},

	_extractAllRows: function (result) {
		return this._extractRows(result, -1);
	},

	_extractFirstRow: function (result) {
		return this._extractRows(result, 1);
	},

	_escapeString: function (sql_value) {
		sql_value = sql_value.toString();
		return sql_value.replace('/"/g', '\\"');
	}
});

});

The store invokes the proxy script using dojo.script. The "phrase" for doing so contains a dojo feature that has not been discussed before: hitching. The snippet below performs the asynchronous HTTP proxy call and then calls the function _extractAllRows() to convert the result into the format expected by the dojo store API.

/* ... */
script.get(
  this._getAddress(sql),
  {jsonp: "jsonp" }
).then(lang.hitch(this, this._extractAllRows))
/* ... */
_extractAllRows: function (result) {
  return this._extractRows(result, -1);
},


Due to the asynchronous execution model, _extractAllRows() may execute in a different context than it has been defined. Thus, this may point to a different object and _extractAllRows() may fail, for example, because this._extractRows is undefined in the execution context. hitch() solves the problem and forces the function to retain its original context.

The remainder of the sample store should bare no surprises. Note the incomplete escapeString() method – it is not production ready code. It may not even be clean or beautiful code but should help you getting started.

Putting things together: a data grid demo

As you are digging deeper into dojo, it is handy to have a copy of dojo on your local system instead of using a CDN service. For a data grid HTML demo, use cut&paste to copy the example JsonpMySQLFields store into a file JsonpMySQLFields.js and save it in /path/to/your_dojo_installation/dojo/store. Then, create a HTML document and load the necessary dojo modules using require.

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <link rel="stylesheet" href="/home/nixnutz/Downloads/dojo-release-1.9.0/dijit/themes/claro/claro.css">
  <style type="text/css">
   @import "/home/nixnutz/Downloads/dojo-release-1.9.0/dojox/grid/resources/Grid.css";
   @import "/home/nixnutz/Downloads/dojo-release-1.9.0/dojox/grid/resources/claroGrid.css";

   #gridDiv {
     height: 10em;
   }
  </style>
</head>
<body class="claro">
  <script src="/home/nixnutz/Downloads/dojo-release-1.9.0/dojo/dojo.js"
               data-dojo-config="async: true"></script>
 <script>
  require(
    ["dojo/dom-construct", "dojo/parser", "dijit/Dialog", "dojox/grid/DataGrid", "dojo/data/ObjectStore", "dijit/form/Select", "dojo/store/Memory", "dojo/dom", "dojo/on",
    "dojo/when", "dojo/request/script", "dojo/request/notify", "dojo/json",
    "dojo/store/JsonpMySQLFields", "dojo/domReady!"],
    function(domConstruct, parser, Dialog, DataGrid, 
      ObjectStore, Select, Memory, dom, on, when, 
      script, notify, JSON, mysqlp) {
 [...]


Do not forget to load the CSS required by the Dijit data grid.

Putting things together is way more complicated than populating a data grid with the contents of a MySQL table. Displaying a MySQL table as a sortable spreadsheet boils down to the following lines of code. First, create an object of the type JsonpMySQLFields with new mysqlp (hint: the require() – see above – is responsible for the name "mysqlp"). Give information to the object where to find the proxy script and which SQL table and SQL columns to use for the mapping. Convert the store into an object store and pass it to the DataGrid constructor. Tell the grid which div to render to and call startup().

    var store = new mysqlp({
            method: "http",
            host: "127.0.0.1",
            port: 8080,
            interface: "index.php?sql=",
            basicAuthUser: "root",
            basicAuthPassword: "secret",
            mysqlTable: "dojo_jsonp_fields",
            mysqlFields: ["first_name", "last_name", "email"]
         });
        var ostore = new ObjectStore({ objectStore: store });
        var grid = new DataGrid(
          {
            store: ostore,
            query: {},
            queryOptions: {},
            structure: [
              { name: "First Name", field: "first_name", width: "25%" },
              { name: "Last Name", field: "last_name", width: "25%" },
              { name: "Mail", field: "email", width: "50%" }
            ]
          }, "gridDiv");
        grid.startup();

This is the SQL table used with the above:

CREATE TABLE `dojo_jsonp_fields` (
  `dojo_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(255) DEFAULT '',
  `last_name` varchar(255) DEFAULT '',
  `email` varchar(255) DEFAULT '',
  `modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `version` int(10) unsigned DEFAULT '1',
  PRIMARY KEY (`dojo_id`)
)

Add a few bells and whistles (buttons to add, update, remove records) and you get the complete HTML document used for the screenshot on top. Please, understand the reason for my brevity with regards to a general dojo introduction. Neither am I a dojo expert nor would it be much related to MySQL which is at the core of my interest.

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <link rel="stylesheet" href="/home/nixnutz/Downloads/dojo-release-1.9.0/dijit/themes/claro/claro.css">
  <style type="text/css">
   @import "/home/nixnutz/Downloads/dojo-release-1.9.0/dojox/grid/resources/Grid.css";
   @import "/home/nixnutz/Downloads/dojo-release-1.9.0/dojox/grid/resources/claroGrid.css";

   #gridDiv {
     height: 10em;
   }
  </style>
</head>
<body class="claro">
  <script src="/home/nixnutz/Downloads/dojo-release-1.9.0/dojo/dojo.js"
               data-dojo-config="async: true"></script>
  <script>
  require(
    ["dojo/dom-construct", "dojo/parser", "dijit/Dialog", "dojox/grid/DataGrid", "dojo/data/ObjectStore", "dijit/form/Select", "dojo/store/Memory", "dojo/dom", "dojo/on",
    "dojo/when", "dojo/request/script", "dojo/request/notify", "dojo/json",
    "dojo/store/JsonpMySQLFields", "dojo/domReady!"],
    function(domConstruct, parser, Dialog, DataGrid, ObjectStore, Select, Memory, dom, on, when, script, notify, JSON, mysqlp) {
        /* Dojo newbees - consider this the main() function */

       function debugMsg(action, msg) {
          domConstruct.place(
            "<div>[click " + clicks + "][" + action + "] " + msg + "</div>",
            debugDiv,
            "after"
          );
        }

        var debugDiv = dom.byId("debugDiv");
        var formDiv = dom.byId("formDiv");
        var clicks = 0;
        var addedIds = new Array();

        var store = new mysqlp({
            method: "http",
            host: "127.0.0.1",
            port: 8080,
            interface: "index.php?sql=",
            basicAuthUser: "root",
            basicAuthPassword: "secret",
            mysqlTable: "dojo_jsonp_fields",
            mysqlFields: ["first_name", "last_name", "email"]
         });
        var ostore = new ObjectStore({ objectStore: store });
        var grid = new DataGrid(
          {
            store: ostore,
            query: {},
            queryOptions: {},
            structure: [
              { name: "First Name", field: "first_name", width: "25%" },
              { name: "Last Name", field: "last_name", width: "25%" },
              { name: "Mail", field: "email", width: "50%" }
            ]
          }, "gridDiv");
        grid.startup();


         on(dom.byId('queryButton'), 'click', function (evt) {
           /* Dump some records into debug log */
           clicks++;
           var records = store.query(
              "",
              {
                start: 1,
                count: 5,
                sort: [{attribute:"last_name", descending: false}]
              });
            records.forEach(function (each) {
              debugMsg("dump", JSON.stringify(each));
            });
         });

         on(dom.byId('addButton'), 'click', function (evt) {
           /* Add a record */
           clicks++;
           store.add(
                {"first_name" : "Johannes (" + clicks + ")", "last_name" : "Schlueter"}
              ).then(
                function (reply) {
                  addedIds.push(reply);
                  debugMsg("add", JSON.stringify(reply));
                  grid.render();
                }
              );
         });

         on(dom.byId('removeButton'), 'click', function (evt) {
           /* Remove freshly added button */
           clicks++;
           if (addedIds.length) {
             store.remove(addedIds.pop()).then(
               function (reply) {
                 debugMsg("remove", "OK, " + JSON.stringify(reply));
                 grid.render();
               }
             );
           } else {
             debugMsg("remove", "All added records already removed");
           }
         });

         on(dom.byId('updateButton'), 'click', function (evt) {
           /* Add and update freshly added... */
           clicks++;
           store.put(
              {"first_name" : "Andrey (" + clicks + ")"}
              ).then(
                function (reply) {
                  var insert_id = reply;
                  debugMsg("update, step 1 (add)", JSON.stringify(reply));
                  store.get(insert_id).then(
                    function (reply) {
                      debugMsg("update, step 2 (fetch ... to demo fetch ;-))", JSON.stringify(reply));
                      store.put(
                        {"first_name" : "Andrey (" + clicks + ")", "last_name" : "Hristov"},
                        {"overwrite": true, "id": insert_id}
                      ).then(
                        function (reply) {
                          debugMsg("update, step 3 (the actual update)", JSON.stringify(reply));
                          grid.render();
                        }
                      );
                    }
                  );
                }
              );
         });
    }
  );
  </script>
  <h1>MySQL spreadsheet</h1>
  <form>
     <fieldset>
       <legend>Grid/spreadsheet</legend>
       <div id="gridDiv"></div>
     </fieldset>
     <fieldset>
        <legend>Actions</legend>
        <input type="button" id="queryButton" value="Dump some records into debug log"><br />
        <input type="button" id="addButton" value="Add a record">
        <input type="button" id="removeButton" value="Remove added record">
        <input type="button" id="updateButton" value="Add, read and update a record">
     </fieldset>
     <fieldset>
       <legend>Debug Log</legend>
       <span style="overflow: auto; height: 5em">
         <div id="debugDiv"></div>
       </span>
     </fieldset>
 </form>

</body>
</html>

JsonpMySQL: JSON in BLOB

Finally a prototype of a dojo store for storing objects as JSON in a BLOB column of a MySQL table. Again, it is written for use with the PHP proxy script presented earlier.

define("dojo/store/JsonpMySQL", ["../number", "../string", "../request/script", "../when", "../_base/xhr", "../_base/lang", "../json", "../_base/declare", "./util/QueryResults" /*=====, "./api/Store" =====*/
], function(number, string, script, when, xhr, lang, JSON, declare, QueryResults /*=====, Store =====*/){

// No base class, but for purposes of documentation, the base class is dojo/store/api/Store
var base = null;
/*===== base = Store; =====*/

/*=====
var __HeaderOptions = {
		// headers: Object?
		//		Additional headers to send along with the request.
	},
	__PutDirectives = declare(Store.PutDirectives, __HeaderOptions),
	__QueryOptions = declare(Store.QueryOptions, __HeaderOptions);
=====*/

return declare("dojo.store.JsonpMySQL", base, {
	// summary:
	//		This is a basic store for RESTful communicating with a server through JSON
	//		formatted data. It implements dojo/store/api/Store.

	constructor: function(options){
		// summary:
		//		This is a basic store for RESTful communicating with a server through JSON
		//		formatted data.
		// options: dojo/store/JsonRest
		//		This provides any configuration information that will be mixed into the store
		declare.safeMixin(this, options);
	},



	method: "http",
	host: "127.0.0.1",
	port: 8080,
	interface: "sql",
	basicAuthUser: "root",
	basicAuthPassword: "secret",
	mysqlTable: "dojo_jsonp",
	mysqlBlob: "dojo_blob",
	mysqlId: "dojo_id",


	// target: String
	//		The target base URL to use for all requests to the server. This string will be
	//		prepended to the id to generate the URL (relative or absolute) for requests
	//		sent to the server
	target: "",

	// idProperty: String
	//		Indicates the property to use as the identity property. The values of this
	//		property should be unique.
	idProperty: "id",

	// sortParam: String
	//		The query parameter to used for holding sort information. If this is omitted, than
	//		the sort information is included in a functional query token to avoid colliding
	//		with the set of name/value pairs.

	get: function(oid){
		// summary:
		//		Retrieves an object by its identity. This will trigger a GET request to the server using
		//		the url `this.target + id`.
		// id: Number
		//		The identity to use to lookup the object
		// returns: Promise
		//		The object in the store that matches the given id.
		var sql = "SELECT " + this.mysqlId + ", " + this.mysqlBlob + "FROM";
		sql += this.mysqlTable + " WHERE " + this.mysqlId + "=" + number.parse(oid);

		return when(
				script.get(
					this._getAddress(sql),
					{jsonp: "jsonp"}
				).then(lang.hitch(this, this._extractFirstObject))
			);
	},

	getIdentity: function(object){
		// summary:
		//		Returns an object's identity
		// object: Object
		//		The object to get the identity from
		// returns: Number
		return object[this.idProperty];
	},

	put: function(object, options){
		// summary:
		//		Stores an object. This will trigger a PUT request to the server
		//		if the object has an id, otherwise it will trigger a POST request.
		// object: Object
		//		The object to store.
		// options: __PutDirectives?
		//		Additional metadata for storing the data.  Includes an "id"
		//		property if a specific id is to be used.
		// returns: dojo/_base/Deferred
		/*
		 Store.PutDirectives = declare(null, {
		// summary:
		//		Directives passed to put() and add() handlers for guiding the update and
		//		creation of stored objects.
		// id: String|Number?
		//		Indicates the identity of the object if a new object is created
		// before: Object?
		//		If the collection of objects in the store has a natural ordering,
		//		this indicates that the created or updated object should be placed before the
		//		object specified by the value of this property. A value of null indicates that the
		//		object should be last.
		// parent: Object?,
		//		If the store is hierarchical (with single parenting) this property indicates the
		//		new parent of the created or updated object.
		// overwrite: Boolean?
		//		If this is provided as a boolean it indicates that the object should or should not
		//		overwrite an existing object. A value of true indicates that a new object
		//		should not be created, the operation should update an existing object. A
		//		value of false indicates that an existing object should not be updated, a new
		//		object should be created (which is the same as an add() operation). When
		//		this property is not provided, either an update or creation is acceptable.
		});
		*/
		options = options || {};
		var sql = "";
		var id = ("id" in options) ? options.id : this.getIdentity(object);
		var hasId = typeof id != "undefined";

		if (("overwrite" in options) && options["overwrite"]) {
			if (!hasId) {
				throw "You must provide the id of the object to update";
			}

			sql  = "UPDATE " + this.mysqlTable + " SET " + this.mysqlBlob + "= '";
			sql += this._escapeString(JSON.stringify(object)) + "'";
			sql += " WHERE " + this.mysqlId + "=" + number.parse(id);

		} else {
			sql = "INSERT INTO " + this.mysqlTable + "(" + this.mysqlBlob;
			if (hasId) {
				sql += ", " + this.mysqlId;
			}
			sql += ") VALUES ('" + this._escapeString(JSON.stringify(object)) + "'";
			if (hasId) {
				sql += ", " + number.parse(id);
			}
			sql += ")";
		}

		return when(
			script.get(
					this._getAddress(sql),
					{jsonp: "jsonp" }
				).then(
					function (reply) {
						if (reply && "last_insert_id" in reply)
							return reply.last_insert_id;
						return reply;
					}
				)
		       );
	},

	add: function(object, options){
		// summary:
		//		Adds an object. This will trigger a PUT request to the server
		//		if the object has an id, otherwise it will trigger a POST request.
		// object: Object
		//		The object to store.
		// options: __PutDirectives?
		//		Additional metadata for storing the data.  Includes an "id"
		//		property if a specific id is to be used.
		options = options || {};
		options.overwrite = false;
		return this.put(object, options);
	},

	remove: function(id, options){
		// summary:
		//		Deletes an object by its identity. This will trigger a DELETE request to the server.
		// id: Number
		//		The identity to use to delete the object
		// options: __HeaderOptions?
		//		HTTP headers.
		options = options || {};
		var sql = "DELETE FROM " + this.mysqlTable + " WHERE " + this.mysqlId + "=" + number.parse(id);
		return when(
			script.get(
					this._getAddress(sql),
					{jsonp: "jsonp" }
				).then(
					function (reply) {
						if (reply && "errno" in reply) {
							return reply;
						}
						return;
					}
				)
		       );
	},

	query: function(query, options){
		// summary:
		//		Queries the store for objects. This will trigger a GET request to the server, with the
		//		query added as a query string.
		// query: Object
		//		The query to use for retrieving objects from the store.
		// options: __QueryOptions?
		//		The optional arguments to apply to the resultset.
		// returns: dojo/store/api/Store.QueryResults
		//		The results of the query, extended with iterative methods.
		var sql = "SELECT " + this.mysqlId + ", " + this.mysqlBlob;
		sql += " FROM " + this.mysqlTable;

		var results = when(
				script.get(
					this._getAddress(sql),
					{jsonp: "jsonp" }
				).then(lang.hitch(this, this._extractAllObjects))
			);
		return QueryResults(results);
	},

	_getAddress : function(query) {
		return this.method + "://" + this.basicAuthUser + ":" + this.basicAuthPassword + "@" +
			this.host + ":" + this.port + "/" + this.interface + encodeURIComponent(query);
	},

	_extractObjects : function(result, limit) {
		var data_only = new Array();
		var result_idx, row_idx;

		for (result_idx = 0; result_idx < result.length; result_idx++) {
			for (row_idx = 0; row_idx < result[result_idx].data.length; row_idx++) {
				if ((limit > 0) && (row_idx >= limit)) {
					return data_only;
				}
				data_only.push(JSON.parse(result[result_idx].data[row_idx][1]));
			}
		}
		return data_only;
	},
	_extractAllObjects : function (result) {
		return this._extractObjects(result, -1);
	},
	_extractFirstObject: function (result) {
		return this._extractObjects(result, 1);
	},

	_escapeString: function (sql_value) {
		sql_value = sql_value.toString();
		return sql_value.replace('/"/g', '\\"');
	}
});

});

The name of the SQL table, the name of the BLOB column and the name of the ID column can be set through the stores constructor. Otherwise it is just a variation of the theme "SQL over HTTP" at the example of a specific JavaScript framework…

var store = new mysqlp({
  method: "http",
  host: "127.0.0.1",
  port: 8080,
  interface: "index.php?sql=",
  basicAuthUser: "root",
  basicAuthPassword: "secret",
  mysqlTable: "name_of_the_table",
  mysqlBlob: "blob_column",
  mysqlId: "id_column"
});

Happy hacking!

@Ulf_Wendel Follow me on Twitter

Connecting MySQL and the Dojo Toolkit (JavaScript): basics

2013/06/21 - by admin - 2 Comments

Over the years JavaScript has become an integral part in every web developers life. The Dojo Toolkit is one of many JavaScript frameworks offering language goodies and interface voodo. It may not be the most used framework but it is a certain beauty to it – once you get used to it. The steep learning curve may be a bit in its way to success. However, to demo how to connect from browser-side JavaScript to MySQL it is about perfect as all its bells and whistles show what can be done, given enough time…

Feel free to skip all the recap sections, if you are familiar with the very basics of AJAX/JSONP.

Recap: JavaScript in a LAMP application

If your web site is driven by MySQL, a mix of a script language (PHP, Python, Perl, Ruby, …) and you plan to have quite some JavaScript running in the Browser you are confronted with a challenge. JavaScript run as part of an HTML page cannot connect to MySQL. JavaScript is limited to making HTTP requests but MySQL does not speak HTTP. Thus, JavaScript cannot connect to MySQL directly. It takes a proxy to connect them.

JavaScript <- HTTP Protocol -> Proxy, e.g. Apache/PHP
  ^
MySQL Protocol (binary)
v
MySQL Server

There are two tasks for you as a developer: issue an HTTP request from JavaScript, create a proxy to handle the requests.

Recap: A proxy to handle requests

In an ideal world you would sit down and design a service interface for your JavaScript application that is then implemented by the proxy and offered as a RESTful service. You would not offer a single call that is not needed by your application to create a minimal service. The less, the less bugs can there be and the lower the risk of abuse. You would map a SQL SELECT to an HTTP GET request and so forth. Do that for production use!

During development, a simple PHP script accepting a SQL command as a GET parameter, for example, like http://127.0.0.1/index.php?sql=SELECT%201, may do the trick. It also does the trick, if all you want it demo how JavaScript and MySQL can be glued together ;-). WARNING: do not use in production, unless you are fully aware of the security implications.

<?php
function reply($msg) {
	$reply = json_encode($msg);
	if (isset($_REQUEST['jsonp']) && $_REQUEST['jsonp']) {
	   $reply = sprintf("%s(%s);", $_REQUEST['jsonp'], $reply);
	}
	header("Content-type: application/json");
	die($reply);
}
if (isset($_REQUEST['sql'])) {
	$mysqli = new mysqli("127.0.0.1", "root", "root", "test", 3307);
	if ($mysqli->connect_errno) {
		reply(array(
		  "errno" => $mysqli->connect_errno,
		  "error" => $mysqli->connect_error,
		  "sqlstate" => "HY000"));
	}


	$stmt = $mysqli->prepare($_REQUEST['sql']);
	if ($mysqli->errno) {
		reply(array(
			"errno" => $mysqli->errno,
			"error" => $mysqli->error,
			"sqlstate" => $mysqli->sqlstate));
	}
	if (!$stmt->execute()) {
		reply(array(
			"errno" => $stmt->errno,
			"error" => $stmt->error,
			"sqlstate" => $stmt->sqlstate));
	}

	$result = array();
	$resultset_idx = 0;
	do {
		if ($res = $stmt->get_result()) {
			/* this could be a stored procedure call returning different result sets */
			$result[$resultset_idx] = array(
				"data" => array(),
				"meta" => array(),
				"status" => array(),
			);
			while ($row = $res->fetch_row()) {
				$result[$resultset_idx]["data"][] = $row;
			}

			$res_meta = $stmt->result_metadata();
			$fields = $res_meta->fetch_fields();
			foreach ($fields as $field) {
				$result[$resultset_idx]["meta"][] = array(
					"type" => $field->type,
					"database" => $field->catalog,
					"table" => $field->table,
					"column" => $field->name
				);
			}

			if ($mysqli->insert_id) {
				$result[$resultset_idx]["status"]["last_insert_id"] = $mysqli->insert_id;
			}
			$result[$resultset_idx]["status"]["warning_count"] = $mysqli->warning_count;
		} else {
			/* Either an error or a statement which has returned no results */
			if ($stmt->errno) {
				reply(array(
					"errno" => $stmt->errno,
					"error" => $stmt->error,
					"sqlstate" => $stmt->sqlstate));
			} else {
				reply(array(
					"warning_count" => $mysqli->warning_count,
					"affected_rows" => $mysqli->affected_rows,
					"last_insert_id" => $mysqli->insert_id,
				));
			}
		}
		$resultset_idx++;
	} while ($stmt->more_results() && $stmt->next_result());

	reply($result);

} else {
  reply(array(
	"errno" => 1065,
	"error" => "Query was empty",
	"sqlstate" => "42000"
  ));
}

Above you can find a sample PHP proxy script which takes an arbitrary SQL command as a GET parameter, executes it and returns the result as a JSON object. The easiest way to test it, may be using the PHP built-in webserver. Copy the script, save it named as index.php, start the PHP built-in webserver with the location of the script as the document root.

> php -S 0.0.0.0:8080 -t /home/nixnutz/src/php-src/ 1>2 2>>/tmp/php.log &
> curl "http://127.0.0.1:8080/?sql=SELECT%201"
[{"data":[[1]],"meta":[{"type":8,"database":"def","table":"","column":"1"}],"status":{"warning_count":0}}]

Feel free to change the format of the JSON returned in reply to SELECT 1. I am not aware of any format standards to follow. The example script is just good enough to handle simple queries and calls to stored procedures returning multiple result sets. It should do the trick for prototyping work. However, you probably know your script language but are more curious to see the JavaScript snippets.

[
  {
    "data": [
      [
        1
      ]
    ],
    "meta": [
      {
        "type": 8,
        "database": "def",
        "table": "",
        "column": "1"
      }
    ],
    "status": {
      "warning_count": 0
    }
  }
]

Recap: JavaScript HTTP request

The easiest way of doing an HTTP request with JavaScript may be using the dedicated XMLHttpRequest object. Together with a proxy, such as the one sketched above, it is the most simple way of connecting from JavaScript to MySQL.

<script language="JavaScript">
  xmlHttp = new XMLHttpRequest();
  xmlHttp.open('GET', 'http://127.0.0.1:8080/index.php?sql=SELECT%20\'Greetings!\'', true);
  xmlHttp.onreadystatechange = function () {
      document.write("Response: " + xmlHttp.responseText);
  };
  xmlHttp.send(null);
</script>


Let me cite Wikipedia on a major limitation:

XMLHttpRequest is subject to the browser’s same origin policy in that, for security reasons, requests will only succeed if they are made to the same server that served the original web page.

A common solution to the same origin restriction is using JSONP: JSON with padding. The idea is simple. It takes two steps to understand. First, instead of returning a JSON document, the proxy return a JSON document padded with a function call. Try calling the sample proxy with index.php?sql=SELECT%201&jsonp=callback:

callback(
[
  {
    "data": [
      [
        1
      ]
    ],
    "meta": [
      {
        "type": 8,
        "database": "def",
        "table": "",
        "column": "1"
      }
    ],
    "status": {
      "warning_count": 0
    }
  }
]
);


Second, put a script block in your HTML document which implements a function callback(). Put another script tag in the HTML document which refers to the proxy using the src attribute.

<script language="JavaScript">
function callback(data) {
  document.wrtite("<pre>" + data + "</pre>");
}
</script>
<script language="JavaScript" src="http://127.0.0.1:8080/index.php?sql=SELECT%1&jsonp=callback">


A script tag can load code from an arbitrary URL. What happens is that the second script tag loads the JavaScript script from your proxy, executes it and callback() gets called.

dojo talks to MySQL

As usual, frameworks hide details such as cross browser compatibility, adding HTML elements to the DOM on the fly, URL encoding and error handling to some degree. Below is a complete dojo example utilizing the PHP proxy from above. Upon click on a button some SQL queries are run against MySQL to create a table, populate it and fetch results from it. The complete example first, a step by step discussion is below.


<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
</head>
<body>
  <script src="http://ajax.googleapis.com/ajax/libs/dojo/1.8.4/dojo/dojo.js"
               data-dojo-config="async: true"></script>
  <script>
  require(
    ["dojo/dom", "dojo/on", "dojo/request/script", "dojo/request/notify", "dojo/json", "dojo/domReady!"],
    function(dom, on, script, notify, JSON) {
        // Results will be displayed in resultDiv

        function mysql(queries) {
          if (0 == queries.length)
            return;

          var query = queries[0];
          queries.shift();
          alert("Running " + query);;

          var url = "http://127.0.0.1:8080/index.php?sql=";
          url += encodeURIComponent(query);
          var promise = script.get(url, {jsonp: "jsonp", query: {jsonp_escape: 1}});
          promise.then(
            function (data) {
              alert("MySQL reply: " + JSON.stringify(data));
              mysql(queries);
            },
            function (err) {            
              alert("(Error) " + JSON.stringify(err));
              mysql(queries);
            }
          );
        }

        // Attach the onclick event handler to the makeRequest button
        on(dom.byId('makeRequest'), "click", function (evt) {
          queries = new Array(
            "DROP TABLE IF EXISTS test",
            "CREATE TABLE test(id INT)",
            "INSERT INTO test(id) VALUES (1), (2), (3)",
            "SELECT id FROM test ORDER BY id DESC"
          );
          mysql(queries);
        });
    }
  );
  </script>
  <form><input type="button" id="makeRequest" value="Click to query MySQL" /></form>
</body>
</html>

As a dojo beginner you will find the usual bits in there to load dojo and to import some modules:


  [...]
  <script src="http://ajax.googleapis.com/ajax/libs/dojo/1.8.4/dojo/dojo.js"
               data-dojo-config="async: true"></script>
  <script>
  [...]
  require(
    ["dojo/dom", "dojo/on", "dojo/request/script", "dojo/request/notify", "dojo/json", "dojo/domReady!"],
    function(dom, on, script, notify, JSON) {

You can either download dojo, put into your local file system and use a directory path to reference it or use a CDN. For toying around, a CDN does the trick. The require() function is dojo’s current way of loading modules and importing them into whatever function you provide. If this is completely new to you, think of function(dom, on, script, notify, JSON) as your main() function, your programs main body. The parameter you get correspond to the modules that you require.

The last element in the list of required modules is a special one: it delays the start of your "main()" function until the DOM is loaded and you can start manipulating it. For example, you may want to associate a click event handler with a button from the DOM, as its done towards the end of the script.


  [...]
        // Attach the onclick event handler to the makeRequest button
        on(dom.byId('makeRequest'), "click", function (evt) {
          queries = new Array(
            "DROP TABLE IF EXISTS test",
            "CREATE TABLE test(id INT)",
            "INSERT INTO test(id) VALUES (1), (2), (3)",
            "SELECT id FROM test ORDER BY id DESC"
          );
          mysql(queries);
        });
  [...]
 <form>
   <input type="button" id="makeRequest" 
      value="Click to query MySQL" />
  </form>


When the button is clicked, dojo calls our mysql() function which performs a JSONP HTTP request to query MySQL.

Culture clash: this world is asynchronous!

A PHP developer, who is used to synchronous function calls, will have to learn a couple of new language "phrases". Many dojo function calls operate in an asynchronous way. Calling function does not block
execution until the function returns. The main execution flow continues as events arrive asynchronously.

A JSONP HTTP request can be made by the means of the dojo script module. It can be used to query the proxy for running a SQL statement. script.get() is a non-blocking, asynchronous call. A call to script.get(proxy_url, [options]) does not block, neither does it return the result of HTTP request it performs. Instead it returns a dojo promise object.


var promise = script.get(url, {jsonp: "jsonp", query: {jsonp_escape: 1}});  

You can use the promise object to install callback functions that are called when results arrive from the HTTP request or an error occurs.


   promise.then(
     function (reply) { },
     function (error) { }
   );       

A promise is one of those "phrases" that you have to know to master this non-PHP world. And, a promise it as the hearth of the mysql() function that makes dojo query MySQL.


  [...]
  var url = "http://127.0.0.1:8080/index.php?sql=";
  url += encodeURIComponent(query);
  var promise = script.get(url, {jsonp: "jsonp", query: {jsonp_escape: 1}});
  promise.then(
    function (data) {
      alert("MySQL reply: " + JSON.stringify(data));
      mysql(queries);
    },
    function (err) {            
      alert("(Error) " + JSON.stringify(err));
      mysql(queries);
    }
 );
 [...]

script.get() hides all the details of the HTTP request from you. You pass in the URL and tell it whether it shall do a JSONP request or something else, it does the rest for you including DOM manipulation for inserting the script tags and code shown above in the recap paragraphs.

JSON.stringify serves no other purpose but turning the object you get in reply to your JSONP request into a string that you can easily display, for example, using alert().

Seriously, it is asynchronous…

Upon closer inspection you may notice that the mysql() function uses a recursive approach to run queries. The function takes an array of SQL queries as an input parameter.


 [...]
 queries = new Array(
   "DROP TABLE IF EXISTS test",
   "CREATE TABLE test(id INT)",
   "INSERT INTO test(id) VALUES (1), (2), (3)",
   "SELECT id FROM test ORDER BY id DESC"
 );
 mysql(queries);
 [...]

It shifts the first element off the list, sends a JSONP request through script.get() and registers callbacks for the promise. When the promise executed the callbacks, the result it processed and
the mysql() is called with the list of remaining queries.

 function mysql(queries) {
   if (0 == queries.length)
     return;

   var query = queries[0];
   queries.shift();
   [...]
   url += encodeURIComponent(query);
   var promise = script.get(url, {jsonp: "jsonp", query: {jsonp_escape: 1}});
   promise.then(
     function (data) {
       alert("MySQL reply: " + JSON.stringify(data));
       mysql(queries);
     },
   [...]     

Assume you would not wait for the promise to invoke the callbacks and fire off all SQL queries at once. You could do that: script.get() is non-blocking!


 url += encodeURIComponent("DROP TABLE IF EXISTS test");
 var promise1 = script.get(url, {jsonp: "jsonp", query: {jsonp_escape: 1}});
 url += encodeURIComponent("CREATE TABLE test(id INT)");
 var promise2 = script.get(url, {jsonp: "jsonp", query: {jsonp_escape: 1}});   

Unfortunately, that is not going to work. The HTTP request for the DROP may arrive at your proxy after the CREATE table one. Thus, the CREATE could fail. The recursive approach makes sure that all SQL statements are processed in the order given.

Next post: a MySQL dojo store

Armed with this, you should be able to glue pieces together to get started. Keep in mind that SQL over HTTP bares a risk. The URL of your proxy script is in your HTML document. Your proxy script will accept any SQL – any. For prototyping or in secured environments this may still be very much acceptable. However, it should not be difficult to derive a simple REST interface given the above cut&paste pieces to get started.

The next post is on creating a dojo data store for MySQL. Many dojo/dijitt widgets, for example, the data grid (spreadsheet) accept stores as data sources. A sortable spreadsheet with dojo and MySQL? Doable…

Happy hacking!

@Ulf_Wendel Follow me on Twitter

PHP: PECL/mysqlnd_ms 1.6 – automatic retry loop for transient errors

2013/06/12 - by admin - 0 comments

PECL/mysqlnd_ms is client-side load balancing driver plugin for PHP MySQL that aims to increase distribution transparency when using any MySQL based cluster: failover, read-write splitting, abstraction on consistency (e.g. read-your-writes), partitioning/sharding support, … it’s all there. Until a few minutes ago, we had no special handling of transient errors. Sometimes a database server replies “come back in a bit and retry, no need to fail over yet”. And, that’s what the client shall do before giving up. PECL/mysqlnd_ms 1.6 (development version) is capable of hiding the retry loop, which makes it easier to use any existing PHP MySQL application with a cluster of MySQL servers.

Transient (temporary) errors are rarely observed with MySQL Replication but can be seen with MySQL Cluster. MySQL Cluster is an eager (synchronous) update anywhere (multi-master) cluster: all replicas accept reads and writes, replication is synchronous. See also the slide deck DIY: A distributed database cluster, or: MySQL Cluster for a brief introduction in distributed database theory relevant to MySQL users (presentation from the International PHP Conference 2013 Spring Edition).

Transient errors

MySQL Cluster scales well for write loads because it features transparent sharding (see slides). It automatically partitions data over multiple replicas. Over the time, for example, when adding replicas to the cluster, data may b redistributed. Rebalancing is an online operation, it does not lock out clients. Thus, you may observe a temporary error such as:

ERROR 1297 (HY000): Got temporary error 1204 'Temporary
failure, distribution changed' from NDBCLUSTER 


There may be other causes for temporary errors as well. In any case, its safe to ignore a 1297/HY000 and retry the command.

The latest versions of MySQL Cluster feature an implicit retry loop before returning the error to the client, if it is believed that your command is not time critical. Means, Cluster resends the command for you a couple of times with a short sleep period in between before returning control to the client to tell about the temporary problem. PECL/mysqlnd_ms 1.6 alpha got a similar loop: very basic and experimental. Here’s the idea.

Automatic retry loop

The dream of Andrey when he created PECL/mysqlnd_ms was to make using a cluster transparent. It should be possible to move an application from a single MySQL to a cluster of MySQL servers without code changes. Thus, as a first step, I have opted against offering a callback to decide on errors (like Connector/J does). Instead, it is possible to configure the retry loop in the config file.

The example config snippet instructs the driver plugin to start an implcitiy command retry loop when there is an error with the error code 1297. Its possible to configure a list of arbitrary error codes. Whenever 1297 happens, the command is retried for max_retries = 2 times. Between the retry attemps PECL/mysqlnd_ms 1.6 sleeps for usleep_retry = 100 milliseconds. In an ideal world, the temporary error is gone by the end of the wait loop. In the worst case of the error persisting, it is forwarded to whatever PHP API you use (mysqli, PDO_MySQL) leaving it to your application to deal with it.

{
  "myapp": {
    [...]
    "transient_error": {
      "mysql_error_codes": [
        1297
      ],
      "max_retries": 2,
      "usleep_retry": 100
    }
  }
}


Please, send us your feature requests: this is a "live report" from the hacking and nothing is set.

You can check whether an implicit retry loop has been performed by inspecting the statistics provided by PECL/mysqlnd_ms.

$stats = mysqlnd_ms_get_stats();
printf("Implicit retries to hide transient errors: %d", 
  $stats['transient_error_retries']);

Failover vs. transient error

When talking to a cluster instead of a single machine there are two additional error conditions to handle:

  • Permanent error: replica disappeared, forget about replica – for now: fail over to someone else…
  • Transient error: replica says BRB/BBIAB, retry – replica is synchronizing, data distribution changes, …

PECL/mysqlnd_ms applies failover logic whenever it connects to a replica. This can happen – due to lazy connect – not only when a connect() function of any PHP MySQL API is called but also during query(). At the time of writing, the retry loop is not applied for a connect attempt.

The new 1.6 transient error logic handles error conditions on already established connections. At the time of writing, it only covers query() – its a safe bet to assume that we cover all commands before the feature is called stable. Work in process, comments are welcome.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

PS: The overdue 1.5 stable release is coming soon. We forgot about it, simple as that.

(Slides) DIY – A distributed database cluster, or: MySQL Cluster

2013/06/05 - by admin - 0 comments

Ever wondered how a database cluster ticks? Here’s how! During my talk at the International PHP Conference (#ipc13) I tried to build a cluster. I brought a soldering iron, sold, pink, orange and brown cables with me. Then, I tried to sold the thick (reliable, high throughput) brown cable at my company notebook (video coming). Eventually, I failed. Probably, I lacked the theoretical background?! Luckily, I got very theoretical slides with me…

View & Download

No marketing fluff

The talk has 0% marketing fluff. 0% marketing fluff leaves little to say but theory. Thus, the slides explain:

  • The four kinds of database clusters
  • Concurrency control: how conflicts are detected
  • Concurrency control: how conflicts are resolved
  • Why atomic broadcast (Virtual Synchrony) is cool
  • How ROWA compares to Quorums
  • How to split brains
  • Three different architectures for DIY
  • How do MySQL Replication/Cluster solutions differ
  • Why partial replication (partitioning/sharding) is the only way to scale writes
  • Why MySQL Cluster hates PHP apps
  • Why MySQL Cluster is the ultimate NoSQL KVS

All this theory helps to create a mind map of clustering solutions available for MySQL. I am no support guy, I am no consulting guy but a curious developer. I cannot tell you how the systems perform in real life but I may help you understanding the basics and asking the real experts nasty questions.

How do MySQL Replication, Tungsten, Galera and Cluster compare?

There are two simple questions to categorize RDBMS replication solutions:

  • Where can any transaction (including updates) be run?
  • When will replicas be synchronized

This gives you a simple matrix with four squares:

  Where?
When? Primary Update Anywhere
Eager n/a MySQL Cluster, Galera
Lazy MySQL Replication, Tungsten MySQL Cluster with Replication

An eager replication system waits for all replicas to – at least – receive the transactions changes, the transaction can commit. Thus, the commit rate is determined by the network round trip time. In the worst case an eager cluster may expose a lower commit rate than a single database. WAN and eager usually do not go together well.

Is synchronous really synchronous…?

Eager or synchronous does not mean zero lag between replicas. How big the lag is depends on the algorithm used. For the average SQL user, Galera is more prone to lagging than MySQL Cluster. Galera executes a transaction on one replica, then sends out the updates and commits. Galera commits before the other replicas have caught up. MySQL Cluster executes transactions in parallel on multiple replicas and distributes pending updates to replicas before it commits.

A SQL user view on commit sequence
Galera MySQL Cluster
Execute transaction Execute transaction (in parallel)
Distribute updates
Commit Wait for update of other replicas
(Other replicas eventually commit) Commit

Is Galera cheating by hiding the time it takes other replicas to commit? No, it is all documented! BTW, the “Virtual” in Virtual Synchrony (their algorithm) does not refer to temporal aspects (see slides). MySQL Cluster can easily compete, if you like the Galera style: see the low level NDBAPI documentation.

A lazy replication system commits at the speed of your disks. Often your disks will have lower latency than the round trip time in your network. The downsides are widely known: copies (slaves) may lag behind, eventual consistency, what if the master crashes.

Three different architectures

On a high level there are three different designs for a distributed database cluster: in-core, middleware and hybrid. A cluster can be described using six building blocks. The first three are obvious: clients, loadbalancer, RDBMS. The next three are: reflector, replicator and GCS. The reflector intercepts transactions in the RDBMS and can control them. It sends all required information about ongoing transactions to the replicator. The (distributed) replicator is responsible for detecting conflicts. The reflector is using a group communication system (GCS) for network communication.

In an in-core design RDBMS, reflector and replicator go in one process. This is the best possible integration with the lowest overhead. MySQL Replication falls into this group. Same about Galera (they have to patch MySQL).

A middleware approach provides a virtual database. It intercepts all statements issued by clients and forwards them to the actual database. Reflector, relicator and GCS may be present but are not part of the database. The challenge is in parsing and handling non-deterministic SQL statements containing functions such as NOW() or TIME(). Also, whenever the database learns a new client protocol, the virtual database has to learn it too. Tungsten is an example of such a solution for MySQL.

A hybrid approach has a reflector plug-in to the RDBMS and a replicator running in its own process. That is pretty much the design of MySQL Cluster. It gets all information it needs from MySQL in an efficient manner.

Handling writes

In the course of the presentation you will learn that there is only one way of scaling writes in a distributed replicated database: partial replication. In a full (e.g. Galera) replication system the formula “more replicas, higher write throughput” is wrong. Every new replica adds load to the entire system as more update transactions can be started and each of them needs to be certified by all the others.

MySQL Cluster does partial replication: it is using partitioning/sharding. A write does not need to be certified by all other replicas but only by those holding partitions affected by the write. Thus, it can scale writes as well as it can scale reads. Also, multiple replicas can participate in the execution of a transaction and their computing power is combined. The big challenge with this approach is in executing those distributed queries in a way that network traffic is minimized. The slides give more insights.

If you have simple key-value style queries, short transactions and few joins, you should have a look at MySQL Cluster. MySQL Cluster is also well suited as a web session store. Thinking that Drupal or WordPress will run a magnitude faster on MySQL Cluster – oh, oh…

Happy hacking!

@Ulf_Wendel Follow me on Twitter