Some sharding support and cache locality optimization support for PHP MySQL driver

2012/12/19 by admin

It is time for christmas presents: some sharding support and cache locality optimizations are coming with PECL/mysqlnd_ms 1.5. PECL/mysqlnd_ms is a plugin for the mysqlnd library. The plugin adds replication and load balancing support to any PHP MySQL API (mysql, mysqli, PDO_MySQL) if compiled to use the mysqlnd library.

As a MySQL user you can choose between a wide variety of clustering solutions to scale-out. Your options range from eventual consistent solutions to strong consistent ones, from built-in (MySQL Replication, MySQL Cluster) to third party or home-grown. PECL/mysqlnd_ms is a client side load balancer that aims to serve all.

Background: cluster-wide partitioning and sharding

Divide-and-conquer is a true classic. If you are lucky, your business requires you to spread load and data over many MySQL nodes using some sort of cluster. In many cases read scale-out offered by MySQL Replication is all you need. Any step further means digging deeper into the world of distributed systems.

Writes
MySQL Master
\|	\|	\|
Slave 1	Slave 2	Slave 3
Reads

Possibly, you need more than the above and look into schema-based partitioning for distributing data over many nodes. MySQL Replication cluster topologies are hardly limited by anything but your fantasy. Please, mind the writes because MySQL Replication is using lazy primary copy but who do I tell this?

Writes for DB1, DB2				Writes for DB3, DB4
Reads covering DB1…DB4				Reads covering DB1…DB4
MySQL Master 1			replicate DB1, DB2 –>	MySQL Master 2
DB1 (rw), DB2 (rw), DB3 (ro), DB4 (ro)			<– replicate DB3, DB4	DB1 (ro), DB2 (ro), DB3 (rw), DB4 (rw)
\|	\|	\|		\|	\|
Slave 1	Slave 2	Slave 3		Slave 4	Slave 5
DB1	DB1	DB2		DB3	DB4
Reads DB1		Reads DB2		Reads DB3	Reads DB4

Finally, further growth gets you into sharding. Again, you split up a big task into small manageable units. This time the split is not done "vertical" along columns, tables or schemata but horizontally along rows. For example, you distribute all rows of the table T1 from the database DB1 based on their ID column value over multiple nodes.

Writes for DB1.T1 (ID 0…1e9)			Writes for DB1.T1 (ID > 1e9)		Writes for DB1.T2	…
Reads covering …			Reads covering …		…
MySQL Master 1		replicate … –>	MySQL Master n
DB1.T1 (ID n…m) (rw)		<– replicate …	DB1.T1 (ID n…m) (rw)
\|	\|		\|	\|
Slave 1	Slave 2		Slave 4	Slave 5
DB1.T1			DB1.T1
Reads DB1.T1			Reads DB1.T1

In the example, the ID column serves as a shard key. Based on the shard key and a deterministic function a cluster node is computed to execute the query on. Search the web for consistent hashing and vBuckets if you want more details.

The distribution logic can be an integral part of the database cluster itself or have to be handled by the application. MySQL Cluster has built-in sharding. As a result, the application does not need to worry about how to distribute and access data. It is all transparently handled by the cluster. MySQL Replication, however, does not have sharding built-in.

Let’s recap on sharding/partitioning: data is distributed and grouped according to some rules. Rules are either handled by the cluster itself or by the application.

The idea: grouping nodes, new node_group filter

Time to simplify the illustrations. Let’s take colors out and take a step back: data is distributed and data is grouped.

Cluster of MySQL servers
Group A			Group B
Node 1	Node 2	Node 3	Node 4	Node 5
			Node 4	Node 5

Further simplification is possible. Groups could consist of no more than one node, for example, one shard.

Cluster of MySQL servers
Group A	Group B	…
Shard A	Shard B	…

Some sharding support

More recap: unless the database has built-in sharding the application has to take care of distributing queries. The application may do this like so.


$server = get_shard_server($id = 1);
$link = new mysqli($server);
$link->query("INSERT INTO t1(id) VALUES(1)");

$server = get_shard_server($id = 2);
$link = new mysqli($server);
$link->query("INSERT INTO t1(id) VALUES(2)");

The above is not very nice as all the connection handling is left to the user as an excercise. Also, imagine you wanted to add read scale-out through a lazy primary copy system like MySQL Replication and had to do read-write splitting, failover or load balancing on top. All tasks could be handled by a load balancer using code like this:


$link = new mysqli("cluster_name");
$link->query(get_shard_hint(1) . "INSERT INTO t1(id) VALUES(1)");
$link->query("/*shard_a*/INSERT INTO t1(id) VALUES(2)");

A SQL hint (comment) at the very beginning of the query tells the load balancer which "shard" to use. The load balancer can then map the shard to one node or a group of nodes that form a MySQL Replication setup consisting of one master to handle the write load and multiple slaves for the reads. That’s one use case of the node_group filter coming with PECL/mysqlnd_ms 1.5.

Static sharding example

It is rather unlikely you will ever setup a MySQL cluster like the below, but let’s assume you do to give me a chance to demo all kinds of funny setups. Your cluster consists of three groups ("shards"). Some of the groups use replication (A, B), some don’t (C). Based on some application rules you want to distribute queries to A, B and C.

cluster_name
Group A	Group B		Group C
Master 1	Master 2		Master 3
Slave 1	Slave 2	Slave 3

This is an example PECL/mysqlnd_ms 1.5 load balancer configuration to match the above cluster topology. Instead of naming the groups A, B, and C, one could use 0, 1, 2 as well. This may be handy if your application is using a very basic function to compute the "shard" such as ID % 3.

node_group.conf


{
  "cluster_name": {
    "master": {
      "master1": {
        "host": "192.168.2.1",
        "port": 3306,
      },
      "master2": {
        "host": "192.168.2.2",
        "port": 3306,
      },
      "master3": {
        "host": "192.168.2.3",
        "port": 3306,
      }
    },
    "slave": {
      "slave1": {
        "host": "192.168.2.4",
        "port": 3306,
      },
      "slave2": {
        "host": "192.168.2.5",
        "port": 3306,
      },
      "slave3": {
        "host": "192.168.2.6",
        "port": 3306,
      }
    },
    "filters": {
      "node_groups": {
        "A": {
          "master": [
            "master1"
          ],
          "slave": [
            "slave1"
          ]
        },
        "B": {
          "master": [
            "master2"
          ],
          "slave": [
            "slave2",
            "slave3"
          ]
        },
        "C": {
          "master": [
            "master3"
          ],
          "slave": [
            "master3"
          ]
        }
      },
      "roundrobin": [
        
      ]
    }
  }
}

Next, you load PECL/mysqlnd_ms into PHP, enable it, tell it where to find the configuration and instruct it to emit no warning about multiple masters given in the configuration.

php.ini


extension=/path/to/mysqlnd_ms.so
mysqlnd_ms.enable=1
mysqlnd_ms.multi_master=1
mysqlnd_ms.config_file=node_group.conf

You are done with the configuration. If you open a connection to the host "cluster_name" using any PHP MySQL API, the load balancers begins its work. To send queries to the "shard" A, prefix the query with the SQL hint "/*A*/". PECL/mysqlnd_ms will recognize the hint and not use all configured nodes but only those listed in the node group "A". With read write splitting being enabled by default, the SELECT will go to slave1 and the INSERT will be directed to master1. The SELECT statements that begin with the hint "/*B*/" are load balanced in a round robin fashion over slave2 and slave3.


$link = new mysqli("cluster_name", "user", "password", "database");
$link->query("/*A*/SELECT 1 FROM DUAL");
$link->query("/*A*/INSERT INTO test(id) VALUES (1)");
$link->query("/*B*/SELECT 1 FROM DUAL");
$link->query("/*B*/SELECT 1 FROM DUAL");

That’s the idea, that’s what works with the initial push. Details to be implemented.

Cache locality optimization

Optimizing query distribution for better cache usage is another use case of the new node_group filter. Assume all nodes hold all the data, no partitioning or sharding is used, all nodes can answer all queries. Queries are evenly distributed over all nodes (Node 1, 2) and databases (DB1, DB2). Thus, all caches hold roughly the same set of data.

Node 1	Node 2
~6MB DB1.* cache	~5MB DB1.* cache
~4MB DB2.* cache	~5MB DB2.* cache

Using node groups you could manually influence, for example, SELECT distribution for selected queries in order to keep all large SELECT questions for DB1 on Node 1 and all large ones for DB2 on Node 2. This may result in more efficient usage of rare cache memory. However, it is a detail optimization for experts.

Node 1	Node 2
~9MB DB1.* cache	~1MB DB1.* cache
~1MB DB2.* cache	~9MB DB2.* cache

Merry Christmas and a Happy New Year!

Christmas is around the corner, PECL/mysqlnd_ms 1.5 is not yet. The node_group filter is brand new (from yesterday) and its not feature complete. Sharding/partitioning support does not really depend on it. Both use cases sketched in this blog can be implemented using user provided callback functions. Search the fine manual for user and user_multi filter.

Happy hacking!

@Ulf_Wendel

Ulf Wendel