2013/03/20
by admin
1 Comment

PHP: PECL/mysqlnd_ms 1.5 alpha, mysqlnd_qc 1.1 alpha releases

Just in time for a PHP 5.5 code freeze, we are brushing over the Replication and Load Balancing plugin (PECL/mysqlnd_ms) and the Query Cache plugin (PECL/mysqlnd_qc) for PHP mysqlnd. PECL/mysqlnd_ms 1.5 is a step up from the previous version whereas PECL/mysqlnd_qc 1.1 is nothing but a maintenance release.

Replication and Load Balancing – better transaction awareness

PECL/mysqlnd_ms 1.5 monitors all mysqli function calls that change transaction status. Since PHP 5.5, te mysqli API gives you the same level of transaction control as SQL does. Thus, if only using mysql function calls, the plugin can do fully transaction aware load balancing.

The improved transaction awareness is not yet available with PDO_MySQL. PDO_MySQL needs some minor tweaks. I can only hope that we manage to implement them before the PHP 5.5 feature freeze happens. We have not delayed alpha, however, to wait for PDO_MySQL support. In general, PDO is not the strongest argument if you try to convince someone of PHP. Honestly, all of PDO could be in a better shape…

Query Cache plugin – APC update, for the last time?

PECL/mysqlnd_qc has seen most changes around APC. The query cache plugin can use the APC user cache as a (SHM) storage backend. This has always been a bit of a hack as there is no C API for other extensions to use the APC user cache. PECL/mysqlnd_qc is not only accessing – sort of – internal APIs of APC to get the job done but this has another side effect that has caused bug reports in the past. To use APC as a storage backend for PECL/mysqlnd_qc you have to build everything statically. Not quite the way distributions like it…

Depending on how APC develops, we’ll have to decide what to do with the APC support in PECL/mysqlnd_qc in the distant future…

Happy hacking!

@Ulf_Wendel

2013/03/20
by admin
1 Comment

PHP Summit presentation: NoSQL in MySQL

Live from the PHP Summit conference in Munich: NoSQL in MySQL. A couple of hours ago I gave a talk about NoSQL in MySQL, contents: what is NoSQL, what is HandlerSocket, what is the InnoDB Memcached Plugin InnoDB and how to use with PHP, is this NoSQL? Slides are below.

NoSQL in MySQL from Ulf Wendel

(PDF download)

The talk was called a night session. Night session means, its the last talk for the day. After eight hours packed with workshops everybody is a bit tired and exhausted. What else could I do but tell them a bed story. What else could I do but make it a puppet theater starring Kasperle, Sakila, Innobear and others? According the Wikipedia, the puppet name Kasperle has its roots in Munich.

Also, NoSQL in MySQL must be a fairytale. All Ulfs know that databases are named after the children of the database inventors. Monty Ulf Widenius has three children: My, Max and Maria. But, he does not have a son called No. In fact, the name No does not exist…

There’s some serious stuff in the slides as well. The fairytale is only there to keep people aware. Have a look!

Happy hacking!

@Ulf_Wendel

2013/02/24
by admin
6 Comments

PECL/mysqlnd_ms 1.5 + PHP 5.5.0 + MySQL 5.6 = better load balancing

PECL/mysqlnd_ms is a client-side load balancer for PHP that supports any MySQL cluster. It does read-write splitting, failover, introduces a quality of service concept, supports partitioning and, of course, load balancing. New mysqli API (begin, *savepoint) calls in PHP 5.5.0 help to improve transaction awareness. New read only in MySQL 5.6 promise major performance gains (think 2x) and an option to reduce the load on a MySQL Replication master. Read how the features go together in PECL/mysqlnd_ms 1.5.

Load balancing – transaction aware?

A load balancer must not switch connections in the middle of a transaction. A load balancer must send all queries to the server a transaction has been started on until the transaction ends. Unfortunately, it is very hard to develop a transparent load balancer for MySQL. In general there are four approaches:

forget about transparency and require applications to hint the load balancer about transaction boundaries (buuuh!)
have the MySQL server announce transactions to clients on the wirte protocol (buggy 🙁)
monitor SQL queries controlling transactions
monitor API calls controlling transactions

PECL/mysqlnd_ms supports the basic hinting and the API monitoring approach. Using SQL hints to control load balancing during transactions is possible but very uncomfortable.


$mysqli = new mysqli(...);
$mysqli->query("BEGIN");
/* stop load balancing, force use of last server */
$mysqli->query(sprintf("/*%s*/INSERT INTO test(id) VALUES (1)", MYSQLND_MS_LAST_USED_SWITCH));
sprintf("/*%s*/COMMIT", MYSQLND_MS_LAST_USED_SWITCH));

API monitoring is a step forward. If transaction stickiness has been configured, PECL/mysqlnd_ms stops load balancing once autocommit is turned off. Given you set trx_stickiness=master, the load balancer will run all transactions on the master.


$mysqli->autocommit(false);
/* autocommit is off, must not switch connections if transaction_stickiness is set */
$mysqli->query("INSERT INTO test(id) VALUES (1)");

$pdo->setAttribute(PDO::ATTR_AUTOCOMMIT, false);
/* if trx_stickiness is set, no connection switch allowed */
$stmt = $pdo->prepare("SELECT @myrole AS _role");
$stmt->execute();
$result = $stmt->fetchAll(PDO::FETCH_ASSOC);

Internally, PECL/mysqlnd_ms hooks the autocommit() C API function of mysqlnd. PDO_MySQL and mysqli call it and thus, PECL/mysqlnd_ms recognizes the change.

Any PHP MySQL application
\|
`$pdo->setAttribute(PDO::ATTR_AUTOCOMMIT, false);`	`$mysqli->autocommit(false);`
mysqlnd `autocommit()`
PECL/mysqlnd_ms `autocommit()`: transaction stickiness set and in transaction?
\|
MySQL Master	MySQL Slave

However, remember that if you used SQL to control the autocommit mode, PECL/mysqlnd_ms would not recognize the change and transaction stickiness would not work.

MySQL C API vs. PHP API

As a PHP user, you may be surprised to hear that autocommit() is the only call monitored in PECL/mysqlnd_ms 1.4. That’s because its pretty much all the MySQL C API had to offer and thus, all the plugin could hook and use to detect transaction boundaries. For example, PECL/mysqlnd_ms 1.4 cannot be made aware of a call to PDO::beginTransaction() because PDO::beginTransaction() does not map to any MySQL C API call that the plugin could monitor.

A close look unveils that SQL offers way more options to control transactions than the MySQL C API.

SQL	MySQL C API	PHP 5.4 MySQL APIs
SET autocommit	mysql_autocommit()	mysqli_autocommit(), PDO::ATTR_AUTOCOMMIT
START TRANSACTION	n/a	PDO::beginTransaction()
START TRANSACTION transaction_characteristic (e.g. READ ONLY)	n/a	n/a
COMMIT	mysql_commit()	mysqli_commit(), PDO::commit()
COMMIT [WORK] [AND [NO] CHAIN] [[NO] RELEASE]	n/a	n/a
ROLLBACK	mysql_rollback()	mysqli_rollback(), PDO::rollBack()
ROLLBACK [WORK] [AND [NO] CHAIN] [[NO] RELEASE]	n/a	n/a
SAVEPOINT	n/a	n/a
RELEASE SAVEPOINT	n/a	n/a
ROLLBACK [WORK] TO [SAVEPOINT] identifier	n/a	n/a

The feature gap between SQL and PHP (mysqli) API is closed in PHP 5.5. The mysqlnd C library has been extended to offer C calls for all SQL features. Those C calls can be monitored by PECL/mysqlnd_ms 1.5. And, those calls are exported to the mysqli API. The transaction aware load balancing of PECL/mysqlnd_ms 1.5 is no longer limited to autocommit() but covers all of the below mysqli_*-functions.

SQL	PHP 5.5 MySQL APIs
SET autocommit	mysqli_autocommit(), PDO::ATTR_AUTOCOMMIT
START TRANSACTION	mysqli_begin_transaction(), PDO::beginTransaction()
START TRANSACTION transaction_characteristic (e.g. READ ONLY)	mysqli_begin_transaction([option [, name]])
COMMIT	mysqli_commit(), PDO::commit()
COMMIT [WORK] [AND [NO] CHAIN] [[NO] RELEASE]	mysqli_commit([option])
ROLLBACK	mysqli_rollback(), PDO::rollBack()
ROLLBACK [WORK] [AND [NO] CHAIN] [[NO] RELEASE]	mysqli_rollback([option, [name]])
SAVEPOINT	mysqli_savepoint(name)
RELEASE SAVEPOINT	mysqli_release_savepoint(name)
ROLLBACK [WORK] TO [SAVEPOINT] identifier	mysqli_rollback([option, [name]])

PDO_MySQL has not been modified yet to use the new mysqlnd API calls. Work in progress… even the mysqli API additions have not been documented yet.

mysqli constant	Comment
`MYSQLI_TRANS_START_WITH_CONSISTENT_SNAPSHOT`, `MYSQLI_TRANS_START_READ_WRITE`, `MYSQLI_TRANS_START_READ_ONLY`	For `mysqli_begin_transaction()`. See SQL.
`MYSQLI_TRANS_COR_AND_CHAIN`, `MYSQLI_TRANS_COR_AND_NO_CHAIN`, `MYSQLI_TRANS_COR_RELEASE`, `MYSQLI_TRANS_COR_NO_RELEASE`	Use with `mysqli_commit()`, `mysqli_rollback()`. See SQL.

Better load balancing in PECL/mysqlnd_ms 1.5

MySQL 5.6 introduces read only transactions. If you tell InnoDB in advance that a transaction will perform read operations only, it can be executed faster than a transaction that may perform a write. Early MySQL 5.6 Release Candidate benchmarks hinted that read only transactions could run twice as fast as normal transactions. You can use the SQL statement
START TRANSACTION READ ONLY to begin a read only transaction, or you use the new mysqli API features of PHP 5.5.0.


$mysqli->begin_transaction(MYSQLI_TRANS_START_READ_ONLY);
$mysqli->query(...);
$mysqli->query(...);
$mysqli->commit();

Using the API has the advantage that PECL/mysqlnd_ms 1.5 can do transaction aware load balancing: the plugin picks a server to run the transaction and continues using it until the transaction ends. If MYSQLI_TRANS_START_READ_ONLY is set, the plugin may try to run the transaction on a slave in order to reduce the load on the master. Whether the transaction will end up on a slave depends on a number of additional factors.

trx_stickiness setting	mysqli call	version requirements
PECL/mysqlnd_ms load balancing
not set	$mysqli->begin_transaction()
Ignored! Load balancer may switch connections at any time. Not transaction safe!
master	$mysqli->autocommit(), PDO::ATTR_AUTOCOMMIT	PHP 5.4, PECL/mysqlnd_ms 1.2
If autocommit is turned off, choose master and used it until autocommit is enabled again. Once autocommit is enabled, switching servers may happen at any time.
master	$mysqli->begin_transaction()	PHP 5.5, PECL/mysqlnd_ms 1.5
Choose a master: if failover is enabled, search a master until you find one based on failover rules. Once a master has been found stop load balancing, stop failover. Use master until the end of the transaction, monitor mysqli_commit(), mysqli_rollback() C API counterparts to learn about the end of the transaction.
on	$mysqli->begin_transaction(MYSQLI_TRANS_START_READ_ONLY)	PHP 5.5, PECL/mysqlnd_ms 1.5
Try to use a slave to run the transaction on. Try to use slave only if master-on-write and the current quality of service setting allows it. For example, if strong consistency has been requested, slaves will not be considered. If no slaves are configured or all slaves have failed, try using a master (see above).

Beside all the improvements, it would be so much easier for clients to do proper transaction aware load balancing if the server would announce the begin and end of a transaction on the wire protocol…

Happy hacking!

@Ulf_Wendel

2013/01/28
by admin
2 Comments

1.5x … 9x faster queries with PHP and MySQL 5.6, really?

I am telling no secret saying MySQL 5.6 GA can be expected to be released soon, very soon. Time to test one of the improvements: MySQL 5.6 speaks SQL and Memcache protocol. In your PHP MySQL apps, try using the Memcache protocol to query MySQL. A key-value SELECT ... FROM ... WHERE pk = <key> can become 1.5x-2x faster, an INSERT INTO table(pk, ...) VALUES (<key> ...) can become 4.5x-9x faster, says the InnoDB team! Read on: background, benchmark, usage, PECL/mysqlnd_memcache, PHP Summit…

Background

Since at least 2009, power users have tried to access MySQL bypassing the SQL layer. SQL is a very rich and powerful query language. But parsing and optimizing SQL takes time. So does a wire protocol that supports all SQL features. A less featured query language and protocol can significantly improve performance. See Key Value stores such as Memcache or Redis: simple queries, simple but fast protocol.

Application
MySQL SQL Client	Memcache Client
\|	\|
rich SQL queries strong persistence, safe data store no stale data	simple and fast key lookup weak persistence, data loss in case of crash memcached: often stale data copies from DBMS
\|	\|
MySQL Server	Memcached

Thus, for simple MySQL queries there should be a simple and fast protocol. But which? Many MySQL web deployments use Memcache, I would not be surprised to learn that its the case for 30% of the MySQL web users. The MySQL manual has a chapter dedicated to using Memcached with MySQL. Given the popularity, the choice of a fast protocol for simple key value style queries was easy: MySQL had to learn Memcache protocol.

MySQL SQL Client	Memcache Client
\|	\|
rich SQL queries and/or simple and fast key lookup persistence: configurable from strong to weak no caching, no data copy to sychronize
\|
MySQL 5.6 with InnoDB Memcache Plugin
InnoDB table

In an ideal world one can run an existing Memcached application against MySQL 5.6 without any code modifications. Future user experience will tell whether the MySQL 5.6 InnoDB Memcache plugin is good enough to replace Memcached installations. There are reasons for considering MySQL as a Memcached replacement (related: Couchbase 2.0) . However, it is the enormous popularity and developer familarity (clients, APIs, usage, …) with Memcached that counts. From an application developer perspective there is little new to learn: change server IP from Memcached to MySQL, done. No more hassles with cold or stale caches…

Benchmarks… never trust them

Really, can MySQL 5.6 replace a Memcached installation? I do not know. As a poP (plain old PHP developer), I focussed on the question whether replacing key value style SQL queries with Memcache access to MySQL could improve PHP application performance. This may be the first step in an evaluation of the new feature.

Of course, I could not resist to run a benchmark MySQL vs. Memcached. As always, never trust benchmarks, run your own tests! All tests have been run on a plain i3-2120T CPU (2 cores, 4 cores counting hyper-threading), 8GB RAM, OpenSuSE 12.1 desktop. Disk configuration is as worse as it can get: soft RAID-1. Due to the poor disk setup I tried to avoid disk access whenever possible, and keep tests running entirely in main memory. Albeit iostat -x has not hinted significant bottlenecks I will not say more on write performance than this. Writes using memcache protocol have been 2x faster than writes using SQL (single INSERT) at the average of some 20 test runs.

Rest of the stack: MySQL 5.6.9-rc (not GA!), memcached 1.4.15, libmemcached 1.0.15, PHP 5.4.11 all compiled using their default settings. Test runs are short, sample variation had to be considered, load generator and server are running on the same machine. This setup is far, far from an ideal one but still something a poP may use.


[mysqld]
innodb_buffer_pool_size = 3072M
daemon_memcached_w_batch_size=1000
daemon_memcached_r_batch_size=1
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0

Please note, the daemon_memcached_w_batch_size setting is not relevant for the following read performance result. The chart shows the reads per second observed with bin/memslap --servers=127.0.0.1 --concurrency=4 --execute-number=20 --test=get --binary --debug --flush –initial-load=100000. memslap does not report reads per second but you can compute them with the formula: reported_runtime * 10000 / concurrency. Key lenght is 100 bytes, value lenght is 400 bytes, set size is 10,000 rows – memslap defaults…

MySQL 5.6.9-rc InnoDB Plugin, memslap, read

MySQL to replace Memcached?

Memcached performed best if using socket connections. Setting daemon_memcached_option="-s/tmp/mmemcached" made the MySQL 5.6.9-rc1 (non GA) release bind to the socket set but no client was able to perform any queries. Thus, for the question whether MySQL can replace Memcached, we have to look at TCP/IP connection figures.

MySQL 5.6
Memcached running as a MySQL plugin (as part of the MySQL process)
Memcached	–	`cache_policy=cache_only`	–	main memory storage
\|
`cache_policy=innodb_only`
\|
InnoDB

The chart shows two TCP/IP figures for MySQL. One shows the results for the MySQL InnoDB Memcached cache policy cache_only, the other one shows results for cache_policy=innodb_only. The MySQL InnoDB Memcached plugin is basically a Memcached running as part of the MySQL process. Memcached supports custom storage backends. The cache_policy sets whether InnoDB or main memory shall be used. In the latter case, the plugin is no different from a standalone Memcached.

Armed with this knowledge you can start comparing MySQL InnoDB Memcached plugin with standalone Memcached. To compare apples and apples you have to compare cache_policy=cache_only with standalone Memcached. Result: little difference. If you compare cache_policy=innodb_only with standalone Memcached you compare apples and oranges! Result for my particular setup: InnoDB turned out slower than Memcached. But recall that you compare apples and oranges: different system architecture, different persistence, possibly no more caching layer…

SQL SELECT versus Memcache get

Sometimes, comparing apples and oranges is perfectly valid. For example, when asking whether to refactor a key value style SQL access (SELECT ... FROM table WHERE pk = <key>) into a Memcached access to MySQL. These kinds of queries are quite typical for applications that use frameworks.
MySQL 5.6.9-rc InnoDB Plugin, SQL vs. Memcached access using PHP

My own benchmark and the InnoDB team benchmark show similar results: try replacing simple SELECT with Memcached accesses, it can give 1,5x … 2x faster queries. The PHP script used is given at the end of the blog post. It is using defaults similar to memslap defaults: key length 100 bytes, value length 400 bytes, 10,000 rows. To minimize impact of the slow disk system on the test computer, the script sleeps for 30 seconds after populating the test table. The SQL access is done using mysqli (w. mysqlnd).


[...]
$res = $mysqli->query($sql = "SELECT c2 FROM demo_test WHERE c1='" . $pairs[$idx][0] . "'");
if (!$res) {
  printf("[%d] %s\n", $mysqli->errno, $mysqli->error);
  break;
}
$row = $res->fetch_row();
$res->free();
assert($pairs[$idx][1] == $row[0]);
[...]

The equivalent Memcached access is done using PECL/Memcached.


[...]
if (false == ($value = $memc->get($pairs[$idx][0]))) {
  printf("[%d] Memc error\n", $memc->getResultCode());
  break;
}
assert($pairs[$idx][1] == $value);
[...]

Please note, the benchmark does not take connect times into account. If you replace only few SQL SELECT queries with a Memcache access to MySQL, the connect overhead for the additional Memcache connection to MySQL may outweight performance gains. As always: run your own tests.

PECL/mysqlnd_memcached

Last year Johannes has published PECL/mysqlnd_memcached. PECL/mysqlnd_memcached monitors all queries run by mysqlnd using any PHP MySQL API compiled to use the mysqlnd library. If a query matches a regular expression for a key value style SELECT, then the plugin transparently performs a Memcached access instead of a SQL access to MySQL. Back then, we found cases in which this gave a slight performance increase of some 20…30%. With my recent benchmark I got some performance loss using PECL/mysqlnd_memcache compared to a plain SQL access with mysqli. As always: run your own tests. However, this hints that an automatic and transparent runtime replacement may not be fast enough.

Finding SQL SELECT key value style queries

Refactoring existing applications may be a better solution. There are two mysqlnd plugins that can help to identify queries that qualify for refactoring: PECL/mysqlnd_uh (PHP 5.3 only), PECL/mysqlnd_qc (PHP 5.3 and above). PECL/mysqlnd_uh enables you to rebuild PECL/mysqlnd_memcached using PHP instead of C. The blog post
Uh, uh… extending mysqlnd: monitoring and statement redirection gets you started. A PHP 5.4+ compatible alternatives are the mysqlnd_qc_get_normalized_query_trace_log() and mysqlnd_qc_get_query_trace_log() functions provided by PECL/mysqlnd_qc. The functions help you to find the origin of SQL SELECT key value style queries by providing traces of all queries executed. Please note, it is not necessary to turn on the actual caching functionality of the client-side cache plugin PECL/mysqlnd_qc to access the query traces.

Apples, oranges and other peoples benchmarks…

Albeit artificial benchmarks hint a significant performance benefit, real life gains have to be evaluated on a case by case basis. For example, you may decide not to commit after each read/get and run InnoDB in read-only mode on a slave server. Thus, you set daemon_memcached_r_batch_size=64 and/or innodb_read_only in your MySQL configuration. Setting either one can double reads per second, peaking around 21,000 reads/s from MySQL 5.6 on an Intel-i3 based desktop. Still slower than Memcached but the features are different…

MySQL 5.6.9-rc InnoDB Memcache Plugin, PHP script, read

Really, there is something here…

A look at the performance shows that there is something here. Next, one will have to take a look at the feature difference and compare with other approaches in MySQL land. That’s the story to tell in my talk at the PHP Summit.

Finally, stability: good enough to start testing. MySQL 5.6.9-rc was reasonable stable to run benchmarks against it but not perfectly stable. PECL/mysqlnd_memcached beta needs to be brushed over to reflect the latest changes. We will do that once MySQL 5.6 GA has been published.

Happy hacking!

@Ulf_Wendel

PHP benchmark script used


<?php
define("MYSQL_HOST", "127.0.0.1");
define("MYSQL_USER", "root");
define("MYSQL_PWD", "");
define("MYSQL_DB", "test");
define("MYSQL_PORT", 3307);
define("MYSQL_MEMC_PORT", 11211);

define("NUM_VALUES", 10000);
define("REPEAT_READS", 10);

/* Wait time e.g. for background commit */
define("REST_TIME_AFTER_LOAD", 30);

/* Make sure the schema matches! */
define("KEY_LEN", 10);
define("VALUE_LEN", 100);

/* match MySQL config to be fair... */
define("WRITE_COMMIT_BATCH_SIZE", 1000);

/* number of parallel fetch worker (processes) */
define("FETCH_WORKER", 2);




function store_fetch_results_in_mysql($run_id, $pid, $results, $num_values = NUM_VALUES, $repeat = REPEAT_READS) {
  $mysqli = new mysqli(MYSQL_HOST, MYSQL_USER, MYSQL_PWD, MYSQL_DB, MYSQL_PORT);
  if ($mysqli->errno) {
    printf("[%d] %s\n", $mysqli->errno, $mysqli->error);
    return false;
  }
  if (!$mysqli->query("CREATE TABLE IF NOT EXISTS php_bench(
       run_id INT, pid INT UNSIGNED,
       label VARCHAR(60),
       runtime DECIMAL(10, 6) UNSIGNED, ops INT UNSIGNED)")) {

     printf("[%d] %s\n", $mysqli->errno, $mysqli->error);
     return false;
  }
  foreach ($results as $label => $time) {

     $sql = sprintf("INSERT INTO php_bench(run_id, pid, label, runtime, ops)
              VALUES (%d, %d, '%s', %10.6f, %d)",
             $run_id,
             $pid,
             $mysqli->real_escape_string($label),
             $time,
             ($time > 0) ? ($num_values * $repeat / $time) : 0);
     if (!$mysqli->query($sql)) {
       printf("[%d] %s\n", $mysqli->errno, $mysqli->error);
       return false;
     }
  }
  return true;
}

function generate_pairs($num = NUM_VALUES, $key_len = KEY_LEN, $value_len = VALUE_LEN) {
  $pairs = array();
  $anum = "0123456789ABCDEFGHIJKLMNOPQRSTWXYZabcdefghijklmnopqrstuvwxyz";
  $anum_len = strlen($anum) - 1;

  for ($i = 0; $i < $num; $i++) {
    $key = "";
    for ($j = 0; $j < $key_len; $j++) {
      $key .= substr($anum, mt_rand(0, $anum_len), 1);
    }
    $value = $key . strrev($key) . $key . strrev($key);
    $pairs[] = array($key, $value);
  }

  return $pairs;
}

function load_pairs_memc($memc, $pairs) {
  $inserted = 0;
  foreach ($pairs as $k => $pair) {
    if (false == $memc->add($pair[0], $pair[1])) {
      printf("[%d] Memc error\n", $memc->getResultCode());
      break;
    }
    $inserted++;
  }
  return $inserted;
}

function load_pairs_sql($mysqli, $pairs) {
  $inserted = 0;
  $mysqli->autocommit = false;
  foreach ($pairs as $k => $pair) {
    if (!$mysqli->query(sprintf("INSERT INTO demo_test(c1, c2) VALUES ('%s', '%s')", $pair[0], $pair[1]))) {
      printf("[%d] %s\n", $mysqli->errno, $mysqli->error);
      break;
    }
    $inserted++;
    if ($inserted % WRITE_COMMIT_BATCH_SIZE == 0) {
      $mysqli->commit();
    }
  }
  $mysqli->commit();
  $mysqli->autocommit = true;
  return $inserted;
}

function timer($label = '') {
  static $times = array();
  if (!$label)
    return $times;

  my_timer($label, $times);
  return $times;
}

function my_timer($label, &$times) {
  if (!$label)
    return;

  if (!isset($times[$label])) {
    $times[$label] = microtime(true);
  } else {
    $times[$label] = microtime(true) - $times[$label];
  }
}


function fetch_sql($mysqli, $pairs, $repeat = REPEAT_READS) {
  $fetched = 0;
  for ($i = 0; $i < $repeat; $i++) {
    $fetched += _fetch_sql($mysqli, $pairs);
  }
  return $fetched;
}
function _fetch_sql($mysqli, $pairs) {
  $fetched = 0;
  $num = count($pairs);
  while (count($pairs)) {
    do {
     $idx = mt_rand(0, $num);
    } while (!isset($pairs[$idx]));

    $res = $mysqli->query($sql = "SELECT c2 FROM demo_test WHERE c1='" . $pairs[$idx][0] . "'");
    if (!$res) {
      printf("[%d] %s\n", $mysqli->errno, $mysqli->error);
      break;
    }
    $row = $res->fetch_row();
    $res->free();
    assert($pairs[$idx][1] == $row[0]);
    $fetched++;
    unset($pairs[$idx]);
  }
  return $fetched;
}

function fetch_memc($memc, $pairs, $repeat = REPEAT_READS) {
  $fetched = 0;
  for ($i = 0; $i < $repeat; $i++) {
    $fetched += _fetch_memc($memc, $pairs);
  }
  return $fetched;
}
function _fetch_memc($memc, $pairs, $repeat = 1) {
  $fetched = 0;
  $num = count($pairs);
  while (count($pairs)) {
    do {
      $idx = mt_rand(0, $num);
    } while (!isset($pairs[$idx]));

    if (false == ($value = $memc->get($pairs[$idx][0]))) {
      printf("[%d] Memc error\n", $memc->getResultCode());
      break;
    }
    assert($pairs[$idx][1] == $value);
    $fetched++;
    unset($pairs[$idx]);
  }
  return $fetched;
}


function generate_and_load_pairs($num = NUM_VALUES, $key_len = KEY_LEN, $value_len = VALUE_LEN) {

  $mysqli = new mysqli(MYSQL_HOST, MYSQL_USER, MYSQL_PWD, MYSQL_DB, MYSQL_PORT);
  if ($mysqli->errno) {
    printf("[%d] %s\n", $mysqli->errno, $mysqli->error);
    return array();
  }
  $memc = new Memcached();
  if (!$memc->addServer(MYSQL_HOST, MYSQL_MEMC_PORT)) {
    printf("[%d] Memc connect error\n",  $memc->getResultCode());
    return array();
  }

  timer("generate pairs");
  printf("\tGenerating pairs...\n");
  $pairs = generate_pairs($num, $key_len, $value_len);
  timer("generate pairs");


  timer("load pairs using SQL");
  printf("\tLoading %d pairs using SQL...\n", load_pairs_sql($mysqli, $pairs));
  timer("load pairs using SQL");

  $mysqli->query("DELETE from demo_test");

  /* server think and commit time */
  sleep(REST_TIME_AFTER_LOAD);

  timer("load pairs using Memcache");
  printf("\tLoading %d pairs using Memcache...\n", load_pairs_memc($memc, $pairs));
  timer("load pairs using Memcache");

  sleep(REST_TIME_AFTER_LOAD);

  return $pairs;
}


function fetch_and_bench($pairs, $pid, $indent = 1, $repeat = REPEAT_READS) {
 $times = array();

 $mysqli = new mysqli(MYSQL_HOST, MYSQL_USER, MYSQL_PWD, MYSQL_DB, MYSQL_PORT);
  if ($mysqli->errno) {
    printf("[%d] %s\n", $mysqli->errno, $mysqli->error);
    return $times;
  }
  $memc = new Memcached();
  if (!$memc->addServer(MYSQL_HOST, MYSQL_MEMC_PORT)) {
    printf("[%d] Memc connect error\n",  $memc->getResultCode());
    return $times;
  }
  $prefix = str_repeat("\t", $indent);

  my_timer("fetch using plain SQL", $times);
  printf("%s[pid = %d] Fetched %d pairs using plain SQL...\n", $prefix, $pid, fetch_sql($mysqli, $pairs, $repeat));
  my_timer("fetch using plain SQL", $times);

  mysqlnd_memcache_set($mysqli, $memc);
  my_timer("fetch using Memcache mapped SQL", $times);
  printf("%s[pid = %d] Fetched %d pairs using Memcache mapped SQL...\n", $prefix, $pid, fetch_sql($mysqli, $pairs, $repeat));
  my_timer("fetch using Memcache mapped SQL", $times);

  my_timer("fetch using Memcache", $times);
  printf("%s[pid = %d] Fetched %d pairs using Memcache...\n", $prefix, $pid, fetch_memc($memc, $pairs, $repeat));
  my_timer("fetch using Memcache", $times);

  return $times;;
}


$run_id = mt_rand(0, 1000);

$pairs = generate_and_load_pairs(NUM_VALUES, KEY_LEN, VALUE_LEN);
$load_times = timer();

$pids = array();
for ($fetch_worker = 1; $fetch_worker <= FETCH_WORKER; $fetch_worker++) {
   switch ($pid = pcntl_fork()) {
      case -1:
         printf("Fork failed!\n");
         break;

      case 0:
         printf("\t\tFetch worker %d (pid = %d) begins...\n", $fetch_worker, getmypid());
         $times = fetch_and_bench($pairs, getmypid(), 2);
         store_fetch_results_in_mysql($run_id, getmypid(), $times, NUM_VALUES, REPEAT_READS);
         printf("\t\tWorker %d (pid = %d) has recorded its results...\n", $fetch_worker, getmypid());
         exit(0);
         break;

      default:
         printf("\t\tParent has created worker [%d] (pid = %d)\n", $fetch_worker, $pid);
         $pids[] = $pid;
         pcntl_waitpid($pid, $status, WNOHANG);
         break;
   }
}

foreach ($pids as $pid) {
  pcntl_waitpid($pid, $status);
}



printf("\n\n");
printf("Key settings\n");
printf("\t%60s: %d\n", "Number of values", NUM_VALUES);
printf("\t%60s: %d\n", "Key length", KEY_LEN);
printf("\t%60s: %d\n", "Value length", VALUE_LEN);
printf("\t%60s: %d\n", "SQL write commit batch size", WRITE_COMMIT_BATCH_SIZE);
printf("\t%60s: %d\n", "Parallel clients (fetch)", FETCH_WORKER);
printf("\t%60s: %d\n", "Run ID used to record fetch times in MySQL", $run_id);

printf("\n\n");
printf("Load times\n");
foreach ($load_times as $label => $time) {
  printf("\t%60s: %.3fs (%d ops)\n", $label, $time, NUM_VALUES / $time);
}

printf("\n");
printf("Fetch times\n");

$mysqli = new mysqli(MYSQL_HOST, MYSQL_USER, MYSQL_PWD, MYSQL_DB, MYSQL_PORT);
if ($mysqli->errno) {
 die(sprintf("[%d] %s\n", $mysqli->errno, $mysqli->error));
}
$res = $mysqli->query("SELECT DISTINCT label FROM php_bench WHERE run_id = " . $run_id);
if (!$res)
  die(sprintf("[%d] %s\n", $mysqli->errno, $mysqli->error));

while ($row = $res->fetch_assoc()) {
  $sql = sprintf("SELECT AVG(runtime) as _time, AVG(ops) AS _ops FROM php_bench WHERE label = '%s' GROUP BY run_id HAVING run_id = %d",
    $mysqli->real_escape_string($row['label']),
    $run_id);
  if (!($res2 = $mysqli->query($sql)))
    die(sprintf("[%d] %s\n", $mysqli->errno, $mysqli->error));

  $row2 = $res2->fetch_assoc();
  printf("\t%60s: %.3fs (%d ops)\n", $row['label'], $row2['_time'], $row2['_ops']);
}
$mysqli->query("DELETE FROM php_bench");
printf("\n\n");
printf("\t\tTHE END\n");

2013/01/17
by admin
5 Comments

Searching document stores in 2013: from 1983 to SQL:2003 in a blink?

I love the new NoSQL systems: more choices! After years of RDBMS dominance there are hundrets of NoSQL systems offering a wide range of data models, data distribution strategies and interfaces. Polyglot persistence describes the market change. I am most fascinated by document stores: nested data and data distribution go hand-in-hand. Nested data, finally. And, for those who like it: schemaless or even schemafree. Maybe something to learn for MySQL? But their search capabilities… A word or two on SQL (SELECT … FROM … WHERE – SFW) and nested data.

Learn from NoSQL document stores

The classical relational data model requests all data to be in first normal form (1NF):

A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain.

Simply put: only scalar values are in the columns of table allowed. Take the popular example of a web application implementing a blog system. Let a blog article consist of an id, a title, the blog content and a list of comments with each of it consisting of an id, an author and a message. Using the classic models two tables are needed: blogposts(id, title, content) and comments(id, author, message, fk_blogpost_id). Because only scalar values can be stored in a table, the list of comments has to go in a second table. A foreign key links the two tables.

blogposts
id	title	content
1982	Searching document stores…	I love the new NoSQL systems: more choices! …

comments
id	author	message	fk_blogpost_id
1	Andrey	Hi Ulf, …	1982
2	Johannes	Ahoi Ulf, …	1982

A NoSQL document store allows using non-scalar values. Thus, comments belonging to a blogpost can be saved in a nested fashion: blogpost(id, title, content, comments(id, author, message)). Here’s a citation from 1986 transferring it into the world of relational databases:

…, an NF² relation is a set of (equally structured) tuples (or records), the fields (attributes) of which contain either atomic values (e.g. numbers, strings) or relations. The latter may be flat or NF² relations in turn.

As will be shown, the non-first normal form (NF² or N1NF) data model is not as academic as it may seem on the first look.

blogposts
id	title	content	comments
id	title	content	id	author	message
1982	Searching document stores…	I love the new NoSQL systems: more choices! …	1	Andrey	Hi Ulf, …
1982	Searching document stores…	I love the new NoSQL systems: more choices! …	2	Johannes	Ahoi Ulf, …

Naturally, the 1980s usage examples are not about blog software. Back then, the examples have been around CAD/CAM, forms, scientific and engineering data. The NF² data model and a modern document store share some advantages:

Semantically related data is not split into different tables.
- On disk: potentially less joins and disk seeks, thus quicker search. Related: Akiban.
- Distribution (e.g. sharding): potentially less cross-shard queries as basic queries may not spawn multiple nodes. Related: MySQL Cluster condition push-down, …
Hierarchical structures often found in reports (master-detail view) are matched.
1:n relationships are easily modeled

[e]NF² and JSON stores similarities

Todays NoSQL document stores go beyond yesterdays NF² data model. Often, JSON respectively BSON is used as a data serialization format. This is a natural choice for a web database. JSON and JavaScript go perfectly together. JavaScript gets more and more popular, for example, to create portable applications for mobile devices. But also other languages support the lightweight data encoding format.

JSON supports scalar and non-scalar data types. The scalar types are: string, number, bool and null. The non-scalar types are array and object. With the exception of an array, counterparts can be found in classic NF² models, as sketched above. Extended NF² proposals exist that cover all JSON types. SQL:1999 introduces ARRAY (as of SQL:2003 multi-dimensional) as well as ROW. SQL:2003 adds MULTISET. The similarities between a JSON based document store and SQL based [e]NF² remain obvious, which is important when getting deeper into algebra (not for me 🙂 !), query languages and such.

Data types
JSON	SQL
Scalar / atomic
string	SQL:92 – CHARACTER, CHARACTER VARYING, SQL:1999 – CLOB
number	SQL:92 – NUMERIC, DECIMAL, INTEGER, SMALLINT, FLOAT, REAL, DOUBLE PRECISION
true, false	SQL:1999 – BOOLEAN (TRUE, FALSE, UNKNOWN)
null	SQL:86(?) – NULL
Non-scalar / composite / collection
object	SQL:1999 – ROW
array	SQL:1999 – ARRAY (ordered), SQL:2003 – MULTISET (unordered), see notes below!

Below is a concrete example for the blog post application data modelling using a JSON document and a SQL:2003 style table. Once you look beyond MySQL, you can find relational databases implementing these or similar features. For example, Oracle has a variable-length VARRAY type, Postgres offers variable-length multidimensional ARRAY at least since 7.1, DB/2 uses…, Informix has…

JSON

JSON
`{ "id": 1982, "title": "Searching document stores...", "content": "I love the new NoSQL systems: more choices! ...", "show_comments": true, "comments": [ { "author": "Andrey", "message": "Hi Ulf, ..." }, { "author": "Johannes", "message": "Ahoi Ulf, ..." } ] }`
SQL:>=2003
`CREATE TABLE blogposts ( id INTEGER, title VARCHAR(100), content CHARACTER LARGE OBJECT, show_comments BOOLEAN, comments ROW( author VARCHAR(100), message CHARACTER LARGE OBJECT ) ARRAY[1000] )`


{
    "id": 1982,
    "title": "Searching document stores...",
    "content": "I love the new NoSQL systems: more choices! ...",
    "show_comments": true,
    "comments": [
        {
            "author": "Andrey",
            "message": "Hi Ulf, ..."
        },
        {
            "author": "Johannes",
            "message": "Ahoi Ulf, ..."
        }
    ]
}

SQL:>=2003


CREATE TABLE blogposts (
  id      INTEGER,
  title   VARCHAR(100),
  content CHARACTER LARGE OBJECT,
  show_comments BOOLEAN,
  comments ROW(
    author VARCHAR(100),
    message CHARACTER LARGE OBJECT 
    ) ARRAY[1000]
)

Please note, I have choosen an example which gives the impression that there is a direct mapping between a SQL:2003 table and arbitrary JSON documents. This is not the case. The SQL:2003 collection data types ARRAY and MULTISET are typed. Imagine, some comments had a created TIME field, others had not. Then, two distinct ROW declarations would be required. For the rest of the post, I’ll ignore the glory schema question.

SQL:1999 "the type monster" versus schemaless

The major difference between a JSON store and a SQL:1999+ relational database is on the role and importance of a schema. Be it a table or a composite type everything in a relational database follows a previously defined schema. The more you walk along the SQL:1999, SQL:2003, SQL:2008 road, the complexer and stricter types (and schemata) get: object orientation, XML, … Not so in many NoSQL document stores: each and every document can have a different schema, schemas are not declared upfront. Simple thing: simple to learn, simple to use yet powerful.

Martin Fowler has published a great presentation discussing schemaless data structures last week. It’s as good as his book NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence (strong buy for beginners).

Like the schemaless nature of a JSON document store or not. Fact is, JSON based document stores have gained significant popularity and market share. Fact is, if your queries match the sweet spot of the data model (queries do not spawn multiple documents), you need sharding, and the system does not mess to badly with schemaless disadvantages (e.g. storage requirements) you get a winning combination.

It is so appealing! BUT the query language…

Speaking of JSON, documents and sharding what could be more logical than doing key lookups (shard key, distributed systems, ….) or using JavaScript expressions to search the data store. As a web developer you do know JavaScript inside-out anyway. Possibly, the search is enriched with MapReduce (sharding, distributed systems… ).

Key lookups are unquestionable no rich query language. Whether the JavaScript expressions available to are rich and powerful, has to be checked on a case by case basis. For example, I have my doubts (english version, german version) about the MongoDB query language. MapReduce is certainly great for distributed systems but only for those. Olaf Bachman, who is managing Google’s European ads abuse engineering efforts, describes in his talk NoNoSQL@Google how his team ended up building SQL-like query capabilities for the various NoSQL solutions used.

One does not have to be a computer specialist to understand SFW basics- Read: sales and marketing can do basic analysis themselves without bothering you, the developer. Virtually ever developers has learned SQL at some point, thus low training costs for you, the manager.

SQL is a declarative language. You say what data you want and not how to access it. The data store will find an access path, an optimizer component can speed it up. Whereas with MapReduce or, for example, a JavaScript based query language, the developer has to implement an access path. Listen to NoNoSQL@Google: are you an expert in developing distributed [MapReduce] programs?

The roots of SQL for [e]NF²

Given the link between a simple JSON centric data model – [e]NF and the unquestioned advantages of SQL, it is time to do a time ride back to the 1980s but skip 1977. We are on the hunt for SQL to query [e]NF². Let the joy ride begin with the paper Die NF²-Relationenalgebra zur einheitlichen Manipulation externer, konzeptueller und interne Datenstrukturen by Hans-Jörg Schek and Marc H. Scholl published in 1983. The discussion of a relational algebra was quickly followed by prototype systems and query language proposals, for example:

M.A. Roth, H.F. Korth, and D.S. Batory.
SQL/NF: A Query Language for N1NF Relational Databases, 1985
P.Pistor, R.Traunmuller.
A Database Language for Sets, Lists and Tables, 1985
Unrelated: MySQLers remember developer meeting in Heidelberg? This became HDBL – Heidelberg Database Language. No kidding.
P.Pistor, F. Andersen.
Designing a Generalized NF² model with an SQL-type language interface, 1986
Per-Åke Larson.
The Data Model and Query Language of LauRel, 1988

After being a hot topic in the mid 1980s, research focus seems to have shifted towards object oriented data bases.

Most NF² algebras propose NEST and UNNEST operations. Most SQL-style language proposals seek to improve database language orthogonality, see also Some principles of good language design: with especial reference to the design of database languages by C.J. Date (1984). Traces of both can be found in SQL:2003.

Nested relations, nested expressions…

The SQL/NF paper notes that a NF² data structure allows a relation wherever a scalar value can occur in a 1NF data structure (orthogonality) [1]. They continue that SQL has a closure property where the result of any query on one or more relations is itself a relation [2].

	SELECT attribute-list
	FROM relation-list
	WHERE predicate

Thus, the language shall allow using a SFW-expression in the attribute-list (because of [1]) and in the relation-list (because of [2]) of a SFW-expression.

blogposts
id	title	content	comments_allowed	comments
id	title	content	comments_allowed	author	message
1982	Searching document stores…	I love the new NoSQL systems: more choices! …	true	Andrey	Hi Ulf, …
1982	Searching document stores…	I love the new NoSQL systems: more choices! …	true	Johannes	Ahoi Ulf, …
2013	Solution for…	MySQL 5.6 introduces…	true
2012	… and a Happy New Year!	Picture taken…	false

Given the above nested blogpost table, the proposal is that we can filter the nested comments using nested SFW:


  SELECT 
    title, 
    (SELECT * FROM comments WHERE author = 'Johannes') 
  FROM 
    blogposts

The above query would return all blogposts including those without any comments but show only comments from Johannes, if any. Andreys’ comment on "Searching document stores…" would be excluded. To show only Johannes comments’ and retrieve only those blogposts commented by Johannes one adds a predicate:


  SELECT 
    title, 
    (SELECT * FROM comments WHERE author = 'Johannes') 
  FROM 
    blogposts
  WHERE 
    (SELECT * FROM comments WHERE author = 'Johannes')

The result of this query would be:

title	comments
title	author	message
Searching document stores…	Johannes	Ahoi Ulf, …

The userfriendliness of the query can be improved removing the need to repeat the nested SFW-expression. The various papers from back then use different shortcuts. Let’s do a modern mash-up using SQL:1999 WITH style:


 WITH
   author_search 
 AS
   (SELECT * FROM comments WHERE author = 'Johannes')
 SELECT 
   title, 
   author_search 
  FROM 
    blogposts
  WHERE 
    author_search

BTW, an Oracle colleguage (not from MySQL) noted in the current issue of the german computer magazine iX that MySQL lacks WITH…

I am aware of jumping around between old and new syntax elements. However, I am not after proposing a concrete language – I better leave that to the experts. I try to take a users view on what could possibly be done. The early proposals appeal to me… One more query to conclude the paragraph with SQL:2003 style syntax elements: show all blog posts with together with the first comment, if any.


  SELECT
    title,
    comments[0]
  FROM 
    blogposts

SQL:2003 allows accessing ARRAY elements using an offset, like in many other programming languages.

Nesting and unnesting

To transform nested into flat relations and vice versa NEST and UNNEST operations are required. Formulating them explicitly and introducing new operators might contribute to language clearity. (Please note, that below example uses PNF.)

blogposts
id	title	content
1982	Searching document stores…	I love the new NoSQL systems: more choices! …

comments
id	author	message	fk_blogpost_id
1	Andrey	Hi Ulf, …	1982
2	Johannes	Ahoi Ulf, …	1982

v NEST / UNNEST ^

blogposts
id	title	content	comments
id	title	content	id	author	message	fk_blogpost_id
1982	Searching document stores…	I love the new NoSQL systems: more choices! …	1	Andrey	Hi Ulf, …	1982
1982	Searching document stores…	I love the new NoSQL systems: more choices! …	2	Johannes	Ahoi Ulf, …	1982

Unnesting is also useful for use of nested relations with function that expect scalar values. What should SUM((price, description), (price, description)) return? Recall that we can have non-scalar wherever scalar values used to be.

SQL:2003 ?

To be honest, I have not written a single SQL query of higher complexity than SELECT 1 FROM DUAL for years. SELECT 'Hello world' was I needed for testing PHP stuff. Be gentle with my lame examples inspired by what I found in my bookshelf. My bookshelf has a copy of SQL 1999 und SQL 2003: Objektrelationales SQL, SQLJ und SQL/XML by Can Türker. (Aside and BTW: Can Türker is an advisor of ETZ Zurich Database Research Group together with H.J. Schek.)

For all SQL:2003 examples, I’ll use a slightly modified table definition than shown above. I do not use ARRAY to match JSON syntax but MULTISET.


CREATE TABLE blogposts (
  id      INTEGER,
  title   VARCHAR(100),
  content CHARACTER LARGE OBJECT,
  show_comments BOOLEAN,
  comments ROW(
    author VARCHAR(100),
    message CHARACTER LARGE OBJECT 
    ) MULTISET
);

To get started we are looking for all blogposts commented by Johannes. For each one found, we want all the comments, including Andreys’ comments.


  SELECT * 
  FROM blogposts
  WHERE EXISTS(
    SELECT * 
    FROM UNNEST(comments) c 
    WHERE c.author = 'Johannes');

Find all blog posts titles together with all comments but show only comments from Johannes, if any:


  SELECT 
    title,
    MULTISET(
      SELECT c 
      FROM UNNEST(comments) c
      WHERE c.author = 'Johannes'
    )
  FROM blogposts

Compared to the early SQL/NF style examples the syntax is quite verbose because a type constructor (MULTISET) us used with the inner query. If I’m not mistaken, such explicit type constructors can also be found in HDBL from 1985/1986. But instead of having to type the eight characters of MULTISET the constructor names are shorter like <>, {}…

SQL:2003 `TABLE` constructor

… speaking of constructors. A new TABLE constructor has been added to SQL:2003 which will become handy in a second. The constructor makes it possible to use a functions return value like a table. For example, there could be a function random_greetings returning (rnd, greeting) tuples. By help of TABLE it can appear in the relation-list of an SFW-expression:


  SELECT
    rnd
  FROM TABLE(random_greetings(100))

Back to JSON

Time to recap, time to attempt to crack the nut.

More than 30 years ago, the SQL community has looked into breaking out of the limitations of "flat" tables (first normal form). Algebras for "nested" tables (non-first normal form) have been developed, followed by SQL language extension proposals. Some 10 years ago, SQL:2003 has incorporated all or some of the ideas – please, find an expert to elaborate on this.

SQL tables can hold nested data, JSON documents can hold nested. SQL types almost match JSON types. The troublemaker here is a JSON array. A JSON array is a sorted list of values of arbitrary types (types can vary within one array). A SQL array is a sorted list of values of one type.

JSON	SQL
`[ 1, "schemaless", "no", "yes", 3 ]`	`n/a`
`[ 1, 2, 3 ]`	`INT ARRAY[3]`

Likely, one would have to introduce a new sorted collection type to overcome this gap. Or, would we rather need a loosely typed scalar? Whatever, it should be possible to use (almost) standard SQL:2003 tables to store JSON documents in a "native" way. JSON would be nothing but a fancy data serialization format. Such a nested table could equally well be used to store and output data as XML.

Result: MySQL as a schemafull document store – with a proper query language.

Finally, the schemaless issue

The remaining nut to crack is schemaless. Like it or not, people use it. Tell them it has drawbacks, they still use it and want it. A question to the experts: is it a non-issue for an SFW-expression?

SFW can deal any nested table. Any collection of (schemaless) documents can be mapped into a list of (schemafull) tables. In the worst case the collection is mapped into as many different tables as there are documents. However, SFW can deal with any table.

It is not possible to create a table that can all possible documents of a collection. But, we can create a table and dump all JSON documents in a BLOB. The SQL:2003 TABLE constructor can be used to create tables, on which we can perform queries, based on a functions return value. Is this possible:


SELECT
  id,
  title
FROM TABLE(json(<blob_column>))

Result, if possible: MySQL as a schemaless document store – with a proper query language. I would definetly prefer that over any UDF based approach such as:


  SELECT
    CAST(JSON_DECODE("id", <blob_column>), INT) AS id,
    JSON_DECODE("title", <blob_column>) AS title
  FROM 
   <blob_column>

The sketched NF²/SQL:2003 road would allow storing nested data in a schemafull and possibly also schemaless way. Query capabilities would be great. Likely, the BLOB hack is somewhat hidden from the user.

Happy hacking!

@Ulf_Wendel

2013/01/10
by admin
Comments Off on Solution for: MySQL 5.6 password expired, PHP can’t connect, application stops

Solution for: MySQL 5.6 password expired, PHP can’t connect, application stops

MySQL 5.6 introduces a new features that must be used with great care. A MySQL users password can be marked as expired. This way, a DBA can force a user to set or reset his password. The MySQL user must set a (new) password before he is allowed to do anything else. As a consequence, if a users password is expired all standard PHP MySQL API connect calls will fail. Applications stop working unless the application is changed to include a user dialog for setting a new password. To develop such a dialog for resetting an expired password one has to use a new connection flag introduced in PHP 5.4.12-dev. Don’t panic: to get in trouble DBA actions have to be at one level with dropping the MySQL user of a production PHP application…

Relax: IF MySQL 5.6 AND IF …AND IF …

You are being warned in time about a pitfall that DBAs may tap into in the future. What’s discussed affects you only:

IF using: MySQL 5.6, which is not GA/stable yet at the time of writing
- AND IF using: ALTER USER user@host PASSWORD EXPIRE
  - AND IF: user@host is used by a PHP application to log in to MySQL
    - AND IF: you have not read about your options discussed below
  - OR
    - user@host is told to reset the password using an unmodified (= todays) PHP web application, for example, phpMyAdmin.

The pitfall, step-by-step

As of MySQL 5.6 a DBA can force a MySQL user to set or reset his password. Let’s assume I want to give Andrey access to my local MySQL 5.6 server. Thus, I create a MySQL user called Andrey. As I do not need to know his password but want him to set a password, I use the password expire feature to force him to set one.


nixnutz@linux-fuxh:~/ftp/mysql56bzr/install> bin/mysql -uroot -S/tmp/mysql56bzr.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2510
Server version: 5.6.11-debug Source distribution
[...]
mysql> GRANT ALL ON test.* TO andrey@localhost;
Query OK, 0 rows affected (0,05 sec)

mysql> ALTER USER andrey@localhost PASSWORD EXPIRE;
Query OK, 0 rows affected (0,00 sec)

Before I tell Andrey, I verify the account changes. It turns out that he can log in to MySQL but is not allowed to run any command but SET PASSWORD. As a good DBA, I have scanned the MySQL documentation and nothing seems suspicious.


nixnutz@linux-fuxh:~/ftp/mysql56bzr/install> bin/mysql -uandrey -S/tmp/mysql56bzr.sock test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2511
Server version: 5.6.11-debug
[...]
mysql> SHOW TABLES;
ERROR 1820 (HY000): You must SET PASSWORD before executing this statement

Quickly after asking Andrey to log in, he replies 🙁 haha…. He fails to log in to set a password. Given that he can’t log in, how should he set a password?!


andrey@johannes_box:~/> /usr/local/mysql/bin/mysql -uandrey -S/tmp/mysql56bzr.sock test
ERROR 1820 (HY000): You must SET PASSWORD before executing this statement

Maybe PHP works, is Andrey’s baby mysqlnd affected?!


andrey@johannes_box:~> php -r '
$link = new mysqli("localhost", "andrey", "johannes", 
    "test", NULL, "/tmp/mysql56bzr.sock"); 
var_dump(mysqli_connect_error());'
PHP Warning:  mysqli::mysqli(): (HY000/1820): You must SET PASSWORD before executing this statement in Command line code on line 1
string(53) "You must SET PASSWORD before executing this statement"

andrey@johannes_box:~> php --version
PHP 5.3.11-dev (cli) (built: Mar  6 2012 11:28:19) (DEBUG)
Copyright (c) 1997-2012 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2012 Zend Technologies

I go and check whether I did anything wrong, which seems not the case. I manage to set the password. Immediately thereafter user andrey@localhost can create and populate table, log in using the new password and query the table.


nixnutz@linux-fuxh:~/ftp/mysql56bzr/install> bin/mysql -uandrey -S/tmp/mysql56bzr.sock test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2513
Server version: 5.6.11-debug
[...]
mysql> SET PASSWORD=PASSWORD('johannes');
Query OK, 0 rows affected (0,00 sec)

mysql> SHOW TABLES;
Empty set (0,08 sec)

mysql> CREATE TABLE test(id INT);
Query OK, 0 rows affected (0,96 sec)

mysql> INSERT INTO test(id) VALUES (1);
Query OK, 1 row affected (0,03 sec)

mysql> QUIT;
Bye
nixnutz@linux-fuxh:~/ftp/mysql56bzr/install> bin/mysql -uandrey -p -S/tmp/mysql56bzr.sock test
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2516
Server version: 5.6.11-debug Source distribution
[...]
mysql> SELECT * FROM test;
+------+
| id   |
+------+
|    1 |
+------+
1 row in set (0,00 sec)

Step 1: check client versions

The solution is in the different versions of the MySQL command-line prompt used in the above examples. The MySQL prompt is an application. It is one of many PHP (Java, ODBC, C++, C, …) MySQL applications. All the applications use a MySQL client library to connect MySQL. Old MySQL client libraries cannot handle the password expiration.

If using an old client, the connect fails:


andrey@johannes_box> /usr/local/mysql/bin/mysql --version
/usr/local/mysql/bin/mysql  Ver 14.14 Distrib 5.1.45, for suse-linux-gnu (x86_64) using readline 5.1

Whereas the latest and greatest clients can handle password expiration and may allow connecting to MySQL for setting a password.


nixnutz@linux-fuxh:~/ftp/mysql56bzr/install> bin/mysql --version
bin/mysql  Ver 14.14 Distrib 5.6.11, for Linux (x86_64) using  EditLine wrapper

The PHP mysqlnd library can handle expired passwords since PHP 5.4.12-dev (next 5.4 release). Please, consult the MySQL documentation on version details for other libraries and clients.

Step 2: connect error is default

It is not enough for an application to use a recent MySQL client library to allow for setting an expired password. Todays standard API connect calls will fail on connect, if the password has expired. Here is an example using PHP 5.4.12-dev and a standard connect call:


nixnutz@linux-0v4u:~/git/superhero/php-src> sapi/cli/php -r '
$link = new mysqli("localhost", "andrey", 
             "johannes", "test", NULL, "/tmp/mysql56bzr.sock");
var_dump(mysqli_connect_error());';

Warning: mysqli::mysqli(): (HY000/1820): You must SET PASSWORD before executing this statement in Command line code on line 1
string(53) "You must SET PASSWORD before executing this statement"

nixnutz@linux-0v4u:~/git/superhero/php-src> sapi/cli/php -v
PHP 5.4.12-dev (cli) (built: Jan 10 2013 15:33:00) (DEBUG)
Copyright (c) 1997-2013 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2013 Zend Technologies

All MySQL client libraries should show this default behaviour: fail on connect if expired password.

The libraries assume that a standard application has no dialog for setting a password, thus the connect call fails by default. Those applications that want to allow resetting an expired password in an interactive way, must use new API features. The MySQL command-line prompt is an example of such an interactive application.

Step 3: new PHP API to allow for setting expired password

To write an interactive PHP application which allows setting an expired password, you have to use mysqli_init(), mysqli_options() and mysqli_real_connect(). Set mysqli_options($link, MYSQLI_OPT_CAN_HANDLE_EXPIRED_PASSWORDS, 1); to be allowed to connect to MySQL. Please note, you must pass the old password to mysqli_real_connect(), if any. Once connected, you are allowed to run SET commands such as SET PASSWORD. As soon as the expired password has been (re-)set, the connected MySQL user can run all commands he has been granted privileges for.


$link = mysqli_init();
$link->options(MYSQLI_OPT_CAN_HANDLE_EXPIRED_PASSWORDS, 1);
if (!$link->real_connect("localhost", "", "johannes", "test", NULL, "/tmp/mysql56bzr.sock")) {
  printf("Cannot log in: [%d] %s\n", $link->connect_errno, $link->connect_error);
} else {
  /* do this in a nice dialog... */
  if (!$link->query("SET PASSWORD=PASSWORD('johannes')")) {
   printf("[%d] %s\n", $link->errno, $link->error);
  }
  if ($res = $link->query("SELECT 1")) {
    var_dump($res->fetch_assoc());
  }
}

Now that the expired password has been (re-)set the application can use standard connect calls (mysqli_connect(), new mysqli(...)) without getting error 1820.

The worst case scenario

Think twice before you expire a password!

Imagine you expire the password of a user and advice the user of resetting it, for example, using todays phpMyAdmin or any similar tool. Todays phpMyAdmin is not prepared to handle the expired password situation. It is using standard PHP MySQL connect calls. It will fail to connect. The user cannot use it to reset the expired password. Ouch…

The solution is above. As a DBA, you must make sure that every MySQL user with an expired password has access to an interactive tool for resetting the password. This can be any PHP 5.4.12-dev+ MySQL application prepared (new flag!) to handle expired passwords or, for example, a MySQL 5.6.10+ command line prompt.

Happy hacking!

@Ulf_Wendel

PS: PDO_MySQL? If you have a nice idea how to properly integrate the new flag into the existing API, please file feature request at bugs.php.net. I don’t think we are in a hurry: too many if’s, if you mind to look above…

2013/01/04
by admin
Comments Off on PECL/mysqlnd_ms and the client failover dilemma – part 1

PECL/mysqlnd_ms and the client failover dilemma – part 1

MySQL Replication has a weak point: the master (primary) is a single point of failure. People ask frequently about client failover. Making a client library failover a client automatically from one server to another is simple. PECL/mysqlnd_ms can do it. Unfortunately, this is only the tip of the iceberg when using a primary copy (master-slave) database cluster. Among others, you may have to reconfigure clients. Below is one hint (no solution) for deploying PECL/mysqlnd_ms configuration files using MHA (Master High Availability Manager and tools for MySQL).

What’s the plural of dilemma?

If a server is unavailable a client may attempt to connect to another server given a list of alternatives. A client library can handle it entirely transparently and automatically if connection state is no issue. Otherwise the client can receive an error, handle the exception and start over again with the client library automatically picking a failover server for the next attempt. As long as the client library knows of failover candidates it works.

Possibly, you want the client library to remember failed server. Dilemma 1: PHP is stateless. PECL/mysqlnd_ms can remember failed servers only for the lifespan of PHP. Anything beyond requires deployment of the configuration file.

Dilemma 2: there may be no failover alternatives. A MySQL Replication master is a single point of failure. In the unlikely event of loss of ~~cabine pressure~~ a master the entire database cluster has to be reconfigured. Usually, a slave has to be promoted as a master and all other slaves have to be made aware of the change.

Certainly, nobody wants thousands of clients attempting to reconfigure a cluster concurrently. Thus, over the years, MySQL users have developed tools for monitoring MySQL Replication clusters and handling failure. MHA (Master High Availability Manager and tools for MySQL) is one of them. MHA automates the cluster reconfiguration: it claims to work in heterogenous deployments with MySQL 5.0 or newer and it may take far less than a minute to replace a failed master.

Dilemma 3: client reconfiguration is left as an excercise, sort of.

Client reconfiguration as part of the failover process

Monitoring tools external to a MySQL Replication cluster often propose two ways to inform clients of cluster topology changes (e.g. new master):

Update catalog with cluster information (more in a future blog post)
IP switch over (not discussed)
Client configuration deployment (see below)

IP switch over is transparent from a client perspective. The IP of the failed master is assigned to the newly elected master. No client changes required – great, issue solved! However, what is virtual IPs are no option for you?

The MySQL Replication monitoring tool of your choice may allow calling custom scripts during cluster reconfiguration. You can hook in your IP switch over script, or, for example, deploy PECL/mysqlnd_ms configuration files.

MHA updating PECL/mysqlnd_ms config

The MHA master_ip_failover_script configuration parameter can be used to call a script for client configuration deployment during cluster reconfiguration. Below is an example configuration file for a minimalistic cluster consisting of one master and one slave, just enough to test MHA in action.


 [server default]
  # mysql user and password
  user=root
  password=root
  ssh_user=root
  # working directory on the manager
  manager_workdir=/var/log/masterha/app1
  # working directory on MySQL servers
  remote_workdir=/var/log/masterha_remote/app1
  master_binlog_dir=/home/nixnutz/ftp/mysql-5.6.9-rc-ms3-master/data
  master_ip_failover_script=/home/nixnutz/ftp/mha4mysql-manager-0.55/mysqlnd_ms_update.php

  [server1]
  hostname=localhost
  ip=127.0.0.1
  port=3309

  [server2]
  hostname=localhost
  ip=127.0.0.1
  port=3310

As a PHP guy, I am using a PHP script to handle the PECL/mysqlnd_ms configuration file update. The script is invoked by MHA during master failover. MHA is passing most valuable information on the dead and the new master to the script. Based on this, the script can remove the slave that became a master from the slave list, remove the failed master and add a new master to the master list. The script I show is far from production ready but you get the idea. For example, I am not discussing remote configuration files on remote application servers…


#!/usr/local/bin/php
<?php
define("MYSQLND_MS_CONFIG", "/home/nixnutz/ftp/mha4mysql-manager-0.55/mysqlnd_ms_config.json");

/*
    New master activation phase
   $options[0] script name
         1 --command=start
         2 --ssh_user=(new master's ssh username)
         3 --orig_master_host=(dead master's hostname)
         4 --orig_master_ip=(dead master's ip address)
         5 --orig_master_port=(dead master's port number)
         6 --new_master_host=(new master's hostname)
         7 --new_master_ip=(new master's ip address)
         8 --new_master_port(new master's port number)
         9 --new_master_user=(new master's user)
        10 --new_master_password(new master's password)
*/

function server_match($server_details, $host, $ip, $port) {
  if (!isset($server_details['host'])) {
    /* TODO: bark */
    return false;
  }
  if ($port && isset($server_details['port'])) {
     /* TCP/IP ? */
     if ($server_details['port'] != $port) {
       return false;
     }
     /* compare ip with server details host */
     return ($server_details['host'] == $ip);
  }
  /* Should be socket... */
  return ($server_details['host'] == $host);
}

$options = array();
foreach ($argv as $k => $arg) {
  $tmp = substr($arg, strpos($arg, "=") + 1);
  if ((substr($tmp, 0, 1) == '"' || substr($tmp, 0, 1) == "'") &&
      (substr($tmp, 0, 1) == substr($tmp, -1, 1))) {
    $tmp = substr($tmp, 1, -1);
  }
  $options[] = $tmp;
}

file_put_contents(MYSQLND_MS_CONFIG . '.options', var_export($options, true));
if (('start' == $options[1]) && (11 == count($options))) {
  /* New master activation phase... */

  $json = file_get_contents(MYSQLND_MS_CONFIG);
  if (!$json) {
    exit(1);
  }
  file_put_contents(MYSQLND_MS_CONFIG . ".old", $json);
  $json = json_decode($json, true);

  foreach ($json as $section_name => $section_details) {
    /* remove old master and add new one... */
    if (isset($section_details['master'])) {
      foreach ($section_details['master'] as $name => $details) {
        if (server_match($details, $options[3], $options[4], $options[5])) {
           printf("Goodbye master...\n");
           var_dump($details);
           unset($json[$section_name]['master'][$name]);
        }
      }
      if ($port = $options[8]) {
        $new_master = array(
           'host' => $options[7],
           'port' => (int)$options[8],
        );
      } else {
        $new_master = array(
           'host' => $options[6],
        );
      }
      /* TODO: user/password */
      $json[$section_name]['master']['master_' . $new_master['host']] = $new_master;
    }
    /* remove slave that has become a master */
    if (isset($section_details['slave'])) {
      foreach ($section_details['slave'] as $name => $details) {
        if (server_match($details, $options[6], $options[7], $options[8])) {
           printf("Goodbye former slave, you are a master now...\n");
           var_dump($details);
           unset($json[$section_name]['slave'][$name]);
        }
      }
    }
    /* TODO: check resulting config, for example, number of slaves/masters! */
  }

  $ok = file_put_contents(MYSQLND_MS_CONFIG . ".new", json_encode($json, JSON_PRETTY_PRINT));
  exit((FALSE !== $ok) && ($ok > 0));
}

If all goes well and all the wheels (there are [too] many!) turn as they should, masterha_manager does it job upon master failure and the script updates the client configuration file.


> masterha_manager --conf=/etc/app1.cnf
[...]
From:
localhost (current master)
 +--localhost

To:
localhost (new master)
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info] * Phase 3.4: Master Log Apply Phase..
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Jan  4 18:39:17 2013 - [info] Starting recovery on localhost(127.0.0.1:3310)..
Fri Jan  4 18:39:17 2013 - [info]  This server has all relay logs. Waiting all logs to be applied..
Fri Jan  4 18:39:17 2013 - [info]   done.
Fri Jan  4 18:39:17 2013 - [info]  All relay logs were successfully applied.
Fri Jan  4 18:39:17 2013 - [info] Getting new master's binlog name and position..
Fri Jan  4 18:39:17 2013 - [info]  mysql-bin.000007:231
Fri Jan  4 18:39:17 2013 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='localhost or 127.0.0.1', MASTER_PORT=3310, MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=231, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Jan  4 18:39:17 2013 - [info] Executing master IP activate script:
Fri Jan  4 18:39:17 2013 - [info]   /home/nixnutz/ftp/mha4mysql-manager-0.55/mysqlnd_ms_update.php --command=start --ssh_user=root --orig_master_host=localhost --orig_master_ip=127.0.0.1 --orig_master_port=3309 --new_master_host=localhost --new_master_ip=127.0.0.1 --new_master_port=3310 --new_master_user='root' --new_master_password='root'
Goodbye master...
array(2) {
  ["host"]=>
  string(9) "127.0.0.1"
  ["port"]=>
  int(3309)
}
Goodbye former slave, you are a master now...
array(2) {
  ["host"]=>
  string(9) "127.0.0.1"
  ["port"]=>
  int(3310)
}
1Fri Jan  4 18:39:17 2013 - [info]  OK.
Fri Jan  4 18:39:17 2013 - [info] ** Finished master recovery successfully.
Fri Jan  4 18:39:17 2013 - [info] * Phase 3: Master Recovery Phase completed.
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info] * Phase 4: Slaves Recovery Phase..
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info] Generating relay diff files from the latest slave succeeded.
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info] All new slave servers recovered successfully.
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info] * Phase 5: New master cleanup phase..
Fri Jan  4 18:39:17 2013 - [info]
Fri Jan  4 18:39:17 2013 - [info] Resetting slave info on the new master..
Fri Jan  4 18:39:17 2013 - [info]  localhost: Resetting slave info succeeded.
Fri Jan  4 18:39:17 2013 - [info] Master failover to localhost(127.0.0.1:3310) completed successfully.
Fri Jan  4 18:39:17 2013 - [info]

----- Failover Report -----

app1: MySQL Master failover localhost to localhost succeeded

Master localhost is down!

Check MHA Manager logs at linux-dstv for details.

Started automated(non-interactive) failover.
Invalidated master IP address on localhost.
The latest slave localhost(127.0.0.1:3310) has all relay logs for recovery.
Selected localhost as a new master.
localhost: OK: Applying all logs succeeded.
localhost: OK: Activated master IP address.
Generating relay diff files from the latest slave succeeded.
localhost: Resetting slave info succeeded.
Master failover to localhost(127.0.0.1:3310) completed successfully.

No solution, but a hint

As said at the beginning, I’m not showing a solution but only giving a hint of what can be done – on the server side.

Instead of MHA one may also want to evaluate the new mysqlfailover utility designed for MySQL 5.6 and GTIDs. There’s a nice demo. The mysqlfailover utility offers hooks like MHA but does not pass any information on the cluster topology (e.g. new master) to the scripts. However, it may be too early to judge about it given that MySQL 5.6 is not GA yet.

Happy hacking!

@Ulf_Wendel

2012/12/30
by admin
Comments Off on … and a Happy New Year!

… and a Happy New Year!

Picture taken over christmas 2012 near Büsum.

2012/12/19
by admin
1 Comment

Some sharding support and cache locality optimization support for PHP MySQL driver

It is time for christmas presents: some sharding support and cache locality optimizations are coming with PECL/mysqlnd_ms 1.5. PECL/mysqlnd_ms is a plugin for the mysqlnd library. The plugin adds replication and load balancing support to any PHP MySQL API (mysql, mysqli, PDO_MySQL) if compiled to use the mysqlnd library.

As a MySQL user you can choose between a wide variety of clustering solutions to scale-out. Your options range from eventual consistent solutions to strong consistent ones, from built-in (MySQL Replication, MySQL Cluster) to third party or home-grown. PECL/mysqlnd_ms is a client side load balancer that aims to serve all.

Background: cluster-wide partitioning and sharding

Divide-and-conquer is a true classic. If you are lucky, your business requires you to spread load and data over many MySQL nodes using some sort of cluster. In many cases read scale-out offered by MySQL Replication is all you need. Any step further means digging deeper into the world of distributed systems.

Writes
MySQL Master
\|	\|	\|
Slave 1	Slave 2	Slave 3
Reads

Possibly, you need more than the above and look into schema-based partitioning for distributing data over many nodes. MySQL Replication cluster topologies are hardly limited by anything but your fantasy. Please, mind the writes because MySQL Replication is using lazy primary copy but who do I tell this?

Writes for DB1, DB2				Writes for DB3, DB4
Reads covering DB1…DB4				Reads covering DB1…DB4
MySQL Master 1			replicate DB1, DB2 –>	MySQL Master 2
DB1 (rw), DB2 (rw), DB3 (ro), DB4 (ro)			<– replicate DB3, DB4	DB1 (ro), DB2 (ro), DB3 (rw), DB4 (rw)
\|	\|	\|		\|	\|
Slave 1	Slave 2	Slave 3		Slave 4	Slave 5
DB1	DB1	DB2		DB3	DB4
Reads DB1		Reads DB2		Reads DB3	Reads DB4

Finally, further growth gets you into sharding. Again, you split up a big task into small manageable units. This time the split is not done "vertical" along columns, tables or schemata but horizontally along rows. For example, you distribute all rows of the table T1 from the database DB1 based on their ID column value over multiple nodes.

Writes for DB1.T1 (ID 0…1e9)			Writes for DB1.T1 (ID > 1e9)		Writes for DB1.T2	…
Reads covering …			Reads covering …		…
MySQL Master 1		replicate … –>	MySQL Master n
DB1.T1 (ID n…m) (rw)		<– replicate …	DB1.T1 (ID n…m) (rw)
\|	\|		\|	\|
Slave 1	Slave 2		Slave 4	Slave 5
DB1.T1			DB1.T1
Reads DB1.T1			Reads DB1.T1

In the example, the ID column serves as a shard key. Based on the shard key and a deterministic function a cluster node is computed to execute the query on. Search the web for consistent hashing and vBuckets if you want more details.

The distribution logic can be an integral part of the database cluster itself or have to be handled by the application. MySQL Cluster has built-in sharding. As a result, the application does not need to worry about how to distribute and access data. It is all transparently handled by the cluster. MySQL Replication, however, does not have sharding built-in.

Let’s recap on sharding/partitioning: data is distributed and grouped according to some rules. Rules are either handled by the cluster itself or by the application.

The idea: grouping nodes, new node_group filter

Time to simplify the illustrations. Let’s take colors out and take a step back: data is distributed and data is grouped.

Cluster of MySQL servers
Group A			Group B
Node 1	Node 2	Node 3	Node 4	Node 5
			Node 4	Node 5

Further simplification is possible. Groups could consist of no more than one node, for example, one shard.

Cluster of MySQL servers
Group A	Group B	…
Shard A	Shard B	…

Some sharding support

More recap: unless the database has built-in sharding the application has to take care of distributing queries. The application may do this like so.


$server = get_shard_server($id = 1);
$link = new mysqli($server);
$link->query("INSERT INTO t1(id) VALUES(1)");

$server = get_shard_server($id = 2);
$link = new mysqli($server);
$link->query("INSERT INTO t1(id) VALUES(2)");

The above is not very nice as all the connection handling is left to the user as an excercise. Also, imagine you wanted to add read scale-out through a lazy primary copy system like MySQL Replication and had to do read-write splitting, failover or load balancing on top. All tasks could be handled by a load balancer using code like this:


$link = new mysqli("cluster_name");
$link->query(get_shard_hint(1) . "INSERT INTO t1(id) VALUES(1)");
$link->query("/*shard_a*/INSERT INTO t1(id) VALUES(2)");

A SQL hint (comment) at the very beginning of the query tells the load balancer which "shard" to use. The load balancer can then map the shard to one node or a group of nodes that form a MySQL Replication setup consisting of one master to handle the write load and multiple slaves for the reads. That’s one use case of the node_group filter coming with PECL/mysqlnd_ms 1.5.

Static sharding example

It is rather unlikely you will ever setup a MySQL cluster like the below, but let’s assume you do to give me a chance to demo all kinds of funny setups. Your cluster consists of three groups ("shards"). Some of the groups use replication (A, B), some don’t (C). Based on some application rules you want to distribute queries to A, B and C.

cluster_name
Group A	Group B		Group C
Master 1	Master 2		Master 3
Slave 1	Slave 2	Slave 3

This is an example PECL/mysqlnd_ms 1.5 load balancer configuration to match the above cluster topology. Instead of naming the groups A, B, and C, one could use 0, 1, 2 as well. This may be handy if your application is using a very basic function to compute the "shard" such as ID % 3.

node_group.conf


{
  "cluster_name": {
    "master": {
      "master1": {
        "host": "192.168.2.1",
        "port": 3306,
      },
      "master2": {
        "host": "192.168.2.2",
        "port": 3306,
      },
      "master3": {
        "host": "192.168.2.3",
        "port": 3306,
      }
    },
    "slave": {
      "slave1": {
        "host": "192.168.2.4",
        "port": 3306,
      },
      "slave2": {
        "host": "192.168.2.5",
        "port": 3306,
      },
      "slave3": {
        "host": "192.168.2.6",
        "port": 3306,
      }
    },
    "filters": {
      "node_groups": {
        "A": {
          "master": [
            "master1"
          ],
          "slave": [
            "slave1"
          ]
        },
        "B": {
          "master": [
            "master2"
          ],
          "slave": [
            "slave2",
            "slave3"
          ]
        },
        "C": {
          "master": [
            "master3"
          ],
          "slave": [
            "master3"
          ]
        }
      },
      "roundrobin": [
        
      ]
    }
  }
}

Next, you load PECL/mysqlnd_ms into PHP, enable it, tell it where to find the configuration and instruct it to emit no warning about multiple masters given in the configuration.

php.ini


extension=/path/to/mysqlnd_ms.so
mysqlnd_ms.enable=1
mysqlnd_ms.multi_master=1
mysqlnd_ms.config_file=node_group.conf

You are done with the configuration. If you open a connection to the host "cluster_name" using any PHP MySQL API, the load balancers begins its work. To send queries to the "shard" A, prefix the query with the SQL hint "/*A*/". PECL/mysqlnd_ms will recognize the hint and not use all configured nodes but only those listed in the node group "A". With read write splitting being enabled by default, the SELECT will go to slave1 and the INSERT will be directed to master1. The SELECT statements that begin with the hint "/*B*/" are load balanced in a round robin fashion over slave2 and slave3.


$link = new mysqli("cluster_name", "user", "password", "database");
$link->query("/*A*/SELECT 1 FROM DUAL");
$link->query("/*A*/INSERT INTO test(id) VALUES (1)");
$link->query("/*B*/SELECT 1 FROM DUAL");
$link->query("/*B*/SELECT 1 FROM DUAL");

That’s the idea, that’s what works with the initial push. Details to be implemented.

Cache locality optimization

Optimizing query distribution for better cache usage is another use case of the new node_group filter. Assume all nodes hold all the data, no partitioning or sharding is used, all nodes can answer all queries. Queries are evenly distributed over all nodes (Node 1, 2) and databases (DB1, DB2). Thus, all caches hold roughly the same set of data.

Node 1	Node 2
~6MB DB1.* cache	~5MB DB1.* cache
~4MB DB2.* cache	~5MB DB2.* cache

Using node groups you could manually influence, for example, SELECT distribution for selected queries in order to keep all large SELECT questions for DB1 on Node 1 and all large ones for DB2 on Node 2. This may result in more efficient usage of rare cache memory. However, it is a detail optimization for experts.

Node 1	Node 2
~9MB DB1.* cache	~1MB DB1.* cache
~1MB DB2.* cache	~9MB DB2.* cache

Merry Christmas and a Happy New Year!

Christmas is around the corner, PECL/mysqlnd_ms 1.5 is not yet. The node_group filter is brand new (from yesterday) and its not feature complete. Sharding/partitioning support does not really depend on it. Both use cases sketched in this blog can be implemented using user provided callback functions. Search the fine manual for user and user_multi filter.

Happy hacking!

@Ulf_Wendel

2012/12/12
by admin
1 Comment

Vote NO for MySQL – Election 2012: NoSQL

Do I have to look out for a new job? Some analysists say so: NoSQL is getting stronger. Given the prediction it is time for me to learn what NoSQL is.

Vote NO for MySQL from Ulf Wendel

http://de.slideshare.net/nixnutz/vote-no-for-mysql (Slides: Vote NO for MySQL)

As a matter of fact there is no one kind of NoSQL but many. Four different kinds of NoSQL stores are competing with MySQL: Key Value Stores, Document Database, Graph Databases and Big Data/Column-oriented. Each of them has their specific strengths.

Key Value Stores are all about performance. MySQL 5.6 introducesMemcache interfaces for both InnoDB and MySQL Cluster. MySQL Cluster is really fast and scales well. Both for reads and writes.

Document stores are schema free. That must appeal to any web developer. But recalling what normalization means, how useful relations can be and what impact the pure existance of a schema has on the disk footprint may make you think twice. And, MySQL 5.6 InnoDB Online DDL (ALTER TABLE) makes schema changes a little less of an issue.

So, things are crystal clear? Not quite… there is some MySQL critique in the slides. However, using some of the core ideas of NoSQL and incorporating them into MySQL setups can’t be wrong, if MySQL 5.6 is really a limiting factor for you. For example, use the good old divide-and-conquery approach: identify small sets of related data that you can easily shard. Then, use batch processing for rebuilding aggregates from shards: map and reduce the data to create views that you can query… Possibly, you want to use the new high performance interfaces for it, if your queries are simple.

Graph Databases and Big Data are beyond the scope of presentation, which I gave today at the PHP user group Hamburg. Please, be gentle with the presentation: there is a difference between the live show and the slides. In case you would like to see me perform live, why not invite me? I’m based in Kiel (Germany). If your company is nearby, ping me…

Happy hacking.

@Ulf_Wendel

Replication and Load Balancing – better transaction awareness

Query Cache plugin – APC update, for the last time?

Load balancing – transaction aware?

MySQL C API vs. PHP API

Better load balancing in PECL/mysqlnd_ms 1.5

Background

Benchmarks… never trust them

MySQL to replace Memcached?

SQL SELECT versus Memcache get

PECL/mysqlnd_memcached

Finding SQL SELECT key value style queries

Apples, oranges and other peoples benchmarks…

Really, there is something here…

PHP benchmark script used

Learn from NoSQL document stores

[e]NF2 and JSON stores similarities

SQL:1999 "the type monster" versus schemaless

It is so appealing! BUT the query language…

The roots of SQL for [e]NF2

Nested relations, nested expressions…

Nesting and unnesting

SQL:2003 ?

SQL:2003 TABLE constructor

Back to JSON

Finally, the schemaless issue

Relax: IF MySQL 5.6 AND IF …AND IF …

The pitfall, step-by-step

Step 1: check client versions

Step 2: connect error is default

Step 3: new PHP API to allow for setting expired password

The worst case scenario

What’s the plural of dilemma?

Client reconfiguration as part of the failover process

MHA updating PECL/mysqlnd_ms config

No solution, but a hint

Background: cluster-wide partitioning and sharding

The idea: grouping nodes, new node_group filter

Some sharding support

Static sharding example

node_group.conf

php.ini

Cache locality optimization

Merry Christmas and a Happy New Year!

[e]NF² and JSON stores similarities

The roots of SQL for [e]NF²

SQL:2003 `TABLE` constructor