Ulf Wendel

PHP: Client side caching for all MySQL extensions

The first public mysqlnd plugin adds client side query result caching to all MySQL extensions of PHP (ext/mysql, ext/mysqli, PDO_MySQL). The cache is written in C. It does not change any of the PHP MySQL APIs and works with any PHP application using MySQL. Query results are stored on the client. Cached data can be stored in main memory, APC, Memcache, SQLite (theoretically – via SQLite wrapperBerkeleyDB). The default invalidation strategy is TTL (Time to live): cache entries are valid for n-seconds. The TTL can be set per query or globally for all queries. User-defined storage handler can implement any invalidation method. Some built-in storage handler offer a special slam defense mode which you may know from Memcache or APC. Statistics help you to identify cache candidates and to measure cache efficiency. The plugin is available in source (PHP License)

Download :

C performance, almost transparent, horizontal scale out

The mysqlnd query result plugin is written in C like every other mysqlnd plugin today. It plugs into the MySQL native driver for PHP (mysqlnd) and adds query caching functionality to it. C gives you the best possible performance – better than any PHP based cache. The plugin is also a PHP extension. Adding a new PHP extension to an existing PHP deployment infrastructure should be easy going technically. Because it is a PHP extension and part of the database library, it gives many users, who do not compile PHP themselves, an "out-of-the box" experience. For those users caching will be just "built-in".

If you are new to mysqlnd and mysqlnd plugins, please note the background information given at the end of the article.

Any PHP MySQL application
|
ext/mysql, ext/mysqli, PDO_MySQL
|
MySQL native driver for PHP (mysqlnd)
mysqlnd query result cache plugin
Hit   Miss
Cache:
main memory, Memcached, APC, SQLite
  MySQL Server

The cache plugin operates on a new level in between the MySQL Server and the existing PHP MySQL APIs. Therefore it does not change any of the existing APIs. It works with every PHP MySQL application including existing ones. This is comparable to a proxy cache.

Client based caches move contents closer to the client. This saves network traffic, makes data accesses faster and lowers the load on the MySQL Server. The MySQL Server and in particular its built-in MySQL Query Cache (don’t confuse this with the plugin!) can easily become a bottle-neck. Central resources are hard to scale. Most of the time you have no other choice but using more powerful hardware. This approach has its limits. It is much easier to scale by client. The mysqlnd query result cache plugin allows you to scale by client. If you hit a load limit, you add a new machine. No worries on overloading the MySQL Query Cache.

Applications control what gets cached

The query cache plugin has two modes of operation. It can either cache every query or only those queries marked for caching with a special SQL comment. The latter is the recommended mode. Applications control what gets cached. SQL comments (SQL hints) control the cache. The SQL hints must be placed at the very beginning of query.

$mysqli = new mysqli("localhost", "user", "password", "database");
$mysql->query("⁄*qc=on*⁄SELECT something FROM slowtable");

$pdo = new PDO("mysql:host=localhost;dbname=database", "user", "password");
$stmt = $pdo->execute("⁄*qc=on*⁄SELECT something FROM slowtable");

$mysql = mysql_connect("localhost", "user", "password", true);
mysql_select_db("database", $mysql);
mysql_query("⁄*qc=on*⁄SELECT something FROM slowtable");

If you run the cache plugin in the unrecommended mode of caching all queries by default you can disable automatic caching for a particular query by preceding it with the SQL hint “/*qc=off*/”. If you use the recommended mode of caching only those queries marked for caching you have to use the SQL hint “/*qc=on*/” to enable caching. Queries will be cached for the number of seconds specified with the PHP configuration setting (php.ini) mysqlnd_qc.ttl. The default TTL can be overwritten on a per query basis using the SQL hint “/*qc_ttl=n*/”.

That is it – you have mastered the foundations of using the plugin in your application. No worries dear experts, the standard TTL invalidation strategy can be replaced by user-defined ones.

Proven standard solutions for storing data

The cache plugin can store cache entries in many different media. For storing data in the main memory of the PHP process the cache uses its own storage handler based on hashes from by the PHP runtime environment. For storing data in shared memory, for using memory-mapped I/O, for using files and for using a server deamon which uses main memory the plugin reuses proven open standard solutions. You may be using some of them already, the cache plugin may fit nicely into your setup.

Handler Medium Location
Default Main memory of the PHP process (Hashes) local
Memcached Deamon using main memory local or remote
APC Shared memory, memory-mapped I/O local
SQLite Main memory of the PHP process or file local
Userdefined No limits – whatever you can envision local or remote

The storage handler differ from each other. Some are known to scale extremly well, others are not. Some can handle high concurrent loads, some can not. Some can share their data between multiple PHP processes or even multiple machines, others may not be able to reuse a cache entry for more than the duration of a web request. On top of the table you see the combinations which are bad at reusing cache entries. At the end of the table are those combinations which give you the best possible reuse.

Handler Deployment Cache entries are reusable/shared…
Default CGI within single web request
SQLite CGI and :memory: within single web request
Default FastCGI/Prefork/Threaded single process, multiple requests
SQLite FastCGI/Prefork/Threaded and :memory: single process, multiple requests
APC Shared memory multiple processes, single machine
APC Memory-mapped I/O multiple processes, single machine (yes, NFS but…)
SQLite file – you don’t want this! multiple processes, single maschine
Memcache local server multiple processes, single maschine
Memcache remote server multiple maschines

You have the choice of picking the combination that fits your PHP MySQL application best.

Scope of a cache entry Handler
single request Default (CGI), SQLite (CGI and :memory:), user-defined
single process (mutiple requests) Default (FastCGI/prefork/threaded), SQLite (FastCGI/prefork/threaded and :memory:), user-defined
single machine (mutiple processes) APC (Shared memory, memory-mapped I/O), Memcached [, SQLite (file)], user-defined
multiple machines Memcached, user-defined

The unlimited offering: user-defined storage handler

Internally the mysqlnd cache plugin consists of a core and the various storage handler. The core has no more logic in it than absolutely necessary. Storage handler are given as much freedom and as many responsibilities as possible. Storage handler do:

  • control which query gets cached
  • control where to store data
  • implement the invalidation strategy
  • maintain cache statistics

PHP developers can write user-defined storage handler in PHP to break out of the limits of the built-in handlers. For example, a user-defined storage handler could maintain a dependency graph between cache entries and automatically invalidate all dependent cache entries if underlying data changes. This may lower the risk of serving stale data, which is possible if using the standard TTL invalidation strategy. You can support any invalidation strategy – even those we have not been able to think of.

User-defined storage handler could also eliminate the need of having to use SQL hints. You could teach your storage handler what queries it shall cache regardless if they start with a SQL hint or not. No need to touch the code of an existing application, no need to change any SQL query. Or say, you want to cache all queries which have a run-time of more than 0.1s – no problem, hack a storage handler.

User defined storage handler can be written by:

  • providing procedural callback functions
  • implementing an interface and registering the object
  • subclassing and specializing the built-in default storage handler

Procedural – checked, object oriented – checked, extending build in storage handler – checked… choose one.

Slam defense

Basic caching for a simple application is super easy: use the plugin, add some SQL hints, let the driver do the work and stop bothering. But caching for high loads is an expert topic. It requires careful planning.

All the major caches, including APC and Memcached, have learned a lesson on overloading and reusing cache entries: what if a very popular cache entry expires?

Client 1   Client 2 Client 3 Client 3 Client .. Client n
|   |
|   Cache
|    
MySQL Server
Load
 
 
             

All of a sudden many clients are no longer served by the cache. They get a cache miss and start quering the data source. A sudden load peak on the data source is the consequence. The MySQL Server, as an example of a data source, will become very busy. Because it start to be overloaded it takes longer and longer to compute answers for all the requests. Over the time more and more cache entries expire because the data source is still busy calculating a reply for the first client that got a cache miss and the cache has not been refreshed yet. The MySQL Server is going to die because you of the cache!

Client 1   Client 2 Client 3 Client 3 Client .. Client n
|   |
|   Cache Miss
|   |
MySQL Server
Load
  *tilt*
     
   

To prevent slamming the MySQL Server the plugin can continue to serve the expired cache entry until they get refreshed. The first client that hits an expired cache entry gets a cache miss. It will query the data source and update the cache entry. But all subsequent clients accessing the same cache entry will get a cache hit until the first client has updated the cache entry. The subsequent clients are given a cache hit although the cache entry is experired.

Client 1   Client 2 Client 3 Client 3 Client .. Client n
|   |
|   | Cache (expired but slam defense)
|   |  
MySQL Server
Load
 
     
     

All the big boys have this feature. We have it too. We call it slam defense because this is how APC, a PHP specific solution, calls it.

It is not possible for the cache plugin to track if the first client which got a cache miss and is supposed to update an expired cache entry has managed to update the entry. We do not know if the client died or not. To prevent an expired cache entry from being in slam defense mode and being served forever we use a TTL for slam defense. If slam defense is turned on an expired cache entry is served for another n-seconds according to the mysqlnd_qc.slam_defense_ttl PHP configuration setting (php.ini) it automatically expired. In the worst case slamming may still happen. But the cache tried its best to prevent it.

A PHP based solution cannot compete

Storage handler are given raw wire data to cache. All existing PHP based query result caching solutions cache the PHP user space result of a query. They cache the PHP array or PHP object which holds the rows of a query result. It is almost impossible to persist the C structures which represent a PHP variable. PHP variables cannot survive the end of PHP process. PHP based caches have to serialize variables before adding them to a cache and have to unserialize them when fetching from the cache. The mysqlnd query cache plugin skips the serialization task.

The query cache plugin hooks the mysqlnd calls for reading and sending raw wire data. It records the wire data send from MySQL to PHP. In case of a cache hit it replays the raw wire data. Upon replay the raw wire data needs to be decoded by mysqlnd. This is a very simple and lean solution: no serialization needed. It is impossible for a PHP based solution to do the same. The PHP based solution cannot manipulate the mysqlnd network functions. This is only possible with the mysqlnd C plugin API.

Limitations

After all the good news a bad one. The cache plugin works only with buffered queries. It cannot cache unbuffered ones. However, the most popular PHP MySQL API calls stand for buffered queries:

  • ext/mysqli:
    • mysqli_query()
    • mysqli_query() + mysqli_store_result()
  • PDO_MySQL:
    • PDO::query(), PDO::exec(), PDO::prepare() if PDO::ATTR_EMULATE_PREPARES = 1 (which is the default setting!)
  • ext/mysql:
    • mysql_query()

The cache plugin cannot cache any of the following unbuffered queries:

  • ext/mysqli:
    • mysqli_real_query() + mysqli_use_result()
    • mysqli_query() + MYSQLI_ASYNC
    • mysqli_stmt_*()
  • PDO_MySQL:
    • PDO::query(), PDO::exec(), PDO::prepare() if PDO::ATTR_EMULATE_PREPARES = 0
  • ext/mysql:
    • mysql_unbuffered_query()

Download and further readings

Please find the source code at the project’s MySQL Forge wiki page at http://forge.mysql.com/wiki/MySQLnd_Query_Cache_Plugin_for_PHP. The cache plugin has been proposed to PECL. If the PHP PECL community wants it, you may be able to download it from PECL in the future. If not, it is probably going to Launchpad. In any case, the download location will change. The wiki page will point you to whatever may the current location be.

We release the software as a prototype to stress out that it is not an official MySQL product for which you can buy commercial support. Technically it is on beta level. It has been successfully build on more than 25 platforms and its tests cover well above 85% of the source code.

Documentation on run time configuration and user space functions of the query cache plugin is available on the wiki page.

Comments are welcome, credits to Andrey

As usual, we are happy to answer comments and questions. The is the first public and downloadable plugin. We are eager to see how it will be received and what other plugins you want from us. And, as usual, credits to Andrey who did most of the query cache coding.

Happy hacking!

Background: mysqlnd plugins

The MySQL native driver for PHP (mysqlnd) is a drop-in replacement for the MySQL Client Library (AKA libmysql, libmysqlclient). Mysqlnd is a C library like the MySQL Client Library. It can be used to connect to MySQL. The mysqlnd library is part of PHP since PHP 5.3. All PHP MySQL extensions (ext/mysql, ext/mysqli, PDO_MySQL) can either make use of mysqlnd or the MySQL Client Library. Which one gets used is determined at compile time.

Drupal, phpMyFAQ, phpMyAdmin, Oxid, …
|
ext/mysql, ext/mysqli, ext/PDO_MYSQL
Mysqlnd
Mysqlnd plugin
Load Balancing Monitoring Performance: Cache
|
MySQL Server

A mysqlnd plugin can add new functionality to mysqlnd. Plugins are written in C. Because plugins and mysqlnd operate "under the hood" of PHP they can be understood as a transparent proxy from a user perspective. A mysqlnd plugin does not change any user space APIs. Therefore it is transparent from a PHP application point of view. Plugins can be often be used with existing applications without having to change anything in the application. That is because plugins operate on a new layer in beneath the PHP application and above the MySQL server.

Plugins such as the mysqlnd query result plugin make use of the mysqlnd plugin API. Because mysqlnd is written in C the plugin API is C based. The foundations of the plugin API have been layed in PHP 5.3.0. Back then the lead developer of mysqlnd, Andrey Hristov, did not plan to create a plugin API. It sort of just happened…. Today’s mysqlnd plugin API exists since PHP 5.3.2. C developers can find a thorough description of how to create a plugin here. Further information is also given in the blog posting mysqlnd plugins: alternative to MySQL Proxy ?! or in the slides show from below.

9 Comments