Dear MySQL users, MaxDB users and friends,
last week we announced a MaxDB series on planetmysql.org with weekly postings every wednesday. This is the first posting with “real” content, it’s the first time we do this for you. We spend a considerable amount of time on making it as good as possible, but we know it won’t be perfect. Please help us to improve in the future and use the MaxDB Forum to tell us more about your expectations on us and how we can improve. Everybody at MySQL AB is eager to learn about your needs.
The first “real” posting has become a little theoretical an less practical. The reason is that we wanted to explain why you should use MaxDB first and what the concepts of the system are. If you don’t know what the MaxDB software components do, you will have difficulties to learn how to use MaxDB.
In this issue
In this first issue you will be presented a first overview of the MaxDB software, its components, todays coolest features and the future of the product. No installation guide? No, not yet. You need to learn some terms and concepts of the MaxDB world first, before a step-by-step installation can be performed in the next issue on next wednesday.
- Today and tomorrow: Quo vadis, MaxDB?
- TOP 5 Features of MaxDB: No Need for Reorganisation
- TOP 5 Features of MaxDB: Backup and Recovery
- TOP 5 Features of MaxDB: Standby Solutions
- TOP 5 Features of MaxDB: Synchronization Manager
- Getting MaxDB
- MaxDB software components
- In the next issue
Today and tomorrow: Quo vadis, MaxDB?
At end of the last year everbody looked back and reported about their archivements in 2005. EnterpriseDB announced that it had 70,000 downloads in six months after the companyâ€™s public launch on May 23rd. Congratulations! People talked a lot about this new offer based on PostgreSQL. MaxDB gets downloaded about 50,000 times per month but it gets much less attention. MaxDB users, do not hesitate to ask questions on the MaxDB mailinglist, blog about your database, participate to the freenode #maxdb IRC Channel and show everybody that MaxDB is alive.
More than 120 employees of the SAP AG work on the MaxDB technology stack (see also MaxDB and SAP liveCache in the SAP Developer Network). MySQL AB actively supports the MaxDB development. Last year we annouced to continue the Precompiler Development and we have more in the pipe.
MaxDB is the world’s first SAP certified open source database used in 3,500 SAP customer installations worldwide. Some figures show what it means to be SAP certified. SAP certified databases have passed several 10.000 quality assurance test cases, are able to run databases with more than 40,000 tables and views and can consists of terrabytes of data.
MaxDB stands for scalability and stability. Once you are SAP certified a tremendous pressure is on your shoulders. You cannot ask your customer to dump a database before an update and reimport it afterwards, because you changed some storage details. That is not acceptable in the SAP business world. This is disappointing for feature-hungry open source users, but relaxing for business users. MaxDB is a secure investment. It has been a secure investment in the past, it is one today and it will be one tomorrow.
The database is distributed under the terms of the MySQL dual-licence princple. You can use it for free if you go for the GPL way, buy a commercial licence for 1290 Euro per CPU and year or for 49 Euro per user. And you can also buy MaxDB for SAP Applications from MySQL – contact us. A full set of Professional Services are available. Support, Training and Consulting are offered by MySQL and by SAP. Compare the prices before you spend too much money, money is always a short resource.
If you want to learn about the Total Cost of Ownership of MaxDB, ask for our TCO papers or read the articles MaxDB reduces total cost of ownership: High Professionalism, Low Operating Costs and MaxDB in hosting environments: Secure Operation for Users and Developers on SAP INFO
TOP 5 Features of MaxDB: No Need for Reorganisation
MaxDB does not need any kind of reorganisation, similar to MySQL’s InnoDB storage engine. There is no
OPTIMIZE TABLE, no
COMPRESS or similar command available in MaxDB, because it is not needed. In other words:
- no scheduled downtimes
- no slowdowns during reorganisation
- no impact on your business
- no DBA resources needed
MaxDB has these advantages ever since. In fact, it has been designed with these goals in mind. No Need for Reorganisation is a very technical topic. Explaining the techniques and tricks of MaxDB requires some theoretical background. If you are not interested in reading about it, check the following the high-altitude list of benefits and skip the paragraph.
- no gaps between data records
- no unused space
- no longer needed space gets freed and reused immediately
To understand why MaxDB does not need any reorganization, you need to know some basics about how databases work internally. You need to know what a “database page”, a “file” and a B-Tree (balanced tree). We will explain this, step by step. Continue to TOP 5: Backup and Recovery if you trust what we say.
Some theoretical background
Internally a database has a different, more abstract view on the data it stores than you have. Databases usually translate a SQL table (a relation in the relational data model) into an internal file. A file is a collection of smaller units, of “database pages”. Records of your SQL tables (tupels in te relational data model) are stored on the database pages. A page is a consequent chunk of storage. In case of MaxDB the size of a database page is 8kB. The size can be changed when recompiling MaxDB. It will be shown later how MaxDB does all required reorganizations of a page on the fly.
Don’t get confused at this point: what a database calls a file internally, does not need to be the same as a file on your harddisk. Most database systems use a “file manager” component to decouple physical storage details from the higher level components of the database.
Databases could store records in arbitrary order on the database pages that belong to the file that represents the table to which the records belong. The result would be kind of an unsorted list of records. Unsorted lists as a data structure do not support search operations much. Searching for a record stored in an unsorted list requires scanning over all list elements in order to compare them to the searched value. Programmers and mathematicans use the term “O” (Big O) to describe the time complexity of an algorithm like a search operation. The search operation for a list has the time complexitiy O(n). O(n) means, that the time is a function of n, which is the number of list elements. In simple words: if it takes 1 second to search a list with 1.000 elements, it will take 10 seconds to search a list with 10.000 elements and 100 seconds to search a list with 100.000 elements. Other data structures exist that can be search much quicker. B-Trees and Hash tables (hash maps) can be searched much faster. The time complexity for B-Tree search operations is O(log n). Is that any better than O(n) of the list? Yes, it is. See how we can search growing data sets with only little more time. O(log 1.000) is 3, let’s assume it’s 3 seconds. Now, put time times as many entries as before in the tree: O(log 10.000) is 4, O(log 100.000) is 5. This means 3 seconds for 1.000 entries and only 5 seconds for 100.000. Hash tables can do even better with O(1) : 1 second to search a set of 1.000 entries, 1 second for 100.000. But unfortunately hash tables have other disadvantages which makes B-Trees the most commonly used data structure for search operations of a database.
MaxDB is using a variation of a B-Tree called B*-Tree. B*-Trees tend to have a little higher fill-level as other variations of B-Trees and their leaf nodes are chained in form of a linked list. But you do not need to worry about these details. For you it is important to remember that B-Trees are a very good data structure when you need fast search operations. B-Trees itself are just trees, but they are sorted and their branches are balanced. All branches have the same depth. We will unveil some more properties of B-Trees when we explain how indexes can speed up some of your queries.
MaxDB does B-tree balancing on the fly
MaxDB performs all B-Tree balancing operations, all maintenance operations whenever they become necessary. This is a must, because a “degenerated” or “unbalanced” B-tree is no longer an efficient data structure for search operations. Balancing operations can become necessary when you add records to a table, alter records of a table or remove records from a table. MaxDB is using on B*-tree for every table. All index nodes and leafes of the B*-trees are made out of database pages. The leaf nodes of the tree hold your records. If you add a new record to your table, MaxDB will insert it into one of the leaf nodes of the B*-tree that belongs to the table. As said, B-Trees are sorted and so goes your new record into a certain leaf node. It might happen that there is not enough space left on the leaf node, on the database page where your record is to be stored. In this case, MaxDB will add a new node to the tree which can make it necessary to rebalance a branch of the tree or the entire tree. Similar operations are performed automatically when you remove or alter records. MaxDB B*-trees are always in an optimal shape for optimal performance.
Update in place, sort by insertion, delete in place
The counterpart to balancing operations on the level of the B-trees are the operations update in place, sort by insertion and delete in place on the level of the database pages. To explain the operations one needs to know how MaxDB stores records on a data page. Each data page has a size of 8kb by default. A page is devided into three areas. Every data page starts with a list of unsorted data entries, followed by free space. At the end of the data page a sorted list of data entry pointers gets stored. The list of data entry pointers, the position list, is sorted in key order. That means the database can perform a fast sequential scan when it fetches records in ascending or descending key order.
When a data entry gets inserted into a page it gets appended to the end of the list of data entries on the page. The pointer to the newly added data entry gets inserted into the pointer list in a way that the pointer list remains sorted. Sorting is performed on the smaller of the two possible sort lists. MaxDB will not sort the data entries itself but it will sort the much smaller pointer list. Of course, sorting the smaller pointer list is faster than sorting the data entries. This is what MaxDB means by “sort by insertion”.
However, if a page overflows or underflows during an insert and a B*-tree balancing operation has to be performed, MaxDB will sort the data entries. Why? Having sorted data entries is of advantage when merging pages. Whenever possible MaxDB tries to handle updates of data entries on their current page. This is called “update in place”. If a data entry gets updated and its overall length and its key value do not change, then MaxDB does not need to do anything but update the data entry itself. No changes to the position list are required because the position list is sorted by the key values and the key values have not changed. A simple update of the data entry in it’s current place is possible, it still fits into the existing slot.
More work needs to be done if the lenght of a data entry changes but the key value remains the same. If the size changes, then the slot that is occupied by the data entry is either too small and needs to be increased or it is too large and should be shrunk in order to save space and avoid gaps. Resizing a slot means shifting forth or back all data entries that follow the resized entry. And, whenever the position of a data entry changes, MaxDB has to update the pointers in the position list.
If a data entry remains the same size but the key value gets changed, then there is little chance to handle this operation on page level. Each page, each leaf node in the tree, stores the records of a certain key value range. If the modified key value no longer matches the key value range of the original page, then the data entry has to be moved to another page which holds the matching key value range.
It is unlikely that the new key value can be kept on the original page. Therefore MaxDB does not try to handle an update with a key value change on a page, but transforms the update operation into a delete followed by an insert. Delete in place means that MaxDB will not juggle data entries on a page if an entry gets deleted. MaxDB will only update the position list, that is all.
Compare it to the rivals
Check with your database expers how your system does it all and if you need to reorganize or optimize your storage system frequently to regain optimal performance. Some systems do need such optimizations, some need extra B-tree optimizations, some need it for the database pages, some for other areas and reasons. For example, EnterpriseDB (and Postgres) needs Vacuuming to remove already deleted records from the database files. As of Postgres 8.1 you can use a separate optional server process called the autovacuum daemon to automate this maintenance task. This is a possible solution, but not the best. The problem with it is that you have to do the house-keeping anyway at some point. You can stall it for a while, but your house will be soon a place where you do not want to live in anymore. Once you have reached this point you can spend a lot of time on the cleaning or you settle over to a new house. Both solutions will block you and require tremendous efforts. MaxDB does the house-keeping all the time. Do you want possible downtimes and peak-loads for cleaning up your database storage. Or do you want to do the work on a day-by-day basis and spend a little bit of time every day? MaxDB and InnoDB do it the day-by-day way. The result: no downtime, no need for reorganisation, always optimal storage, less DBA resources required.
TOP 5 Features of MaxDB: Backup and Recovery
Backup and Recovery is full-fledged and feature-rich. All major backup strategies are supported. Integration into thrid-party tools and large-scale Backup systems is smooth.
- Non-blocking, consistent Hot Backup
- Automatic Log Backup
- Support for consistent copies on the snapshots
- Im-/Export (SQL, CSV, …): Loader tool
- Support for third-party tools
- GUI and CLI interfaces
- Functions to check and verify backups
Non-blocking, consistent Hot Backup
MaxDB can create data backups when the database is running. The backup operation does not block any running transactions. The use of extra server tasks ensures that the impact on other tasks running at the same time on the server is as small as possible. The data backup is transaction-consistent because it is based on a savepoint (checkpoint).
You can create full data backups and incremental data backups. Incremental backups are based on full backups. Once you have created a full backup, you can use incremental backups to backup the difference in the database between the last full or incremental backup. In most cases the delta is much smaller than a full backup.
Automatic Log Backup
MaxDB can perform log backups automatically. Whenever a segment of the MaxDB log area has been filled an automatic log backup is done. The backup works asynchronously and does not block any other log writer activities. The size of the log segments can be reconfigured to enforce more or less frequent backup runs. In general you do not need to be too paranoid on this. If you used hardware-based mirroring or the MaxDB software-based log mirroring as we strongly recommend, then you do not need to worry too much about loosing many log entries in case of an catastrophy (e.g. disk-failure). You do not need, because the log is mirrored and in most cases you still have one good copy left.
Full data backups, incremental data backups and automatic log backups together give you a wide variety of recovery options. In some cases you can even perform a recovery if you have lost an incremental data backup but you still have the log backups. We will go into the details in another issue, when we discuss how to make a backup
Support for consistent copies on the snapshots
During the last years consistent copies of the filesystem have become popular. Specialized hardware and software can create a copy of very large filesystems extremly fast. Sometimes even faster than any other backup method. You can take a consistent copy of your MaxDB database while the database is running. Note that, if you do create a copy while the database is running, the database might perform an automatic recovery if you put the copy back in place and start the database again. There might have been been transactions running when you did the copy. These unfinished transactions will be automatically rolledback when you restart the database.
A way to circumvent this is to create a consistent database snapshot inside the database before you do the copy. Once you have recovered the copy, you can revert the consistent database snapshot.
More on Backup & Recovery
There is a lot more to say about backup and recovery features in MaxDB. The Loader, the MaxDB data import and export tool, has a new wizard mode as of 7.6.00. It makes using the Loader for backup as easy as using mysqldump, if not easier. MaxDB has interfaces to support third-party backup solutions, all administrative commands can be performed using CLI or a GUI and you can check and verify your backups. Don’t worry that this all makes MaxDB overly complicated. It is not and you can start with easy to use wizards, if you want. This is still not all, but we guess you are eager to hear about the other TOP 5 features.
TOP 5 Features of MaxDB: Standby Solutions
MaxDB offers two standby solutions a standby database and a hot standby solution. You do not need any extra software to set up a standby solution. MaxDB comes with all that you need. Both standby variants can be integrated in cluster solutions for automatic IP-switching and client redirection.
- standby database based on log-shipping
- hot standby solution
- integration into cluster setup possible
Settung up a standby database using the GUI wizard is something you get taught on the second day of the MaxDB Administration class offered by MySQL. It takes 10 minutes to show the slides, 15 minutes to make all possible mistakes a beginner can make and another 5 minutes until the trainer comes along and corrects the problems. All in all it takes in the worst case it takes 30 minutes to set up.
The reason it is so easy is that there is no rocket science behind. To setup a standby database you need a full data backup of your original database. This full data backup is used to create a copy of your original database. After recovering the backup on your standby database you ship the log backup from the original database to your standby database to recover them. If the main system fails,
This process can be done manually using the GUI tools. With help of the CLI interfaces it can be automatted. Third-party companies have created solutions that can manage many different standby databases for one production system. As the frequency of the log shipping determines how many minutes the standby database is “behind” the production systems, you can set up standby systems that hold an old copy of the data of the production system. In case of a logical mistake, for example an accidential
DROP, you can have a standby system already in place that holds a copy of your data from before the mistake. You only need to know when the mistake was made to select the standby systems that still has a copy of the original data.
In an Hot Standby instance a MaxDB master and a slave share the log area. Each database runs on its own machine and is using his private data area. But the log area is shared. Immediately after the master has written log entries to the shared log area, the slave starts to replay the log. Therefore the slave is always only seconds behind the master. If the master fails, a cluster solution can perform the IP- and client switch over, while the slave replays the latest log entries before it can become the master.
TOP 5 Features of MaxDB: Synchronization Manager
With version 7.6.00 the Synchronization Manager was introduced. The Synchronization Manager is a replication tool. It replicates tables from a master database to client databases. But the tool can do more: client databases can send messages back to the master even if they got disconnected for some time. This can be exactly what you are looking for. Guess you need to synchronize mobile devices of your field sales personell with a central database. The Synchronization Manager can do this for you if you run MaxDB, MinDB or – in the future – MySQL on the mobile devices. Your field sales can download customer details from the central database and upload changes to customer records or orders.
- bi-directional replication
- conflict resolution
- asynchronous: replication continues after reconnect
- GUI-based admin tool
Try it out, we have written an article on how to install the Synchronization Manager.
TOP 5 Features of MaxDB: Database snapshots
MaxDB can freeze in the current state of the database in a database snapshot. Upon creation of a snapshot the database marks all existing data as read-only. If you modify the data after the snapshot has been created, the modified data cannot overwrite the existing read-only data but gets stored in a different place. Whenever you want you can get back originial state of the database you can revert the snapshot. Taking a snaphot and reverting to it are very fast operations as everything happens inside the database. Unlike with backup and recovery, you do not need to export and re-import data from an external medium.
- database snapshots
- fast way to recover the original state of the database
Snapshots are handy in many situations. For example you can use them for your training classes. You take a snapshot from your demo database before the class starts, allow the trainees to alter everything in it during the class and after the class you can revert the snapshot. As said, this is much faster than backup and recovery. Another example: you are doing an application upgrade. Before you install the new version of the software, you create a database snapshot. If the software upgrade runs fine, you delete the snapshot. If it fails, you reinstall the old version of the software and revert the snapshot. Again, your benefit is the time needed to restore a previous state of the database.
I hope we could finally raise your interest in MaxDB. Try it out.
The latest General Available (GA) version of MaxDB is 7.6.00. The latest Build is 16 and the version string is 7.6.00.16. You can download the latest version of MaxDB 7.6.00 from http://dev.mysql.com/downloads/maxdb/7.6.00.html. MaxDB 7.5.00 is also a General Available (GA) version of MaxDB. If you start with MaxDB, you should go for 7.6.00. MaxDB 7.6.00 has many advantages over 7.5.00. For example SQL schema support, the Synchronization Tool, advanced monitoring and alterting and much more. Previous versions of MaxDB can be found in the MaxDB software archive on http://dev.mysql.com/downloads/maxdb/archives/archive_index.html
For your first steps with MaxDB you need to download one of the “MaxDB Installer” packages for your operating system. MaxDB 7.6.00 can be run on Windows, Linux, HP-UX, IBM AIX and Sun Solaris. If you have a Windows system available you should also Download the GUI administration tools “DBMGUI” and “SQL Studio”. We will provide detailed installation instructions for the software in the next issue of our series.
MaxDB documentation is available on http://dev.mysql.com/doc/maxdb/index.html. Note that the Online Documentation is about 7.6.00. The documentation does not have any notes how things have been done in previous versions. If you are using 7.5.00, get a copy of the 7.5.00 documentation from the download archive. Many more places exist in the internet to learn about MaxDB and to discuss MaxDB questions:
- MaxDB FAQ
- MaxDB mailing list
- MySQL MaxDB Forum focussing mainly on MaxDB outside SAP environments
- SAP Developer Network (SDN) focussing mainly on MaxDB in SAP environments
- MaxDB Wiki, mainly written by MaxDB developers
- SAP Info (the SAP customer magazine), features MaxDB articles from time to time
MaxDB software components
The first time you work with MaxDB you will be confronted with many new terms and software components that you might not know. The hearth of the MaxDB database is the kernel. The kernel does the SQL processing and manages the data and log area. You should never manipulate the data and log area on your own, unless the development support gave you detailed instructions. If the data and log area gets modified by any other software than the Kernel, you risk loosing all your data.
SQL commands can be send to the kernel using the sqlcli, SQL Studio or application clients that use one of the programming interfaces (JDBC, ODBC, Perl, PHP, Phyton, SQLDBC [C/C++]). The sqlcli is a command line interface (CLI) to enter SQL commands. It communicates directly to the kernel. The SQL Studio is a GUI application to run SQL commands. Currently the SQL Studio is only available on Windows, but you can expect new, updated GUI tools in future releases that will be available on more operating systems. .SQL Studio connects to the kernel through ODBC, similar to any other application client using ODBC. The ODBC driver in turn talks to the kernel.
Database Administrators can use the SQL tools (SQL Studio, sqlcli) to create users, tables, indexes and to grant access rights. Application programmers can use the SQL tools to inspect the structure of SQL object, test and optimize statements.
Administrative tasks can be performed with the Database Manager GUI (DBMGUI) application (available only on Windows) and the dbmcli command line tool. Both tools talk to the Database Manager component which in turn talks to the kernel. Administrative tasks are tasks like creating new database instances, creating backups or restoring backups.
All remote connections are handled by the X Server. No matter which tool you use to talk to the kernel (SQL Studio, sqlcli, DBMGUI, dbmcli), if the tool does not run on the same host as the kernel, all communication will be done through the X Server.
The following table gives a brief overview how the tools can be compared to tools of other systems.
|MySQL Query Browser
|Database Manager GUI
So, what do you need if you want to start with MaxDB on your own, before we finally present installation instructions and start with the practical parts in the next issue? You need the MaxDB Server software and – if you are on Windows – the SQL Studio and Database Manager GUI. On all other operating systems you have to use the command line interfaces (sqlcli, dbmcli). If you run the MaxDB server (kernel) on a different host than the GUI tools, then you have to make sure that your X Server is running.
In the next issue
The next issue will give step-by-step instructions how to install MaxDB. We will try to publish the issue on next wednesday, but as you can see from this slightly delayed posting, we cannot promise that it will be available on wednesday.
We appreciate your comments. Use the MaxDB Forum to contact us.
Ulf Wendel for the MySQL MaxDB team