Searching data in NotOnlyMySQL databases: a rich query language?

2012/09/24 by admin

A popular NoSQL store claims to have a rich query language. A powerful, portable query language is most important. If not powerful, you have to send multiple commands to fetch the desired data. In the worst case a program must be written which contraditcs the idea of a query language. If not portable, you are back to the 1960th and vendor lock-in. Developing portable applications is impossible. Stepping up from one database to another means rewriting your application. SQL does not have any of these issues. Plus, it is widely known which means low training costs and high understanding of its properties.

SQL – a query language for the relational model

It may seem odd to compare a query language for the relational model (SQL) with access methods of a document store. However, the NoSQL store compared with explains how to map their concepts to SQL in their manual! In the following, I’ll do the same. I’ll consider tables as collections and rows of a table as documents.

The SQL query language is build around the following operations:

Projection (filter columns)
Selection (filter rows)
Join (join tables)
Union (merge tables)

Some of those operations are not found in the query language of the NoSQL store. There is no join, there is no union. In a way it makes sense. The document stores’ data model can rather be described as a hierarchical than relational one.

Projection (filter columns)
Selection (filter rows)
~~Join (join tables)~~
~~Union (merge tables)~~

The projection

A projection is formulated at the beginning of a SELECT statement.


SELECT
  [DISTINCT]
  attribute |  arithmetric expression |  aggregation function

Let’s see how to query the below given collection of documents – step by step.


[
  {   "street_number" : 64,   "_id" : {   "$oid" : "50520155cc93742e0d0dac65"   },   "firstname" : "Ulf"   },
  {   "street_number" : 64,   "_id" : {   "$oid" : "50520263cc93742e0d0dac7e"   },   "firstname" : "hamster"   }
]

By default SELECT will allow duplicates. DISTINCT is used to filter duplicates. SELECT DISTINCT street_number FROM people maps to db.people.distinct("street_number").


[
{ "street_number" : 64 }
]

Limiting the column list: to fetch all firstnames with SQL one uses SELECT firstname FROM people. The NoSQL store counterpart is db.people.find({}, {firstname:1, _id:0}).


[
  {   "firstname" : "Ulf"   },
  {   "firstname" : "hamster"   }
]

SQL allows you to have an arithmetric expression as part of the projection such as SELECT street_number + 1 FROM people. There is no equivalent in the query language of the NoSQL store I am comparing with. The only option one has is to use MapReduce. MapReduce is not considered a query language in this comparison. A query language is no programming language, it must be more comprehensive . A query language let’s you describe what data you need. How to fetch is up to the database. With MapReduce you have to provide a function (a program) and you have to implement an access path instead of just saying what you need. Thus, MapReduce is beyond the topic of the blog post.

SQL supports the following aggregation functions: COUNT(*), SUM(column), MAX(column), MIN(column), AVG(column). db.people.count() is easy going. But then comes kind of MapReduce: db.people.group({key: {}, initial: {sum: 0}, reduce: function (doc, out) { out.sum += doc.street_number}}). The user has to provide a function.

This leaves me with the following intermediate comparison of SQL and another rich query language.

Projection (filter columns): SELECT [DISTINCT] attribute | arithmetric expression | aggregation function
- DISTINCT
- ~~arithmetric expression~~
- aggregation function
  - COUNT(*)
  - SUM(column) (user to provide function)
  - MAX(column) (user to provide function)
  - MIN(column) (user to provide function)
  - AVG(column) (user to provide function)
Selection (filter rows)
~~Join (join tables)~~
~~Union (merge tables)~~

A basic selection

The SQL WHERE clause describes the conditions of a selection.


 constant: attribute = | != | < | <= | > | >= constant
 attribute: attribute = | != | < | <= | > | >= attribute
 logical operators: ( attribute ... constant | attribute) and|or|not (...)
 uncertainty: attribute LIKE constant
 NULL: attribute IS NULL

Simple constant based selection first:

SQL	Other
`SELECT * FROM people WHERE street_number = 64`	`db.people.find({"street_number": : 64})`
`SELECT * FROM people WHERE street_number > 1`	`db.people.find({"street_number" : {$gt:1}})`

SELECT * FROM people WHERE street_number != first_name is an example of an attribute based selection. The NoSQL counterpart is db.people.find({$where : "this.street_number != this.first_name"}). It works but the manual of the NoSQL store notes that $where based comparisons tend to be slow.

SQL and the NoSQL stores query language are on par for the rest of basic selection. AND, OR, NOT map to $and, $or, $not. LIKE is not available but the counterpart to MySQLs’ RLIKE, which is regular expression based. Regular expressions are a superset of LIKE. Regarding NULL the NoSQL store can distinguish between value given but NULL, value given and not NULL and value/column not given at all. No negative surprises.

Selection: subqueries

Although the NoSQL store does support the predicates ALL, ANY, IN, EXISTS it does not feature what the predicates are actually intended for: subqueries.


WHERE
   ...
   ALL, ANY, IN, EXISTS

SELECT * FROM people WHERE street_number IN (1, 64) maps to db.people.find({"street_number" : { $in : [1, 64] }}). But that is a boring one as it could also be written as ((street_number = 1) OR (street_number = 64)).

A subquery such as SELECT * FROM people WHERE street_number IN (SELECT street_number FROM other_people) cannot be expressed with the query language of the NoSQL store. Remember: MapReduce is not considered a query language here.

Is this a rich query language?

The NoSQL store I picked for this comparison is MongoDB. The story may be a different one when comparing SQL against other NoSQL stores.

Projection (filter columns): SELECT [DISTINCT] attribute | arithmetric expression | aggregation function
- DISTINCT
- ~~arithmetric expression~~
- aggregation function
  - COUNT(*)
  - SUM(column) (user to provide function)
  - MAX(column) (user to provide function)
  - MIN(column) (user to provide function)
  - AVG(column) (user to provide function)
Selection (filter rows): WHERE … (see above)
- constant: attribute = | != | < | <= | > | &gt= constant
- attribute: attribute = | != | < | <= | > | &gt= attribute (see above)
- logical operators: ( attribute … constant | attribute) and|or|not (…)
- uncertainty: attribute LIKE constant (must be emulated with regular expressions)
- NULL: attribute IS NULL
- ALL, ANY, IN, EXISTS (no subquery)
~~Join (join tables)~~
~~Union (merge tables)~~

SQL is very competitive as a query language. It is powerful and lets you describe what data you want. You do not have to write a function or program which is describes how to fetch the data you want. SQL stands for standardized query language. Your applications can be run against a variety of SQL databases. No vendor lock-in.

Happy hacking!

@Ulf_Wendel

Ulf Wendel