Apache Ignite SQL Documentation

The Apache Ignite SQL Developer Hub

Welcome to the Apache Ignite SQL developer hub. You'll find comprehensive guides and documentation to help you start working with Apache Ignite SQL as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

How Ignite SQL Works

Overview

Apache Ignite SQL engine is tightly coupled with H2 Database which, in short, is a fast in-memory and disk-based database written in Java and available under a number of open source licenses.

An embedded H2 instance is always started as a part of an Apache Ignite node process whenever ignite-indexing module is added to the node's classpath. If the node is started from a command line tool using ignite.sh{bat} script, then the module will be added to the classpath automatically since it's located under {apache_ignite}\libs\ directory.

If you are on Java and use Maven then add the dependency below to your pom.xml file:

<dependency>
    <groupId>org.apache.ignite</groupId>
    <artifactId>ignite-indexing</artifactId>
    <version>${ignite.version}</version>
</dependency>

Apache Ignite leverages from H2's SQL query parser and optimizer as well as the execution planner. Lastly, H2 executes a query locally on a particular node and passes a local result to a distributed Ignite SQL engine for further processing.

However, the data, as well as the indexes, are always stored in the Ignite that executes queries in a distributed and fault-tolerant manner which is not supported by H2.

Ignite is capable of running SQL queries in two ways:

First, if a query is executed against a fully REPLICATED data then Ignite will send it to a single cluster node and run it over the local data there.

Second, if a query is executed over a PARTITIONED data, then the execution flow will be the following:

  • The query will be parsed and split into multiple map queries and a single reduce query.
  • All the map queries are executed on all the nodes where required data resides.
  • All the nodes provide result sets of local execution to the query initiator (reducer) that, in turn, will accomplish the reduce phase by properly merging provided result sets.

Joins Execution Flow

The execution flow of queries with joins is not different from the one described for the PARTITIONED data described above.

Concurrent Modifications

UPDATE and DELETE statements generate SELECT queries internally in order to get a set of entries that have to be modified. The keys from the set are not locked and there is a chance that their values will be modified by other queries concurrently. A special technique is implemented by the SQL​ engine that, first, avoids locking of keys and, second, guarantees that the values will be up-to-date at the time they will be updated by a DML statement.

Basically, the engine detects a subset of the cache entries which were modified concurrently and re-executes the SELECT statement limiting its scope to the modified keys only.

Let's say the following UPDATE statement is being executed.

UPDATE Person set firstName = 'Mike' WHERE lastName = 'Smith';

Before firstName and lastName are updated, the SQL engine will generate the SELECT query to get all the entries matching the WHERE clause. The statement can be the following.

SELECT _key, _value, 'Mike' from Person WHERE lastName = 'Smith';

_key and _val

The _key and _val keywords are used to get a key and value of the object with all its fields at once. The same keywords can be used from an application code.

Right after that, the entry that was retrieved​ with the SELECT above might be updated concurrently by some other query:

UPDATE Person set firstName = 'Sarah' WHERE id = 1;

The SQL engine will find out that the entry with id=1 (aka. _key = 1) was modified at the update phase of the first UPDATE query. After that, it will stop the update and will re-execute a modified version the SELECT query in order to get latest entries' values:

SELECT _key, _value, 'Mike' from Person WHERE secondName = 'Smith'
    AND _key IN (SELECT * FROM TABLE(KEY long = [ 1 ]))

This query will be executed only for outdated keys. In our example, there is only one key that is 1.

This process will be repeated until the DML engine ensure that all the entries, that are going to be updated, were not modified concurrently. The maximum number of attempts is 4. Presently, there is no configuration parameter that can change this value.

DML engine does not re-execute the SELECT query for entries that are deleted concurrently​. The query is re-executed only for entries that are still in the database.

Known Limitations

Subqueries in WHERE clause

SELECT queries used in INSERT and MERGE statements as well as SELECT queries generated by UPDATE and DELETE operations will be distributed and executed in either collocated or non-collocated distributed modes if needed.

However, if there is a subquery that is executed as part of WHERE clause, then it can be executed in the collocated mode only.

For instance, let's take this query:

DELETE FROM Person WHERE id IN
    (SELECT personId FROM Salary s WHERE s.amount > 2000);

The SQL engine will generate the SELECT query in order to get a list of entries to be deleted. The query will be distributed and executed across the cluster and will look like the one below:

SELECT _key, _val FROM Person WHERE id IN
    (SELECT personId FROM Salary s WHERE s.amount > 2000);

However, the subquery from IN clause (SELECT personId FROM Salary ...) will not be distributed further and will be executed over the local data set available on a cluster node.

EXPLAIN support for DML statements

Presently, EXPLAIN is not supported for DML operations.

One possible approach is to execute EXPLAIN for the SELECTquery that is automatically generated (UPDATE, DELETE) or used (INSERT, MERGE) by DML statements. This will give you an insight on the indexes that are used when a DML operation is executed.

How Ignite SQL Works


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.