.... ...........

Contributed by Calpont, InfiniDB Community Edition provides a scale-up analytics database engine for your data warehousing, business intelligence and read-intensive application needs. Enabled via MySQLTM and purpose-built for an analytical workload with column-oriented technology at its core, the multi-threaded capabilities of InfiniDB Community Edition fully encompass query, transactional support and bulk load operations.  So come on in, grab a download and get started.

Introducing InfiniDB from Calpont

E-mail Print PDF

Introducing InfiniDB from Calpont

Let’s be honest: working with big databases is a lot of fun. There’s something cool about dealing with tables that have hundreds of millions or billions of rows in them, loading huge amounts of data, building star and snowflake schemas for data warehouses/marts, optimizing query performance, and all that jazz. Yes, working with big databases is a lot of fun.

On the other hand, let’s be honest: working with big databases is not a lot of fun. There’s a lot of pain in dealing with tables that have hundreds of millions or billions of rows in them, waiting for huge amounts of data to be loaded only to have the load job toss its cookies and fail when it’s 99% done, building special schemas that you wonder whether make any difference at all, and trying to figure out why just a simple two-way join query has been hanging for over an hour. Yes, working with big databases is not a lot of fun.

But here’s the thing: big databases aren’t going away; in fact, there’s more of them than ever before and they just keep multiplying and getting bigger. I could quote a lot of industry statistics and TDWI reports on this fact, but that would just bore you because you likely know all this stuff already. You also know that there are a number of database vendors out there who can take on big data, but they also come with big pricetags.

Lately, the idea of column-oriented databases has been catching new wind in its sails. I say ‘new wind’ because Sybase IQ did the column database thing when columns weren’t that cool. But now there are a number of column-oriented databases that are really taking off, however most still are proprietary and cost quite a bit. But what happens when you marry a column-oriented database that’s modular in design, can scale both up and out, with open source? You get Calpont’s InfiniDB.

 

Why Columns?

To be frank, I’m as skeptical as they come when claims are made that say how this or that different ‘type’ of database is better than a general purpose RDBMS. When object-oriented databases came out, I scoffed. When OLAP/cube databases were hogging all the tech press, I smirked. I believed that neither could really challenge the good ol’ relational model, which isn’t perfect by any means, but hey: it still has enough punch to relegate those two other challengers to the very boutique marketplace. That being the case, why should we entertain a relational database that is vertical vs. horizontally architected?

Because, for querying and processing medium to huge amounts of data, they actually work. And, many times, they work better at finishing the SQL query horse race than a general purpose RDBMS. That’s why.

Before column databases came out, experienced data modelers creating data warehouses, marts, and reporting databases knew that by designing vertically partitioned tables that were subsets of an overall table, they could sometimes buy better physical query performance because the rows were shorter and thus I/O could somewhat be reduced. Using views, they’d then tie the vertically partitioned tables together to form one logically-based table that could be referenced. And sometimes this practice worked. But oftentimes, it didn’t make the grade because (1) even the vertically partitioned tables were still row-based and I/O wasn’t always reduced by much; (2) it was really hard to predict what columns would be consistently queried together; (3) the practice didn’t reduce the need for indexes (and in fact made the matter worse sometimes because you needed indexes to smartly do joins between the vertically partitioned tables in the view definitions); (4) like with the difficulty in predicting what columns would be queried together, it was difficult to do index design because user’s query patterns could change at the drop of a hat and suddenly queries would do full scans instead of index searches. And on and on it went.

Column-oriented databases designed especially for analytics overcome these limitations by storing, managing, and querying data based on columns rather than rows. Because only the necessary columns in a query are accessed rather than entire rows, the column can act as the index and I/O activities as well as overall query response times can oftentimes be reduced. The end result is the ability to interrogate and return query results against either moderate or large amounts of data in a pretty darn efficient manner. On average, a row-based system can do up to 5-10x the physical I/O that a column-based database does to retrieve the same information. This is why column bigot Philip Howard (Research Director for Bloor Research) says: “Columns provide better performance at a lower cost with a smaller footprint: it is difficult to understand why any company seriously interested in query performance would not consider a column-based solution.”

Nice quote, but should you really check out a column database? Can they really help you? Read on…

 

The InfiniDB Proposal

Column databases can be a help for sure in the right situation, but there’s more that’s needed to crunch through large data volumes than just having a vertical data design. In addition to being column-oriented, the open source version of InfiniDB provides the following core feature set, which adds a number of needed items to the mix:

  • Column-oriented architecture: mentioned already...

  • Multi-threaded design: the engine is multi-threaded and uses today’s modern multi-CPU/core architectures to split a single query up across the available processing units.

  • Automatic vertical and horizontal partitioning: Being column-oriented, the engine uses vertical partitioning to store data but it also uses a form of logical horizontal range partitioning that does not require special storage placements or design. Using both vertical and logical horizontal range partitioning allows it to reduce I/O in both directions (column and row). Both vertical and horizontal partitioning are automatically handled so you don’t have to manually do anything.

  • High user concurrency: There are no real limits as far as concurrency is concerned.

  • High-speed data loader: To effectively load lots of data, a high-speed load utility is made available. I’ll show you how this works in just a second.

  • DML support: In addition to supporting the high-speed bulk loading of data, the engine supports full DML (insert, update, delete) operations as well.

  • Transactional support: ACID-compliant transactional support is provided in the engine. Commit, rollback, deadlock detection, all that good stuff.

  • Crash recovery: the engine has full crash recovery capabilities.

  • MVCC design: who doesn’t like ‘snapshot read’? I do, and the engine has it.

  • No need for indexing: Because the engine uses both vertical and logical horizontal partitioning, there is no need for indexing. In essence, the data is the index. In addition, the engine automatically maintains a small, important structure called the Extent Map (explained later), which is used to reduce I/O. Besides no indexes, you also don’t have to worry about dealing with materialized views, summary tables (of course, in MySQL, you don’t’ have these anyway…), or manual data partitioning.

  • Logical Data compression: the engine doesn’t have real physical/file compression yet (coming…) so for now, it uses transparent logical data compression to store data that benefits from being compressed.

  • ALTER TABLE is supported (add, drop).

  • Performance diagnostics: To help troubleshoot those queries from hell, InfiniDB supplies a nice tracing/diagnostic utility that helps figure out exactly what a query is doing above the normal EXPLAIN output.

  • MySQL front end: InfiniDB utilizes MySQL for its front end. MySQL is used mainly for security, SQL parsing, and initial query plan output.

In addition to the above open source product feature set, a commercial/Enterprise version of InfiniDB will be offered that has a pay-for scale out option where multiple machines can be used in a massive parallel processing (MPP) configuration. However, for many data warehousing, data mart, and analytic database tasks, the open source version of InfiniDB provides plenty of power to plow through big databases.

 

InfiniDB Architecture

The architecture of InfiniDB is modular and consists of three basic components:

  • User Module: The User Module is made up of a small MySQL instance and a number of InfiniDB processes that handle concurrency scaling. The User Module is also responsible for breaking down SQL requests and distributing the various parts to one or more Performance Modules that actually retrieve requested data from either memory caches or disk. Finally, the User Module assembles all the query results from the various participating Performance Modules to form the complete query result set that is returned to the user. Note there can be more than one User Module in an InfiniDB Enterprise configuration so you can do concurrency scale out by adding more User Module nodes.

  • Performance Module: The Performance Module is responsible for retrieving and managing data, and passing it back to the User Module(s) to satisfy query requests. The Performance Module selects data from disk and caches it in a shared nothing data cache that is part of the server on which the Performance Module resides. InfiniDB Enterprise accomplishes its MPP benefits by allowing the user to configure as may Performance Modules as they would like; each additional Performance Module adds more cache to the overall database as well as more processing power.

  • Storage: InfiniDB can use either local storage or shared storage (e.g. SAN) to store data. A user can have everything configured and running on one server. In the commercial edition of InfiniDB, they can scale out with multiple servers and configure either a shared disk (currently supported) or shared nothing (coming in future release) architecture.

All the above modular components can exist on one server, or in the Enterprise version, be broken up and run on different machines. If you want to compare the architecture and design of InfiniDB to other proprietary analytic databases in the market that do both scale up (use a box’s available CPU’s) and scale out (MPP), InfiniDB is comparable to Vertica, Sybase IQ, and Paraccel. There are other analytic databases available that scale and do MPP, but all others are row-based (e.g. Greenplum, Aster Data, Teradata, Netezza, etc.) vs. column-oriented.

One interesting feature that allows InfiniDB to handle lots of data is a structure called the Extent Map. It’s the Extent Map that removes the need for you to do any indexing, manual partitioning of data, create summary tables, etc.

An “extent” in InfiniDB is a logical block of space that exists within a physical file, with an extent being anywhere from 8-64MB in size depending on a column’s datatype. Each extent supports the same number of rows, with smaller data types using less space on disk. The Extent Map catalogs all extents and their corresponding blocks (identified with logical block identifiers or LBID’s). The Extent Map also maintains minimum and maximum values for a column’s data within an extent.

The Extent Map provides the ability for InfiniDB to only retrieve the blocks needed to satisfy a query, but it also provides another benefit – that of logical range partitioning. This is accomplished via the minimum and maximum values for each extent that are contained within the Extent Map. Extent elimination is first accomplished in InfiniDB via the column-oriented architecture (only needed columns are scanned), but the process is accelerated because of this logical horizontal partitioning that is implemented in the Extent Map.

This automatic extent elimination behavior is well suited for time-based data where data is loaded frequently and often referenced by time. Near real-time loads with queries against the leading edge of the data can easily show good extent elimination for all of the date/time columns as well as an ascending key value. Any column with clustered values is a good candidate for extent elimination.

To eliminate an extent when a column scan involves a filter, that filter value is compared to the minimum and maximum values stored in each extent for the column:

In the above figure, if a WHERE column filter of “col1 between 220 and 250” is specified, InfiniDB will eliminate extents 1, 2 and 4 from being scanned, saving ¾ of the I/O and many comparison operations. If the extent cannot possibly contain any rows that would match the filter, that extent is ignored entirely. Additionally, since each extent eliminated for a column also eliminates the same extents for all the other columns in a table, impossible filter chains can be quickly identified without any I/O being required. For example, take the following two columns and their Extent Map information:

If a column WHERE filter of “col1 between 220 and 250 and col2 < 10000” is specified, InfiniDB will eliminate extents 1, 2 and 4 from the first column filter, then, looking at just the matching extents for col2 (i.e. just extent 3), it will determine that no extents match and return zero rows without doing any I/O at all.

OK, enough design and architecture talk: Let’s kick the tires of the database now so you can see these things in action.

Loading Data

First, let’s load some data into a basic database to work with. The below load and query tests were run on a small 32-bit Fedora Core 6 machine with 2GB of RAM, and 4 CPU’s (2.40GHz with 512KB cache).

With InfiniDB, you have a high-speed loader to work with called cpimport that does a decent job at taking flat files and pushing their data into database tables. There are two primary steps to using the cpimport utility:

  1. Create a job file that is used to load data from one or more flat files into a database

  2. Run the cpimport utility to perform the data import

Note that bulk loads are an append operation to a table so they allow existing data to be read and remain unaffected during the process. Upon completion of the load operation, a high water mark in each column file is moved in an atomic operation that allows for any subsequent queries to read the newly loaded data.

You can create a load job to load one table at a time or load multiple tables from different flat files. Let’s go ahead and load up an entire TPC-H database:

mysql> show tables;

+-----------------+

| Tables_in_tpch2 |

+-----------------+

| customer |

| lineitem |

| nation |

| orders |

| part |

| partsupp |

| region |

| supplier |

+-----------------+

8 rows in set (0.00 sec).

First, you put all your load files in InfiniDB’s import directory:

[rschumacher@srvsn import]$ pwd

/usr/local/Calpont/data/bulk/data/import

[rschumacher@srvsn import]$ ls -l

total 1076072

-rw-rw-r-- 1 root calpont 24346144 Sep 25 11:39 customer.tbl

-rw-rw-r-- 1 root calpont 759863287 Sep 25 11:39 lineitem.tbl

-rw-rw-r-- 1 root calpont 2224 Sep 25 11:39 nation.tbl

-rw-rw-r-- 1 root calpont 171952161 Sep 25 11:39 orders.tbl

-rw-rw-r-- 1 root calpont 118984616 Sep 25 11:39 partsupp.tbl

-rw-rw-r-- 1 root calpont 24207240 Sep 25 11:39 part.tbl

-rw-rw-r-- 1 root calpont 389 Sep 25 11:39 region.tbl

-rw-rw-r-- 1 root calpont 1409184 Sep 25 11:39 supplier.tbl

Next, you create a job file that’s used by the import operation. The job file is created with the colxml utility. The reason for the job file step is that oftentimes loads are repetitively done (especially in dev/test environments) and a job file can be reused many times to re-load data into the same objects. So, let’s create our job file to import all our data into a database called ‘tpch2’, where the flat files use the default pipe (‘|’) character as the flat file field delimiter, and give our job a number of ‘500’:

[rschumacher@srvsn bin]$ ./colxml tpch2 -j500

Running colxml with the following parameters:

2009-10-07 15:14:20 (9481) INFO :

Schema: tpch2

Tables:

Load Files:

-b 0

-c 1048576

-d |

-e 10

-f CSV

-j 500

-m 50

-n

-p /usr/local/Calpont/data/bulk/job/

-r 5

-s

-u

-w 10485760

-x tbl

File completed for tables:

tpch2.customer

tpch2.lineitem

tpch2.nation

tpch2.orders

tpch2.part

tpch2.partsupp

tpch2.region

tpch2.supplier

Normal exit.

 

Now, let’s perform the actual import for all eight tables using the cpimport utility:

[rschumacher@srvsnp bin]$ ./cpimport -j 500

Bulkload root directory : /usr/local/Calpont/data/bulk

job description file : Job_500.xml

2009-10-07 15:14:59 (9952) INFO : successfully load job file /usr/local/Calpont data/bulk/job/Job_500.xml

2009-10-07 15:14:59 (9952) INFO : PreProcessing check starts

2009-10-07 15:15:04 (9952) INFO : PreProcessing check completed

2009-10-07 15:15:04 (9952) INFO : preProcess completed, total run time : 5 seconds

2009-10-07 15:15:04 (9952) INFO : No of Read Threads Spawned = 1

2009-10-07 15:15:04 (9952) INFO : No of Parse Threads Spawned = 3

2009-10-07 15:15:06 (9952) INFO : For table tpch2.customer: 150000 rows processed and 150000 rows inserted.

2009-10-07 15:16:12 (9952) INFO : For table tpch2.nation: 25 rows processed and 25 rows inserted.

2009-10-07 15:16:12 (9952) INFO : For table tpch2.lineitem: 6001215 rows processed and 6001215 rows inserted.

2009-10-07 15:16:31 (9952) INFO : For table tpch2.orders: 1500000 rows processed and 1500000 rows inserted.

2009-10-07 15:16:33 (9952) INFO : For table tpch2.part: 200000 rows processed and 200000 rows inserted.

2009-10-07 15:16:44 (9952) INFO : For table tpch2.partsupp: 800000 rows processed and 800000 rows inserted.

2009-10-07 15:16:44 (9952) INFO : For table tpch2.region: 5 rows processed and 5 rows inserted.

2009-10-07 15:16:45 (9952) INFO : For table tpch2.supplier: 10000 rows processed and 10000 rows inserted.

2009-10-07 15:16:45 (9952) INFO : Bulk load completed, total run time : 106 seconds

 

The cpimport utility pushed 8.6 million rows into our database in a little over 100 seconds, which isn’t bad (about 82,000 rows/second). Now, let’s run some query tests.

Querying Data in InfiniDB

To run queries against InfiniDB, you’ll be using a mysql client interface named calpontmysql.

As nearly everyone likes to test query speed by issuing a count(*) against a table, let me start off by saying that InfiniDB does not keep a count of a table’s rows immediately on hand, so it will always do an actual count of data:

mysql> select count(*) from lineitem;

+----------+

| count(*) |

+----------+

| 6001215 |

+----------+

1 row in set (0.81 sec)

With that caveat out of the way, let’s now run a few queries and see what we see. Again, the database has no indexes or manual partitioning of data, so it’s just pure column-oriented tables we’re working with. Let’s start with a 3-way table join between customer, orders, and lineitem:

mysql> select c_name, sum(l_quantity)

-> from customer, orders, lineitem

-> where c_custkey = o_custkey and

-> l_orderkey = o_orderkey and

-> l_shipdate = '1992-01-02'

-> group by c_name;

+--------------------+-----------------+

| c_name | sum(l_quantity) |

+--------------------+-----------------+

| Customer#000094199 | 35.00 |

| Customer#000146399 | 21.00 |

| Customer#000104149 | 20.00 |

| Customer#000081157 | 12.00 |

| Customer#000071485 | 27.00 |

.

.

.

17 rows in set (1.12 sec)

InfiniDB brings back the result set in a little more than a second, which isn’t bad. Now, let’s switch to another TPC-H database where I have 20 times the data in the lineitem table, which equates to 120 million rows instead of just 6 million and run the same query:

mysql> select c_name, sum(l_quantity)

-> from customer, orders, lineitem

-> where c_custkey = o_custkey and

-> l_orderkey = o_orderkey and

-> l_shipdate = '1992-01-02'

-> group by c_name;

+--------------------+-----------------+

| c_name | sum(l_quantity) |

+--------------------+-----------------+

| Customer#000094199 | 700.00 |

| Customer#000009263 | 380.00 |

| Customer#000146399 | 420.00 |

| Customer#000072862 | 100.00 |

.

.

.

17 rows in set (10.48 sec)

So adding 20 times the data in the largest table resulted in a 10x increase in response time, which is not horrible. If we added CPU’s to the box or increased the data cache size, we could reduce the response time further, or if we decided to use the pay-for scale-out option and go MPP, we can expect to cut the response time in half for one new node and in half again if we added two more nodes after that.

Now, let’s try a full 8-way join with the smaller TPC-H database:

mysql> select c_name, p_name, ps_availqty, s_name,

-> o_custkey, r_name, n_name, sum(l_quantity)

-> from orders, lineitem, customer, part,

-> partsupp, supplier, nation, region

-> where o_orderkey = l_orderkey and

-> c_custkey = o_custkey and

-> p_partkey = l_partkey and

-> ps_partkey = p_partkey and

-> s_suppkey = ps_suppkey and

-> r_regionkey = n_regionkey and

-> s_nationkey = n_nationkey and

-> o_orderkey = 4161537

-> group by c_name, p_name, ps_availqty, s_name, o_custkey, r_name, n_name;

+--------------------+----------------------------------------+-------------+--------------------+-----------+-------------+---------------+-----------------+

| c_name | p_name | ps_availqty | s_name | o_custkey | r_name | n_name | sum(l_quantity) |

+--------------------+----------------------------------------+-------------+--------------------+-----------+-------------+---------------+-----------------+

| Customer#000122437 | slate light gainsboro dark seashell | 8343 | Supplier#000000597 | 122437 | AMERICA | UNITED STATES | 47.00 |

.

.

28 rows in set (5.68 sec)

And with the bigger TPC-H lineitem table (120 million rows):

mysql> select c_name, p_name, ps_availqty, s_name,

-> o_custkey, r_name, n_name, sum(l_quantity)

-> from orders, lineitem, customer, part,

-> partsupp, supplier, nation, region

-> where o_orderkey = l_orderkey and

-> c_custkey = o_custkey and

-> p_partkey = l_partkey and

-> ps_partkey = p_partkey and

-> s_suppkey = ps_suppkey and

-> r_regionkey = n_regionkey and

-> s_nationkey = n_nationkey and

-> o_orderkey = 4161537

-> group by c_name, p_name, ps_availqty, s_name, o_custkey, r_name, n_name;

+--------------------+----------------------------------------+-------------+--------------------+-----------+-------------+---------------+-----------------+

| c_name | p_name | ps_availqty | s_name | o_custkey | r_name | n_name | sum(l_quantity) |

+--------------------+----------------------------------------+-------------+--------------------+-----------+-------------+---------------+-----------------+

| Customer#000122437 | slate light gainsboro dark seashell | 8343 | Supplier#000000597 | 122437 | AMERICA | UNITED STATES | 47.00 |

.

.

28 rows in set (22.98 sec)

 

With this join, we see pretty good performance in the smaller 8-way join and about a 4x increase in response time for the larger table in the same 8-way operation.

Let’s now test InfiniDB’s Extent Map to see if queries that use a value outside of a column’s maximum and minimum value can be satisfied in the Extent Map alone and not have to read any actual data blocks:

mysql> select min(l_orderkey), max(l_orderkey) from lineitem;

+-----------------+-----------------+

| min(l_orderkey) | max(l_orderkey) |

+-----------------+-----------------+

| 1 | 6000000 |

+-----------------+-----------------+

1 row in set (1.25 sec)

 

mysql> select count(*) from lineitem where l_orderkey < 1;

+----------+

| count(*) |

+----------+

| 0 |

+----------+

1 row in set (0.04 sec)

 

mysql> select count(*) from lineitem where l_orderkey > 6000000;

+----------+

| count(*) |

+----------+

| 0 |

+----------+

1 row in set (0.04 sec)

The Extent Map seems to be doing its job here, which is good. There are few things more frustrating in my book than waiting a long time for a result set that ends up in zero rows being sent back.

I could run through other example queries, but the above should give you a decent first impression of how InfiniDB will handle data loads and queries.

When To and Not To Use InfiniDB

So what are some rules of thumb as to when you should and shouldn’t use InfiniDB? In general, InfiniDB may be a help to you when you have:

  • Read-only databases or read-only parts of an application where response time speeds are important and row subsets (i.e. not all columns in a row) are being requested

  • Data marts/ warehouses/ analytic applications where query patterns change and cannot predicted. In other words, it is difficult to continually know what columns will be requested and will need indexing

  • Very large data volumes that are not served well by a general purpose database’s optimizer

  • Scale out scenarios where you want to do parallel processing of queries across multiple machines (note: today, this is only possible with the commercial version of InfiniDB)

Places where you don’t want to use InfiniDB include:

  • Transactional applications where you want InfiniDB to serve as a transactional back end database. Even though InfiniDB is transactional, the column-oriented nature of it means it will not do singleton/selective inserts and deletes as fast as a general purpose RDBMS will

  • Query systems where all the columns in a table are requested and a primary key lookup is performed

  • Databases that are very small in nature and are static/not growing

Conclusion

Yes, working with big databases is fun and not fun at the same time. But hopefully, InfiniDB will take some of the ‘unfun’ nature of managing large databases away.

You can now download the open source edition of InfiniDB and give it a whirl yourself. InfiniDB will run on a 32 or 64-bit Intel and AMD Linux box and can be downloaded at the InfiniDB.org website at: www.infinidb.org. Also on the site, you’ll find forums, roadmaps, and more concerning the future of InfiniDB.

If you have comments or questions, please don’t hesitate to make your voice heard in the online forums or by shooting us an email. Thanks for your support of InfiniDB and Calpont!