Contributed by Calpont, InfiniDB Community Edition is an open source, scale-up analytics database engine for your data warehousing, business intelligence and read-intensive application needs. Enabled via MySQLTM and purpose-built for an analytical workload with column-oriented technology at its core, the multi-threaded capabilities of InfiniDB Community Edition fully encompass query, transactional support and bulk load operations.  So come on in, grab a download and get started.

             | 

InfiniDB Team Blog

News and tidbits from your InfiniDB team.

A Behind the Scenes look at InfiniDB: Ease of Use (Part 3 of 3)

This post is the third in a three-part series covering InfiniDB's hallmark characteristics: I/O efficiency, parallelism and ease-of-use.

When we set out to design InfiniDB, we wanted to build a database that was extremely easy-to-use. Historically, companies have required DBAs and enterprise IT to be closely engaged to enable analytics within their organizations.

We wanted analysts and data scientists to be able to use InfiniDB without a significant burden on precious DBA resources. As such, we designed InfiniDB with the hugely popular MySQL interface, index-free, and with features that extend MySQL for analytics. Let’s examine each of these.

Of the many database solutions on the market, InfiniDB is the only MPP columnar database tightly integrated with MySQL. This means that, for tens of thousands of MySQL applications that have large-scale data, InfiniDB is the tool-of-choice for structured Big Data analytics. No need to learn a new language or rewrite code: just install InfiniDB and you can be executing complex commands on massive datasets in minutes.

InfiniDB is designed to minimize performance tuning. Gone are the days when DBAs have to create indexes or other materialized views to get blazing-fast performance. Through its unique Extent Map architecture, paired with automatic horizontal and vertical partitioning, InfiniDB scales effortlessly without technical tweaking.

In fact, we’ve even added features that make MySQL easier to use for analytics. For example, InfiniDB fully supports online DDL (i.e. one session can be adding columns to a table while another session is querying that table) using calonlinealter. This is a feature that’s currently not supported in standard MySQL.

With enhanced I/O, record parallelism and unparalleled ease-of-use, InfiniDB is the top choice for helping take control of your data for deep analytics.

Coming soon is InfiniDB 3, which will feature greater storage architecture flexibility, enabling it for massive cloud deployments. 2012 will be a tremendous year for Big Data analytics and InfiniDB!

A Behind the Scenes look at InfiniDB: Parallelism (Part 2 of 3)

As part of a second post in our three-part series, we'd like to discuss the importance of parallelism and how InfiniDB uses parallelism to achieve breath taking performance.

InfiniDB is architected for effortless scalability.

The InfiniDB database is comprised of two types of modules: User Modules and Performance Modules. User Modules interpret MySQL commands and convert them into parallelized code for execution.

Performance Modules execute the parallelized code and return the result to the User Modules. Performance Modules are architected to be lightweight and to execute in a MapReduce-fashion, similar to Apache Hadoop. (To be clear, though, Performance Modules do not use Hadoop code. In fact, we started designing this architecture well before Hadoop/HDFS’ adoption.)

Our customers typically have one or a few User Modules, although customers may use any number of User Modules to provide high-availability and workload balancing. In a distributed environment, one or more User Modules may be interacting with any number of Performance Modules.

Due to InfiniDB's Map-Reduce-like scale-out architecture, our Performance Modules execute requests extremely effectively.   Each thread within the distributed architecture operates independently, avoiding thread-to-thread or node-to-node communication that can cripple scaling.

What this means for our customers is that InfiniDB scales linearly. In internal tests, we've scaled InfiniDB on Amazon EC2 to 1024 cores without noticeable loss in performance. In fact, we believe that we can scale to 1 Million cores (although our customers have not yet asked us for deployments of this size).

The impact of this is groundbreaking. Large deployments, such as companies delivering analytics over the Cloud, require extremely efficient databases to enable their customers’ needs. Lightweight chips, such as mobile processors and GPUs, also need efficient databases due to hardware constraints. Companies with massive amounts of data (such as Petabyte-size data volumes) need highly-efficient technology to enable analytics their data. InfiniDB's architecture remains performant under all of these scenarios.

Stay tuned for the third post in this three part series, where we discuss the unparalleled ease-of-use with which our customers use InfiniDB.

A Behind the Scenes look at InfiniDB: I/O Efficiency (Part 1 of 3)

Since the launch of InfiniDB last year, we've been seeing InfiniDB enabling tremendous customer successes. We've seen hundreds of customers use InfiniDB to power their most impactful analytics projects.

We’d like to describe InfiniDB's architecture and explain what makes it so scalable, fast and simple. In a three-part blog series, we'll be covering how InfiniDB differs from other database systems.

Compared to relational databases, InfiniDB has three key benefits: I/O, parallelism and ease-of-use.

First, let's start with I/O. Traditionally, the bottleneck in database processing has been I/O. When you're moving large data volumes, small differences in I/O start to add up. You can imagine the phrase "being bitten to death by ducks"; a growing series of small I/O penalties make traditional relational database systems unusable for analytics on large datasets (i.e. typically data volumes over 500 GB or more, often described as "big data").

Traditionally, database technologies have found ways to alleviate -- but not fix -- the pain. For example, Netezza (an IBM company) allows for scaling the scan rate to overcome the I/O bottleneck. However, such solutions remain costly as they require large investments in proprietary hardware.

We set out to change all of that with InfiniDB.

One of the most impactful ways to alleviate the I/O bottleneck is to align the way that data is stored with the way that it's used. For analytics, this suggests 'columnar databases' or column-stores. Unlike a traditional row-based database, which store data in rows ("First Name", "Last Name", "Age") columnar databases store data in columns (i.e. for the column "Age": '32', '44', '65').

For analytics, specific columns tend to be pulled frequently (i.e. Average of Ages '32', '44', and '65') which makes columnar databases typically the tool-of-choice. And, since columnar databases like InfiniDB can be deployed on commodity hardware, this solution tends to be much less expensive as well.

Once installed, our customers often can't believe the performance that they gain with InfiniDB. They're even more excited when they learn that InfiniDB is priced by the core and doesn't tax their growing data volumes.

What happens when your data volumes increase further and you need to scale out? Read our next post on Parallelism where we describe InfiniDB's industry-leading scale-out functionality.

InfiniDB to 1 Trillion Rows ( 1,039,909,436,172 )

Calpont's InfiniDB has hit a new milestone, loading over 1 trillion rows with our columnar analytics DBMS with the actual value being 1039909436172 rows.  As expected, the load rate is stable over the entire duration, loading better than 1.1 million rows per second. 


The size of source files was 20195.94GB, size on disk was 5657.52GB.  Special thanks to Chris Wolf, Auburn '14 for execution of this benchmark.  Additional details to follow including query performance and a breakdown on compression. 
Let us help you put your data to work.  - Jim Tommaney CTO, Calpont. 

Calpont is pleased to announce high performance sub-query support

Calpont is pleased to announce high performance sub-query support with the latest alpha version (1.1.2) of InfiniDB. We’re especially happy about this release because polls done by MySQL have consistently shown – year after year – that slow subquery performance is one of the top items the MySQL community wants fixed in the server. It’s been complex work for sure, but the Calpont engineering team has come through again and delivered high-speed subqueries in only 5 months. Way to go guys!

The latest release includes fully parallel sub-queries including scalar, correlated, 'select clause', and from clause. This was a gap in functionality when Vadim Tkachenko from Percona ran some early benchmarks of InfiniDB (November 2009). Vadim (rightly) commented that InfiniDB needed subquery support, and we listened. The good news is that this functionality is now available, the better news is that the performance is consistent with other queries, and the best news is that the sub-query performance scales as additional cores are available for the system.

  • 2 cores - 12.3 seconds
  • 4 cores -  6.15 seconds
  • 8 cores - 3.34 seconds

Highlights:

  • All of Vadim's sub-query examples are now supported.
  • The sub-queries are fast. The average non-sub-query time with 8 cores was 5.7 seconds (for Vadim's queries). The sub-query statements average 3.35 seconds, again with 8 cores.
  • The sub-queries are scalable. Using query0 as an example, the new sub-queries shows very nice scalability with 2, 4 and 8 cores:

We invite you to download and try out the latest version of InfiniDB with high-speed subquery support. Please let us know what you think and thanks for your support of InfiniDB!