Contributed by Calpont, InfiniDB Community Edition is an open source, scale-up analytics database engine for your data warehousing, business intelligence and read-intensive application needs. Enabled via MySQLTM and purpose-built for an analytical workload with column-oriented technology at its core, the multi-threaded capabilities of InfiniDB Community Edition fully encompass query, transactional support and bulk load operations.  So come on in, grab a download and get started.

             | 

InfiniDB Team Blog

News and tidbits from your InfiniDB team.

A Behind the Scenes look at InfiniDB: Parallelism (Part 2 of 3)

As part of a second post in our three-part series, we'd like to discuss the importance of parallelism and how InfiniDB uses parallelism to achieve breath taking performance.

InfiniDB is architected for effortless scalability.

The InfiniDB database is comprised of two types of modules: User Modules and Performance Modules. User Modules interpret MySQL commands and convert them into parallelized code for execution.

Performance Modules execute the parallelized code and return the result to the User Modules. Performance Modules are architected to be lightweight and to execute in a MapReduce-fashion, similar to Apache Hadoop. (To be clear, though, Performance Modules do not use Hadoop code. In fact, we started designing this architecture well before Hadoop/HDFS’ adoption.)

Our customers typically have one or a few User Modules, although customers may use any number of User Modules to provide high-availability and workload balancing. In a distributed environment, one or more User Modules may be interacting with any number of Performance Modules.

Due to InfiniDB's Map-Reduce-like scale-out architecture, our Performance Modules execute requests extremely effectively.   Each thread within the distributed architecture operates independently, avoiding thread-to-thread or node-to-node communication that can cripple scaling.

What this means for our customers is that InfiniDB scales linearly. In internal tests, we've scaled InfiniDB on Amazon EC2 to 1024 cores without noticeable loss in performance. In fact, we believe that we can scale to 1 Million cores (although our customers have not yet asked us for deployments of this size).

The impact of this is groundbreaking. Large deployments, such as companies delivering analytics over the Cloud, require extremely efficient databases to enable their customers’ needs. Lightweight chips, such as mobile processors and GPUs, also need efficient databases due to hardware constraints. Companies with massive amounts of data (such as Petabyte-size data volumes) need highly-efficient technology to enable analytics their data. InfiniDB's architecture remains performant under all of these scenarios.

Stay tuned for the third post in this three part series, where we discuss the unparalleled ease-of-use with which our customers use InfiniDB.

A Behind the Scenes look at InfiniDB: I/O Efficiency (Part 1 of 3)

Since the launch of InfiniDB last year, we've been seeing InfiniDB enabling tremendous customer successes. We've seen hundreds of customers use InfiniDB to power their most impactful analytics projects.

We’d like to describe InfiniDB's architecture and explain what makes it so scalable, fast and simple. In a three-part blog series, we'll be covering how InfiniDB differs from other database systems.

Compared to relational databases, InfiniDB has three key benefits: I/O, parallelism and ease-of-use.

First, let's start with I/O. Traditionally, the bottleneck in database processing has been I/O. When you're moving large data volumes, small differences in I/O start to add up. You can imagine the phrase "being bitten to death by ducks"; a growing series of small I/O penalties make traditional relational database systems unusable for analytics on large datasets (i.e. typically data volumes over 500 GB or more, often described as "big data").

Traditionally, database technologies have found ways to alleviate -- but not fix -- the pain. For example, Netezza (an IBM company) allows for scaling the scan rate to overcome the I/O bottleneck. However, such solutions remain costly as they require large investments in proprietary hardware.

We set out to change all of that with InfiniDB.

One of the most impactful ways to alleviate the I/O bottleneck is to align the way that data is stored with the way that it's used. For analytics, this suggests 'columnar databases' or column-stores. Unlike a traditional row-based database, which store data in rows ("First Name", "Last Name", "Age") columnar databases store data in columns (i.e. for the column "Age": '32', '44', '65').

For analytics, specific columns tend to be pulled frequently (i.e. Average of Ages '32', '44', and '65') which makes columnar databases typically the tool-of-choice. And, since columnar databases like InfiniDB can be deployed on commodity hardware, this solution tends to be much less expensive as well.

Once installed, our customers often can't believe the performance that they gain with InfiniDB. They're even more excited when they learn that InfiniDB is priced by the core and doesn't tax their growing data volumes.

What happens when your data volumes increase further and you need to scale out? Read our next post on Parallelism where we describe InfiniDB's industry-leading scale-out functionality.

InfiniDB 1.5 Final is Now Available!

I am very excited to announce that the the FINAL 1.5 version of the InfiniDB Community Edition is now available for use.  Thanks to everyone in the community for helping us through the alpha, beta, and RC cycles to the 1.5 release of InfiniDB.

We've put a lot of hard work into this release and we've come a long way since 1.0. Here's a reminder of a few of the features that have been added since 1.0:

  • High-speed subqueries.
  • Support for running on Microsoft Windows.
  • Support for updating a column's value with another column's value or a value derived from other columns.
  • Support for functions and expressions in the where clause for DML statements.
  • Support for queries with mixed outer joins
  • Support for several new distributed functions.
  • Automatic updating of the Extent Map by cpimport.
  • Support for prepared statements.
  • Support for load data infile null escape sequence in cpimport.
  • Enhanced Extent Map handling for minimum/maximum column value projections between multiple tables.
  • Enhanced support for UTF-8.
  • Faster concurrent query/active transaction performance.
  • STDIN support for imported data in cpimport (used by certain BI tools such as Pentaho).

I am very proud of the amount of content that we've been able to add in a release just five short months after 1.0.  We have a great team and it's a testament to their hard work and the foundation of a solid architecture.

This release also includes a number of bug fixes that you can see at http://bugs.launchpad.net/infinidb.

You can download the latest InfiniDB binaries, source code, and updated documentation at: http://infinidb.org/downloads.

In addition, we also made 1.5 available as InfiniDB Enterprise Edition today.  If you're interested in taking a look at the Enterprise side, be sure to check out the new look and feel to the home for InfiniDB Enterprise Edition at http://calpont.com.  As always, thanks for your support of InfiniDB and let us know what you think!