Contributed by Calpont, InfiniDB Community Edition is an open source, scale-up analytics database engine for your data warehousing, business intelligence and read-intensive application needs. Enabled via MySQLTM and purpose-built for an analytical workload with column-oriented technology at its core, the multi-threaded capabilities of InfiniDB Community Edition fully encompass query, transactional support and bulk load operations.  So come on in, grab a download and get started.

             | 

InfiniDB Team Blog

News and tidbits from your InfiniDB team.

A Behind the Scenes look at InfiniDB: Parallelism (Part 2 of 3)

As part of a second post in our three-part series, we'd like to discuss the importance of parallelism and how InfiniDB uses parallelism to achieve breath taking performance.

InfiniDB is architected for effortless scalability.

The InfiniDB database is comprised of two types of modules: User Modules and Performance Modules. User Modules interpret MySQL commands and convert them into parallelized code for execution.

Performance Modules execute the parallelized code and return the result to the User Modules. Performance Modules are architected to be lightweight and to execute in a MapReduce-fashion, similar to Apache Hadoop. (To be clear, though, Performance Modules do not use Hadoop code. In fact, we started designing this architecture well before Hadoop/HDFS’ adoption.)

Our customers typically have one or a few User Modules, although customers may use any number of User Modules to provide high-availability and workload balancing. In a distributed environment, one or more User Modules may be interacting with any number of Performance Modules.

Due to InfiniDB's Map-Reduce-like scale-out architecture, our Performance Modules execute requests extremely effectively.   Each thread within the distributed architecture operates independently, avoiding thread-to-thread or node-to-node communication that can cripple scaling.

What this means for our customers is that InfiniDB scales linearly. In internal tests, we've scaled InfiniDB on Amazon EC2 to 1024 cores without noticeable loss in performance. In fact, we believe that we can scale to 1 Million cores (although our customers have not yet asked us for deployments of this size).

The impact of this is groundbreaking. Large deployments, such as companies delivering analytics over the Cloud, require extremely efficient databases to enable their customers’ needs. Lightweight chips, such as mobile processors and GPUs, also need efficient databases due to hardware constraints. Companies with massive amounts of data (such as Petabyte-size data volumes) need highly-efficient technology to enable analytics their data. InfiniDB's architecture remains performant under all of these scenarios.

Stay tuned for the third post in this three part series, where we discuss the unparalleled ease-of-use with which our customers use InfiniDB.

A Behind the Scenes look at InfiniDB: I/O Efficiency (Part 1 of 3)

Since the launch of InfiniDB last year, we've been seeing InfiniDB enabling tremendous customer successes. We've seen hundreds of customers use InfiniDB to power their most impactful analytics projects.

We’d like to describe InfiniDB's architecture and explain what makes it so scalable, fast and simple. In a three-part blog series, we'll be covering how InfiniDB differs from other database systems.

Compared to relational databases, InfiniDB has three key benefits: I/O, parallelism and ease-of-use.

First, let's start with I/O. Traditionally, the bottleneck in database processing has been I/O. When you're moving large data volumes, small differences in I/O start to add up. You can imagine the phrase "being bitten to death by ducks"; a growing series of small I/O penalties make traditional relational database systems unusable for analytics on large datasets (i.e. typically data volumes over 500 GB or more, often described as "big data").

Traditionally, database technologies have found ways to alleviate -- but not fix -- the pain. For example, Netezza (an IBM company) allows for scaling the scan rate to overcome the I/O bottleneck. However, such solutions remain costly as they require large investments in proprietary hardware.

We set out to change all of that with InfiniDB.

One of the most impactful ways to alleviate the I/O bottleneck is to align the way that data is stored with the way that it's used. For analytics, this suggests 'columnar databases' or column-stores. Unlike a traditional row-based database, which store data in rows ("First Name", "Last Name", "Age") columnar databases store data in columns (i.e. for the column "Age": '32', '44', '65').

For analytics, specific columns tend to be pulled frequently (i.e. Average of Ages '32', '44', and '65') which makes columnar databases typically the tool-of-choice. And, since columnar databases like InfiniDB can be deployed on commodity hardware, this solution tends to be much less expensive as well.

Once installed, our customers often can't believe the performance that they gain with InfiniDB. They're even more excited when they learn that InfiniDB is priced by the core and doesn't tax their growing data volumes.

What happens when your data volumes increase further and you need to scale out? Read our next post on Parallelism where we describe InfiniDB's industry-leading scale-out functionality.