Contributed by Calpont, InfiniDB Community Edition is an open source, scale-up analytics database engine for your data warehousing, business intelligence and read-intensive application needs. Enabled via MySQLTM and purpose-built for an analytical workload with column-oriented technology at its core, the multi-threaded capabilities of InfiniDB Community Edition fully encompass query, transactional support and bulk load operations.  So come on in, grab a download and get started.

             | 

E-mail Print PDF

Let's take a quick look at installing and running InfiniDB on EC2. The short list of commands are listed to create a m1.xlarge instance, install InfiniDB, create a 4 disk raid set, create an InfiniDB instance, and connect to the database.  A more detailed description follows that shows a bulk load example, joins, and new subqueries.

 ec2-run-instances ami-86db39ef -k gsg-keypair -g calpont2 -t m1.xlarge -b "sdb=ephemeral0" -b "sdc=ephemeral1" -b "sdd=ephemeral2" -b "sde=ephemeral3"

ssh -i gsg-keypair.pem root@< your public DNS created by Amazon>
 
wget http://www.infinidb.org/downloads/doc_download/188-111-alpha-64-bit-rpms/download

tar -xvf download

rpm -ivh calpont*.rpm

fdisk /dev/sdb < also c,d,e> selecting these options:  n, p, 1, default, default, w

mdadm --create --verbose /dev/md0 --level=0 -c256 --raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

mke2fs /dev/md0

mount /dev/md0 /usr/local/Calpont/data1/ -t ext2 -o rw,noatime,nodiratime

/usr/local/Calpont/bin/install-infinidb.sh

service infinidb start

. /usr/local/Calpont/bin/calpontAlias

idbmysql



 


Note that  ephemeral disk do not retain data across restarts so would need to be integrated with persistent storage (S3 or EBS) for any production use.    


A more detailed description:

Visit  http://aws.amazon.com/console/  to create your account, or log in if you have one already:

 

Using Amazon command line tools, create an instance via this command:

 

ec2-run-instances ami-86db39ef -k gsg-keypair -g calpont2 -t m1.xlarge -b "sdb=ephemeral0" -b "sdc=ephemeral1" -b "sdd=ephemeral2" -b "sde=ephemeral3"

 

RESERVATION     r-0543ae6e      558601713646    calpont2

INSTANCE        i-63a3ed08      ami-86db39ef                    pending gsg-keypair     0               m1.xlarge       2010-05-06T13:38:20+0000        us-east-1a       aki-54ce2c3d    ari-56ce2c3f            monitoring-disabled                                     ebs

 

 

 

Which uses these components:

 

 

Basic 64-bit Fedora Core 8 (AMI Id: ami-86db39ef)
Fedora Core 8, 64-bit architecture, and Amazon EC2 AMI tools.

 

Instance Type:

Extra Large (m1.xlarge, 15 GB)

 

Revisit  http://aws.amazon.com/console/  to find the public DNS, then use Amazon tools to SSH in. Example here uses ec2-11-11-11-111.compute-1.amazonaws.com for the public DNS.

 

ssh -i gsg-keypair.pem This e-mail address is being protected from spambots. You need JavaScript enabled to view it

 

 

         __|  __|_  )  Fedora 8

         _|  (     /    64-bit

        ___|___|___|

 

 Welcome to an EC2 Public Image

                       :-)

    Base

 

fdisk /dev/sdb 

fdisk /dev/sdc

fdisk /dev/sdd 

fdisk /dev/sde

 

selecting these options: 

              n   (new)

  p   (primary partition)

  1   (partition 1)

return      (default)

return      (default)

  w         (write)

 

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel

Building a new DOS disklabel with disk identifier 0x55484bc9.

Changes will remain in memory only, until you decide to write them.

After that, of course, the previous content won't be recoverable.

 

 

The number of cylinders for this disk is set to 54823.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

   (e.g., DOS FDISK, OS/2 FDISK)

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

 

Command (m for help): n

Command action

   e   extended

   p   primary partition (1-4)

p

Partition number (1-4): 1

First cylinder (1-54823, default 1):

Using default value 1

Last cylinder or +size or +sizeM or +sizeK (1-54823, default 54823):

Using default value 54823

 

Command (m for help): w

The partition table has been altered!

 

Calling ioctl() to re-read partition table.

Syncing disks.

 

 

mdadm --create --verbose /dev/md0 --level=0 -c256 --raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

 

 

wget http://www.infinidb.org/downloads/doc_download/188-111-alpha-64-bit-rpms/download

 

--2010-05-06 08:59:22--  http://www.infinidb.org/downloads/doc_download/188-111-alpha-64-bit-rpms/download

Resolving www.infinidb.org... 208.72.157.207

Connecting to www.infinidb.org|208.72.157.207|:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: 13279088 (13M) [application/x-compressed-tar]

Saving to: `download'

 

100%[================================================================================================>] 13,279,088   958K/s   in 63s

 

2010-05-06 09:00:26 (204 KB/s) - `download' saved [13279088/13279088]

 

tar -xvf download

 

calpont-1.1.1-2.x86_64.rpm

calpont-mysql-1.1.1-2.x86_64.rpm

calpont-mysqld-1.1.1-2.x86_64.rpm

 

rpm -ivh calpont*.rpm

 

Preparing...                ########################################### [100%]

   1:calpont-mysqld         ########################################### [ 33%]

Calpont RPM install completed

   2:calpont                ########################################### [ 67%]

Calpont RPM install completed

   3:calpont-mysql          ########################################### [100%]

Calpont RPM install completed

mke2fs /dev/md0

 

mke2fs 1.40.4 (31-Dec-2007)

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

220184576 inodes, 440365568 blocks

22018278 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=4294967296

13439 block groups

32768 blocks per group, 32768 fragments per group

16384 inodes per group

Superblock backups stored on blocks:

        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,

        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,

        102400000, 214990848

 

Writing inode tables: done

Writing superblocks and filesystem accounting information: done

 

This filesystem will be automatically checked every 23 mounts or

180 days, whichever comes first.  Use tune2fs -c or -i to override.

 

 

/usr/local/Calpont/bin/install-infinidb.sh

 

 

DEPENDENCY LIBRARY CHECK

 

All libraries found

 

DISK SPACE CHECK

Make sure there is enough local or mounted disk space for the

InfiniDB System Catalog and the planned test Database.

Minumum disk space is 1.7 GB for the InfiniDB system catalog.

 

Filesystem            Size  Used Avail Use% Mounted on

/dev/sda1              15G  2.2G   12G  16% /

/dev/md0              1.7T   69M  1.6T   1% /usr/local/Calpont/data1

 

MEMORY SPACE CHECK

Check and set InfiniDB Configuration to optimal settings of approximately 25% of total memory.

 

Memory Total in megabytes is  15367

 

Setting Configuration memory setting to  2048M

 

SETUP INFINIDB MYSQL

 

Starting MySQL. SUCCESS!

Shutting down MySQL. SUCCESS!

Starting MySQL. SUCCESS!

Shutting down MySQL. SUCCESS!

Starting MySQL. SUCCESS!

Shutting down MySQL. SUCCESS!

InfiniDB Installation Completed

 

mount /dev/md0 /usr/local/Calpont/data1/ -t ext2 -o rw,noatime,nodiratime

 

service infinidb start

 

Starting Calpont InfiniDB Database Plaform: ....... done.

Starting Calpont InfiniDB Mysqld:

Starting MySQL. SUCCESS!

 

. /usr/local/Calpont/bin/calpontAlias

 

 

idbmysql

 

Welcome to the MySQL monitor.  Commands end with ; or g.

Your MySQL connection id is 8

Server version: 5.1.39 MySQL Community / InfiniDB Community Edition 1.1.1-2 Alpha (GPL)

 

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

 

mysql>

 

 

 

 

 

cat cpimport_single.sh

 

echo "Usage:"

echo "cpimport_single  schema  table  delimiter  job  path_and_file"

echo "cpimport_single  test1   test2  |          1000 /tmp/test1.out"

echo ""

cd /usr/local/Calpont/data/bulk/data/import

sed  -i 's/N//g' $5

/usr/local/Calpont/bin/colxml $1 -t $2 -d $3 -j $4

ln -s -f $5 $2.tbl

/usr/local/Calpont/bin/cpimport -j $4

 

 

 

Example SQL and Export / Bulk Load (cpimport):

 

create schema test1;

use test1;

create table test1 (c1 int, c2 varchar(5), c3 date) engine=infinidb;

create table test2 (c1 int, c2 varchar(5), c3 date) engine=infinidb;

                         (4,'four','2010-01-04'),(5,'five','2010-01-05');


select * into outfile '/tmp/test1.out'  from test1;


Create the cpimport_single.sh  file and run as follows: which allows bulk load via a single command.  Note that the script contains a command to replace '/N' with an empty string for bulkloading.  

. cpimport_single.sh test1 test2 " " 1001 /tmp/test1.out

 

Usage:

cpimport_single  schema  table  delimiter  job  path_and_file

cpimport_single  test1   test2  |          1000 /tmp/test1.out

 

Running colxml with the following parameters:

2010-05-06 13:24:35 (9881) INFO :

        Schema: test1

        Tables: test2

        Load Files:

        -b      0

        -c      1048576

        -d     

        -e      10

        -f      CSV

        -j      1001

        -n

        -p      /usr/local/Calpont/data/bulk/job/

        -r      5

        -s

        -u

        -w      10485760

        -x      tbl

 

File completed for tables:

        test1.test2

 

Normal exit.

 

Bulkload root directory : /usr/local/Calpont/data/bulk

job description file : Job_1001.xml

2010-05-06 13:24:35 (9946) INFO : successfully load job file /usr/local/Calpont/data/bulk/job/Job_1001.xml

2010-05-06 13:24:35 (9946) INFO : PreProcessing check starts

2010-05-06 13:24:35 (9946) INFO : PreProcessing check completed

2010-05-06 13:24:35 (9946) INFO : preProcess completed, total run time : 0 seconds

2010-05-06 13:24:35 (9946) INFO : No of Read Threads Spawned = 1

2010-05-06 13:24:35 (9946) INFO : No of Parse Threads Spawned = 3

2010-05-06 13:24:35 (9946) INFO : For table test1.test2: 5 rows processed and 5 rows inserted.

2010-05-06 13:24:35 (9946) INFO : Bulk load completed, total run time : 0 seconds

 

 

Example small Bulk Load

 

time for (( i = 1 ;  i <= 200000;  i++ ));  do cat /tmp/test1.out ; done > /tmp/test1_1m.out

 

real    8m7.965s

user    0m12.566s

sys     1m30.735s

 

. cpimport_single.sh test1 test2 " " 1001 /tmp/test1_1m.out

 

Usage:

cpimport_single  schema  table  delimiter  job  path_and_file

cpimport_single  test1   test2  |          1000 /tmp/test1.out

 

Running colxml with the following parameters:

2010-05-06 13:40:37 (15947) INFO :

        Schema: test1

        Tables: test2

        Load Files:

        -b      0

        -c      1048576

        -d     

        -e      10

        -f      CSV

        -j      1001

        -n

        -p      /usr/local/Calpont/data/bulk/job/

        -r      5

        -s

        -u

        -w      10485760

        -x      tbl

 

File completed for tables:

        test1.test2

 

Normal exit.

 

Bulkload root directory : /usr/local/Calpont/data/bulk

job description file : Job_1001.xml

2010-05-06 13:40:37 (16012) INFO : successfully load job file /usr/local/Calpont/data/bulk/job/Job_1001.xml

2010-05-06 13:40:37 (16012) INFO : PreProcessing check starts

2010-05-06 13:40:37 (16012) INFO : PreProcessing check completed

2010-05-06 13:40:37 (16012) INFO : preProcess completed, total run time : 0 seconds

2010-05-06 13:40:37 (16012) INFO : No of Read Threads Spawned = 1

2010-05-06 13:40:37 (16012) INFO : No of Parse Threads Spawned = 3

2010-05-06 13:40:39 (16012) INFO : For table test1.test2: 1000000 rows processed and 1000000 rows inserted.

2010-05-06 13:40:39 (16012) INFO : Bulk load completed, total run time : 2 seconds

 

 

 

cat quicktest.sql

 

-- join 5, 1m

select test1.c1, min(test2.c3), sum(test1.c1), count(distinct(test2.c2)) from test1 join test2 using (c1) group by 1;

 

-- subquery 5 sub 1m

select count(*) from test1 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4);

 

-- subquery 1m sub 1m

select c1, count(*) from test2 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4) group by c1;

 

idbmysql -vvv test1 < quicktest.sql

 

[root@domU-12-31-39-0E-35-C1 import]# idbmysql -vvv test1 < quicktest.sql

--------------

select test1.c1, min(test2.c3), sum(test1.c1), count(distinct(test2.c2)) from test1 join test2 using (c1) group by 1

--------------

 

+------+---------------+---------------+---------------------------+

| c1   | min(test2.c3) | sum(test1.c1) | count(distinct(test2.c2)) |

+------+---------------+---------------+---------------------------+

|    1 | 2010-01-01    |        200000 |                         1 |

|    2 | 2010-02-02    |        400000 |                         1 |

|    3 | NULL          |        600000 |                         0 |

|    4 | 2010-01-04    |        800000 |                         1 |

|    5 | 2010-01-05    |       1000000 |                         1 |

+------+---------------+---------------+---------------------------+

5 rows in set (0.84 sec)

 

--------------

select count(*) from test1 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4)

--------------

 

+----------+

| count(*) |

+----------+

|        1 |

+----------+

1 row in set (0.20 sec)

 

--------------

select c1, count(*) from test2 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4) group by c1

--------------

 

+------+----------+

| c1   | count(*) |

+------+----------+

|    2 |   200000 |

+------+----------+

1 row in set (0.32 sec)

 

Bye

 

idbmysql -vvv test_myisam < quicktest.sql


As a quick test, I loaded the same data into MyISAM tables with an index on c1 to show the comparative timing.

[root@domU-12-31-39-0E-35-C1 import]# idbmysql -vvv test_myisam < quicktest.sql

--------------

select test1.c1, min(test2.c3), sum(test1.c1), count(distinct(test2.c2)) from test1 join test2 using (c1) group by 1

--------------

 

+------+---------------+---------------+---------------------------+

| c1   | min(test2.c3) | sum(test1.c1) | count(distinct(test2.c2)) |

+------+---------------+---------------+---------------------------+

|    1 | 2010-01-01    |        200000 |                         1 |

|    2 | 2010-02-02    |        400000 |                         1 |

|    3 | NULL          |        600000 |                         0 |

|    4 | 2010-01-04    |        800000 |                         1 |

|    5 | 2010-01-05    |       1000000 |                         1 |

+------+---------------+---------------+---------------------------+

5 rows in set (3.80 sec)

 

--------------

select count(*) from test1 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4)

--------------

 

+----------+

| count(*) |

+----------+

|        1 |

+----------+

1 row in set (10.39 sec)

 

--------------

select c1, count(*) from test2 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4) group by c1

--------------


< dnf >