Let's take a quick look at installing and running InfiniDB on EC2. The short list of commands are listed to create a m1.xlarge instance, install InfiniDB, create a 4 disk raid set, create an InfiniDB instance, and connect to the database. A more detailed description follows that shows a bulk load example, joins, and new subqueries.
ec2-run-instances ami-86db39ef -k gsg-keypair -g calpont2 -t m1.xlarge -b "sdb=ephemeral0" -b "sdc=ephemeral1" -b "sdd=ephemeral2" -b "sde=ephemeral3"
ssh -i gsg-keypair.pem root@< your public DNS created by Amazon>
wget http://www.infinidb.org/downloads/doc_download/188-111-alpha-64-bit-rpms/download
tar -xvf download
rpm -ivh calpont*.rpm
fdisk /dev/sdb < also c,d,e> selecting these options: n, p, 1, default, default, w
mdadm --create --verbose /dev/md0 --level=0 -c256 --raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
mke2fs /dev/md0
mount /dev/md0 /usr/local/Calpont/data1/ -t ext2 -o rw,noatime,nodiratime
/usr/local/Calpont/bin/install-infinidb.sh
service infinidb start
. /usr/local/Calpont/bin/calpontAlias
idbmysql
Note that ephemeral disk do not retain data across restarts so would need to be integrated with persistent storage (S3 or EBS) for any production use.

A more detailed description:
Visit http://aws.amazon.com/console/ to create your account, or log in if you have one already:
Using Amazon command line tools, create an instance via this command:
ec2-run-instances ami-86db39ef -k gsg-keypair -g calpont2 -t m1.xlarge -b "sdb=ephemeral0" -b "sdc=ephemeral1" -b "sdd=ephemeral2" -b "sde=ephemeral3"
|
RESERVATION r-0543ae6e 558601713646 calpont2 INSTANCE i-63a3ed08 ami-86db39ef pending gsg-keypair 0 m1.xlarge 2010-05-06T13:38:20+0000 us-east-1a aki-54ce2c3d ari-56ce2c3f monitoring-disabled ebs
|
Which uses these components:
|
|
Basic 64-bit Fedora Core 8 (AMI Id: ami-86db39ef) |
|
Instance Type: |
Extra Large (m1.xlarge, 15 GB) |
Revisit http://aws.amazon.com/console/ to find the public DNS, then use Amazon tools to SSH in. Example here uses ec2-11-11-11-111.compute-1.amazonaws.com for the public DNS.
ssh -i gsg-keypair.pem This e-mail address is being protected from spambots. You need JavaScript enabled to view it
|
__| __|_ ) Fedora 8 _| ( / 64-bit ___|___|___|
Welcome to an EC2 Public Image :-) Base |
fdisk /dev/sdb
fdisk /dev/sdc
fdisk /dev/sdd
fdisk /dev/sde
selecting these options:
n (new)
p (primary partition)
1 (partition 1)
return (default)
return (default)
w (write)
|
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel with disk identifier 0x55484bc9. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable.
The number of cylinders for this disk is set to 54823. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-54823, default 1): Using default value 1 Last cylinder or +size or +sizeM or +sizeK (1-54823, default 54823): Using default value 54823
Command (m for help): w The partition table has been altered!
Calling ioctl() to re-read partition table. Syncing disks. |
mdadm --create --verbose /dev/md0 --level=0 -c256 --raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
wget http://www.infinidb.org/downloads/doc_download/188-111-alpha-64-bit-rpms/download
|
--2010-05-06 08:59:22-- http://www.infinidb.org/downloads/doc_download/188-111-alpha-64-bit-rpms/download Resolving www.infinidb.org... 208.72.157.207 Connecting to www.infinidb.org|208.72.157.207|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 13279088 (13M) [application/x-compressed-tar] Saving to: `download'
100%[================================================================================================>] 13,279,088 958K/s in 63s
2010-05-06 09:00:26 (204 KB/s) - `download' saved [13279088/13279088] |
tar -xvf download
|
calpont-1.1.1-2.x86_64.rpm calpont-mysql-1.1.1-2.x86_64.rpm calpont-mysqld-1.1.1-2.x86_64.rpm |
rpm -ivh calpont*.rpm
|
Preparing... ########################################### [100%] 1:calpont-mysqld ########################################### [ 33%] Calpont RPM install completed 2:calpont ########################################### [ 67%] Calpont RPM install completed 3:calpont-mysql ########################################### [100%] Calpont RPM install completed |
mke2fs /dev/md0
|
mke2fs 1.40.4 (31-Dec-2007) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 220184576 inodes, 440365568 blocks 22018278 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 13439 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848
Writing inode tables: done Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 23 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. |
/usr/local/Calpont/bin/install-infinidb.sh
|
DEPENDENCY LIBRARY CHECK
All libraries found
DISK SPACE CHECK Make sure there is enough local or mounted disk space for the InfiniDB System Catalog and the planned test Database. Minumum disk space is 1.7 GB for the InfiniDB system catalog.
Filesystem Size Used Avail Use% Mounted on /dev/sda1 15G 2.2G 12G 16% / /dev/md0 1.7T 69M 1.6T 1% /usr/local/Calpont/data1
MEMORY SPACE CHECK Check and set InfiniDB Configuration to optimal settings of approximately 25% of total memory.
Memory Total in megabytes is 15367
Setting Configuration memory setting to 2048M
SETUP INFINIDB MYSQL
Starting MySQL. SUCCESS! Shutting down MySQL. SUCCESS! Starting MySQL. SUCCESS! Shutting down MySQL. SUCCESS! Starting MySQL. SUCCESS! Shutting down MySQL. SUCCESS! InfiniDB Installation Completed |
mount /dev/md0 /usr/local/Calpont/data1/ -t ext2 -o rw,noatime,nodiratime
service infinidb start
|
Starting Calpont InfiniDB Database Plaform: ....... done. Starting Calpont InfiniDB Mysqld: Starting MySQL. SUCCESS! |
. /usr/local/Calpont/bin/calpontAlias
idbmysql
|
Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 8 Server version: 5.1.39 MySQL Community / InfiniDB Community Edition 1.1.1-2 Alpha (GPL)
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> |
cat cpimport_single.sh
|
echo "Usage:" echo "cpimport_single schema table delimiter job path_and_file" echo "cpimport_single test1 test2 | 1000 /tmp/test1.out" echo "" cd /usr/local/Calpont/data/bulk/data/import sed -i 's/N//g' $5 /usr/local/Calpont/bin/colxml $1 -t $2 -d $3 -j $4 ln -s -f $5 $2.tbl /usr/local/Calpont/bin/cpimport -j $4 |
Example SQL and Export / Bulk Load (cpimport):
create schema test1;
use test1;
create table test1 (c1 int, c2 varchar(5), c3 date) engine=infinidb;
create table test2 (c1 int, c2 varchar(5), c3 date) engine=infinidb;
(4,'four','2010-01-04'),(5,'five','2010-01-05');
select * into outfile '/tmp/test1.out' from test1;
Create the cpimport_single.sh file and run as follows: which allows bulk load via a single command. Note that the script contains a command to replace '/N' with an empty string for bulkloading.
. cpimport_single.sh test1 test2 " " 1001 /tmp/test1.out
|
Usage: cpimport_single schema table delimiter job path_and_file cpimport_single test1 test2 | 1000 /tmp/test1.out
Running colxml with the following parameters: 2010-05-06 13:24:35 (9881) INFO : Schema: test1 Tables: test2 Load Files: -b 0 -c 1048576 -d -e 10 -f CSV -j 1001 -n -p /usr/local/Calpont/data/bulk/job/ -r 5 -s -u -w 10485760 -x tbl
File completed for tables: test1.test2
Normal exit.
Bulkload root directory : /usr/local/Calpont/data/bulk job description file : Job_1001.xml 2010-05-06 13:24:35 (9946) INFO : successfully load job file /usr/local/Calpont/data/bulk/job/Job_1001.xml 2010-05-06 13:24:35 (9946) INFO : PreProcessing check starts 2010-05-06 13:24:35 (9946) INFO : PreProcessing check completed 2010-05-06 13:24:35 (9946) INFO : preProcess completed, total run time : 0 seconds 2010-05-06 13:24:35 (9946) INFO : No of Read Threads Spawned = 1 2010-05-06 13:24:35 (9946) INFO : No of Parse Threads Spawned = 3 2010-05-06 13:24:35 (9946) INFO : For table test1.test2: 5 rows processed and 5 rows inserted. 2010-05-06 13:24:35 (9946) INFO : Bulk load completed, total run time : 0 seconds |
Example small Bulk Load
time for (( i = 1 ; i <= 200000; i++ )); do cat /tmp/test1.out ; done > /tmp/test1_1m.out
|
real 8m7.965s user 0m12.566s sys 1m30.735s |
. cpimport_single.sh test1 test2 " " 1001 /tmp/test1_1m.out
|
Usage: cpimport_single schema table delimiter job path_and_file cpimport_single test1 test2 | 1000 /tmp/test1.out
Running colxml with the following parameters: 2010-05-06 13:40:37 (15947) INFO : Schema: test1 Tables: test2 Load Files: -b 0 -c 1048576 -d -e 10 -f CSV -j 1001 -n -p /usr/local/Calpont/data/bulk/job/ -r 5 -s -u -w 10485760 -x tbl
File completed for tables: test1.test2
Normal exit.
Bulkload root directory : /usr/local/Calpont/data/bulk job description file : Job_1001.xml 2010-05-06 13:40:37 (16012) INFO : successfully load job file /usr/local/Calpont/data/bulk/job/Job_1001.xml 2010-05-06 13:40:37 (16012) INFO : PreProcessing check starts 2010-05-06 13:40:37 (16012) INFO : PreProcessing check completed 2010-05-06 13:40:37 (16012) INFO : preProcess completed, total run time : 0 seconds 2010-05-06 13:40:37 (16012) INFO : No of Read Threads Spawned = 1 2010-05-06 13:40:37 (16012) INFO : No of Parse Threads Spawned = 3 2010-05-06 13:40:39 (16012) INFO : For table test1.test2: 1000000 rows processed and 1000000 rows inserted. 2010-05-06 13:40:39 (16012) INFO : Bulk load completed, total run time : 2 seconds
|
cat quicktest.sql
|
-- join 5, 1m select test1.c1, min(test2.c3), sum(test1.c1), count(distinct(test2.c2)) from test1 join test2 using (c1) group by 1;
-- subquery 5 sub 1m select count(*) from test1 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4);
-- subquery 1m sub 1m select c1, count(*) from test2 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4) group by c1; |
idbmysql -vvv test1 < quicktest.sql
[root@domU-12-31-39-0E-35-C1 import]# idbmysql -vvv test1 < quicktest.sql
--------------
select test1.c1, min(test2.c3), sum(test1.c1), count(distinct(test2.c2)) from test1 join test2 using (c1) group by 1
--------------
+------+---------------+---------------+---------------------------+
| c1 | min(test2.c3) | sum(test1.c1) | count(distinct(test2.c2)) |
+------+---------------+---------------+---------------------------+
| 1 | 2010-01-01 | 200000 | 1 |
| 2 | 2010-02-02 | 400000 | 1 |
| 3 | NULL | 600000 | 0 |
| 4 | 2010-01-04 | 800000 | 1 |
| 5 | 2010-01-05 | 1000000 | 1 |
+------+---------------+---------------+---------------------------+
5 rows in set (0.84 sec)
--------------
select count(*) from test1 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4)
--------------
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.20 sec)
--------------
select c1, count(*) from test2 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4) group by c1
--------------
+------+----------+
| c1 | count(*) |
+------+----------+
| 2 | 200000 |
+------+----------+
1 row in set (0.32 sec)
Bye
idbmysql -vvv test_myisam < quicktest.sql
As a quick test, I loaded the same data into MyISAM tables with an index on c1 to show the comparative timing.
[root@domU-12-31-39-0E-35-C1 import]# idbmysql -vvv test_myisam < quicktest.sql
--------------
select test1.c1, min(test2.c3), sum(test1.c1), count(distinct(test2.c2)) from test1 join test2 using (c1) group by 1
--------------
+------+---------------+---------------+---------------------------+
| c1 | min(test2.c3) | sum(test1.c1) | count(distinct(test2.c2)) |
+------+---------------+---------------+---------------------------+
| 1 | 2010-01-01 | 200000 | 1 |
| 2 | 2010-02-02 | 400000 | 1 |
| 3 | NULL | 600000 | 0 |
| 4 | 2010-01-04 | 800000 | 1 |
| 5 | 2010-01-05 | 1000000 | 1 |
+------+---------------+---------------+---------------------------+
5 rows in set (3.80 sec)
--------------
select count(*) from test1 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4)
--------------
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (10.39 sec)
--------------
select c1, count(*) from test2 where c1 in (select max(c1) from test2 where c2 is not null and c1 < 4) group by c1
--------------
< dnf >





