Contributed by Calpont, InfiniDB Community Edition is an open source, scale-up analytics database engine for your data warehousing, business intelligence and read-intensive application needs. Enabled via MySQLTM and purpose-built for an analytical workload with column-oriented technology at its core, the multi-threaded capabilities of InfiniDB Community Edition fully encompass query, transactional support and bulk load operations.  So come on in, grab a download and get started.

             | 

How to analyze SQL performance in InfiniDB

E-mail Print PDF

One of the things I found missing when I came to MySQL from other databases was a good SQL tracing utility that helped me understand exactly what a long running SQL statement was doing. The inclusion of the SQL Profiler in post 5.0 versions of MySQL helped, but I always felt more could be done.

With InfiniDB, you have some new SQL diagnostic and tracing tools that you can use to get more performance data from SQL statements that don’t seem to be running well.  Let me give you a quick snapshot of what you have available.

First, the standard EXPLAIN plan is not going to be very helpful as InfiniDB doesn’t use indexes or MySQL much for the heavy lifting portion of SQL processing:

mysql> explain select c_name, sum(l_quantity)
    ->      from customer, orders, lineitem
    ->      where c_custkey = o_custkey and
    ->      l_orderkey = o_orderkey and
    ->      l_shipdate = '1992-01-02'
    ->      group by c_nameG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2000
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2000
        Extra: Using where; Using join buffer
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: lineitem
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2000
        Extra: Using where with pushed condition; Using join buffer
3 rows in set (0.00 sec)

Right now, the EXPLAIN output is, well, pretty bogus.

Next, you can use the SQL Profiler (that was of GA quality starting in MySQL 5.1) with InfiniDB.  Let’s see how this works: first, I’ll flush the InfiniDB cache so we get some physical I/O, set a profiling session, execute a three-way join against two medium-sized tables and one semi-large table (lineitem which has 1.2 billion rows), and then examine the profiling results:

mysql> select calflushcache();                                                                               +-----------------+
| calflushcache() |
+-----------------+
|               0 |
+-----------------+
1 row in set (0.20 sec)
mysql> set profiling=1;
Query OK, 0 rows affected (0.00 sec)
mysql> select c_name, sum(l_quantity)
    ->      from customer, orders, lineitem
    ->      where c_custkey = o_custkey and
    ->      l_orderkey = o_orderkey and
    ->      l_shipdate = '1992-01-02'
    ->      group by c_name;
+--------------------+-----------------+
| c_name             | sum(l_quantity) |
+--------------------+-----------------+
| Customer#000094199 |         7035.00 |
| Customer#000064205 |         7839.00 |
| Customer#000120376 |         2814.00 |
| Customer#000146399 |         4221.00 |
| Customer#000104149 |         4020.00 |
| Customer#000009263 |         3819.00 |
| Customer#000072862 |         1005.00 |
| Customer#000003043 |         8643.00 |
| Customer#000139421 |         9045.00 |
| Customer#000081157 |         2412.00 |
| Customer#000015592 |         4020.00 |
| Customer#000098428 |         8442.00 |
| Customer#000069892 |         1206.00 |
| Customer#000084082 |         2814.00 |
| Customer#000071485 |         5427.00 |
| Customer#000019397 |         5628.00 |
| Customer#000013060 |         4824.00 |
+--------------------+-----------------+
17 rows in set (32.44 sec)

mysql> set profiling=0;
Query OK, 0 rows affected (0.00 sec)
mysql> show profilesG
*************************** 1. row ***************************
Query_ID: 1
Duration: 32.35964100
   Query: select c_name, sum(l_quantity)
     from customer, orders, lineitem
     where c_custkey = o_custkey and
     l_orderkey = o_orderkey and
     l_shipdate = '1992-01-02'
     group by c_name
1 row in set (0.00 sec)
mysql> select min(seq) seq,state,count(*) numb_ops,
    ->           round(sum(duration),5) sum_dur, round(avg(duration),5) avg_dur,
    ->           round(sum(cpu_user),5) sum_cpu, round(avg(cpu_user),5) avg_cpu
    ->           from information_schema.profiling
    ->           where query_id = 1
    ->           group by state
    ->           order by seq;
+------+----------------------+----------+----------+----------+---------+---------+
| seq  | state                | numb_ops | sum_dur  | avg_dur  | sum_cpu | avg_cpu |
+------+----------------------+----------+----------+----------+---------+---------+
|    1 | starting             |        1 |  0.00018 |  0.00018 | 0.00000 | 0.00000 |
|    2 | query end            |        5 |  0.00002 |  0.00000 | 0.00000 | 0.00000 |
|    3 | freeing items        |        5 |  0.00031 |  0.00006 | 0.00000 | 0.00000 |
|    4 | checking permissions |        3 |  0.00001 |  0.00000 | 0.00000 | 0.00000 |
|    5 | Opening tables       |        3 |  0.00015 |  0.00005 | 0.00000 | 0.00000 |
|    6 | System lock          |        4 |  0.00002 |  0.00001 | 0.00000 | 0.00000 |
|    7 | Table lock           |        4 |  0.00003 |  0.00001 | 0.00000 | 0.00000 |
|    8 | init                 |        3 |  0.00008 |  0.00003 | 0.00000 | 0.00000 |
|    9 | creating table       |        2 |  0.00398 |  0.00199 | 0.00000 | 0.00000 |
|   10 | After create         |        2 |  0.00011 |  0.00005 | 0.00000 | 0.00000 |
|   13 | optimizing           |        2 |  0.00002 |  0.00001 | 0.00000 | 0.00000 |
|   14 | statistics           |        2 |  0.00002 |  0.00001 | 0.00000 | 0.00000 |
|   15 | preparing            |        2 |  0.00003 |  0.00001 | 0.00000 | 0.00000 |
|   16 | Creating tmp table   |        1 |  0.00003 |  0.00003 | 0.00000 | 0.00000 |
|   17 | executing            |        2 |  0.02549 |  0.01274 | 0.00200 | 0.00100 |
|   18 | end                  |        4 |  0.00004 |  0.00001 | 0.00000 | 0.00000 |
|   19 | removing tmp table   |        1 |  0.00001 |  0.00001 | 0.00000 | 0.00000 |
|   29 | setup                |        1 |  0.00001 |  0.00001 | 0.00000 | 0.00000 |
|   32 | copy to tmp table    |        1 |  0.00003 |  0.00003 | 0.00000 | 0.00000 |
|   33 | rename result table  |        1 |  0.00014 |  0.00014 | 0.00000 | 0.00000 |
|   45 | Sending data         |        1 | 32.32892 | 32.32892 | 0.00000 | 0.00000 |
|   51 | cleaning up          |        1 |  0.00002 |  0.00002 | 0.00000 | 0.00000 |
+------+----------------------+----------+----------+----------+---------+---------+
22 rows in set (0.02 sec)

Not bad info at all, but some things weren’t picked up from InfiniDB activities, and there are still other areas worth checking out.  Now, let’s use InfiniDB’s two primary SQL utilities, one of which is a SQL diagnostic function call and one that’s a trace.

The calgetstats() call gives you an overall read of various SQL statement performance stats. Let’s reset the cache, reissue the same SQL statement, and then call calgetstats() to see what we can see:

mysql> select calflushcache();

| calflushcache() |
+-----------------+
|               0 |
+-----------------+
1 row in set (0.20 sec)
mysql> select c_name, sum(l_quantity)
    ->      from customer, orders, lineitem
    ->      where c_custkey = o_custkey and
    ->      l_orderkey = o_orderkey and
    ->      l_shipdate = '1992-01-02'
    ->      group by c_name;
.
.
17 rows in set (32.44 sec)
mysql> select calgetstats()G
*************************** 1. row ***************************
calgetstats(): Query Stats: MaxMemPct-0; NumTempFiles-0;
TempFileSpace-0MB; ApproxPhyI/O-1184667; CacheI/O-1179406;
BlocksTouched-1187377; PartitionBlocksEliminated-0; MsgBytesIn-17MB; MsgBytesOut-0MB;
Mode-Distributed| 1257276196 222563
1 row in set (0.00 sec)

The diagnostic fields are:

  • MaxMemPct;  This field shows memory utilization for the User Module (UM) in support of any UM join, group by, aggregation, distinct, or other operation.
  • NumTempFiles; This field shows any temporary file utilization for the User Module (UM) in support of any UM join, group by, aggregation, distinct, or other operation.
  • TempFileSpace; This field shows the size of any temporary file utilization for the User Module (UM) in support of any UM join, group by, aggregation, distinct, or other operation.
  • ApproxPhyI/O; This field shows any reads from storage for the query. Under some circumstances it may vary slightly from actual, typically less than 1/10th of 1 percent.
  • CacheI/O; This field shows block touches required for the query, reduced by the number of discrete Physical I/O reads requested.
  • BlocksTouched; This field shows block touches required for the query.
  • PartitionBlocksEliminated; This field shows blocks that were avoided based on the partition elimination that took place based on the Extent Map min/max values. Note that this does not report I/O reduction from column storage or I/O reduction from deferring column reads until after filters have been applied.
  • MsgBytesIn; A measure of process for data movement in the server.
  • MsgBytesOut; A measure of process for data outside the server.
  • Mode-Distributed; An indicator whether the query join processing was handled within InfiniDB, or by traditional MySQL join functionality.  This should be Distributed for best performance against big data.

Now, let’s look at InfiniDB’s SQL tracing utility.  You basically set the trace, execute the query, and examine the trace output. I’ll flush the cache again to ensure we have some physical I/O to report:

mysql> select calflushcache();                                                                              

| calflushcache() |
+-----------------+
|               0 |
+-----------------+
1 row in set (0.20 sec)

mysql> select calsettrace(1);
+----------------+
| calsettrace(1) |
+----------------+
|              0 |
+----------------+
1 row in set (0.00 sec)
mysql> select c_name, sum(l_quantity)
    ->      from customer, orders, lineitem
    ->      where c_custkey = o_custkey and
    ->      l_orderkey = o_orderkey and
    ->      l_shipdate = '1992-01-02'
    ->      group by c_name;
.
.
17 rows in set (32.44 sec)

mysql> select calgettrace()G
*************************** 1. row ***************************
calgettrace():
Desc Mode Table    TableOID ReferencedOIDs   PIO     LIO     PBE Elapsed Rows
BPS  PM   customer 3099     (3100,3108)      593     603     0   0.447   150000
BPS  PM   lineitem 3113     (3114,3124,3118) 1182600 1177341 0   31.926  3417
BPS  PM   orders   3140     (3142,3141)      1476    1462    0   0.320   17
HJS  PM   orders   3140     -                -       -       -   0.000   -
ADS  UM   -        -        -                -       -       -   0.062   -
1 row in set (0.01 sec)

The output from the calgettrace() shows the steps InfiniDB used to satisfy the query and includes the following stats:

  • Desc – Operation being executed (BPS=batch primitive step, HJS=hash join step, etc.)
  • Mode – Whether executed within the User Module or the Performance Module.
    Table – Table being used.
  • TableOID – ObjectID for the table being scanned.
  • ReferencedOIDs – ObjectIDs for the columns required by the query.
  • PIO – Physical I/O executed for the query.
  • LIO – Logical I/O executed for the query, also known as Blocks Touched.
  • PBE – Partition Blocks Eliminated identifies blocks eliminated by Extent Map min/max.
  • Elapsed – Elapsed time for a give step.
  • Rows – Intermediate rows returned

So both getcalstats and calgettrace offer you some more data when troubleshooting slow running SQL.  What else would you like to have?  Drop us a line at www.infinidb.org and let us know.