Magnus' database blog: Exadata Smart Flash Table Caching

This feature is probably my favorite new feature in 11.2.3.3, because it allows Exadata to do smart scans of segments in flash. The examples below are run on a quarter rack X3-2 HC with storage sw version 12.1.1 and db version 11.2.0.4.

Here I have a simple query joining one small table of 19MB via a hash join with a larger table of ~150GB, with some filtering and a few aggregations.

SQL> explain plan for select /*+ monitoring*/ sum(extended_amt), count(extended_amt) from v_demo 
where sku_item_name like 'Pampers%' and (bsns_unit_key > 2012057 and bsns_unit_key < 2021070)

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 518373028
---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                               |     1 |    42 | 69534   (1)| 00:11:36 |       |       |        |      |            |
|   1 |  SORT AGGREGATE                   |                               |     1 |    42 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR                  |                               |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)            | :TQ10001                      |     1 |    42 |            |          |       |       |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                |                               |     1 |    42 |            |          |       |       |  Q1,01 | PCWP |            |
|*  5 |      HASH JOIN                    |                               |  4668 |   191K| 69534   (1)| 00:11:36 |       |       |  Q1,01 | PCWP |            |
|   6 |       PX RECEIVE                  |                               |     1 |    26 |     6   (0)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   7 |        PX SEND BROADCAST          | :TQ10000                      |     1 |    26 |     6   (0)| 00:00:01 |       |       |  Q1,00 | P->P | BROADCAST  |
|   8 |         PX BLOCK ITERATOR         |                               |     1 |    26 |     6   (0)| 00:00:01 |       |       |  Q1,00 | PCWC |            |
|*  9 |          TABLE ACCESS STORAGE FULL| DWR_SKU_ITEM                  |     1 |    26 |     6   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|  10 |       PX BLOCK ITERATOR           |                               |   207M|  3172M| 69515   (1)| 00:11:36 |     1 |   128 |  Q1,01 | PCWC |            |
|* 11 |        TABLE ACCESS STORAGE FULL  | R_DWB_RTL_SLS_RETRN_LINE_ITEM |   207M|  3172M| 69515   (1)| 00:11:36 |     1 |   128 |  Q1,01 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("A"."SKU_ITEM_KEY"="B"."SKU_ITEM_KEY")
   9 - storage("SKU_ITEM_NAME" LIKE 'Pampers%')
       filter("SKU_ITEM_NAME" LIKE 'Pampers%')
  11 - storage("BSNS_UNIT_KEY"<2021070 AND "BSNS_UNIT_KEY">2012057 AND SYS_OP_BLOOM_FILTER(:BF0000,"A"."SKU_ITEM_KEY"))
       filter("BSNS_UNIT_KEY"<2021070 AND "BSNS_UNIT_KEY">2012057 AND SYS_OP_BLOOM_FILTER(:BF0000,"A"."SKU_ITEM_KEY"))

First I'll show an execution from Exadata storage sw version 11.2.3.2.1

SQL> @snap "select /*+ monitoring*/ sum(extended_amt), count(extended_amt) from v_demo 
where sku_item_name like 'Pampers%' and (bsns_unit_key > 2012057 and bsns_unit_key < 2021070)"

SUM(EXTENDED_AMT) COUNT(EXTENDED_AMT)
----------------- -------------------
         47905.43               31471

Elapsed: 00:00:06.27

NAME
---------------------------------------------------------------------------------------------------------------------------------------------------------
-- ExaSnapper v0.7 BETA by Tanel Poder @ Enkitec - The Exadata Experts ( http://www.enkitec.com )
---------------------------------------------------------------------------------------------------------------------------------------------------------

DB_LAYER_IO                    DB_PHYSIO_BYTES               |##################################################|         145469 MB          20708 MB/sec
DB_LAYER_IO                    DB_PHYSRD_BYTES               |##################################################|         145469 MB          20708 MB/sec
DB_LAYER_IO                    DB_PHYSWR_BYTES               |                                                  |              0 MB              0 MB/sec
AVOID_DISK_IO                  PHYRD_FLASH_RD_BYTES          |                                                  |              0 MB              0 MB/sec
AVOID_DISK_IO                  PHYRD_STORIDX_SAVED_BYTES     |#############################################     |         129787 MB          18476 MB/sec
REAL_DISK_IO                   SPIN_DISK_IO_BYTES            |#####                                             |          15682 MB           2232 MB/sec
REAL_DISK_IO                   SPIN_DISK_RD_BYTES            |#####                                             |          15682 MB           2232 MB/sec
REAL_DISK_IO                   SPIN_DISK_WR_BYTES            |                                                  |              0 MB              0 MB/sec
REDUCE_INTERCONNECT            PRED_OFFLOADABLE_BYTES        |##################################################|         145469 MB          20708 MB/sec
REDUCE_INTERCONNECT            TOTAL_IC_BYTES                |                                                  |            577 MB             82 MB/sec
REDUCE_INTERCONNECT            SMART_SCAN_RET_BYTES          |                                                  |            577 MB             82 MB/sec
REDUCE_INTERCONNECT            NON_SMART_SCAN_BYTES          |                                                  |              0 MB              0 MB/sec
CELL_PROC_DEPTH                CELL_PROC_DATA_BYTES          |#####                                             |          15685 MB           2233 MB/sec
CELL_PROC_DEPTH                CELL_PROC_INDEX_BYTES         |                                                  |              0 MB              0 MB/sec
CLIENT_COMMUNICATION           NET_TO_CLIENT_BYTES           |                                                  |              0 MB              0 MB/sec
CLIENT_COMMUNICATION           NET_FROM_CLIENT_BYTES         |                                                  |              0 MB              0 MB/sec

Here I'm using Tanel Poders Exasnapper to show a detailed view of where the I/O is done. In total the database should read ~145GB. 129GB is saved through the storage indexes and 15GB is read from the traditional HDD in the Exadata. The last column with MB/s is pretty correct here for the HDDs, we are reading 2232MB/s. A quarter rack is able to achieve faster throughput from disk, the reason why this is not higher is probably because it is not completely sequential I/O due to the storage indexes.

Next I'm running the same query on storage sw version 12.1.1.1

SQL> @snap "select /*+ monitoring*/ sum(extended_amt), count(extended_amt) from v_demo 
where sku_item_name like 'Pampers%' and (bsns_unit_key > 2012057 and bsns_unit_key < 2021070)";

SUM(EXTENDED_AMT) COUNT(EXTENDED_AMT)
----------------- -------------------
         47905.43               31471

Elapsed: 00:00:01.69

NAME
---------------------------------------------------------------------------------------------------------------------------------------------------------
-- ExaSnapper v0.7 BETA by Tanel Poder @ Enkitec - The Exadata Experts ( http://www.enkitec.com )
---------------------------------------------------------------------------------------------------------------------------------------------------------

DB_LAYER_IO                    DB_PHYSIO_BYTES               |##################################################|         145469 MB          67812 MB/sec
DB_LAYER_IO                    DB_PHYSRD_BYTES               |##################################################|         145469 MB          67812 MB/sec
DB_LAYER_IO                    DB_PHYSWR_BYTES               |                                                  |              0 MB              0 MB/sec
AVOID_DISK_IO                  PHYRD_FLASH_RD_BYTES          |#####                                             |          15212 MB           7091 MB/sec
AVOID_DISK_IO                  PHYRD_STORIDX_SAVED_BYTES     |#############################################     |         129790 MB          60503 MB/sec
REAL_DISK_IO                   SPIN_DISK_IO_BYTES            |                                                  |            467 MB            218 MB/sec
REAL_DISK_IO                   SPIN_DISK_RD_BYTES            |                                                  |            467 MB            218 MB/sec
REAL_DISK_IO                   SPIN_DISK_WR_BYTES            |                                                  |              0 MB              0 MB/sec
REDUCE_INTERCONNECT            PRED_OFFLOADABLE_BYTES        |##################################################|         145469 MB          67812 MB/sec
REDUCE_INTERCONNECT            TOTAL_IC_BYTES                |#                                                 |           1742 MB            812 MB/sec
REDUCE_INTERCONNECT            SMART_SCAN_RET_BYTES          |#                                                 |           1742 MB            812 MB/sec
REDUCE_INTERCONNECT            NON_SMART_SCAN_BYTES          |                                                  |              0 MB              0 MB/sec
CELL_PROC_DEPTH                CELL_PROC_DATA_BYTES          |#####                                             |          14523 MB           6770 MB/sec
CELL_PROC_DEPTH                CELL_PROC_INDEX_BYTES         |                                                  |              0 MB              0 MB/sec
CLIENT_COMMUNICATION           NET_TO_CLIENT_BYTES           |                                                  |              0 MB              0 MB/sec
CLIENT_COMMUNICATION           NET_FROM_CLIENT_BYTES         |                                                  |              0 MB              0 MB/sec

Elapsed time dropped from 6,27 seconds to 1,69 second, and now all I/O is done on flash as we can see from PHYRD_FLASH_RD_BYTES.

Earlier I had to use the alter table x storage(cell_flash_cache keep) to get smart scans to run from flash, now it just works! I have no experience with this in a real production system, if anyone has any real world experience with this please leave a comment!

Magnus' database blog

Wednesday, February 12, 2014

Exadata Smart Flash Table Caching

No comments:

Post a Comment

Blog Archive