Step-By-Step for Running DBD from CLI in Vertica 7.1

The steps and examples have been borrowed from Vertica Documentation!
I am just walking through learning and sharing my experience… No Rights claimed here.
In this DEMO I’ll create two tables (which can be joined and queried) with FK relationship and feed a sample query to DBD to suggest and deploy the required projections. Let us start.
Create a table to hold data with different data distribution, including cardinality. The example uses all integer data types.

dbadmin=> CREATE TABLE T( 
dbadmin(> x INT, 
dbadmin(> y INT, 
dbadmin(> z INT, 
dbadmin(> u INT, 
dbadmin(> v INT, 
dbadmin(> w INT PRIMARY KEY 
dbadmin(> ); 
CREATE TABLE 
dbadmin=> \dt 
 List of tables 
 Schema | Name | Kind | Owner | Comment 
--------+------+-------+---------+--------- 
 public | T | table | dbadmin | 
(1 row)
dbadmin=>

Now let us load about 100,000 rows. Note the cardinality and data distribution for the columns/data.

dbadmin=> \! perl -e 'for ($i=0; $i<100000; ++$i) {printf("%d, %d, %d, %d, %d, %d\n", $i/10000, $i/100, $i/10, $i/2, $i, $i);}' | vsql -c "COPY T FROM STDIN DELIMITER ',' DIRECT;"
Password:
dbadmin=> select count(1) from t;
count
--------
100000
(1 row)

dbadmin=>

This script “prints” 100000 data records to STDIN and pipes it to COPY command. This is executed from within vsql environment.

Now let us create a second table and load it.

dbadmin=> CREATE TABLE T2(
dbadmin(> x INT,
dbadmin(> y INT,
dbadmin(> z INT,
dbadmin(> u INT,
dbadmin(> v INT,
dbadmin(> w INT PRIMARY KEY
dbadmin(> );
CREATE TABLE
Time: First fetch (0 rows): 11.924 ms. All rows formatted: 11.967 ms
dbadmin=> INSERT /*+DIRECT*/ INTO T2 SELECT * FROM T;
OUTPUT
--------
100000
(1 row)

Time: First fetch (1 row): 77.560 ms. All rows formatted: 77.600 ms
dbadmin=> COMMIT;
COMMIT
Time: First fetch (0 rows): 5.196 ms. All rows formatted: 5.254 ms
dbadmin=> select count(1) from t2;
count
--------
100000
(1 row)

Time: First fetch (1 row): 9.055 ms. All rows formatted: 9.149 ms
dbadmin=>

Some of the queries we would be issuing on these tables are as follow.

SELECT DISTINCT T2.u FROM T JOIN T2 ON T.z=T2.z-1 WHERE T2.u > 0;
SELECT DISTINCT w FROM T;
SELECT count(distinct t.u) , t2.v from t join t2 on t.w=t2.w group by t2.v;

Let us run them with query optimizing projection and time them.
For the test, let us turn the timing on and pipe the output to /dev/null

dbadmin=>
dbadmin=> \timing
Timing is on.
dbadmin=> \o /dev/null
dbadmin=>

Let us run the query for timing.

dbadmin=>
dbadmin=> SELECT DISTINCT T2.u FROM T JOIN T2 ON T.z=T2.z-1 WHERE T2.u > 0;
Time: First fetch (1000 rows): 59.334 ms. All rows formatted: 94.765 ms
dbadmin=> SELECT DISTINCT w FROM T;
Time: First fetch (1000 rows): 6.149 ms. All rows formatted: 58.125 ms
dbadmin=> SELECT count(distinct t.u) , t2.v from t join t2 on t.w=t2.w group by t2.v;
Time: First fetch (1000 rows): 166.229 ms. All rows formatted: 277.410 ms
dbadmin=>

Note the timing for first 1000 rows (default limit) and estimated time for all rows.

Now we’ll run the queries through DBD and create suggested projects for optimization.
Create a named design.

SELECT DESIGNER_CREATE_DESIGN('my_design');

Add the tables to the design

SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t');
SELECT DESIGNER_ADD_DESIGN_TABLES('my_design', 'public.t2');

Now we’ll save our queries into text file, which we will provide as input to DBD. In the vi editor I pasted the queries into the file and saved.

dbadmin=> \! vi pwd
dbadmin=> \!pwd
/home/dbadmin
dbadmin=> \! vi dbd_input.txt
dbadmin=>

Add the query files to our design

dbadmin=> SELECT DESIGNER_ADD_DESIGN_QUERIES ('my_design',
dbadmin(> '/home/dbadmin/dbd_input.txt','true');
Time: First fetch (1 row): 543.436 ms. All rows formatted: 543.481 ms
dbadmin=>

DESIGNER_ADD_DESIGN_QUERIES populates DESIGN_QUERIES SYSTEM Table in v_monitor. Let us check the system table.

dbadmin=> \d v_monitor.design_queries;
List of Fields by Tables
Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
-----------+----------------+--------------------------+----------------+-------+---------+----------+-------------+-------------
v_monitor | design_queries | design_id | int | 8 | | f | f |
v_monitor | design_queries | design_name | varchar(128) | 128 | | f | f |
v_monitor | design_queries | design_query_id | int | 8 | | f | f |
v_monitor | design_queries | design_query_id_index | int | 8 | | f | f |
v_monitor | design_queries | design_query_search_path | varchar(65000) | 65000 | | f | f |
v_monitor | design_queries | design_query_signature | int | 8 | | f | f |
v_monitor | design_queries | query_text | varchar(65000) | 65000 | | f | f |
v_monitor | design_queries | weight | float | 8 | | f | f |
(8 rows)

dbadmin=>
Three queries we submitted to the designer are available in DESIGN_QUERIES.
dbadmin=> select design_id, design_name, query_text from v_monitor.design_queries;
design_id | design_name | query_text
-------------------+-------------+------------------------------------------------------------------------------
45035996273711966 | my_design | SELECT DISTINCT T2.u FROM T JOIN T2 ON T.z=T2.z-1 WHERE T2.u > 0;

45035996273711966 | my_design | SELECT DISTINCT w FROM T;

45035996273711966 | my_design | SELECT count(distinct t.u) , t2.v from t join t2 on t.w=t2.w group by t2.v;

(3 rows)

dbadmin=>

At this point we can set the design type and objective and invoke the designer.

dbadmin=> SELECT DESIGNER_SET_DESIGN_TYPE('my_design', 'comprehensive');
DESIGNER_SET_DESIGN_TYPE
--------------------------
0
(1 row)

dbadmin=> SELECT DESIGNER_SET_OPTIMIZATION_OBJECTIVE('my_design', 'query');
DESIGNER_SET_OPTIMIZATION_OBJECTIVE
-------------------------------------
0
(1 row)

dbadmin=>

Run DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY function to create projection scripts. Note that the projections are not deployed automatically.

dbadmin=> SELECT DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY
dbadmin-> ('my_design',
dbadmin(> '/home/dbadmin/my_design_projections.sql',
dbadmin(> '/home/dbadmin/my_design_deploy.sql',
dbadmin(> 'True',
dbadmin(> 'False',
dbadmin(> 'False',
dbadmin(> 'False'
dbadmin(> );
DESIGNER_RUN_POPULATE_DESIGN_AND_DEPLOY
-----------------------------------------
0
(1 row)

dbadmin=> \q
dbadmin@sudhakar-Latitude-E6420:~$ pwd
/home/dbadmin
dbadmin@sudhakar-Latitude-E6420:~$ ll
total 96
drwx------ 7 dbadmin verticadba 4096 Nov 23 09:58 ./
drwxr-xr-x 5 root root 4096 Nov 16 14:06 ../
-rw------- 1 dbadmin verticadba 178 Nov 20 12:30 .bash_history
-rw-r--r-- 1 dbadmin verticadba 220 Mar 30 2013 .bash_logout
-rw-r--r-- 1 dbadmin verticadba 3696 Nov 16 14:06 .bashrc
drwx------ 2 dbadmin verticadba 4096 Nov 16 15:34 .cache/
drwxr-x--- 2 dbadmin verticadba 20480 Nov 17 15:18 DataCollector/
-rw-r--r-- 1 dbadmin verticadba 169 Nov 19 21:15 dbd_input.txt
-rwx------ 1 dbadmin verticadba 0 Nov 17 15:18 ErrorReport.txt*
-rw-r--r-- 1 dbadmin verticadba 8980 Oct 4 2013 examples.desktop
-rw-rw-rw- 1 dbadmin verticadba 1069 Nov 23 09:58 my_design_deploy.sql
-rw-rw-rw- 1 dbadmin verticadba 945 Nov 23 09:58 my_design_projections.sql
-rw-r--r-- 1 dbadmin verticadba 704 Nov 16 15:35 .profile
drwxr-xr-x 3 dbadmin verticadba 4096 Nov 16 16:16 .python-eggs/
drwxr-xr-x 2 dbadmin verticadba 4096 Nov 19 15:36 .rpmdb/
drwx------ 2 dbadmin verticadba 4096 Nov 16 15:51 .ssh/
-rw-r--r-- 1 dbadmin verticadba 0 Nov 17 15:18 startup.log
-rw------- 1 dbadmin verticadba 1192 Nov 19 21:24 .viminfo
-rw-r--r-- 1 dbadmin verticadba 5462 Nov 23 09:58 .vsql_history
dbadmin@sudhakar-Latitude-E6420:~$

This command adds information to the following system tables:
DEPLOYMENT_PROJECTION_STATEMENTS
DEPLOYMENT_PROJECTIONS
OUTPUT_DEPLOYMENT_STATUS

The function created two files that define suggested projections and script for deploying the projection. The projections file is a subset of deploy file.
Let us check the status of projections.

dbadmin=> \x
Expanded display is on.
dbadmin=> select * FROM OUTPUT_DEPLOYMENT_STATUS;
-[ RECORD 1 ]--------------+-----------------------
deployment_id | 45035996273727982
design_name | my_design
deployment_projection_id | 1
deployment_projection_name | T_DBD_1_rep_my_design
deployment_status | pending
error_message | N/A
-[ RECORD 2 ]--------------+-----------------------
deployment_id | 45035996273727982
design_name | my_design
deployment_projection_id | 2
deployment_projection_name | T2_DBD_2_rep_my_design
deployment_status | pending
error_message | N/A
-[ RECORD 3 ]--------------+-----------------------
deployment_id | 45035996273727982
design_name | my_design
deployment_projection_id | 3
deployment_projection_name | T2_DBD_3_rep_my_design
deployment_status | pending
error_message | N/A
-[ RECORD 4 ]--------------+-----------------------
deployment_id | 45035996273727982
design_name | my_design
deployment_projection_id | 4
deployment_projection_name | T_super
deployment_status | pending
error_message | N/A
-[ RECORD 5 ]--------------+-----------------------
deployment_id | 45035996273727982
design_name | my_design
deployment_projection_id | 5
deployment_projection_name | T2_super
deployment_status | pending
error_message | N/A

dbadmin=>

Now we’ll run the deploy script to deploy the projections.

dbadmin=> \i /home/dbadmin/my_design_deploy.sql
vsql:/home/dbadmin/my_design_deploy.sql:21: WARNING 4468: Projection <public.T_DBD_1_rep_my_design_node0001> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
vsql:/home/dbadmin/my_design_deploy.sql:41: WARNING 4468: Projection <public.T2_DBD_2_rep_my_design_node0001> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
vsql:/home/dbadmin/my_design_deploy.sql:54: WARNING 4468: Projection <public.T2_DBD_3_rep_my_design_node0001> is not available for query processing. Execute the select start_refresh() function to copy data into this projection.
The projection must have a sufficient number of buddy projections and all nodes must be up before starting a refresh
CREATE PROJECTION
-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
refresh | Refresh completed with the following outcomes:
Projection Name: [Anchor Table] [Status] [Refresh Method] [Error Count] [Duration (sec)]
----------------------------------------------------------------------------------------
"public"."T_DBD_1_rep_my_design_node0001": [T] [refreshed] [scratch] [0] [0]
"public"."T2_DBD_3_rep_my_design_node0001": [T2] [refreshed] [scratch] [0] [0]
"public"."T2_DBD_2_rep_my_design_node0001": [T2] [refreshed] [scratch] [0] [0]
-[ RECORD 1 ]+-----------------------------
make_ahm_now | AHM set (New AHM Epoch: 196)

DROP PROJECTION
DROP PROJECTION
dbadmin=>

Now we can explain the queries to see which projections are being used.

QUERY PLAN|explain SELECT count(distinct t.u) , t2.v from t join t2 on t.w=t2.w group by t2.v;
QUERY PLAN|
QUERY PLAN|Access Path:
QUERY PLAN|+-GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 2K, Rows: 100K] (PATH ID: 1)
QUERY PLAN|| Aggregates: count(DISTINCT t.u)
QUERY PLAN|| Group By: t2.v
QUERY PLAN|| +---> GROUPBY HASH (LOCAL RESEGMENT GROUPS) [Cost: 1K, Rows: 100K] (PATH ID: 2)
QUERY PLAN|| | Group By: t2.v, t.u
QUERY PLAN|| | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 1K, Rows: 100K] (PATH ID: 3)
QUERY PLAN|| | | Join Cond: (t.w = t2.w)
QUERY PLAN|| | | Materialize at Output: t.u
QUERY PLAN|| | | +-- Outer -> STORAGE ACCESS for t [Cost: 248, Rows: 100K] (PATH ID: 4)
QUERY PLAN|| | | | Projection: public.T_DBD_1_rep_my_design_node0001
QUERY PLAN|| | | | Materialize: t.w
QUERY PLAN|| | | | Runtime Filter: (SIP1(MergeJoin): t.w)
QUERY PLAN|| | | +-- Inner -> STORAGE ACCESS for t2 [Cost: 496, Rows: 100K] (PATH ID: 5)
QUERY PLAN|| | | | Projection: public.T2_DBD_2_rep_my_design_node0001
QUERY PLAN|| | | | Materialize: t2.w, t2.v

We can see the queries are using the new projections that were created.
HOWEVER… Please be VERY careful in adding new projections. First review, understand and TEST the impact of new projections. They can adversely affect performance of “other queries”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s