Wednesday, June 18, 2014

Easy Way to Collect QVD Stats

 

Collecting Stats Helps Debugging and Alerts You to Problems

Most organizations will use QVD Generators to build out QVD data layers suitable for QlikView’s consumption.  There is generally no need to create any visualizations in these QVD Generators except for maybe some validation tables and some QVD statistics.  Today we will focus on an easy way to gather those statistics.  The statistics I generally gather help me answer questions regarding the output QVDs.
 
·         Which QVDs were written with this application?
·         How long did it take to create this QVD?
·         How many rows were written?
·         How many fields were in each table?
Obviously, being able to easily answer these questions helps us better understand and serve the BI enterprise.  It also provides a helpful tool for debugging logical errors in our QlikView scripts.
And because we actually take this statistics table and write it to QVD as well, we can use it in later loads as needed to create a data source lineage for a particular application.  Another benefit of creating this statistics QVD(s) is that I can drop all the tables that created the QVDs while still retaining the statistics.  This leaves the application light and quick to save.
Here is an example of the front end to show you what kind of data will be collected and displayed.  Of course, the stats you collect and present are really up to you, the developer.
 
 

How to set up the collection

I use blocks of code that can be basically put into any script and adjusted as needed.  We basically have four chunks of code that get deposited into our script.
1.       Set up any needed global variables and set up the statistics table structure.
2.       Set up a sub-routine that creates the stat table.
3.       Change the start time and table name each time we go to a new QVD.
4.       Call the subroutine.
 

Set up any needed global variables and set up the statistics table structure

First, create any variables that will need to be collected in your statistics or displayed in your front end.  For this example, we are using a variable to determine my directory, the name of my environment, the source, division and the path for my statistics QVD.  These two chunks of script should be inserted towards the beginning of the script before any loading occurs.
//dev
set vDir = 'C:\Users\Aaron\Documents\QlikView Apps\Blog Apps\Statistics Collection';
set vEnv = 'Dev';
//prod
//set vDir = 'D:\QlikView Documents\Production\Sales';

//   Set variables for qvd naming
set vDivision=  'QAP';
set vSource =   'BAAN';

LET StatisticsPath = '$(vDir)\ETL_Statistics_$(vDivision)_$(vSource).qvd';
 
We also will look for an existing statistics QVD and if one is not found, we will create it.
IF FileSize('$(StatisticsPath)') > 0 THEN //  The Statistics QVD already exists.
ELSE // The Statistics QVD needs to be created

Statistics:
LOAD * INLINE [
    TableName, Division, Source, ReloadedDate, StartTime, EndTime, Duration, Records, Fields
    , ]
;

STORE Statistics INTO $(StatisticsPath)(qvd);
DROP TABLE Statistics;
END IF  
 
 

Set up a sub-routine that creates the stat table

Next we create a subroutine that appends the statistics each time we load new data into a QVD.  This script records all of the needed data from whatever load has just occurred, adds it to the existing statistics records and then saves the whole thing back to QVD.  This script gets inserted just below the prior set up stuff and will be called at various times later in the script.
sub Stats

    
//   Call this script between qvd store and drop table commands to gather statistics for the qvd. 
     //   There must be a vStartTime and a vTable variable
     LET vEndTime = NOW();
    
LET vRecords = NoOfRows('$(vTable)');
    
LET vFields  = NoOfFields('$(vTable)');
    
     Statistics:
    
LOAD * FROM $(StatisticsPath)(qvd)   WHERE TableName <> '$(vTable)';
    
JOIN (Statistics)
    
LOAD DISTINCT '$(vTable)' AS TableName, '$(vDivision)' AS Division, '$(vSource)' as Source, DATE(NOW()) AS ReloadedDate, '$(vStartTime)' AS StartTime, '$(vEndTime)' AS EndTime,
          
INTERVAL(NUM('$(vEndTime)') - NUM('$(vStartTime)'),'hh:mm:ss') AS Duration, $(vRecords) AS Records, $(vFields) as Fields
    
RESIDENT Statistics;
    
STORE Statistics INTO $(StatisticsPath)(qvd);
    
DROP TABLE Statistics;

end sub 
 
 

Change the start time and table name each time we go to a new QVD

So now we go about the business of loading data from our data sources and creating QVDs.  The basic idea is that we will need to establish some variables at the beginning of each load and then run the subroutine at the end of each load.  Using this approach we can collect statistics for every QVD that gets created.  Start by adding this script to the top of each data loading section.  Note the vTable variable could also be used within your load statements to name your table.
//   Append this script to the start of each qvd creation.
LET vStartTime = NOW();
SET vTable = 'ETL_Sales_Header'; 
 
 

Call the subroutine

Finally, we can call the statistics subroutine that will record our ending variables and append the QVD.  This simple code will run the routine we created in the second step.
//   call subroutine to gather statistics for the qvd.
call Stats;
Place this code after we store each data QVD.  One thing to note is that you should wait to drop the data QVD until after this subroutine runs or simply drop them all at the end of your script.
 

Thoughts?

There are many ways we can simplify the data we are collecting or expand the data.  Hopefully this structure is clear and economical enough for you to adjust to your needs.  The example I provided is simply what I use to complete this task and I tweak the code depending on the situation.
As always, I hope you find this valuable.  Please let me know what you think.  And if you accomplish statistics collection differently or can improve upon this, please participate in the discussion.

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. BTW, Richard Pearce also has a great take on this concept over at QlikCentral http://qlikcentral.com/2014/03/25/audit-logging/

    ReplyDelete