SQL Server Statistics Maintenance Job

Posted at Tuesday, May 20, 2008
I use the following job to augment SQL Server's auto statistics create/update functionality:

create_update_statistics.sql

sp_MSforeachdb @command1='
USE ?;
IF NOT ("?" IN ("model","msdb","master","tempdb"))
BEGIN
EXEC sp_createstats;
DECLARE @tbls TABLE (rowId INT IDENTITY(1,1), tblschema NVARCHAR(128), tblname NVARCHAR(128));
DECLARE @stmt NVARCHAR(2000), @rowId INT, @maxRowId INT, @tblschema NVARCHAR(128), @tblname NVARCHAR(128);
INSERT INTO @tbls (tblschema, tblname)
SELECT table_schema, table_name FROM INFORMATION_SCHEMA.TABLES WHERE table_type="BASE TABLE";
SELECT @rowId = MIN(rowId), @maxRowId = MAX(rowId) FROM @tbls;
WHILE @rowId <= @maxRowId
BEGIN
SELECT @tblschema = tblschema, @tblname = tblname FROM @tbls WHERE rowId = @rowId;
SET @stmt = "UPDATE STATISTICS " + "[" + @tblschema + "].[" + @tblname + "];";
PRINT "?: " + @stmt;
EXECUTE (@stmt);
SET @rowId = @rowId + 1;
END
END
';

You can adjust this to fit your needs and implement as part of your Maintenance Plans and/or run from a batch file, such as:

create_update_statistics.bat

sqlcmd -S PRIVATE-E3A52F5\SS1 -E -i create_update_statistics.sql -o create_update_statistics.out
notepad.exe create_update_statistics.out


Labels: ,

Custom Statistics Gathering Package for 9i, 10g

Posted at Tuesday, November 20, 2007
In a previous post, I mentioned that 11g adds some great functionality which
makes using Oracle's auto statistics gathering job a much more attractive proposition. Specifically, you can set statistics-related parameters at the table-level as well as at the global level.

However, most shops are still on some combination of 9i and 10g.

For the aforementioned versions, I take the following general approach to statistics:

1.) Disable Oracle's default auto stats DBMS_SCHEDULER job, 'GATHER_STATS_JOB'

2.) Enable Monitoring on all tables (in 10g this is automatically done when
STATISTICS_LEVEL is 'TYPICAL' or greater)

3.) Install (and customize as appropriate) a PL/SQL package which is responsible for the periodic gathering of statistics for the instance. I have included the base package below. Depending upon the environment, I will tweak things such as DEGREE and ESTIMATE_PERCENT. Additionally, I may freeze the stats on staging tables and, when warranted, gather additional histograms for certain tables.

4.) Schedule a daily call to dba_stats_maint_pkg.gather_stats, ideally after any batch programs and/or ETL routines; in other words, after data changes. This procedure updates empty statistics and any segments that have changed more than a certain percentage (by default the package is coded to analyze once 5% of the data has changed, although this can be customized).

Description of the base DBA_STATS_MAINT_PKG PL/SQL package is below:


PROGRAM UNITDESCRIPTION
delete_all_statsDeletes all statistics in the database, including fixed objects
disable_auto_stats_jobDisables Oracle's auto stats gathering job
disable_auto_space_jobDisables Oracle's auto segment space advisor job
gather_data_dict_statsGathers stats for the data dictionary. Applies to system users such as SYS, SYSTEM, DBSNMP, and OUTLN
gather_fixed_statsGathers fixed stats. This should be done after upgrades and when the profile of the system changes substantially
gather_statsGathers optimizer stats for non-data dictionary objects. First gathers stats for empty segments and then gathers stats for segments that have had a lot of DML (by default 5%). This is the procedure you will want to modify for your specific environment
start_overA test procedure that deletes all statistics and then gathers all statistics. A good procedure to test your customization end-to-end


Source for my package can be found here: dba_stats_maint_pkg

Or, if you prefer, you can cut and paste the SQL script below:

GRANT CREATE SESSION TO &&user;
GRANT ANALYZE ANY TO &&user;
GRANT ANALYZE ANY DICTIONARY TO &&user;
GRANT SELECT ANY TABLE TO &&user;
GRANT SELECT ANY SEQUENCE TO &&user;
GRANT SELECT ANY DICTIONARY TO &&user;
GRANT EXECUTE ON DBMS_STATS TO &&user;
GRANT SELECT ON DBA_TAB_MODIFICATIONS TO &&user;
GRANT CREATE JOB TO &&user;
GRANT SCHEDULER_ADMIN TO &&user;
GRANT ALL ON SYS.GATHER_STATS_JOB TO &&user;
GRANT ALL ON SYS.AUTO_SPACE_ADVISOR_JOB TO &&user;

CREATE OR REPLACE PACKAGE &&user..dba_stats_maint_pkg
/* Package is intended for use when managing
the optimizer statistics for 9i and 10g
databases. Once on 11g, each tables' statistics
setting can be customized and used by Oracle's automatic
job

Assumptions:

For 10g, STATISTICS_LEVEL is set to a level
above basic, so that DBA_TAB_MODIFICATIONS
is populated (i.e. tables are monitored)

For 9i, relevant tables have their MONITORING
attribute set, so that DBA_TAB_MODIFICATIONS
view is populated

This package is either compiled as a privileged user
and/or it is compiled under someone else and
all applicable grants on system packages and
dynamic performance views are made

Has a dependency to dbms_lock.sleep procedure - this is assumed to be present

Explicit ANALYZE ANY grant has been made to DB_UTIL user
*/
AS
--Globals change factor setting
--This defines what % a segment must change by in order to update its statistics
gn_change_factor number(16,2) := 5.00;

--Removes all Statistic for the Database
PROCEDURE delete_all_stats;

--Procedure that disables the automatic gathering of stats
--since, until 10g, this 'canned' procedure is not adequate
--to gather statistics responsibly
PROCEDURE disable_auto_stats_job;

--Procedure that disables the automatic segment space advisor job
--The running of this job can use up a lot of resources, hence
--why many disable it
PROCEDURE disable_auto_space_job;

--Procedure gathers stats for SYSTEM and SYS schemas
--Basically a wrapped calls to the applicable DBMS_STATS procedure
PROCEDURE gather_data_dict_stats;

--Procedure gathers fixed object stats
--Whereas, data dictionary stats should be gathers constantly
--fixed object stats only need to be gathered after upgrades, or
--if the profile of the database's usage changes significantly

PROCEDURE gather_fixed_stats;

--Procedure that gathers non-fixed, non-locked table, index, partition, and subpartition statistics
--based upon whether there have been significant changes (i.e. DBA_TAB_MODIFICATIONS)
--and or whether stats are empty
--Package level defaults can be modified for your
PROCEDURE gather_stats;

--Procedure makes internal calls to first delete all stats
--and then gather them, including a gathering of system statistics
PROCEDURE start_over;
END dba_stats_maint_pkg;
/

SHOW ERRORS;



CREATE OR REPLACE PACKAGE BODY &&user..dba_stats_maint_pkg
/* Package is intended for use when managing
the optimizer statistics for 9i and 10g
databases. Once on 11g, each tables' statistics
setting can be customized and used by Oracle's automatic
job

Assumptions:

For 10g, STATISTICS_LEVEL is set to a level
above basic, so that DBA_TAB_MODIFICATIONS
is populated (i.e. tables are monitored)

For 9i, relevant tables have their MONITORING
attribute set, so that DBA_TAB_MODIFICATIONS
view is populated

This package is either compiled as a privileged user
and/or it is compiled under someone else and
all applicable grants on system packages and
dynamic performance views are made

*/
AS
--Removes all Statistic for the Database
PROCEDURE delete_all_stats
AS
BEGIN
DBMS_STATS.delete_dictionary_stats;
DBMS_STATS.delete_database_stats;
EXCEPTION
WHEN OTHERS
THEN
RAISE;
END;

--Procedure that disables the automatic gathering of stats
--since, until 10g, this 'canned' procedure is not adequate
--to gather statistics responsibly

PROCEDURE disable_auto_stats_job
AS
BEGIN
dbms_scheduler.disable('SYS.GATHER_STATS_JOB');
END;

--Procedure that disables the automatic segment space advisor job
--The running of this job can use up a lot of resources, hence
--why many disable it

PROCEDURE disable_auto_space_job
AS
BEGIN
dbms_scheduler.disable('SYS.AUTO_SPACE_ADVISOR_JOB');
END;

PROCEDURE gather_data_dict_stats
AS
BEGIN
DBMS_STATS.gather_dictionary_stats;
EXCEPTION
WHEN OTHERS
THEN
RAISE;
END;

PROCEDURE gather_fixed_stats
AS
BEGIN
DBMS_STATS.gather_fixed_objects_stats;
EXCEPTION
WHEN OTHERS
THEN
RAISE;
END;

PROCEDURE gather_stats
AS
/* Cursor with the names of all the non-partitioned, non-system, non-locked, non-iot overflow type segments
which don't currently have statistics */
lv_owner dba_tables.owner%TYPE;
lv_table_name dba_tables.table_name%TYPE;
lv_part_name dba_tab_partitions.partition_name%TYPE;
lv_change_factor NUMBER (16, 2);
lv_partitioned VARCHAR (10);

CURSOR lc_ns_np_nostats
IS
SELECT owner, table_name
FROM dba_tables
WHERE partitioned = 'NO'
AND NOT owner IN ('SYS','SYSTEM','OUTLN')
AND num_rows IS NULL
AND iot_type IS NULL
AND NOT (owner, table_name) IN (SELECT owner, table_name
FROM dba_tab_statistics
WHERE stattype_locked = 'ALL')
AND NOT (owner, table_name) IN (SELECT owner, table_name
FROM dba_external_tables);

CURSOR lc_ns_p_nostats
IS
SELECT dtp.table_owner, dtp.table_name, dtp.partition_name
FROM dba_tab_partitions dtp, dba_tables dt
WHERE dtp.num_rows IS NULL
AND dtp.table_name = dt.table_name
AND dtp.table_owner = dt.owner
AND dt.iot_type IS NULL
AND NOT dtp.table_owner IN ('SYS','SYSTEM','OUTLN')
AND NOT (dtp.table_owner, dtp.table_name) IN (
SELECT owner, table_name
FROM dba_tab_statistics
WHERE stattype_locked =
'ALL')
AND NOT (dtp.table_owner, dtp.table_name) IN (SELECT owner, table_name
FROM dba_external_tables);

CURSOR lc_ss
IS
SELECT dtm.table_owner, dtm.table_name, dtm.partition_name,
ROUND ( (dtm.inserts + dtm.updates + dtm.deletes)
/ dt.num_rows,
2
)
* 100 "CHANGE_FACTOR",
dt.partitioned
FROM sys.dba_tab_modifications dtm, dba_tables dt
WHERE dtm.table_owner = dt.owner
AND dtm.table_name = dt.table_name
AND NOT dtm.table_owner IN ('SYS','SYSTEM','OUTLN')
AND NOT dt.num_rows IS NULL
AND iot_type IS NULL
AND ( (dt.partitioned = 'YES' AND NOT dtm.partition_name IS NULL
)
OR (dt.partitioned = 'NO' AND dtm.partition_name IS NULL)
)
AND NOT (dtm.table_owner, dtm.table_name) IN (
SELECT dts.owner, dts.table_name
FROM dba_tab_statistics dts
WHERE dts.stattype_locked =
'ALL')
AND NOT (dtm.table_owner, dtm.table_name) IN (
SELECT det.owner,
det.table_name
FROM dba_external_tables det);
BEGIN
/**************************BEGIN EMPTYSTATS************************************/

-- First process the cursor above; alternatively, you could
-- do something similar via a call to DBMS_STATS with the 'GATHER EMPTY' option

OPEN lc_ns_np_nostats; -- open the cursor before fetching

LOOP
FETCH lc_ns_np_nostats
INTO lv_owner, lv_table_name; -- fetches 2 columns into variables

EXIT WHEN lc_ns_np_nostats%NOTFOUND;
-- Call stats package
DBMS_OUTPUT.put_line ( 'Gathering Stats for Table and Indexes of '
|| lv_owner
|| '.'
|| lv_table_name
|| ' because they are empty...'
);
-- For these tables we will use most of the oracle defaults
-- We are assuming that partitioned tables are to be treated differently in terms
-- of sample size, parallelism degree and the like
--trap any errors by placing in its own block
BEGIN
DBMS_STATS.gather_table_stats
(ownname => lv_owner,
tabname => lv_table_name,
granularity => 'AUTO',
method_opt => 'for all columns',
CASCADE => DBMS_STATS.auto_cascade,
estimate_percent => DBMS_STATS.auto_sample_size
);
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.put_line (SQLCODE || ' - ' || SQLERRM);
END;
END LOOP;

CLOSE lc_ns_np_nostats;

-- Next we process table partitions whose statistics are null/empty
-- We will use smaller samples and parallelism when appropriate

OPEN lc_ns_p_nostats; -- open the cursor before fetching

LOOP
FETCH lc_ns_p_nostats
INTO lv_owner, lv_table_name, lv_part_name;

-- fetches 3 columns into variables
EXIT WHEN lc_ns_p_nostats%NOTFOUND;
-- Call stats package
DBMS_OUTPUT.put_line ( 'Gathering Stats for Partition '
|| lv_owner
|| '.'
|| lv_table_name
|| '.'
|| lv_part_name
|| ' because they are empty...'
);

-- For these partitioned tables we will use smaller sample sizes
-- Trap Errors in anonymous block

BEGIN
DBMS_STATS.gather_table_stats (ownname => lv_owner,
tabname => lv_table_name,
partname => lv_part_name,
granularity => 'AUTO',
method_opt => 'for all columns',
CASCADE => DBMS_STATS.auto_cascade,
DEGREE => 1,
estimate_percent => .00001
);
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.put_line (SQLCODE || ' - ' || SQLERRM);
END;

END LOOP;

CLOSE lc_ns_p_nostats;

--Execute Global Database Stats Gathering with GATHER EMPTY option
--in order to catch any objects that were missed above because of errors, etc.
DBMS_OUTPUT.put_line ('Executing Global Stats Procedure to Catch Any Missed Empty Objects...');

BEGIN
dbms_stats.gather_database_stats(method_opt=>'for all columns',
granularity=>'AUTO',
estimate_percent=>DBMS_STATS.auto_sample_size,
degree=>DBMS_STATS.default_degree,
cascade=>DBMS_STATS.auto_cascade,
options=>'GATHER EMPTY');
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.put_line (SQLCODE || ' - ' || SQLERRM);
END;
/**************************END EMPTYSTATS************************************/

-- Lastly we flush the database monitoring information
-- and then use the information therein to gather stats
-- on significantly changed objects

/**************************BEGIN CHANGED OBJECTS STATS***********************/

DBMS_STATS.flush_database_monitoring_info ();

OPEN lc_ss; -- open the cursor before fetching

LOOP
FETCH lc_ss
INTO lv_owner, lv_table_name, lv_part_name, lv_change_factor,
lv_partitioned;

EXIT WHEN lc_ss%NOTFOUND;

-- If change factor is greater than change factor defined in the package header then we will analyze the changed object in question
IF lv_change_factor >= gn_change_factor
THEN
-- Now determine whether the Object is a partition
IF lv_partitioned = 'YES'
THEN
DBMS_OUTPUT.put_line
( 'Gathering Stats for Partition '
|| lv_owner
|| '.'
|| lv_table_name
|| '.'
|| lv_part_name
|| ' because its change factor exceeds '||gn_change_factor||'%...'
);
BEGIN
DBMS_STATS.gather_table_stats
(ownname => lv_owner,
tabname => lv_table_name,
partname => lv_part_name,
granularity => 'AUTO',
method_opt => 'for all columns',
CASCADE => DBMS_STATS.auto_cascade,
DEGREE => 1,
estimate_percent => .00001
);
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.put_line (SQLCODE || ' - ' || SQLERRM);
END;
ELSE
DBMS_OUTPUT.put_line
( 'Gathering Stats for Table and Indexes of '
|| lv_owner
|| '.'
|| lv_table_name
|| ' because its change factor exceeds '||gn_change_factor||'%...'
);
BEGIN
DBMS_STATS.gather_table_stats
(ownname => lv_owner,
tabname => lv_table_name,
granularity => 'AUTO',
method_opt => 'for all columns',
CASCADE => DBMS_STATS.auto_cascade,
estimate_percent => DBMS_STATS.auto_sample_size
);
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.put_line (SQLCODE || ' - ' || SQLERRM);
END;
END IF;
ELSE
NULL;
END IF;
END LOOP;

CLOSE lc_ss;

/**************************END CHANGED OBJECTS STATS**************************/
END;

--Procedure makes internal calls to first delete all stats
--and then gather them all over again

PROCEDURE start_over
AS
BEGIN
DBMS_OUTPUT.put_line ('Deleting all stats...');
delete_all_stats;
DBMS_OUTPUT.put_line ('Gathering fixed stats...');
gather_fixed_stats;
DBMS_OUTPUT.put_line ('Gathering user stats...');
gather_stats;
DBMS_OUTPUT.put_line ('Gathering dictionary stats...');
gather_data_dict_stats;
EXCEPTION
WHEN OTHERS
THEN
RAISE;
END;
END dba_stats_maint_pkg;
/

SHOW ERRORS;


Labels: ,

11g New Feature: Statistic Gathering and Publishing Decoupled/Statistics Maintenance Changes

Posted at Wednesday, October 31, 2007
In addition to the extended statistics functionality introduced by 11g, you can now gather statistics without having to publish them right away.

This allows one to gather the statistics and test the efficacy thereof before effecting a live production environment; in other words, existing explain plans will not be invalidated/regenerated.

Reference this 'Oracle by Example' article for a great demonstration: Gathering and Publishing Statistics Independently.

Additionally, Oracle has made some significant changes which makes using the out-of-the-box statistics collection program much more feasible. Namely, DBMS_STATS, has been changed so that you can set statistics collection preferences - think arguments passed to DBMS_STATS.GATHER_*_STATS procedures - at both the global and table levels. In turn, the automatic statistics gathering job will respect these preferences when it runs (during the applicable maintenance window).

In the past, I have always disabled the aforementioned 'AutoTask' job - for this very lack of granular control - in favor of a custom statistics gathering job. This was particularly necessary in Data Warehouse environments due to heavy use of partitioning as well as concomitant data volumes. In such environments, sampling only the partitions that have changed is critical to the conservation of system resources.

With the abovementioned scenario in mind, if you want to employ the out-of-the-box 'AutoTask' job then simply set table preferences (as appropriate) using DBMS_STATS. For instance, in order to gather large partitioned tables' statistics properly, you may wish to alter the following: INCREMENTAL, DEGREE, METHOD_OPT, STALE_PERCENT, and ESTIMATE_PERCENT.

For each, verify the current value for the table:

SQL> select dbms_stats.get_prefs('INCREMENTAL','SH','COSTS') from dual;

DBMS_STATS.GET_PREFS('INCREMENTAL','SH','COSTS')
--------------------------------------------------------------------------------
FALSE

The current setting for the table looks like the system default.

Alter the value, just for the table in question:

SQL> exec dbms_stats.set_table_prefs('SH','COSTS','INCREMENTAL' ,'TRUE') ;

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_prefs('INCREMENTAL','SH','COSTS') from dual;

DBMS_STATS.GET_PREFS('INCREMENTAL','SH','COSTS')
--------------------------------------------------------------------------------
TRUE

Verify that the global setting and table-specific setting co-exist and are, in fact, different:

SQL> select dbms_stats.get_prefs('INCREMENTAL') from dual;

DBMS_STATS.GET_PREFS('INCREMENTAL')
--------------------------------------------------------------------------------
FALSE

Here is another example, where we change METHOD_OPT for a table, analyze it using the DBMS_STATS package and then verify that the number of buckets we specify are created. It is important to note that the DBMS_STATS.SET_TABLE_PREFS doesn't seem to support the full syntax that METHOD_OPT allows:

Set the preference:

SQL> exec dbms_stats.set_table_prefs(ownname=>'SH',tabname=>'SALES',pname=>'METHOD_OPT',pvalue=>'FOR ALL COLUMNS SIZE 10');


Gather stats - this should pick up the tables preference:

SQL> exec dbms_stats.gather_table_stats(ownname=>'SH',tabname=>'SALES');
PL/SQL procedure successfully completed.


Verify that the number of histograms are created per the specificed METHOD_OPT:

SQL> select table_name, column_name, count(*) from user_histograms
2 where table_name='SALES'
3 group by table_name, column_name;

TABLE_NAME COLUMN_NAME COUNT(*)
------------------------------ ------------------------------ ----------
SALES PROD_ID 11
SALES CHANNEL_ID 4
SALES AMOUNT_SOLD 11
SALES CUST_ID 11
SALES PROMO_ID 4
SALES QUANTITY_SOLD 1
SALES TIME_ID 11

11g, also introduces the ability to easily revert to previous statistics' settings. They have extended the DBMS_STATS package as well as provided new *_STATS_HISTORY views.

Building on our prior example, let's say I analyze the aforementioned table again (with different options), but then I want to revert to the previous statistics above:

Analyze the table:

SQL> exec dbms_stats.gather_table_stats(ownname=>'SH',tabname=>'SALES', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY');

PL/SQL procedure successfully completed.

Verify that the number of buckets, are in fact, different:

SQL> select table_name, column_name, count(*) from user_histograms
2 where table_name='SALES'
3 group by table_name, column_name;

TABLE_NAME COLUMN_NAME COUNT(*)
------------------------------ -------------------- ----------
SALES PROD_ID 72
SALES CHANNEL_ID 4
SALES AMOUNT_SOLD 250
SALES CUST_ID 255
SALES PROMO_ID 4
SALES QUANTITY_SOLD 1
SALES TIME_ID 2

Query the timestamp for the prior collected stats:

SQL>  select count(*), stats_update_time
2 from user_tab_stats_history
3 where table_name='SALES'
4 group by stats_update_time;

COUNT(*) STATS_UPDATE_TIME
---------- ----------------------------------------
29 25-JAN-08 11.18.09.693351 AM -07:00
29 25-JAN-08 11.18.57.432585 AM -07:00

Restore the old stats for the table:

SQL> exec dbms_stats.restore_table_stats(ownname=>'SH',tabname=>'SALES',AS_OF_TIMESTAMP=>'25-JAN-08 11.18.57.432585 AM -07:00');

PL/SQL procedure successfully completed.

Validate that the old stats have been put back:

SQL> select table_name, column_name, count(*) from user_histograms
2 where table_name='SALES'
3 group by table_name, column_name;

TABLE_NAME COLUMN_NAME COUNT(*)
------------------------------ -------------------- ----------
SALES PROD_ID 11
SALES CHANNEL_ID 4
SALES AMOUNT_SOLD 11
SALES CUST_ID 11
SALES PROMO_ID 4
SALES QUANTITY_SOLD 1
SALES TIME_ID 11

Otherwise, Oracle has not changed a lot in terms of Statistics Maintenance. The same instances in which you have had to gather statistics manually still apply: System Statistics (those found in aux_stats$), Fixed Statistics, Volatile Tables, Tables post Direct-load, etc.

Reference the following Oracle docs:

Managing Optimizer Statistics

DBMS_STATS package reference

Labels: , ,

11g New Feature: Extended Optimizer Statistics

Posted at Tuesday, October 30, 2007
In 11g you can now correlate the relationship between 2 skewed columns. Here is a great 'Oracle by Example' article that exhibits how you can use extended statistics to aid the optimizer's explain plans: Using Extended Statistics to Optimize Multi-Column Relationships and Function-Based Statistics

Labels: , ,