Star Schema/Transformation Example

Posted at Friday, November 23, 2007
Here is a home-grown example of a star schema at work. I created a couple of dimension tables and a single fact, that represent baseball statistics. In my example, I have a couple of fictious players that have been batting in every game since the dawn of the last millenium. I then query this to see whether oracle decides to perform star transformations in order to enhance the execution plans.

In theory, the optimizer should consider the bitmap indexes I have created and transform the query in question if considered a more efficient mechanism for identifying the requested rows from the fact table.

Here is the DDL as well as the corresponding log from my test:

create_sample_star_schema.sql

create_sample_star_schema.log

As you can see, the design is fairly simple. I utilize B-TREE indexes for the primary keys and the alternate/unique keys. When the dimensions' primary keys migrate into the fact table (as foreign keys), however, I index them using a BITMAP strategy. I also bitmap all the remaining columns that would ostensibly be queried by users in real life.

Afterwards, I test running an analytical query with STAR_TRANSFORMATION_ENABLED set to 'TRUE' and 'FALSE' to see how having star transformation at the optimizer's disposal effects the results.

Here is the DDL as well as the corresponding log from my queries:

star_transform_perf_test.sql

star_transform_perf_test.log

If STAR_TRANSFORMATION_ENABLED is set to 'TRUE' then the optimizer does choose to transform the query: it returns my query in 680 milliseconds performing 4,087 consistent gets and 603 physical reads at an optimizer cost of 169.

If STAR_TRANSFORMATION_ENABLED is set to 'FALSE' then the optimizer doesn't choose to transform the query: it returns my query in 280 milliseconds performing 1,776 consistent gets and 51 physical reads at an optimizer cost of 102.

Granted, my example doesn't have a lot of data (and I didn't partition, since I could perform my desired tests that notwithstanding), but I find it strange that Oracle chooses a more resource-intensive plan if STAR_TRANSFORMATION_ENABLED is 'TRUE'. Perhaps a better strategy is to use the */+ STAR_TRANSFORMATION */ hint for those queries that have been tested and identified to work better with the transformation. As data volumes increase in this example, I am pretty sure that the star transformation would gain ground performance-wise; therefore, testing is key.

Without question, however, designs that use BITMAP indexes have a much smaller footprint than their B-TREE counterparts and herego are associated with shorter ETL times.

Just to prove this, I ran the exact same previous example, but instead I replaced all BITMAP indexes with BTREE indexes:

create_sample_star_schema_btree.sql

create_sample_star_schema_btree.log

You should notice just how much more space the BTREE indexes consume.

Now I run the aforementioned query and see what the results are:

star_transform_perf_test_btree.sql

star_transform_perf_test_btree.log

If we replace the BITMAP indexes in the previous example with BTREE indexes: it returns my query in 670 milliseconds performing 4,034 consistent gets and 276 physical reads at an optimizer cost of 38. Not much of a performance gain given that it takes almost twice the storage to house the BTREE indexes vs. the BITMAP indexes.

So, in conclusion, using bitmap indexes in a star schema design can save a lot of space and can meet or out-perform the same design with btree indexes. As far as the star transformations, I would test thoroughly in your environment. This can always be set via a hint and or at the session level. It is my understanding that the decision to use star transformation is very environment dependent.

According to Barnali Banerjee, star transformations are best used in the following circumstances, when:

"
• The query patterns are undefined so building and maintaining multiple composite keys would be too difficult, if not impossible.

• The number of joined dimensions is large (over 5).

• Cartesian products would be too costly due to the size of the individual dimension tables.

• The fact table is sparsely populated.

• The cardinality is low between fact foreign key columns and the total fact number of rows making it a good candidate for bitmap indexes.

• Space is an issue. Bitmap indexes do not store the column value in the index, which greatly reduces the space requirements.

• In complex queries where the conditions in the WHERE clause are based on non-foreign key fact table columns. These columns need to have bitmap indexes to benefit from the bitmap transformation.
"

I obtained the following from his presentation Tuning Data Warehouse Query Performance (A Case Study)

Labels: