Change Data Capture: Implementing Distributed Asynchronous Autolog Archive CDC

Posted at Saturday, November 17, 2007
I have talked to a lot of folks lately that are moving towards using Oracle's Change Data Capture functionality for their Data Warehouses' ETL processes.

Change Data Capture (CDC for short), introduced with 9i, is used for propagating changes - specifically, those that are incremental - to non-OLTP environments. In other words, this functionality is largely for the Data Warehouse/Business Intelligence Stakeholder.

CDC can be configured a lot of different ways: synchronous vs. asynchronous; distributed vs. non-distributed; ARCH vs. LGWR transport method - just to name a few.

I believe that the Distributed Asynchronous Autolog configuration holds the most promise namely because it has little performance impact on the source operational (OLTP) environment (an important factor in firms' ETL tool selection decisions). Additionally, the configuration changes necessary on the source database are reasonable and shouldn't have an impact on availability/SLA requirements.

I implemented this on a couple of Dell machines using 10.2.0.3 and Enterprise Linux (2.6.9-55). I followed the following 'cookbook' from Oracle: Asynchronous Autolog Change Data Capture Cookbook (September 2007)

I also referred to the following Oracle Data Warehousing Guide (Reference Chapter 16): Data Warehousing Guide 10g Release 2 (10.2) B14223-02

To make this easier for you to do the same, I have uploaded tar archives from each machine - both the SOURCE and STAGE targets - which include all the scripts and log files I used in my implementation. This should be most of what you will need to successfully create the databases/listeners and test the aforementioned Cookbook's test-case, except for manual steps such as editing /etc/oratab and running orapwd.

Tarball Archive for the STAGING Database (cdc_autolog_arch_stage.tar)

Tarball Archive for the SOURCE Database (cdc_autolog_arch_source.tar)

Labels: ,