RAC: Tuning the Interconnect

Posted at Sunday, February 03, 2008
In order to tune the Interconnect for your RAC cluster, Oracle provides a diagnostic script; you can find this script in the following Metalink Note: 135714.1

In the event that you don't have access to Metalink; here is the script and an example log file from a run against my 2-node, 10.2.0.3 RAC cluster on Enterprise Linux:

racdiag.sql (Rac Diagnostic Script)

Example racdiag.sql log file

In regards to how the Interconnect is performing, the aforementioned script runs a couple of key queries; here are a couple of the more important ones:

-- GLOBAL CACHE CR PERFORMANCE
-- This shows the average latency of a consistent block request.
-- AVG CR BLOCK RECEIVE TIME should typically be about 15 milliseconds depending
-- on your system configuration and volume, is the average latency of a
-- consistent-read request round-trip from the requesting instance to the holding
-- instance and back to the requesting instance. If your CPU has limited idle time
-- and your system typically processes long-running queries, then the latency may
-- be higher. However, it is possible to have an average latency of less than one
-- millisecond with User-mode IPC. Latency can be influenced by a high value for
-- the DB_MULTI_BLOCK_READ_COUNT parameter. This is because a requesting process
-- can issue more than one request for a block depending on the setting of this
-- parameter. Correspondingly, the requesting process may wait longer. Also check
-- interconnect badwidth, OS tcp settings, and OS udp settings if
-- AVG CR BLOCK RECEIVE TIME is high.
--
set numwidth 20
column "AVG CR BLOCK RECEIVE TIME (ms)" format 9999999.9
select b1.inst_id, b2.value "GCS CR BLOCKS RECEIVED",
b1.value "GCS CR BLOCK RECEIVE TIME",
((b1.value / b2.value) * 10) "AVG CR BLOCK RECEIVE TIME (ms)"
from gv$sysstat b1, gv$sysstat b2
where b1.name = 'global cache cr block receive time' and
b2.name = 'global cache cr blocks received' and b1.inst_id = b2.inst_id
or b1.name = 'gc cr block receive time' and
b2.name = 'gc cr blocks received' and b1.inst_id = b2.inst_id ;


-- GLOBAL CACHE LOCK PERFORMANCE
-- This shows the average global enqueue get time.
-- Typically AVG GLOBAL LOCK GET TIME should be 20-30 milliseconds. the elapsed
-- time for a get includes the allocation and initialization of a new global
-- enqueue. If the average global enqueue get (global cache get time) or average
-- global enqueue conversion times are excessive, then your system may be
-- experiencing timeouts. See the 'WAITING SESSIONS', 'GES LOCK BLOCKERS',
-- 'GES LOCK WAITERS', and 'TOP 10 WAIT EVENTS ON SYSTEM' sections if the
-- AVG GLOBAL LOCK GET TIME is high.
--
set numwidth 20
column "AVG GLOBAL LOCK GET TIME (ms)" format 9999999.9
select b1.inst_id, (b1.value + b2.value) "GLOBAL LOCK GETS",
b3.value "GLOBAL LOCK GET TIME",
(b3.value / (b1.value + b2.value) * 10) "AVG GLOBAL LOCK GET TIME (ms)"
from gv$sysstat b1, gv$sysstat b2, gv$sysstat b3
where b1.name = 'global lock sync gets' and
b2.name = 'global lock async gets' and b3.name = 'global lock get time'
and b1.inst_id = b2.inst_id and b2.inst_id = b3.inst_id
or b1.name = 'global enqueue gets sync' and
b2.name = 'global enqueue gets async' and b3.name = 'global enqueue get time'
and b1.inst_id = b2.inst_id and b2.inst_id = b3.inst_id;


-- DLM TRAFFIC INFORMATION
-- This section shows how many tickets are available in the DLM. If the
-- TCKT_WAIT columns says "YES" then we have run out of DLM tickets which could
-- cause a DLM hang. Make sure that you also have enough TCKT_AVAIL.
--
set numwidth 5
select * from gv$dlm_traffic_controller
order by TCKT_AVAIL;


Oracle also maintains a RAC survival guide - which is quite helpful - here: 203226.1

Labels: , ,