Monitoring a Grid Control Installation

Posted at Wednesday, September 24, 2008
This posting covers how to monitor your 10g Grid Control installation. This is referred to as OOB (Out-of-Bound) Notification. I am using a version 10.2.0.4 OMS (Oracle Management Service), Repository Database, and Agent on Enterprise Linux.

In essence, the agent local to the OMS must be used in order to monitor your Grid Control installation via the Oracle-provided method (i.e. series of perl scripts, run by the agent).

This post assumes that you already have a local mail service - in my case sendmail - established and functioning.

Check the Agent and Verify the OMS Target

$ export ORACLE_SID=agent10g
$ . oraenv
$ emctl config agent listtargets | grep oracle_emrep

If the OMS target is missing then add it to the $AGENT_HOME/sysman/emd/targets.xml file

<target type="oracle_emrep" name="Management Services and Repository" version="1.0">
<property value="(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=rac1.colestock.test)(PORT=1521)))(CONNECT_DATA=(SID=emrep)))" name="ConnectDescriptor"></property>
<property value="sysman" name="UserName" encrypted="FALSE"></property>
<property value="password" name="password" encrypted="FALSE"></property></target>

Adjust the aforementioned values to suit your environment

Reload the Agent

$ emctl reload agent
Oracle Enterprise Manager 10g Release 4 Grid Control 10.2.0.4.0.
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD reload completed successfully

This will encrypt the username and password properties in the targets.xml

The agent should now be monitoring the OMS as a target

$ emctl config agent listtargets | grep oracle_emrep
[Management Services and Repository, oracle_emrep]

Update the $AGENT_HOME/sysman/config/emd.properties

Configure the email properties for the agent in emd.properties

emd_email_address=oracle@rac1.colestock.test,james@colestock.com
emd_email_gateway=localhost
emd_from_email_address=

Reload the agent again

$ emctl reload agent
Oracle Enterprise Manager 10g Release 4 Grid Control 10.2.0.4.0.
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
---------------------------------------------------------------
EMD reload completed successfully

Verify $AGENT_HOME/bin/emrepdown.pl

This file may be missing from your agent installation. If so, copy the file from the OMS' $ORACLE_HOME to $AGENT_HOME/bin/emrepdown.pl

Review this file closely. For example, if you specified a value for emd_from_email_address earlier, then notifications might fail. This is due to the fact that the script uses the -r or "recipient" option of mailx; this option is not available on all Linux/UNIX-based distributions. Make the changes that are necessary for your environment.

Test Notification

Shut down the OMS and wait to see whether the notification email fires off.

$ export ORACLE_SID=oms10g
$ . oraenv
$ $ORACLE_HOME/opmn/bin/opmnctl stopall
opmnctl: stopping opmn and all managed processes...

The notification should take at least a few minutes to appear.

If you were to trace the agent's activity (DEBUG level) you would see the following in $AGENT_HOME/sysman/log/emagent_perl.trc

emrepnotif.pl: Wed Sep 24 18:28:55 2008: DEBUG: emrepnotif: Connectdescriptor (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=rac1.colestock.test)(PORT=1521)))(CONNECT_DATA=(SID=emrep)))
emrepnotif.pl: Wed Sep 24 18:28:55 2008: DEBUG: emrepnotif: emConsoleMode=STANDALONE jobLike=EMD_MAINTENANCE%
emrepnotif.pl: Wed Sep 24 18:28:55 2008: DEBUG: emrepnotif: , /tmp/_emrepnotif
emrepnotif.pl: Wed Sep 24 18:28:55 2008: DEBUG: emrepnotif: sql is: SELECT
(SELECT count(broken) FROM user_jobs
WHERE what LIKE('EMD_MAINTENANCE%')
AND broken = 'Y'),
(SELECT MIN(SYSDATE-next_date) FROM user_jobs
WHERE what LIKE('EMD_MAINTENANCE%')),
NVL((SELECT AVG(value) FROM mgmt_system_performance_log
WHERE job_name like('EMD_MAINTENANCE%')
AND name='Queued Notifications'
AND time>(SYSDATE-(1/24))),0),
(SELECT DECODE(COUNT(a.device_name), 0, -1, COUNT(a.device_name)) - COUNT(b.device_name)
FROM mgmt_notify_devices a, mgmt_notify_devices b
WHERE b.status = 0) FROM DUAL
emrepnotif.pl: Wed Sep 24 18:28:55 2008: DEBUG: emrepnotif: query result is:
em_result=0|0|-1

emrepnotif.pl: Wed Sep 24 18:28:55 2008: DEBUG: emrepnotif: Time in emrepnotif: 61.2359046936035
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: Connectdescriptor (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=rac1.colestock.test)(PORT=1521)))(CONNECT_DATA=(SID=emrep)))
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: , /tmp/_emrepresp
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: sql is: select count(distinct host_url) from mgmt_failover_table where sysdate-last_time_stamp < 300
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: sql1 is: SELECT
(SELECT count(broken) FROM user_jobs
WHERE what LIKE('EMD_COLLECTION.%')
AND broken = 'Y'),
(SELECT MIN(SYSDATE-next_date) FROM user_jobs
WHERE what LIKE('EMD_COLLECTION.%')) FROM DUAL
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: No active OMSs
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: exists=, accesstime=, interval=0.0416666666666667 mailscriptexists=
emrepresp.pl: Wed Sep 24 18:29:15 2008: ERROR: emrepresp: processfailure /u01/app/oracle/product/agent10g/bin/emrepdown.pl, Message:No active Management Services were found, Subject:Severe Enterprise Manager problem
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: opened /tmp/sysman1234_emrepdown
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: list command1=cat /u01/app/oracle/product/agent10g/sysman/config/emd.properties | grep -i EMD_EMAIL_ADDRESS= | sed s?EMD_EMAIL_ADDRESS=??i | awk '{print }'
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: return command1=cat /u01/app/oracle/product/agent10g/sysman/config/emd.properties | grep -i EMD_FROM_EMAIL_ADDRESS= | sed s?EMD_FROM_EMAIL_ADDRESS=??i | awk '{print }'
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: list=oracle@rac1.colestock.test,james@colestock.com
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: return=
emrepresp.pl: Wed Sep 24 18:29:15 2008: DEBUG: command1=`mailx -s "Severe Enterprise Manager problem" oracle@rac1.colestock.test,james@colestock.com < /tmp/sysman1234_emrepdown`
emrepresp.pl: Wed Sep 24 18:29:16 2008: DEBUG: out=
emrepresp.pl: Wed Sep 24 18:29:16 2008: DEBUG: emrepresp: Time in emrepresp: 1205.62410354614

The referenced perl scripts are the ones involved in checking the status of the Grid Control installation and sending the email

If everything works, you should receive an email similar to the following

From oracle@colestock.com  Wed Sep 24 18:29:16 2008
Return-Path:
Received: from rac1.colestock.test (localhost.colestock.test [127.0.0.1])
by rac1.colestock.test (8.13.1/8.13.1) with ESMTP id m8P0TFTp006370;
Wed, 24 Sep 2008 18:29:16 -0600
Received: (from oracle@localhost)
by rac1.colestock.test (8.13.1/8.13.1/Submit) id m8P0TFpN006368;
Wed, 24 Sep 2008 18:29:15 -0600
Date: Wed, 24 Sep 2008 18:29:15 -0600
From: Oracle Software Owner
Message-Id: <200809250029.m8P0TFpN006368@rac1.colestock.test>
To: oracle@rac1.colestock.test, james@colestock.com
Subject: Severe Enterprise Manager problem


Wed Sep 24 18:29:15 MDT 2008
Severe Enterprise Manager problem
Error message: No active Management Services were found

For more help, refer to this Metalink article: 429257.1

Labels: ,