App Team has found a long running query and are working to
stop the query.
The long running query was stopped and the bridge waited to see if the
errors dropped. The Team was still
seeing a significant amount of errors so the team looked further into the
problem. There appeared to be a second long running job that was not seen
initially .Team stopped the second long running job and the bridge is now
waiting for the transactions to flow through to see if the errors drop..
Stopping the second long running query showed some temporary relief, the
numbers have started to increase again. DBA is being contacted to assist .DBA
team is looking further to determine why the long running queries are
continuing to show up. Bouncing the db server is not an immediate option as it
will not cleanly fail over.
Though mcafee was top contributor ,but it was determined that mcafee has been running since server start, it quite normal for a process to reflect in top with starttime way back,also when I checked in top command ,the process was found Sleeping (S tag) .
Server Team has been brought to the
bridge to stop McAfee on the server. It was found that the load on one of
the servers was very high and it is believed that a process kicked off that
triggered McAfee to perform a whole table scan. Once McAfee was stopped,
the load average has started to slowly come down. The bridge is waiting
to see if further action will be needed.
McAfee was turned down on the problematic database server clearing the
problem. The bridge continues to discuss root cause and any steps that
will need to be taken for outage prevention.
extensive_sql
awr_sql_execution_tot or sqlT (inbuilt)
sql_relative_card
sql_check , I did not find any sql profile/baseline created, so I used coe script to pin the old plan.
TLDR: McAfee was turned down on the problematic database server in an attempt to clear. However, this did not resolve the outage. It was found that an Oracle Plan was changed at 04:00CT/ Oracle support reverted the Oracle Plan change and pinned the good plan which improved response time.
No comments:
Post a Comment