Sunday, June 20, 2021

DB file sequential read due to full scans

 App Team has found a long running query and are working to stop the query.
The long running query was stopped and the bridge waited to see if the errors dropped.  The  Team was still seeing a significant amount of errors so the team looked further into the problem. There appeared to be a second long running job that was not seen initially .Team stopped the second long running job and the bridge is now waiting for the transactions to flow through to see if the errors drop..
 Stopping the second long running query showed some temporary relief, the numbers have started to increase again.  DBA is being contacted to assist .DBA team is looking further to determine why the long running queries are continuing to show up. Bouncing the db server is not an immediate option as it will not cleanly fail over.

pgausage

 

Though mcafee was top contributor ,but it was determined that mcafee has been running since server start, it quite normal for a process to reflect in top with starttime way back,also when I checked in top command ,the process was found Sleeping (S tag) .


Server  Team has been brought to the bridge to stop McAfee on the server.  It was found that the load on one of the servers was very high and it is believed that a process kicked off that triggered McAfee to perform a whole table scan.  Once McAfee was stopped, the load average has started to slowly come down.  The bridge is waiting to see if further action will be needed.
 McAfee was turned down on the problematic database server clearing the problem.  The bridge continues to discuss root cause and any steps that will need to be taken for outage prevention.
  extensive_sql

  awr_sql_execution_tot or sqlT (inbuilt)
  sql_relative_card   
 

  awrstat_timeseen 

 

 

sql_check  , I did not find any sql profile/baseline created, so I used coe script to pin the old plan.

TLDR: McAfee was turned down on the problematic database server in an attempt to clear.  However, this did not resolve the outage. It was found that an Oracle Plan was changed at 04:00CT/ Oracle support reverted the Oracle Plan change and pinned the good plan which improved response time.

 

No comments:

Post a Comment