Tuesday, May 18, 2021

Role of SCN in Instance Recovery & Media Recovery

Role of SCN in Instance Recovery & Media Recovery

SCN after an Instance Crash

Checkpoint is an event that happens at a particular time(=SCN). When this happens, all datafiles and control files are updated with SCN at that time. A successful checkpoint guarantees that all database changes up to the checkpoint SCN have been recorded in the datafiles. As a result, only those changes made after the checkpoint need to be applied during recovery.

 Oracle stores the SCN associated with the Checkpoint in four places: three of them in the control file( accessed thru v$datafile) and one in the datafile header for each datafile ( thru v$datafile_header)

a.)System checkpoint SCN in the control file.( v$database.current_scn)
b.)Datafile Checkpoint SCN individually in the control file for each datafile (v$datafile.chckpoint_change#)
c.)Checkpoint SCN value in the header of each datafile, which also referred as Start SCN ( v$datafile_header.checkpoint_change#)
d.)The stop SCN is held in the control file for each datafile ( v$datafile.last_cahnge#)


Now let us see how the SCN that is stored in different places by various events are useful during recovery scenarios

 

Under the normal conditions, where database is running without any problem, the System Checkpoint SCN, Datafile Checkpoint SCN and Start SCN have equal values. Whereas, Stop SCN will always be NULL under normal conditions.

During a clean shutdown of the database, since it is an event, a checkpoint occurs and the stop SCN value is set to Start SCN of datafile header. Now, all the 4 SCN values are same.
(Just remember !!! if it is a clean shutdown – all the four SCN’s will be same.)

Now let us see what happens when the database is started –

Upon startup, Start SCN value is checked with Datafile Checkpoint SCN, if they both match then Start SCN value is checked with Stop SCN and if they do match, Oracle opens the database without any problem. Since in our case as mentioned above all the 4 values are same and so we have a clean startup. Once the database is opened, stop SCN of the control file will be resetted to NULL again.

So, as explained above, this is how the SCN i.e stored at various places behaves under normal conditions. Now let us know how SCN’s behaves under abnormal conditions like instance crash or media failure.

When an Instance Crash occurs, since a clean shutdown didn’t happend, Stop SCN value is not initiated with the start SCN value and it remains as NULL. Upon startup, Oracle checks the Start SCN with Stop SCN and found that they are not in match. Oracle determines that instance requires a crash recovery. Instance recovery happens automatically at database startup without database administrator (DBA) intervention.


At instance startup, the datafile checkpoint SCN in the control file and the start SCN in the datafile header are checked for equality. If they don’t match, it is a signal that media recovery is required

 

.

 

 

In this session we are going to demonstrate the behavior of the checkpoints after an instance crash, the following SQL creates a table (which performs an implicit commit) and inserts a row of data into it without a commit:

SQL> create table dummytable (x number) tablespace test;
Table created.

SQL> insert into dummytable values(100);
1 row created.

During subsequent startup after instance crash Oracle will realize before opening database that value for Last_Change# column for datafiles are not showing SCN value , which means END Checkpoint SCN was not done, which would be indication for Oracle that instance recovery is required.

Let’s mimic instance failure and then startup mount to examine the situation

SQL> shu abort;
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.
Total System Global Area 939495424 bytes
Fixed Size 2218952 bytes
Variable Size 704644152 bytes
Database Buffers 226492416 bytes
Redo Buffers 6139904 bytes
Database mounted.

SQL> select checkpoint_change# from v$database;
CHECKPOINT_CHANGE#
------------------
2125209

SQL> select name,checkpoint_change#,last_change# from v$datafile where name like '%system%';
NAME                                          CHECKPOINT_CHANGE# LAST_CHANGE#
--------------------------------------------- ------------------ ------------
/u02/app/oracle/oradata/mask11g/system01.dbf  2125209

SQL> select name,checkpoint_change# from v$datafile_header where name like '%system%';
NAME                                          CHECKPOINT_CHANGE#
--------------------------------------------- ------------------
/u02/app/oracle/oradata/mask11g/system01.dbf  2125209


In this case, the stop SCN is not set, which is indicated by the NULL value in the LAST_CHANGE# column. This information enables Oracle, at the time of the next startup, to determine that the instance crashed because the checkpoint on shutdown was not performed. If it had been performed, the LAST_CHANGE# and CHECKPOINT_CHANGE# values would match for each datafile as they did during a clean shutdown. If an instance crashes at shutdown, then instance crash recovery is required the next time the instance starts up.

Recovery from an Instance Crash

Upon the next instance startup that takes place after SHUTDOWN ABORT or a DBMS crash, the Oracle DBMS detects that the stop SCN for datafiles is not set in the control file during startup. Oracle then performs crash recovery. During crash recovery, Oracle applies redo log records from the online redo logs in a process referred to as roll forward to ensure that all transactions committed before the crash are applied to the datafiles. Following roll forward, active transactions that did not commit are identified from the rollback segments and are undone before the blocks involved in the active transactions can be accessed. This process is referred to as roll back. In our example, the following transaction was active but not committed at the time of the SHUTDOWN ABORT, so it needs to be rolled back:

SQL> insert into dummytable values(100);

 

After instance startup, the dummytable table exists, but remains empty. Instance recovery happens automatically at database startup without database administrator (DBA) intervention. It may take a while because of the need to apply large amounts of outstanding redo changes to data blocks for transactions that completed and those that didn’t complete and require roll back.

Recovery from a Media Failure

Up until this point, the checkpoint start SCN in the datafile header has always matched the datafile checkpoint SCN number held in the control file. This is reasonable because during a checkpoint, the datafile checkpoint SCN in the control file and the start SCN in the datafile header are both updated, along with the system checkpoint SCN. The following SQL shows the start SCN from the datafile header and datafile checkpoint SCN from the control file for the same file:

SQL> select 'controlfile' "SCN location",name,checkpoint_change# from v$datafile where name like '%TEST01%'
union
select 'file header',name,checkpoint_change# from v$datafile_header where name like '%TEST01%';

SCN location NAME                                         CHECKPOINT_CHANGE#
------------ -------------------------------------------- ------------------
controlfile /u02/app/oracle/oradata/mask11g/TEST01.dbf    2160246
file header /u02/app/oracle/oradata/mask11g/TEST01.dbf    2160246



Unlike the v$datafile view, there is no stop SCN column in the v$datafile_header view because v$datafile_header is not used at instance startup time to indicate that an instance crash occurred. However, the v$datafile_header does provide the Oracle DBMS with the information it requires to perform media recovery. At instance startup, the datafile checkpoint SCN in the control file and the start SCN in the datafile header are checked for equality. If they don’t match, it is a signal that media recovery is required.

For example, media recovery is required if a media failure has occurred and the original datafile has been replaced with a backup copy. In this case, the start SCN in the backup copy is less than the checkpoint SCN value in the control file, and Oracle requests archived redo logs—generated at the time of previous log switches—in order to reapply the changes required to bring the datafile up to the current point in time.

In order to recover the database from a media failure, we must run the database in ARCHIVELOG mode to ensure that all database changes from the online redo logs are stored permanently in archived redo log files. In order to enable ARCHIVELOG mode, we must run the command ALTERDATABASE ARCHIVELOG when the database is in a mounted state.

So to examine the media recovery our database must be in archivelog mode so we can use tablespace or datafile level failure for our test. Before trying the media recovery scenario let’s take backup of tablespace.

SQL> alter tablespace TEST begin backup;

Tablespace altered.

Check the SCN of datafile at this point, as post begin backup datafile header will be frozen.

SQL> select 'controlfile' "SCN location",name,checkpoint_change# from v$datafile where name like '%TEST01%'
union
select 'file header',name,checkpoint_change# from v$datafile_header where name like '%TEST01%';

SCN location NAME                                         CHECKPOINT_CHANGE#
------------ -------------------------------------------- ------------------
controlfile /u02/app/oracle/oradata/mask11g/TEST01.dbf    2160333
file header /u02/app/oracle/oradata/mask11g/TEST01.dbf    2160333


Let’s take a backup of datafile test01.dbf

SQL> !cp /u02/app/oracle/oradata/mask11g/TEST01.dbf /u02/app/oracle/oradata/mask11g/TEST01.dbf.bkp

SQL> !ls -lrt /u02/app/oracle/oradata/mask11g/TEST01*

-rw-r----- 1 oracle oinstall 52436992 Sep 2 17:27 /u02/app/oracle/oradata/mask11g/TEST01.dbf
-rw-r----- 1 oracle oinstall 52436992 Sep 2 17:27 /u02/app/oracle/oradata/mask11g/TEST01.dbf.bkp


SQL> alter tablespace TEST end backup;

Tablespace altered.

Now create another table into Test tablespace.

SQL> create table dummy tablespace test as select * from dba_extents;

Table created.

Remove datafile and force system to sync OS filesystem cache

# rm /u02/app/oracle/oradata/mask11g/TEST01.dbf

# sync

# sync

Let’s try DBWR to trigger the I/O

SQL> alter system flush buffer_cache;

System altered.

Report this ad

At this point we have checked and no error visible in alert log which means DB still able to access cached copy of datafile. Let’s try to access table, and we are able to see the result what we were expecting.

SQL> select count(*) from dummy;
select count(*) from dummy
*
ERROR at line 1:
ORA-00376: file 17 cannot be read at this time
ORA-01110: data file 17: '/u02/app/oracle/oradata/mask11g/TEST01.dbf'

We can identify files that need recovery by querying v$recover_file.

SQL> select file#,change# from v$recover_file;

FILE#      CHANGE#
---------- ----------
17         0

At this stage datafile is no longer available for query so CHANGE# coloumn showing 0. Once we place the file from backup, we will see latest information from header of restored datafile.

# mv /u02/app/oracle/oradata/mask11g/TEST01.dbf.bkp /u02/app/oracle/oradata/mask11g/TEST01.dbf

Let’s query the status of recovery SCN , which must be the SCN number at which backup of datafile was taken, which is 2160333.

SQL> select file#,change# from v$recover_file;

FILE#      CHANGE#
---------- ----------
17         2160333

Lets compare the required SCN from controlfile and actual SCN from datafile header to compare why Oracle is showing this file as restored from backup.

SQL> select 'controlfile' "SCN location",name,checkpoint_change# from v$datafile where name like '%TEST01%'
union
select 'file header',name,checkpoint_change# from v$datafile_header where name like '%TEST01%';

SCN location NAME                                         CHECKPOINT_CHANGE#
------------ -------------------------------------------- ------------------
controlfile /u02/app/oracle/oradata/mask11g/TEST01.dbf    2166885
file header /u02/app/oracle/oradata/mask11g/TEST01.dbf    2160333


We can clearly see the expected SCN for datafile and available SCN from restored datafile, that’s how Oracle DBMS detects the requirement of media recovery.

SQL> alter database recover tablespace test;

Database altered.

SQL> alter tablespace test online;

Tablespace altered.

Let’s try to access table having extents in restored datafile.

SQL> select count(*) from dummy;

COUNT(*)
----------
9306

 

No comments:

Post a Comment