In a previous blog post we laid out the basics of database recovery models and backups. As part of that blog we briefly discussed Log Sequence Numbers and Log Chains, and how they both apply to, and affect, backups. This blog will explore that topic deeper, looking in detail at backups and the log chain.
Setting up the Environment
Let’s get the environment set up first to show how log sequence numbers and log chains effect and work with backups. To illustrate this, I have created a simple database called SampleDB with a single table called Friends.
The process for this demo is to insert data into the table and then take backups. To illustrate this, I have created a simple C# .NET application to loop and insert a row into the table, waiting 30 seconds between inserts.
Once I kick that off, I need to take some backups. I first want to take a FULL backup of the database, followed by backups of the transaction log. As a recap, a FULL backup backs up the entire database along with part of the transaction log; differential backups contain only the data that has changed since the last full backup, and log backups contain all the transactions since the last FULL or differential backup. For this example, we’ll first do a FULL then Log backups. The following code first does a FULL backup, then loops, backing up the transaction log every minute.
In this example, I started .NET application, waited a few minutes to get data into the table, then ran the SQL script. As you can see, it performed a FULL, then started taking Log backups.
After that has been running for a few minutes, open up a new query window and run the following query against the SampleDB database:
SELECT
s.database_name,
CASE s.[type]
WHEN 'D' THEN 'Full'
WHEN 'I' THEN 'Differential'
WHEN 'L' THEN 'Transaction Log'
END AS BackupType,
s.first_lsn,
s.last_lsn,
s.database_backup_lsn,
s.checkpoint_lsn,
s.recovery_model
FROM msdb.dbo.backupset s
WHERE s.database_name = 'SampleDB'
GO
Here we are querying the backupset table from the msdb database to get the type of backup and the Log Sequence Number (LSN) information. As another refresher from the last blog, an LSN is essentially the primary key for a log record. As SQL Server writes transactions to the log, they are written sequentially as log records. Each record in the log is given, and identified by, a LSN. As records are written to the log, the record is written to the logical end of the log, and the LSN is higher than the record that preceded it.
As each backup of the transaction log is performed, they are added and maintained in a series which forms a chain called the log chain. The log chain defines an unbroken, continuous sequence of transaction log backups. As more log backups are taken, they are added to the chain in order. All log chains begin with a Full backup.
For example, running the above query shows our initial backup followed by a handful of transaction log backups. The important columns to highlight are the following:
- BackupType: From the type column, tells us what the type of backup performed.
- database_backup_lsn: The Log Sequence Number of the most recent full database backup.
- first_lsn: The Log Sequence Number of the first (or oldest) log record in the backup set (logical string of log records).
- last_lsn: The Log Sequence Number of the next log record after the backup set.
Thus, what we see here is an intact log chain. All the transaction logs “belong” to the most recent full backup, and last_lsn of each log backup is the first_lsn of the next transaction log backup in the backups set.
Backups
At this point we have our foundation set, and while both programs still continue to run, let’s start taking FULL backups and see how that effects the log chain. Again, we are still running both apps so that data is still being inserted into the Friends table and log backups are still being taken.
Copy Only
We’ll start with Copy Only backups. The following T-SQL statement takes a Copy-only backup of the SampleDB database:
BACKUP DATABASE [SampleDB] TO DISK = N'D:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\Backup\CopyOnly.bak'
WITH COPY_ONLY, NOFORMAT, NOINIT, NAME = N'SampleDB-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
The thing to focus on in the statement above is the WITH COPY_ONLY parameter, which is the same as selecting the Copy-only backup option in the Back up Database dialog.
What is a copy-only backup? It is a database backup that is completely independent of the current backup sequence. Thus, continuing the example, while transaction logs are still being taken, we take a copy-only backup of the SampleDB database. What we see is the following;
- All transaction log backups are still chained to the last FULL (non copy-only) backup. In the image below, that would be the FULL backup in row 1.
- The copy-only FULL backup in row 8 has its own first_lsn and last_lsn and as thus is NOT part of the chain.
- We modified the query from earlier and added the is_copy_only column and can see that it is indeed a copy-only backup.
- As subsequent transaction log backups are taken, the LSN chain continues, as you can see in the image below between rows 7 and 9. The last_lsn before the copy-only backup is the first_lsn immediately after the copy-only backup. Thus, the chain is still intact.
You might be asking why the copy-only FULL backup has the same database_backup_lsn as the transaction log backups if it is independent of the current backup sequence. Great question.
The answer is that the database backup is the LSN you need to go forward from the log backups to restore from that FULL. Thus, the database_backup_lsn does not change on the copy_only but does change on the non copy-only.
Append to existing backup set
Next we’ll take normal FULL backups and see how that affects the log chain. While the two apps are still running, we’ll execute the following which appends to the existing media set. This is the default behavior of SQL Server anyway, we’re just not selecting the Copy-only backup option.
BACKUP DATABASE [SampleDB] TO DISK = N'D:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\Backup\AppendToSet.bak'
WITH NOFORMAT, NOINIT, NAME = N'SampleDB-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
This T-SQL is the same as selecting the Append to the existing backup set option in the Back up Database dialog, which again is the default behavior.
Running the query that selects from the backupset table, we now see that our chain is jacked up, i.e., broken. First, we see that the FULL backup just taken is NOT a copy-only, and more importantly, we see that all subsequent transaction log backups are now part of a new chain starting with this most recent FULL backup.
It is important to note, though, that the last_lsn (row 12) before the last FULL backup (row 13) is the first_lsn (row 14) immediately after the FULL backup. That does not mean that our chain is intact. It is not intact because our chain has started over (new database_backup_lsn).
Overwrite existing backup set
Overwriting existing backup sets overwrites all the sets on the media. The only difference between the following T-SQL statement and the previous statement is the INIT vs. NOINIT parameter.
BACKUP DATABASE [SampleDB] TO DISK = N'D:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\Backup\OverwriteSet.bak'
WITH NOFORMAT, INIT, NAME = N'SampleDB-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
Regardless, the outcome is the same; the Log Sequence chain is broken and a new chain begins.
Log Sequence Numbers and Log Chains: Consequences
The issue that we have highlighted here is that if you take a backup that is NOT a copy-only, you effectively break up the chain, and there are significant consequences with that. Breaking up the chain makes it impossible to restore any of the DIFF or LOG backups until there is a new FULL. You no longer have a point-in-time-restore, essentially nullifying your RPO (recovery point objective).
And, let’s say you decide to restore that FULL backup and then try to take additional DIFF and LOG backups. You get the following:
In this scenario, you need to take another FULL backup, then let the DIFF and LOG backups continue. It just makes things real messy.
The whole point of doing FULL, DIFF, and LOG backups is to be able to adhere to an established RPO and be able to do any point-in-time data recovery. Taking a non copy-only backup breaks the chain sequence and eliminates PIT recovery.
If you need to take a backup of a database, such as taking a copy of production and restoring it to your test environment, then grab the last FULL backup from your backup store location. But please, if you need to take an ad-hoc backup, use the COPY-ONLY option.