Much has been written about the performance implications of the dreaded RBAR, a term coined by Jeff Moden many years ago. I won’t rehash any of that here other than to say that in almost all cases, set operations are faster and more efficient than RBAR. So we spend a lot of time teaching (or preaching) the evil of RBAR and to do things in sets.
However, the flip side of this is that a write operation (insert/update/delete) is logging many rows in a single transaction, often resulting in bloated transaction logs or even full disks resulting in outages. In fact, in my experience this is the second-most common cause of transaction logs filling up (the first being FULL recovery model without log backups).
Transaction Log 101
Let’s review how the transaction log works. All write operations are transactions. If an explicit transaction is not begun, the operation will be an implicit transaction. The transaction log contains a log record which is created for each row of data being modified.
This can be seen using the undocumented system function fn_dblog(). There is all kinds of great information returned by this function but I am just including a few columns to show the impact on log size. This database is in SIMPLE recovery model which means the log is cleared of completed transactions every time a CHECKPOINT runs. So we will use the CHECKPOINT command to limit the log records to just what we care about.
The following query gets the log records we care about (omitting the checkpoint records) and total log record length for the operation.
--view log records SELECT [CURRENT lsn],[OPERATION],[LOG record length] FROM fn_dblog (null, null) WHERE OPERATION not like '%ckpt%' UNION all SELECT '','LogBytesUsed',SUM([LOG record length]) FROM fn_dblog(null,null) WHERE OPERATION not like '%ckpt%' go
Let’s create a simple table and insert five rows.
CREATE TABLE foo (c1 INT, c2 INT) GO CHECKPOINT GO INSERT foo VALUES(1,2) GO 5
Running our log query, what we see is a transaction for each insert which consists of three log entries. The start of the transaction, the insert, and the end of the transaction. This pattern repeats five times, once for each insert. The total log used is 1580 bytes which amounts to 316 bytes per transaction.
Now let’s do a set-based update.
CHECKPOINT UPDATE foo SET c1= 0
This time we only have one transaction, with a log record for each row. The log used is 728 bytes.
Finally a delete, which looks very similar to update, for a total of 780 bytes.
So what do the numbers tell us? Using this extremely small data set we can see the following:
While RBAR used over twice as much log as the set based operations, the log used per transaction was less than half. Translate that to a recordset with millions of rows and you can really see the impact. The customer situation that prompted this post involved a delete statement that removed more than 400 million rows from a table for a total of almost 160GB of transaction log. This with a log drive of 100GB and log size of 50GB. You can guess what happened.
So we are back to RBAR?
No. There is a middle ground. In situations like this, the thing to do is to do break the operation into batches. We still get the benefits of large sets, but of a reasonable size. In the example above, the 464 million rows used 200GB of log. So we can estimate that 10 million rows would use roughly 4.5GB of log. Running the delete 10 million rows at a time in a loop would have completely avoided the problem by giving the CHECKPOINT process and/or log backups to clear the log of the completed transactions.
There are many ways to accomplish but the algorithm is the same no matter how you do it. The below is just one simple way to accomplish the delete referred to above in batches. We use SET ROWCOUNT to say only operate on 1 million rows maximum, then loop and do it again until there is nothing left to do. Note that SET ROWCOUNT has some caveats so do your research before using this in other ways.
WHILE 1=1 BEGIN SET rowcount 1000000 DELETE FROM dbo.bigtable WHERE MyDate <= '1/1/2019' IF @@ROWCOUNT = 0 BREAK END
In conclusion, while doing things in set operations is better than one row at time (RBAR), it is certainly possible to have too much of a good thing. Always keep the transaction log in mind when working with large datasets to avoid those angry calls from your friendly neighborhood sysadmin.