December 9, 2019

Rethinking RBAR

Randy Knight

Much has been written about the performance implications of the dreaded RBAR, a term coined by Jeff Moden many years ago. I won’t rehash any of that here other than to say that in almost all cases, set operations are faster and more efficient than RBAR. So we spend a lot of time teaching (or preaching) the evil of RBAR and to do things in sets.

However, the flip side of this is that a write operation (insert/update/delete) is logging many rows in a single transaction, often resulting in bloated transaction logs or even full disks resulting in outages. In fact, in my experience this is the second-most common cause of transaction logs filling up (the first being FULL recovery model without log backups).

Transaction Log 101

Let’s review how the transaction log works. All write operations are transactions. If an explicit transaction is not begun, the operation will be an implicit transaction. The transaction log contains a log record which is created for each row of data being modified.

This can be seen using the undocumented system function fn_dblog(). There is all kinds of great information returned by this function but I am just including a few columns to show the impact on log size. This database is in SIMPLE recovery model which means the log is cleared of completed transactions every time a CHECKPOINT runs. So we will use the CHECKPOINT command to limit the log records to just what we care about.

The following query gets the log records we care about (omitting the checkpoint records) and total log record length for the operation.

Source code

--view log records
SELECT [CURRENT lsn],[OPERATION],[LOG record length]
FROM fn_dblog (null, null)
WHERE OPERATION not like '%ckpt%'
UNION all 
SELECT '','LogBytesUsed',SUM([LOG record length]) 
FROM fn_dblog(null,null)
WHERE OPERATION not like '%ckpt%'
go

Let’s create a simple table and insert five rows.

Source code

CREATE TABLE foo (c1 INT, c2 INT)
GO
CHECKPOINT 
GO
INSERT foo VALUES(1,2)
GO 5

Running our log query, what we see is a transaction for each insert which consists of three log entries. The start of the transaction, the insert, and the end of the transaction. This pattern repeats five times, once for each insert. The total log used is 1580 bytes which amounts to 316 bytes per transaction.

Now let’s do a set-based update.

Source code

CHECKPOINT
UPDATE foo SET c1= 0

This time we only have one transaction, with a log record for each row. The log used is 728 bytes.

Finally a delete, which looks very similar to update, for a total of 780 bytes.

So what do the numbers tell us? Using this extremely small data set we can see the following:

While RBAR used over twice as much log as the set based operations, the log used per transaction was less than half. Translate that to a recordset with millions of rows and you can really see the impact. The customer situation that prompted this post involved a delete statement that removed more than 400 million rows from a table for a total of almost 160GB of transaction log. This with a log drive of 100GB and log size of 50GB. You can guess what happened.

So we are back to RBAR?

No. There is a middle ground. In situations like this, the thing to do is to do break the operation into batches. We still get the benefits of large sets, but of a reasonable size. In the example above, the 464 million rows used 200GB of log. So we can estimate that 10 million rows would use roughly 4.5GB of log. Running the delete 10 million rows at a time in a loop would have completely avoided the problem by giving the CHECKPOINT process and/or log backups to clear the log of the completed transactions.

Batching

There are many ways to accomplish but the algorithm is the same no matter how you do it. The below is just one simple way to accomplish the delete referred to above in batches. We use SET ROWCOUNT to say only operate on 1 million rows maximum, then loop and do it again until there is nothing left to do. Note that SET ROWCOUNT has some caveats so do your research before using this in other ways.

Source code

WHILE 1=1
BEGIN
	SET rowcount 1000000
	DELETE FROM dbo.bigtable WHERE MyDate <= '1/1/2019' 
	IF @@ROWCOUNT = 0 
BREAK 
END

In conclusion, while doing things in set operations is better than one row at time (RBAR), it is certainly possible to have too much of a good thing. Always keep the transaction log in mind when working with large datasets to avoid those angry calls from your friendly neighborhood sysadmin.

Please share this

Randy Knight

Randy Knight, a Microsoft Certified Master, founded SQL Solutions Group in October 2010, after more than 20 years in the technology industry. Randy has a passion for solving what can seem to others to be insurmountable problems. His prowess with SQL Server earned him the label “database whisperer” from one satisfied customer. Randy is a Microsoft Certified Solutions Master in SQL Server 2008. In 2020 he completed certification for Azure Fundamentals. In addition, he has numerous Novell, Microsoft, and Cisco legacy certifications. Prior to founding SSG, he spent many years as a Database Administrator in a variety of industries, including financial services, telecommunications, and with well-known dot com properties. As a data professional, Randy says he has done lots of fun stuff, encountering the most interesting data during his time as database architect for match.com (and no, he didn’t meet his wife while working there). In addition to getting elbow-deep in data, Randy enjoys taking part in and presenting at SQL Server events nationwide, including SQLSaturday, User Groups, SQL PASS Rally and SQL PASS Summit. In Randy’s words, “We've assembled a great team of SQL Server Consultants at SQL Solutions Group, and we have the talent to tackle anything."

This Post Has One Comment

Jeff Moden 31 Aug 2021 Reply

Hi, Randy,

Do you remember how many rows the source table had prior to doing the deletions and either what the size of the Clustered Index was or what the average row length was? I’d like to do a little testing with an alternate method. It might not amount to a thing but I won’t know until I test it.

Thanks, and thanks for the article. The “middle ground” (as you described it) is a tried an true method of doing such large deletes.

Rethinking RBAR

Transaction Log 101

So we are back to RBAR?

Batching

Please share this

Tags

Randy Knight

This Post Has One Comment

Leave a Reply Cancel reply

Related Articles

SQL Server Antipatterns: Common Mistakes with SQL Code

The Right Tool for the Right Job

A Visit from St. Randy: ‘Twas the night in the server room, silent and cold …

Get Started

Subscribe to get the latest news from us