March 3, 2021

T-SQL Anti-Patterns: SQL User Defined Functions (UDFs) that turn your set operation into a cursor

Randy Knight

One of the things we see as consultants are consistent patterns of behavior by T-SQL developers that cause big performance problems. I call these anti-patterns. Today I want to specifically address SQL User Defined Functions (UDFs).

SQL Server 2000 introduced UDFs. While they can be useful in certain situations, an all too prevalent anti-pattern has emerged.

SQL User Defined Functions: Identifying The Problem

To illustrate this, let’s take a simple example from the AdventureWorks sample database. If you don’t have this database, you can download and install it. This is a good thing to have on your personal development or lab SQL Server instance, as many blogs, books, and articles use it for examples.

In our example, we are going to generate a simple report for the AdventureWorks Human Resources department using the following tables.

Our report will consist of the employee’s name, the department they work in, their job title, and start date. We have a mock-up of the report that the end user would like to see:

Taking a look at the schema above, we realize that the EmployeeDepartmentHistory table has most of this information. So we start there.

SELECT BusinessEntityID AS EmployeeID,
       DepartmentID,
       StartDate
FROM HumanResources.EmployeeDepartmentHistory

This gets us the EmployeeID and DepartmentID, but not the actual names. As developers, we are big into encapsulating code as much as possible to avoid repetitive code and the manageability issues that come with it. We know that UDFs in SQL Server can help with this so we create the following function to get the department name:

CREATE FUNCTION dbo.GetDepartment
(
    @DepartmentID INT
)
RETURNS VARCHAR(40)
AS
BEGIN
    DECLARE @ret VARCHAR(40)
    SET @ret =
    (
        SELECT [Name]
        FROM HumanResources.Department
        WHERE DepartmentID = @DepartmentID
    )
    RETURN @ret
END

Now we can execute the same query but call the new function to return the department name.

SELECT
     BusinessEntityID AS EmployeeID,
     dbo. GetDepartment(DepartmentID) AS DepartmentName,
     StartDate
 FROM HumanResources. EmployeeDepartmentHistory

That worked great so we do the same thing for EmployeeName and JobTitle.

CREATE FUNCTION dbo. GetEmployeeName(@EmployeeID int)
RETURNS VARCHAR(40)
AS
BEGIN
  DECLARE @ret VARCHAR(40), @ContactID INT 

  --get contactid
  SELECT @ContactID = BusinessEntityID FROM HumanResources. Employee WHERE BusinessEntityID = @EmployeeID

  --get name
  SET @ret = (	SELECT coalesce(FirstName,'') + ' ' + coalesce(MiddleName,'') + ' ' + coalesce(LastName,'')
				FROM Person. Person
				WHERE BusinessEntityID = @ContactID
			  )

  RETURN @ret
END

CREATE FUNCTION dbo.GetJobTitle
(
    @EmployeeID INT
)
RETURNS VARCHAR(40)
AS
BEGIN
    DECLARE @ret VARCHAR(40)
    SET @ret =
    (
        SELECT [JobTitle]
        FROM HumanResources.Employee
        WHERE BusinessEntityID = @EmployeeID
    )
    RETURN @ret
END

Executing the query again using the three functions we’ve created, we get the data we are looking for. We have encapsulated the code into function for re-use in other places, and we avoided those pesky joins. The resulting query is clean and simple. Best of all, this is a single query so we’re doing a set operation like our DBA has told us we should do.

SELECT
	dbo. GetEmployeeName(BusinessEntityID) as EmployeeName,
	dbo. GetDepartment(DepartmentID) AS DepartmentName,
	dbo. GetJobTitle(BusinessEntityID),
	StartDate
FROM HumanResources. EmployeeDepartmentHistory

So what’s the problem?

The problem is that we have unwittingly turned our single query into a loop by introducing these functions. Because the functions are non-deterministic, they have to be called for each row in our result set.

To see this, let’s fire up our old friend SQL Server Profiler, which shows us the actual executing statements and the accompanying resources used. We’ll use the following trace definition to keep things as simple and clean as possible:

Let’s execute our query and see what we get.

SELECT
	dbo. GetEmployeeName(BusinessEntityID) as EmployeeName,
	dbo. GetDepartment(DepartmentID) AS DepartmentName,
	dbo. GetJobTitle(BusinessEntityID),
	StartDate
FROM HumanResources. EmployeeDepartmentHistory

Wait a minute. Our single-statement query just generated 2076 individual SQL Statements, generating a total of 2,672 logical reads. If we look through the profiler results carefully, we can see that each of the three functions was called once for each of the 296 rows our query returned. So what looks like a simple set operation is, in fact, a loop.

We go to our DBA for help and she explains that we need to get rid of the functions and use joins instead to make it into a true set operation. So we refactor the query as follows:

SELECT
	COALESCE(c. FirstName,'') + ' ' + coalesce(c. MiddleName,'') + ' ' + coalesce(c. LastName,'') AS EmployeeName,
	d. name AS DepartmentName,
	e. JobTitle,
	StartDate
FROM HumanResources. EmployeeDepartmentHistory eh
	INNER JOIN HumanResources. Department d ON eh. DepartmentID = d. DepartmentID
	INNER JOIN HumanResources. Employee e ON e. BusinessEntityID = eh. BusinessEntityID
	INNER JOIN Person. Person c ON c. BusinessEntityID = e. BusinessEntityID

Now if we execute the query and look at the profiler trace, we get just a single statement for a total of 1032 logical reads. Our query has become much more complicated and we don’t have the benefit of code encapsulation, but we’ve got the logical reads to less than half. And that was for only 296 rows.

Conclusion

This is a simple example, but hopefully you can see the impact in a busy system with lots more data. I have seen this in production environments where something like GetCustomerName() is being called tens of millions of times daily. This is a great example of an anti-pattern, where SQL user defined functions look like a great idea but the performance impact can be devastating.

Please share this

Randy Knight

Randy Knight, a Microsoft Certified Master, founded SQL Solutions Group in October 2010, after more than 20 years in the technology industry. Randy has a passion for solving what can seem to others to be insurmountable problems. His prowess with SQL Server earned him the label “database whisperer” from one satisfied customer. Randy is a Microsoft Certified Solutions Master in SQL Server 2008. In 2020 he completed certification for Azure Fundamentals. In addition, he has numerous Novell, Microsoft, and Cisco legacy certifications. Prior to founding SSG, he spent many years as a Database Administrator in a variety of industries, including financial services, telecommunications, and with well-known dot com properties. As a data professional, Randy says he has done lots of fun stuff, encountering the most interesting data during his time as database architect for match.com (and no, he didn’t meet his wife while working there). In addition to getting elbow-deep in data, Randy enjoys taking part in and presenting at SQL Server events nationwide, including SQLSaturday, User Groups, SQL PASS Rally and SQL PASS Summit. In Randy’s words, “We've assembled a great team of SQL Server Consultants at SQL Solutions Group, and we have the talent to tackle anything."

T-SQL Anti-Patterns: SQL User Defined Functions (UDFs) that turn your set operation into a cursor

SQL User Defined Functions: Identifying The Problem

So what’s the problem?

Conclusion

Please share this

Tags

Randy Knight

Leave a Reply Cancel reply

Related Articles

Wrestling with Full-Text Search in SQL Server

An introduction to Memory-Optimized Tables in SQL Server

Understanding Wait Stats in SQL Server: A Practical Guide

Get Started

Subscribe to get the latest news from us