T-SQL Anti-Patterns: SQL User Defined Functions (UDFs) that turn your set operation into a cursor

One of the things we see as consultants are consistent patterns of behavior by T-SQL developers that cause big performance problems. I call these anti-patterns. Today I want to specifically address SQL User Defined Functions (UDFs).

SQL Server 2000 introduced UDFs. While they can be useful in certain situations, an all too prevalent anti-pattern has emerged.

SQL User Defined Functions: Identifying The Problem

To illustrate this, let’s take a simple example from the AdventureWorks  sample database. If you don’t have this database, you can download and install it. This is a good thing to have on your personal development or lab SQL Server instance, as many blogs, books, and articles use it for examples.

In our example, we are going to generate a simple report for the AdventureWorks Human Resources department using the following tables.

SQL user defined functions

Our report will consist of the employee’s name, the department they work in, their job title, and start date. We have a mock-up of the report that the end user would like to see:

image

Taking a look at the schema above, we realize that the EmployeeDepartmentHistory table has most of this information. So we start there.

SELECT BusinessEntityID AS EmployeeID,
       DepartmentID,
       StartDate
FROM HumanResources.EmployeeDepartmentHistory
image

This gets us the EmployeeID and DepartmentID, but not the actual names. As developers, we are big into encapsulating code as much as possible to avoid repetitive code and the manageability issues that come with it. We know that UDFs in SQL Server can help with this so we create the following function to get the department name:

CREATE FUNCTION dbo.GetDepartment
(
    @DepartmentID INT
)
RETURNS VARCHAR(40)
AS
BEGIN
    DECLARE @ret VARCHAR(40)
    SET @ret =
    (
        SELECT [Name]
        FROM HumanResources.Department
        WHERE DepartmentID = @DepartmentID
    )
    RETURN @ret
END

Now we can execute the same query but call the new function to return the department name.

SELECT
     BusinessEntityID AS EmployeeID,
     dbo. GetDepartment(DepartmentID) AS DepartmentName,
     StartDate
 FROM HumanResources. EmployeeDepartmentHistory
SQL user defined functions

That worked great so we do the same thing for EmployeeName and JobTitle.

CREATE FUNCTION dbo. GetEmployeeName(@EmployeeID int)
RETURNS VARCHAR(40)
AS
BEGIN
  DECLARE @ret VARCHAR(40), @ContactID INT 

  --get contactid
  SELECT @ContactID = BusinessEntityID FROM HumanResources. Employee WHERE BusinessEntityID = @EmployeeID

  --get name
  SET @ret = (	SELECT coalesce(FirstName,'') + ' ' + coalesce(MiddleName,'') + ' ' + coalesce(LastName,'')
				FROM Person. Person
				WHERE BusinessEntityID = @ContactID
			  )

  RETURN @ret
END
CREATE FUNCTION dbo.GetJobTitle
(
    @EmployeeID INT
)
RETURNS VARCHAR(40)
AS
BEGIN
    DECLARE @ret VARCHAR(40)
    SET @ret =
    (
        SELECT [JobTitle]
        FROM HumanResources.Employee
        WHERE BusinessEntityID = @EmployeeID
    )
    RETURN @ret
END

Executing the query again using the three functions we’ve created, we get the data we are looking for. We have encapsulated the code into function for re-use in other places, and we avoided those pesky joins. The resulting query is clean and simple. Best of all, this is a single query so we’re doing a set operation like our DBA has told us we should do.

SELECT
	dbo. GetEmployeeName(BusinessEntityID) as EmployeeName,
	dbo. GetDepartment(DepartmentID) AS DepartmentName,
	dbo. GetJobTitle(BusinessEntityID),
	StartDate
FROM HumanResources. EmployeeDepartmentHistory
SQL user defined functions

So what’s the problem?

The problem is that we have unwittingly turned our single query into a loop by introducing these functions. Because the functions are non-deterministic, they have to be called for each row in our result set.

To see this, let’s fire up our old friend SQL Server Profiler, which shows us the actual executing statements and the accompanying resources used. We’ll use the following trace definition to keep things as simple and clean as possible:

image

Let’s execute our query and see what we get.

SELECT
	dbo. GetEmployeeName(BusinessEntityID) as EmployeeName,
	dbo. GetDepartment(DepartmentID) AS DepartmentName,
	dbo. GetJobTitle(BusinessEntityID),
	StartDate
FROM HumanResources. EmployeeDepartmentHistory
image

Wait a minute. Our single-statement query just generated 2076 individual SQL Statements, generating a total of 2,672 logical reads. If we look through the profiler results carefully, we can see that each of the three functions was called once for each of the 296 rows our query returned. So what looks like a simple set operation is, in fact, a loop.

We go to our DBA for help and she explains that we need to get rid of the functions and use joins instead to make it into a true set operation. So we refactor the query as follows:

SELECT
	COALESCE(c. FirstName,'') + ' ' + coalesce(c. MiddleName,'') + ' ' + coalesce(c. LastName,'') AS EmployeeName,
	d. name AS DepartmentName,
	e. JobTitle,
	StartDate
FROM HumanResources. EmployeeDepartmentHistory eh
	INNER JOIN HumanResources. Department d ON eh. DepartmentID = d. DepartmentID
	INNER JOIN HumanResources. Employee e ON e. BusinessEntityID = eh. BusinessEntityID
	INNER JOIN Person. Person c ON c. BusinessEntityID = e. BusinessEntityID
image

Now if we execute the query and look at the profiler trace, we get just a single statement for a total of 1032 logical reads. Our query has become much more complicated and we don’t have the benefit of code encapsulation, but we’ve got the logical reads to less than half. And that was for only 296 rows.

SQL user defined functions

Conclusion

This is a simple example, but hopefully you can see the impact in a busy system with lots more data. I have seen this in production environments where something like GetCustomerName() is being called tens of millions of times daily.  This is a great example of an anti-pattern, where SQL user defined functions look like a great idea but the performance impact can be devastating.

Please share this

Leave a Reply

Related Articles