Our Health Check is a great starting point with our clients. It tells us a lot about their instances, such as glaring problems we have to fix immediately. It also reveals issues that may not be a problem now, but they could blow up soon. We always point out these ticking timebombs to our customers because they have the potential to create problems down the road. These tend to be issues that are not affecting performance or security, etc.
In this post, I’ll walk you through a real-world situation tied to SQL Server object ownership where a timebomb went off. The two best practices involved here are:
- Don’t put user objects in system databases.
- Ensure the default built-in logins sa or dbo. own all objects, databases, jobs, etc.
Patching Time Bomb
Recently while installing a SQL Server service pack on one of our customer’s servers (fortunately a non-production instance), the SQL Server service would not start after reboot. I checked the usual suspects (permissions, service account issues) and checked if the Windows Event Log could give me a clue. There was nothing except the generic error in the System log that the service could not start.
So it was time to dig into the SQL Server Error Log. Even though the service would not start, the log is just a text file and does not need to be accessed through the server. The default location of the log is C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\Log, replacing the version and instance folders as needed.
Note that there is a file for each error log as they are cycled and retained. So the log from the most recent restart (or restart attempt, in this case) is named ERRORLOG, with the archived logs named ERRORLOG.# for each of the archived logs.
Looking at the log in this particular case, I noted the following errors.
Anytime you see something that says “restore the master database from backup” it’s not a good thing. However, looking a bit further back we see that the initial error indicates there is already an object named DatabaseMailUserRole in the database. That is followed by a CREATE SCHEMA failure, all of which took place during the msdb upgrade script. So the next step is to dig into the msdb database and find out what’s going on.
But if the service won’t start, how do I get into the database and figure out what’s going on with those objects?
Enter Trace Flag 902
We use Trace Flag 902 for this exact purpose. When an upgrade goes wrong, every time you start the service it will attempt to run the upgrade script. Doing so generates the same failure. This is because the instance is in “Script Upgrade Mode.” Trace Flag 902 tells SQL Server to startup and recover databases without running any pending upgrade scripts. That way you can get in, find and fix the problem, then restart the server again without the trace flag.
The easiest way to start the service with a temporary trace flag is from the command line.
After executing that command, the service started and all of the databases came up normally.
Looking into the actual issue, I found that the script deleted DatabaseMailUserRole as it was not there. Digging a little deeper, it turns out there was a second database role owned by that role, which in turned owned the schema in question. So when the upgrade script tried to upgrade that schema, it failed because it was effectively orphaned. Modifying the ownership of the schema to dbo (as it should be) and restarting the instance without the trace flag fixed the problem and the upgrade completed. I ran the Service Pack installer a second time to be sure it got everything. Sure enough it said “Incomplete” for the database engine. This time the install completed and rebooted with no issues.
So we see an issue with two Best Practices here:
- Creation of user objects (in this case a role) in the msdb database.
- Changing the SQL Server object ownership of a schema to use that role.
I don’t know how long it had been this way nor how it got that way. But it’s a great example of something that is just sitting there, seemingly benign, until the ticking stops and the bomb goes off.