SQL Black Box Recorder

trace blackboxAcross many professions and industries there is often a need to have some sort of device that “audits” everything that happens with a device or process. We are probably all quite familiar with the infamous black box recorders used by the passenger airline and train industries. It is also quite possibly fairly common knowledge that many pace makers perform the same sort of functionality – on a smaller scale. Various metrics and data points are registered and recorded somewhere. Whether to a local storage device or whether the device phones home to some other remote location, it is recorded.

We often pontificate about the what-ifs for a black box recorder within SQL Server, right? We wish and dream and go about creating our own little recorder to suit the requirements we deem necessary for successful monitoring of the server for the just-in-case scenario. Well, the truth of the matter is that we really don’t need to go to such great lengths to create a “black box recorder” because Microsoft has already done all of that for us.

Wait, what? Yes, that’s right! Truth be told this shouldn’t be much news for most Senior Level data professionals working with SQL Server for the past few years. But if you are new to the product, this might be good news for you. Then again, it might just be bad news depending on your take.

Very much like what you may find with the airline industry, the black box recorder in SQL Server is not just a single method (device) implemented to capture all of the desired data points. On a passenger jet, you may find that there are three or more such devices that contribute to the capture and recording of the in-flight data. In SQL Server, there are three major processes that help capture our in-flight data. Over the next few articles I will discuss each of these processes. These processes include:

  • Default Trace
  • system_health Extended Event Session
  • sp_server_diagnostics procedure

This multi-tiered approach does provide a sort of fail-safe system. If one should be disabled, there still remain up to two more processes that may be running. That said, each of these can be disabled but it does take a bit of an overt effort. And since it does require somebody to put out effort to try and disable each of the black box components, you could potentially capture the culprit via audits from the other components or via an explicit audit that you create for these types of purposes.

Since this will be something to discuss over a few articles, I will break out each process into an individual article. For today, I will discuss the default trace.

Default Trace

The default trace by itself is something that can be turned off via configuration option. There may be good reason to disable the default trace. Before disabling the default trace, please consider the following that can be captured via the default trace. I will use a query to demonstrate the events and categories that are configured for capture in the default trace.

/* events */
SELECT te.name AS EventName
		,tca.name AS CategoryName
		, CASE tca.type WHEN '0' THEN 'Normal'
				WHEN '1' THEN 'Connection'
				WHEN '2' THEN 'ERROR' END AS CategoryType
		, t.PATH AS TracePath
		, oa.logical_operator,oa.comparison_operator, oa.VALUE AS FilteredValue
	FROM sys.traces t
		CROSS APPLY (SELECT DISTINCT gei.eventid FROM sys.FN_TRACE_GETEVENTINFO(t.id) gei) ca
		INNER JOIN sys.trace_events te
			ON te.trace_event_id = ca.eventid
		INNER JOIN sys.trace_categories tca
			ON te.category_id = tca.category_id
		OUTER APPLY (SELECT gfi.columnid,gfi.logical_operator,gfi.comparison_operator,gfi.VALUE FROM sys.FN_TRACE_GETFILTERINFO(t.id) gfi) oa
	WHERE t.is_default = 1

In this query, I have requested a few more data points than necessary to illustrate the point. That is to help illustrate an additional point that the default trace isn’t filtering out any data for these events. If the event fires (in this trace), it is recorded. Let’s divert right back to the events and categories for now. The results of that query will produce the following sample list of events for me on SQL Server 2014:

That is quite a range of events covered by this particular recording device. From changes to objects down to security related events and even errors and warnings. If somebody drops an object, the default trace can catch it. If one of the various DBCC statements is executed, it will be trapped in this trace.

There is one thing that is captured by this trace that is not overly obvious. In fact, it is rather annoying in how it is recorded in my opinion. Server configuration settings such as “cost threshold of parallelism” or the lesser utilized “user options” are not registered as an “Object:Alter” event but rather as an ErrorLog event. I wouldn’t call this type of “change” an error and don’t necessarily like seeing these registered with other legitimate errors. That said, it would be nice to see these logged differently (they are currently logged the same way with Extended Events). So, if somebody is dorking around with server configurations, it becomes a bit more convoluted to figure it out, but we can get there. Let’s see how that works:

-- server configs Audited via Def Trace
SELECT 	T.StartTime
		, T.SPID
		, T.LoginName
		, T.HostName
		, T.ApplicationName
		, T.DatabaseName
		--, ObjectName,sv.number AS ObjTypeVal, sv.name [ObjectType]
		--, T.TextData
		, ConfigOption = SUBSTRING(T.TextData,CHARINDEX('''',T.TextData)+1,CHARINDEX(' changed from ',T.TextData)-CHARINDEX('''',T.TextData)-2)
		, PrevValue = SUBSTRING(T.TextData,CHARINDEX('from ',T.TextData)+5,CHARINDEX('to ',T.TextData)-CHARINDEX('from ',T.TextData)-5)
		, NewValue = SUBSTRING(T.TextData,CHARINDEX('to ',T.TextData)+3,CHARINDEX('. Run',T.TextData)-CHARINDEX('to ',T.TextData)-3)
		, EventName = te.name
		, T.EventClass
	FROM sys.FN_TRACE_GETTABLE(CONVERT(VARCHAR(150), 
			( SELECT REVERSE(SUBSTRING(REVERSE(PATH),
					CHARINDEX('\',REVERSE(path)),256)) + 'LOG.trc'
				FROM    sys.traces
				WHERE   is_default = 1)), DEFAULT) AS T  
		INNER JOIN sys.trace_events AS te
			ON T.EventClass = te.trace_event_id
	WHERE T.EventClass = 22
		AND T.TextData LIKE '%config%'
	ORDER BY T.StartTime DESC;

Looking at my system, I get the following sample results:

You can see there are plenty of settings that I have been playing with captured by the default trace. If you happen to have a junior DBA or maybe a cowboy DBA that loves to shoot from the hip and make changes, this will help you find all the necessary details to bring to that DBA for discussion.

The default trace comprises one third of the black box recorder. I have shown a quick use for the default trace and have also shown some of the various events that are captured from this trace. I recommend getting to know your default trace just a little bit better. You never know when you may need to resort to the flight recorder data held in the default trace. Being familiar with the default trace before you need to use it will help improve your comfort level when under the stress of trying to figure out what happened just before the server went belly up!

Please share this

Leave a Reply

Related Articles