A few months back I began a series, about the SQL Black Box Recorder that you can read here, with a promise of at least two more posts. The first article in the series laid the groundwork about the black box recorder in SQL Server. Furthermore, there was a brief introduction into the three components of the black box recorder. The three components as laid out in that article are:
- Default Trace
- system_health Extended Event Session
- sp_server_diagnostics procedure
The first article went into deeper detail about the first leg of this three-legged black box recorder – the default trace. In this article, the focus will be directed to the next leg of the black box recorder – or the system_health Extended Event Session. If by some measure you are in the dark about what Extended Events is, then I recommend you read my personal blog series on the subject. There are numerous articles stepping through the subject in easy-to-digest fashion. You can find many of the articles in the following table of contents – here.
The focus of this article will not be to introduce the topic of Extended Events. Rather, it will be to introduce the system_health session and dive into it a fair bit.
SQL Black Box Recorder
What is the system_health session?
Beyond being a component of the black box for SQL Server, what exactly is this event session? The system_health is much as the name implies – it is a “trace” that attempts to gather information about various events that may affect the overall health of the SQL Server instance.
The event session will trap various events related to deadlocks, waits, clr, memory, schedulers, and reported errors. To get a better grasp of this, let’s take a look at the event session makeup based on the available metadata in the dmvs and catalog views.
DECLARE @SessionName VARCHAR(256) = 'system_health';
/* note: this only works if the session is running and the source is "server" not internal or something else.
*/
SELECT xs.name AS SessionName
, xs.session_source
, xse.event_name
, xo.description AS EventDesc
, xsea.action_name
, xoa.description AS ActionDesc
, xse.event_predicate
, sese.predicate, sese.predicate_xml
FROM sys.dm_xe_sessions xs
INNER JOIN sys.dm_xe_session_events xse
ON xs.address = xse.event_session_address
LEFT OUTER JOIN sys.dm_xe_session_event_actions xsea
ON xse.event_session_address = xsea.event_session_address
AND xse.event_name = xsea.event_name
INNER JOIN sys.dm_xe_objects xo
ON xse.event_name = xo.name
AND xse.event_package_guid = xo.package_guid
AND xo.object_type = 'event'
LEFT OUTER JOIN sys.dm_xe_objects xoa
ON xsea.action_name = xoa.name
AND xsea.action_package_guid = xoa.package_guid
AND xoa.object_type = 'action'
INNER JOIN sys.server_event_sessions ses
ON xs.name = ses.name
LEFT OUTER JOIN sys.server_event_session_events sese
ON ses.event_session_id = sese.event_session_id
AND xse.event_name = sese.name
WHERE xs.name = @SessionName
ORDER BY xs.name,xse.event_name, xsea.action_name;
In addition to the types of events I mentioned, there are also a few more interesting things to discover from the results of this particular query. For instance, the actions that are being employed help to gather even more information such as the callstack, tsql_callstack, database id, and sql text of the query that was executing at the moment of the event that is trapped.
One very peculiar event that is trapped is the sp_server_diagnostics_component_result event. Remember from earlier in this article that I noted the third leg of the black box record happens to be a very similarly named session – “sp_server_diagnostics procedure”. When I cover this third leg, it will make a little more sense. Understand, for now, that your suspicion of it being a bit more integral to the black box (than merely 1/3 of the overall black box) is confirmed.
This is a good set of information that can be trapped. Furthermore, this is a good set of information which can give a fairly good diagnosis of several different potential problems within your SQL Server Instance! (Yes it may seem repetitive, but I wanted to underscore the importance of the data that can be trapped.) That said, compare the types of events to those of the default trace. Take note of how many events actually overlap between the two components of the black box recorder.
For ease of comparison, let’s simplify that previous query to get a set of events to read without the duplication due to the actions on each event.
DECLARE @SessionName VARCHAR(256) = 'system_health';
SELECT xs.name AS SessionName, xse.event_name
FROM sys.dm_xe_sessions xs
INNER JOIN sys.dm_xe_session_events xse
ON xs.address = xse.event_session_address
WHERE xs.name = @SessionName
ORDER BY xse.event_name;
And since I am feeling particularly generous, I have lumped the events from the two traces together in the following query:
DECLARE @SessionName VARCHAR(256) = 'system_health';
SELECT xs.name AS SessionName
, xse.event_name AS EventName
, mv.map_value AS CategoryName
FROM sys.dm_xe_sessions xs
INNER JOIN sys.dm_xe_session_events xse
ON xs.address = xse.event_session_address
INNER JOIN sys.dm_xe_objects xo
ON xse.event_package_guid = xo.package_guid
AND xse.event_name = xo.name
AND xo.object_type = 'event'
INNER JOIN sys.dm_xe_object_columns oc
ON oc.OBJECT_NAME = xo.name
AND oc.object_package_guid = xo.package_guid
AND oc.name = 'KEYWORD'
INNER JOIN sys.dm_xe_map_values mv
ON oc.type_name = mv.name
AND oc.column_value = mv.map_key
AND oc.object_package_guid = mv.object_package_guid
WHERE xs.name = @SessionName
UNION ALL
/* events */
SELECT 'Default Trace' AS SessionName
, te.name AS EventName
, tca.name AS CategoryName
FROM sys.traces t
CROSS APPLY ( SELECT DISTINCT gei.eventid
FROM sys.FN_TRACE_GETEVENTINFO(t.id) gei
) ca
INNER JOIN sys.trace_events te
ON te.trace_event_id = ca.eventid
INNER JOIN sys.trace_categories tca
ON te.category_id = tca.category_id
WHERE t.is_default = 1
ORDER BY EventName;
That should help to compare the 17 events from the system_health session and the 34 from the default trace with a little more ease and speed. The events from the two sessions are rather complimentary to each as they help to construct the SQL Server Black Box recorder.
How to use this session
With the events trapped by this session, there are a few ways to use the trapped data that should stand out pretty easily. To me, the quick application of the data from this session can come from the review of deadlocks, reported errors, or even waits analysis. Using the deadlock as probably the most common problem people will want to troubleshoot, this information is suddenly much more valuable and easy to access (no more need for that trace flag). Since the system_health is now trapping the deadlock by default, one could quickly access the deadlock graph from the event data and proceed to troubleshooting. More information on getting that deadlock graph can be found here and here.
Learn to use the data made available through this member of the black box recorder. Becoming familiar with this tool (and additionally with Extended Events) will make you a rock-star in the office. The information needed to troubleshoot most problems is right at your fingertips through the use of the different pieces of the black box recorder. If it is not available there, then more than likely it can be made available through a bit of tinkering with Extended Events. Get to know the tools and be that rock-star DBA.