sp_server_diagnostics – Black Box Recorder

A few months back I began a series about the SQL Black Box Recorder, that you can read here, with a promise of at least two more posts. sp_server_diagnosticsThe first article in the series laid the groundwork about the black box recorder in SQL Server. Furthermore, there was a brief introduction into the three components of the black box recorder. The three components as laid out in that article are: Default Trace, system_health Extended Event Session, and the sp_server_diagnostics procedure.

The first article went into deeper detail about the first leg of this three-legged black box recorder—the default trace. In the second article, the focus was directed to the next leg of the black box recorder—or the system_health Extended Event Session. If by some measure you are in the dark about what Extended Events is, then I recommend you read my personal blog series on the subject. There are numerous articles stepping through the subject in easy-to-digest fashion. You can find many of the articles in the following table of contents—here.

The focus of this article will take on the topic of the third piece of the black box recorder, sp_server_diagnostics. I will show how this procedure is critical in gathering pertinent information that is crucial to gathering an overall picture of the system—from a default black box recorder perspective.

SQL Black Box Recorder

What is the sp_server_diagnostics process?

Beyond being a component of the black box for SQL Server, what exactly is this diagnostics process? The sp_server_diagnostics is much as the name implies—it is a “diagnostics” service that attempts to gather information about various events that may affect the overall health of the SQL Server instance.

The diagnostics process will trap various server related health (diagnostics) information related to the SQL Server instance in an effort to try and detect potential failures and errors. This diagnostics session/process traps information for five different categories by default. There is a sixth category of information for those special servers that happen to be running an Availability Group.

The Six

  • system: overall health of the server including information about spinlocks, severe processing conditions, non-yielding tasks, page faults, and CPU usage.
  • resource: health of resources, specifically including physical and virtual memory, buffer pools, pages, cache and other memory objects
  • query_processing: captures query processing data about worker threads, tasks, wait types, CPU intensive sessions, and blocking tasks.
  • io_subsystem: health related data to IO.
  • events: errors and events of interest recorded by the server, including details about ring buffer exceptions, ring buffer events about memory broker, out of memory, scheduler monitor, buffer pool, spinlocks, security, and connectivity.
  • AvailabilityGroup: basic information about the AvailabilityGroup if there is an AvailabilityGroup enabled on the server.

This is fantastic introductory information. As luck would have it, you can also read more about it from my blog – here.

How to use this process?

What good is any level of information if we don’t quite know how to use or consume that data? Do we consume it via trace? Do we execute the stored procedure? Do we capture the information from some other means? The answer is yes! Wait, yes to what? Yes to all of it! Let’s explore how to consume this data.

Option 1

The first step to learning how to consume this data is to start at probably the most basic of levels. As you have already noted, this is a stored procedure. It was probably the sp_ in the name that gave it away, right? Since it comes in the form of a stored procedure, it makes it pretty easy to figure out the first means to get the data from it—just execute it. Actually, let me clarify that just a bit. sp_server_diagnostics is a CLR stored procedure. Nonetheless, all that needs be done is execute it at this level to get to the data.

EXECUTE sp_server_diagnostics;

Look at that! Pretty slick! I execute the stored procedure and I can see a point in time result of the “health check” from this component of the black box recorder. Notice here that I only have five of the six categories that I had mentioned. This is due to the lack of a configured availability group (AlwaysOn) from this particular instance. Another point of interest is that the data column is not in a very friendly format for DBA consumption. This is flat text that just goes on and on. It is painful to consume in this manner.

Tidy that Code

CREATE TABLE #ServerDiagnosticsResult
	(
		create_time DATETIME
		, component_type sysname
		, component_name sysname
		, STATE INT
		, state_desc sysname
		, DATA XML
	);  
INSERT INTO #ServerDiagnosticsResult
		EXECUTE sp_server_diagnostics;  
 
SELECT *
	FROM #ServerDiagnosticsResult;
 
DROP TABLE #ServerDiagnosticsResult;
CREATE TABLE #ServerDiagnosticsResult
	(
		create_time DATETIME
		, component_type sysname
		, component_name sysname
		, STATE INT
		, state_desc sysname
		, DATA XML
	);  
INSERT INTO #ServerDiagnosticsResult
		EXECUTE sp_server_diagnostics;  
 
SELECT *
	FROM #ServerDiagnosticsResult;
 
DROP TABLE #ServerDiagnosticsResult;

Looking at the code, you can see quickly the difference. I have changed the data column to be XML-based to better match the data being output from the stored procedure. Let’s see that in action:

That is much better. Now I can simply click one of the data results and see a nicely formatted easy to read XML output of that component. Here is an example from the query_processing component:

Option 2

Cool! I have just shown one means of exploring this facet of the black box recorder. Now, let’s take a look at other means to get at this particular data, beyond just executing the stored procedure. Did you know that this procedure is very closely related to Extended Events? (I am sure that you picked up on that nugget due to reading the previous article in the series or possibly the other article I mentioned here). With such a close relationship to Extended Events, it makes sense that sp_server_diagnostics also has a couple of events and is consumed by multiple sessions.

SQL Server comes with various default Extended Event sessions, of those sp_server_diagnostics is consumed by the system_health session as well as the hidden session for SQLDiag that is used by Windows Clustering and Availability Groups to determine server health. You already heard about the system_health (from the previous article), but few have heard about the hidden session. I will take a look at the events as well as the sessions that consume the sp_server_diagnostics data.

Events

First, I will share the details for each of the events related to the sp_server_diagnostics process.

As I have shown here, you can see that these events are a part of the debug channel. In the next couple of images, I want to show some of the payload attached to these events.

These events show what I have shown previously with regards to output of the stored procedure. There is a consistency in the output of the diagnostics process. What I have yet to show in this data is one more piece of information that could be helpful down the road and that pertains specifically to the extended events data. That piece of information I need to share is the keyword associated with each of these events. When looking for related events, I need you to understand that the sp_server_diagnostics events are a part of the scheduling classification. There, I have said it. I just gave you a big clue to help you in your troubleshooting endeavors and the use of the diagnostics processes/events.

Which Sessions then?

Now that I have shown which events are related to diagnostics, I can finally show which sessions are dependent upon the diagnostics data. To show that kind of information, I have this next query to help:

/* session that is running */
SELECT xs.name AS SessionName
		, xse.event_name
	FROM sys.dm_xe_session_events xse
		INNER JOIN sys.dm_xe_sessions xs
			ON xse.event_session_address = xs.address
	WHERE xse.event_name IN ( 'sp_server_diagnostics_result_set',
								'sp_server_diagnostics_component_result' );
 
/* sessions on the server but may be stopped */
SELECT ses.name AS SessionName
		, sese.name AS event_name
	FROM sys.server_event_session_events sese
		INNER JOIN sys.server_event_sessions ses
			ON ses.event_session_id = sese.event_session_id
	WHERE sese.name IN ( 'sp_server_diagnostics_result_set',
							'sp_server_diagnostics_component_result' );

When I execute the statements in the preceding query, I will receive a single result—system_health. That said, do not believe everything I just showed you with that query. SQL Server does a decent job at hiding some of the details. When I run the next query, I will see a little bit more information.

SELECT *
	FROM sys.dm_xe_sessions;

When I run the preceding query, I will see the following two sessions (given that I have not change the default settings):

Take note of the names of those two sessions. I showed that the system_health session is reliant upon the diagnostics events. Now, I am showing that there is another event session called “sp_server_diagnostics session“. Wait, there is more! SQL Server continues to hide stuff from us.

Hidden Sessions

When you have AlwaysOn configured, there is an additional hidden session that is not visible within the sessions lists inside of SQL Server. To view this hidden session, you must follow the directions shown in this article on msdn. Once you have navigated to the folder containing the hidden XEL files, you can open them in Management Studio and view the data. It is while viewing the data that you will be able to see the relationship between sp_server_diagnostics and this newly discovered trace file.

The payload from this image demonstrates the relationship between sp_server_diagnostics and the hidden trace file. I challenge you to take some time to explore this hidden trace file on one of your servers where AlwaysOn is configured. I know you will be thankful for the investment at some point down the road.

Conclusion

I have just shown a ton of info about the third component of the black box recorder—sp_server_diagnostics. This procedure may be lesser known than the first two components, but it is highly integral into the overall monitoring and health of your SQL Server instance. In this article, I have not only shown how to use sp_server_diagnostics (albeit quickly), but I have also shown that the diagnostics are heavily relied upon by other components of the black box recorder. In addition, I have shown how this component is integrated into other hidden traces that are essential in the monitoring and troubleshooting of your instance.

The black box recorder is real. You need to invest time in getting to know the black box recorder components and how they can be of help to you in becoming a superhero DBA.

Please share this

Leave a Reply

Related Articles

A blue and white background with white text Description automatically generated

Exploring SSMS 21

Microsoft recently announced SQL Server Management Studio (SSMS 21). Here’s a quick look at what’s included in this latest iteration.

Read More »