Introduction

Data Governance plays an important role in building a successful data-driven company as it helps business users to trust the data. While a typical Data Analytics project starts with identifying all relevant transactional sources of information, building your Data Warehouse according to business needs and feeding (and converting) the data as often as possible, one can argue that without operational reports alerting on a lag between the two systems, your Data Analytics function is doomed. No matter how great your Data Warehouse is, if the users are constantly complaining about bad data, your Data Warehouse project days are numbered.

I am assuming that you followed best practices and isolated your company analytical server (OLAP) from your transactional server (OLTP). You also set up an ongoing replication in near real-time. As much as business users are happy to analyze all the historical data (great for trending analysis that requires volume of data), they would also love to see real-time data (great for an e-commerce company that is coming up with new products). But how do you ensure your dependent tables are in-sync?

Problem

How do you compare data between source and destination tables using a repeatable process? One of the first reporting needs in a Data Governance report is comparing counts to identify any missing records. Having a code that allows to get a record count from any table anywhere (local/remote server) will make it easy.

Solution

We are going to build a reusable stored procedure that could help in a Data Governance Report to capture record counts between source and destination tables to compare and keep track of trending.

While the first challenge is building the flexibility that allows to query a table/view record count on any server, the second challenge is how to avoid resources hungry COUNT(*) that could lead to performance issues.

The following stored procedure does just that – capturing a record count virtually anywhere using sp_executesql with dynamic SQL and taking advantage of statistical record count from sys.dm_db_partition_stats to avoid performance problems.

 

USE [yourdatabasename]
 GO
 /****** Object:  StoredProcedure [dbo].[usp_GET_RecordCountfromAnyTable]    Script Date: 4/23/2019 3:41:40 PM ******/
 SET ANSI_NULLS ON
 GO
 SET QUOTED_IDENTIFIER ON
 GO
 -- =============================================
 -- Author: Steve Rezhener
 -- Create date: 04/23/2019
 -- Description: Getting record count from any table anywhere (local/remote server)
 -- EXEC dbo.usp_GET_RecordCountfromAnyTable @TagName = 'AWS EC2', @ServerName= NULL, @DatabaseName='yourdatabase', @SchemaAndTableName='yourtablename'
 
CREATE PROCEDURE [dbo].[usp_GET_RecordCountfromAnyTable]
@TagName AS VARCHAR(20)
,@ServerName AS VARCHAR(100)
,@DatabaseName AS VARCHAR(50)
,@SchemaAndTableName AS VARCHAR(100)
 
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
 
DECLARE @SQLString1 AS NVARCHAR(MAX)
DECLARE @SQLString2 AS NVARCHAR(MAX)
DECLARE @ObjectID AS BIGINT
DECLARE @ParmDefinition NVARCHAR(500);
DECLARE @SchemaName AS VARCHAR(50) = PARSENAME(@SchemaAndTableName,2)
DECLARE @TableName AS VARCHAR(50) = PARSENAME(@SchemaAndTableName,1)
 
 
	IF ISNULL(@ServerName,@@SERVERNAME) = @@SERVERNAME -- local
	BEGIN
 
		IF @ServerName IS NULL SET @ServerName = @@SERVERNAME
 
		SET @SQLString1 = 'SELECT @ObjectIDOUT = tabs.OBJECT_ID
		FROM ' + @DatabaseName + '.sys.tables AS tabs
		INNER JOIN ' + @DatabaseName + '.sys.schemas AS mas ON tabs.schema_id = mas.schema_id
		WHERE 1=1
		AND tabs.name = ''' + @TableName + '''
		AND mas.name = ''' + @SchemaName + '''
		'
 
		PRINT '@SQLString1: ' + @SQLString1
		SET @ParmDefinition = N'@ObjectIDOUT BIGINT OUTPUT';
		EXECUTE SP_EXECUTESQL @SQLString1, @ParmDefinition, @ObjectIDOUT = @ObjectID OUTPUT
		PRINT '@ObjectID: ' + CONVERT(VARCHAR,@ObjectID)
 
		SET @SQLString2 = N'SELECT ''' + @TagName + ''' AS TagName, ''' + @ServerName + ''' AS ServerName, ''' + @DatabaseName + ''' AS DatabaseName, ''' + @SchemaAndTableName + ''' AS SchemaAndTableName, row_count AS RecCount
		FROM ' + @DatabaseName + '.sys.dm_db_partition_stats
		WHERE 1=1 AND object_id = ' + CONVERT(VARCHAR,@ObjectID) + ' AND index_id=1'
 
 
		PRINT '@SQLString2: ' + @SQLString2
		EXECUTE SP_EXECUTESQL @SQLString2
 
	END
	ELSE -- remote server via linked server
	BEGIN
 
		SET @SQLString1 = 'SELECT @ObjectIDOUT = tabs.OBJECT_ID
		FROM [' + @ServerName + '].' + @DatabaseName + '.sys.tables AS tabs
		INNER JOIN [' + @ServerName + '].' + @DatabaseName + '.sys.schemas AS mas ON tabs.schema_id = mas.schema_id
		WHERE 1=1
		AND tabs.name = ''' + @TableName + '''
		AND mas.name = ''' + @SchemaName + '''
		'
 
		PRINT '@SQLString1: ' + @SQLString1
		SET @ParmDefinition = N'@ObjectIDOUT BIGINT OUTPUT';
		EXECUTE SP_EXECUTESQL @SQLString1, @ParmDefinition, @ObjectIDOUT = @ObjectID OUTPUT
		PRINT '@ObjectID' + CONVERT(VARCHAR,@ObjectID)
 
 
		SET @SQLString2 = N'SELECT ''' + @TagName + ''' AS TagName, ''' + @ServerName + ''' AS ServerName, ''' + @DatabaseName + ''' AS DatabaseName, ''' + @SchemaAndTableName + ''' AS SchemaAndTableName, row_count AS RecCount
		FROM [' + @ServerName + '].' + @DatabaseName + '.sys.dm_db_partition_stats
		WHERE 1=1 AND object_id = ' + CONVERT(VARCHAR,@ObjectID) + ' AND index_id=1'
		--PRINT '@SQLString2: ' + @SQLString2
 
 
		PRINT '@SQLString2: ' + @SQLString2
		EXECUTE SP_EXECUTESQL @SQLString2
 
	END
 
END

Equipped with this stored procedure, it is just a matter of time that you can build an operational report to look sharp and shinny that you can share with your business users, as you can see in this image: 

get a record count