Introduction
Data Governance plays an important role in building a successful data-driven company as it helps business users to trust the data. While a typical Data Analytics project starts with identifying all relevant transactional sources of information, building your Data Warehouse according to business needs and feeding (and converting) the data as often as possible, one can argue that without operational reports alerting on a lag between the two systems, your Data Analytics function is doomed. No matter how great your Data Warehouse is, if the users are constantly complaining about bad data, your Data Warehouse project days are numbered.
I am assuming that you followed best practices and isolated your company analytical server (OLAP) from your transactional server (OLTP). You also set up an ongoing replication in near real-time. As much as business users are happy to analyze all the historical data (great for trending analysis that requires volume of data), they would also love to see real-time data (great for an e-commerce company that is coming up with new products). But how do you ensure your dependent tables are in-sync?
Problem
How do you compare data between source and destination tables using a repeatable process? One of the first reporting needs in a Data Governance report is comparing counts to identify any missing records. Having a code that allows to get a record count from any table anywhere (local/remote server) will make it easy.
Solution
We are going to build a reusable stored procedure that could help in a Data Governance Report to capture record counts between source and destination tables to compare and keep track of trending.
While the first challenge is building the flexibility that allows to query a table/view record count on any server, the second challenge is how to avoid resources hungry COUNT(*) that could lead to performance issues.
The following stored procedure does just that – capturing a record count virtually anywhere using sp_executesql with dynamic SQL and taking advantage of statistical record count from sys.dm_db_partition_stats to avoid performance problems.
USE [yourdatabasename]
GO
/****** Object: StoredProcedure [dbo].[usp_GET_RecordCountfromAnyTable] Script Date: 4/23/2019 3:41:40 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: Steve Rezhener
-- Create date: 04/23/2019
-- Description: Getting record count from any table anywhere (local/remote server)
-- EXEC dbo.usp_GET_RecordCountfromAnyTable @TagName = 'AWS EC2', @ServerName= NULL, @DatabaseName='yourdatabase', @SchemaAndTableName='yourtablename'
CREATE PROCEDURE [dbo].[usp_GET_RecordCountfromAnyTable]
@TagName AS VARCHAR(20)
,@ServerName AS VARCHAR(100)
,@DatabaseName AS VARCHAR(50)
,@SchemaAndTableName AS VARCHAR(100)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
DECLARE @SQLString1 AS NVARCHAR(MAX)
DECLARE @SQLString2 AS NVARCHAR(MAX)
DECLARE @ObjectID AS BIGINT
DECLARE @ParmDefinition NVARCHAR(500);
DECLARE @SchemaName AS VARCHAR(50) = PARSENAME(@SchemaAndTableName,2)
DECLARE @TableName AS VARCHAR(50) = PARSENAME(@SchemaAndTableName,1)
IF ISNULL(@ServerName,@@SERVERNAME) = @@SERVERNAME -- local
BEGIN
IF @ServerName IS NULL SET @ServerName = @@SERVERNAME
SET @SQLString1 = 'SELECT @ObjectIDOUT = tabs.OBJECT_ID
FROM ' + @DatabaseName + '.sys.tables AS tabs
INNER JOIN ' + @DatabaseName + '.sys.schemas AS mas ON tabs.schema_id = mas.schema_id
WHERE 1=1
AND tabs.name = ''' + @TableName + '''
AND mas.name = ''' + @SchemaName + '''
'
PRINT '@SQLString1: ' + @SQLString1
SET @ParmDefinition = N'@ObjectIDOUT BIGINT OUTPUT';
EXECUTE SP_EXECUTESQL @SQLString1, @ParmDefinition, @ObjectIDOUT = @ObjectID OUTPUT
PRINT '@ObjectID: ' + CONVERT(VARCHAR,@ObjectID)
SET @SQLString2 = N'SELECT ''' + @TagName + ''' AS TagName, ''' + @ServerName + ''' AS ServerName, ''' + @DatabaseName + ''' AS DatabaseName, ''' + @SchemaAndTableName + ''' AS SchemaAndTableName, row_count AS RecCount
FROM ' + @DatabaseName + '.sys.dm_db_partition_stats
WHERE 1=1 AND object_id = ' + CONVERT(VARCHAR,@ObjectID) + ' AND index_id=1'
PRINT '@SQLString2: ' + @SQLString2
EXECUTE SP_EXECUTESQL @SQLString2
END
ELSE -- remote server via linked server
BEGIN
SET @SQLString1 = 'SELECT @ObjectIDOUT = tabs.OBJECT_ID
FROM [' + @ServerName + '].' + @DatabaseName + '.sys.tables AS tabs
INNER JOIN [' + @ServerName + '].' + @DatabaseName + '.sys.schemas AS mas ON tabs.schema_id = mas.schema_id
WHERE 1=1
AND tabs.name = ''' + @TableName + '''
AND mas.name = ''' + @SchemaName + '''
'
PRINT '@SQLString1: ' + @SQLString1
SET @ParmDefinition = N'@ObjectIDOUT BIGINT OUTPUT';
EXECUTE SP_EXECUTESQL @SQLString1, @ParmDefinition, @ObjectIDOUT = @ObjectID OUTPUT
PRINT '@ObjectID' + CONVERT(VARCHAR,@ObjectID)
SET @SQLString2 = N'SELECT ''' + @TagName + ''' AS TagName, ''' + @ServerName + ''' AS ServerName, ''' + @DatabaseName + ''' AS DatabaseName, ''' + @SchemaAndTableName + ''' AS SchemaAndTableName, row_count AS RecCount
FROM [' + @ServerName + '].' + @DatabaseName + '.sys.dm_db_partition_stats
WHERE 1=1 AND object_id = ' + CONVERT(VARCHAR,@ObjectID) + ' AND index_id=1'
--PRINT '@SQLString2: ' + @SQLString2
PRINT '@SQLString2: ' + @SQLString2
EXECUTE SP_EXECUTESQL @SQLString2
END
END
Equipped with this stored procedure, it is just a matter of time that you can build an operational report to look sharp and shinny that you can share with your business users, as you can see in this image: