When building ETL processes, one of the things we always like to do is to account for every row. But setting up error data flows can be very time consuming since they are schema dependent. The following solutions is a generic way to log bad rows that can be re-used without modification. It is also set-based so we are only writing one error row no matter how many errors we had.
The basis of the solution is that we store the data as XML. This way we maintain the structure and schema and can shred it as needed later on but we don’t have to have schema-specific tables.
To start with, we need a table to store the results.
CREATE TABLE [dbo].[ErrorRows_XML]( [ErrorID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, [ErrorDate] [datetime] NOT NULL, [ErrorSource] [varchar](500) NULL, [ErrorCount] [int] NULL, [RowData] [xml] NULL, ) GO
The next thing we need to do is create an SSIS variable to store the results. We are going to use a Recordset destination so we need a variable of type Object to store the Recordset in.
For the variable name, I use the following naming convention:
Error_<PackageName>_<CFStepName>_<DFStepName>
As you can see from the screenshot below, the DataType is Object and the Value is set for you as System.Object.
In this example, I am using the No Match output of a Lookup transform. If we can’t find the key, we don’t want the row to go to the fact table via the Match output, but we don’t want to lose the data either so we can troubleshoot the key issue.
The destination is a Recordset Destination, which uses the Advanced Editor. On Component Properties, the only thing we need to specify is the VariableName, which we created in the previous step.
On the Input Columns tab, we specify which columns from the data flow we want to store as XML. I generally limit this as much as possible to just the data I need to figure out why this particular operation didn’t work. In this case that would be the primary key of the input for the data flow and the columns used in the lookup.
At this point, when the Data Flow runs, the rows to be logged will be stored in the Recordset variables. The last step (typically at the end of the control flow) is a Script Task to write the data to the ErrorLog table created above.
The variable we stored the data in needs to be specified in the Script Task (ReadOnly is fine).
The script uses this function to do the work.
public void WriteError(string variable) { //serialize to xml OleDbDataAdapter da = new OleDbDataAdapter(); DataTable dt = new DataTable(); DataSet ds = new DataSet(); da.Fill(dt, Dts.Variables[variable].Value); ds.Tables.Add(dt); string xml = ds.GetXml(); if (xml != "<NewDataSet />") { //write to db string ErrSource = Dts.Variables[variable].Name; SqlConnection cn = new SqlConnection(); cn = (SqlConnection)(Dts.Connections["DW01_ADO"].AcquireConnection(Dts.Transaction) as SqlConnection); SqlCommand cmd = new SqlCommand("insert dbo.EFC_ErrorRows_XML(ErrorSource,RowData) values(@ErrorSource, @XML)", cn); cmd.Parameters.Add("@ErrorSource", SqlDbType.VarChar).Value = ErrSource; cmd.Parameters.Add("@XML", SqlDbType.Xml).Value = xml; cmd.ExecuteNonQuery(); } }
The body of the script just calls the function for each variable that contains errors.
public void Main() { WriteError("Error_FactEFCTaps_StageTaps_TripKey"); WriteError("Error_FactEFCTaps_StageTaps_ActionKey"); WriteError("Error_FactEFCTaps_StageTaps_ServiceKey"); WriteError("Error_FactEFCTaps_StageTaps_ZoneKey"); WriteError("Error_FactEFCTaps_StageTaps_YardKey"); WriteError("Error_FactEFCTaps_StageTaps_MTTapDateKey"); WriteError("Error_FactEFCTaps_StageTaps_MTTapTimeKey"); WriteError("Error_FactEFCTaps_StageTaps_StopKeyEndDateNull1"); WriteError("Error_FactEFCTaps_StageTaps_NoPlatformKey2"); WriteError("Error_FactEFCTaps_StageTaps_RouteID0"); WriteError("Error_FactEFCTaps_StageTaps_PlatformKey1"); WriteError("Error_FactEFCTaps_StageTaps_NoPlatformKey3"); WriteError("Error_FactEFCTaps_StageTaps_NoPlatformKey1"); WriteError("Error_FactEFCTaps_StageTaps_AllStopKey"); WriteError("Error_FactEFCTaps_StageTaps_NoPlatformKey"); WriteError("Error_FactEFCTaps_StageTaps_InstitutionKey"); WriteError("Error_FactEFCTaps_StageTaps_ProductKey"); WriteError("Error_FactEFCTaps_StageTaps_CardKey"); WriteError("Error_FactEFCTaps_StageTaps_VehicleKeyTraxFR"); Dts.TaskResult = (int)ScriptResults.Success; }
We now have the data in the Error Log table as XML
XQuery can be used to shred the XML to get at specific data. For common errors, I create views for the specific error type.
SELECT x.ErrorID ,x.ErrorDate ,tbl.col.value(N'(id_sess)[1]',N'bigint') AS id_sess ,tbl.col.value(N'(cardid)[1]',N'nvarchar(255)') AS CardID ,tbl.col.value(N'(CC_CompanyName)[1]',N'nvarchar(255)') AS CC_CompanyName ,tbl.col.value(N'(CDListMembership)[1]',N'nvarchar(255)') AS CDListMembership ,tbl.col.value(N'(FaceNumber)[1]',N'nvarchar(255)') AS FaceNumber ,tbl.col.value(N'(id_payee)[1]',N'bigint') AS id_payee ,tbl.col.value(N'(LastCDListChangeDate)[1]',N'datetime') AS LastCDListChangeDate ,tbl.col.value(N'(ProductKey)[1]',N'int') AS ProductKey ,tbl.col.value(N'(ShortName)[1]',N'nvarchar(255)') AS ShortName ,tbl.col.value(N'(SubscriptionEndDate)[1]',N'datetime') AS SubscriptionEndDate ,tbl.col.value(N'(SubscriptionStartDate)[1]',N'datetime') AS SubscriptionStartDate ,tbl.col.value(N'(InstitutionKey)[1]',N'int') AS InstitutionKey from dbo.EFC_ErrorRows_XML x cross apply RowData.nodes(N'//Table1') as tbl(col) WHERE ErrorSource = 'Error_FactEFCTaps_StageTaps_CardKey'