Sateesh Yele | Microsoft SQL Server Support
I would like to share information about an interesting Replication issue I recently worked. Customer was getting unusual error messages during synchronization.
Merge agent is failing with the following error message when validating large tables with checksum.
The merge process was unable to perform data validation on article ‘ tablename’’. Check for SQL Server errors in the Windows application event log or retry at a later time. (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147200953)
Get help: http://help/MSSQL_REPL-2147200953
Error converting data type varchar to numeric. (Source: MSSQLServer, Error number: 8114)
Get help: http://help/8114
When validating articles in Merge replication, the Merge agent will execute sp_table_validation on the subscriber. The logic inside the proc is running the below command.
select count_big(*), sum (convert(numeric, binary_checksum(*) ) ) from [dbo].[tablename] WITH (TABLOCK HOLDLOCK)
By default, the query time out for Merge agent is 600 seconds (5 min) and the Merge agent. For large tables, the above command would take a lot of time as we are calculating the checksum of each row in the table and may fail with the above error. If we look at the centralized replmerge.log, we would see the below error message.
Validating article ‘tablename’
The merge process failed to execute a query because the query timed out. If this failure continues, increase the query timeout for the process. When troubleshooting, restart the synchronization with verbose history logging and specify an output file to which to write.
The merge process was unable to perform data validation on article ‘tablename’. Check for SQL Server errors in the Windows application event log or retry at a later time.
The Merge agent will not roll up the timeout error message and would only display “Error converting data type varchar to numeric” message in the replication monitor.
Create a new Merge Agent Profile and increase the “–QueryTimeOut” or add the parameter to the Merge Agent job properties to increase the default time out period then restart the Merge agent.