by Holger Linke, Microsoft SQL Escalation Engineer
The merge agent may get stuck without apparent progress, or even fail with a timeout error.
If several merge agents are trying to synchronize at the same time, they would block each other, until the agent which is running sp_msmakegeneration finishes the call.
If the execution of sp_msmakegeneration is taking longer than the configured query timeout, no further synchronization may be possible, because each synchronization is either blocked or running into the timeout, never being able to complete.
A typical error reported by the merge agent might be:
2013-07-29 13:03:35.88 Command Text: exec sp_MSmakegeneration
2013-07-29 13:03:35.88 Parameters:
2013-07-29 13:33:41.05 [0%] The replication agent had encountered an exception.
2013-07-29 13:33:41.05 Source: Replication
2013-07-29 13:33:41.05 Exception Type: Microsoft.SqlServer.Replication.SqlCommandTimeoutException
2013-07-29 13:33:41.05 Exception Message: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
The system procedure sp_msmakegeneration is usually called by the merge agent during the start phase of the agent. The procedure has several purposes:
- It closes all open metadata generations to prepare the metadata for the synchronization.
- It splits generations with a lot of data changes into smaller generations; this is called generation leveling and usually improves synchronization performance over unreliable and slow network connections.
The procedure gets called inside the subscriber database before the upload phase of the synchronization, and again inside the publisher database before the download phase. The issue usually occurs inside the publisher database.
If sp_msmakegeneration is unable to finish execution, it might get stuck on either operation:
- Closing all open generations might get blocked by other activity in the database, e.g. long user transactions or blocking caused by user transaction.
- Generation leveling is a very expensive process. Its performance depends on several factors:
- How many changes are associated with a specific generation (e.g. if you usually have transactions with several thousands or millions of data changes to a single article)
- How much metadata exists inside the affected database (e.g. if the database is very active, or if the retention period is very long)
- How complex any dynamic filters and join filters are (e.g. join filters several levels deep, with complex join criteria or inefficient indexing of join criteria)
There are several possible steps for a solution. The main differentiator is if this a one-time, exceptional occurrence, or a permanent condition.
One-time, exceptional occurrence:
This usually occurs after a large transaction was executed, like inserting 100.000 rows into a published table within the same transaction.
You have two options to resolve the issue, and may follow either of those options:
- Call “sp_msmakegeneration” directly from a Query Window in SQL Server Management Studio. Connect to the publisher database and/or the affected subscriber database, and call the procedure without any parameters. Let the procedure finish its execution after increasing or disabling the query timeout.
- Temporarily disable generation leveling
At the publisher, inside the publisher database, execute the following procedure call:
exec sp_changemergepublication @publication='<publication name>’, @property= ‘generation_leveling_threshold’, @value= ‘0’
NOTE: replace <publication name> with the actual name of your publication.
Then start the merge agent and let it complete its synchronization. After successful synchronization, reset the generation leveling threshold back to the default value:
exec sp_changemergepublication @publication='<publication name>’, @property= ‘generation_leveling_threshold’, @value= ‘1000’
Permanent, continuous condition:
If you usually have large transactions with several thousand or millions of data changes:
- Set the leveling threshold to a higher value.
To increase the value from the default of 1000 to e.g. 10000 or 20000: At the publisher, inside the publisher database, execute the following procedure call:
- Set the leveling threshold to a higher value.
exec sp_changemergepublication @publication='<publication name>’, @property= ‘generation_leveling_threshold’, @value= ‘10000’
Monitor the following synchronizations and adapt the value to your needs. The potential negative effect is: if a synchronization is interrupted e.g. due to a network failure, it will take a much longer time to clear and retransmit the interrupted changes during the next synchronization.
If changing the generation leveling threshold does not help:
Then the next step would require further analysis of the replication metadata, and capturing the execution of sp_msmakegeneration with performance tools like SQL Profiler or the PSSDIAG tool.
Possible actions are:
- Check the size of the replication metadata
On the affected database, run the following queries:
select count(*) from msmerge_contents
select count(*) from msmerge_tombstone
Select count(*) from msmerge_genhistory
- Provide a script of the affected publication for further analysis of likely causes
- Collect the replication metadata from the publisher and subscriber database
- Capture the execution of sp_msmakegeneration with a detailed Profiler trace or a PSSDIAG on both publisher and subscriber
Examine data to determine why large number of data changes are moving between publisher and subscriber. Is this expected? Then perhaps Transactional Replication might be a better solution.
Recommended First Step:
- Follow the steps listed in paragraph “One-time, exceptional occurrence” above
- Report back the results to discuss further steps if needed.