Follow-up Customer Questions on Peer-2-Peer Replication
I meet with a customer today planning for Peer-2-Peer Replication in conjunction with Microsoft Amalga, our Hospital Information System solution. I thought others would benefit from some of the Q&A that were raised in this meeting. I hope you too find these helpful in your projects.
1. How to monitor replication?
>>>Tracer Tokens can be automatically inserted via TSQL Agent Job at regular intervals, then queries can be run against the Distribution Agent Tracer Token tables to identify periods of high latency, or latency with specific subscriber(s). Timestamp values “null” indicate token is not yet delivered and are usually cause for further investigation.
>>>System Monitor. You can use MS System Center or other Windows Performance Monitor tracking/alert tools to notify you when the Agent Latency exceeds expected values. What are “expected”, well that depends on many variables. Recommend watching Perfmon counters under both typical and atypical data loads. This is similar technique to monitoring for high CPU or slow IO conditions.
>>For the hard-core developers, you can use RMO to simulate Replication Monitor.
2. How to handle large volume of data?
>>>Adjust the retention period to keep the Distribution database as small as possible. Consider how long it would take to “catch up”. For example, at 3million transactions per day, if subscriber is offline for 2 days the Distribution Agent has 6 million transactions to apply just to “catch up”. If this takes, say, 6 hours, it may be faster to reinitialize the Subscriber. Therefore, why keep 2 or 3 days of transactions queued, if we’ll reinitialize anyway, instead only keep 1 day queued.
>>>Configure Replication with the Distribution DB/log on its own physical disk (or LUN for SAN storage). This spreads IO and helps prevents disk bottleneck when the Log Reader is retrieving commands from the Published database and writing them to the Distribution database
3. How to maintain / clean up Distribution database (Replication Maintenance Jobs)?
>>>We normally only stop the History Cleanup while troubleshooting Replication. This give us more historical data to analyze. All the cleanup jobs should be running at scheduled intervals.
Removed 0 replicated transactions consisting of 0 statements in 10 seconds
>>>If the Distribution cleanup Job is not deleting data, this may indicate a problem with the SQL internal Replication metadata tracking tables. Conservative solution is to drop all publications, disable publishing which deletes the Distribution database, then reconfigure Publishing/Distribution which create a new distribution database. Then monitor the msrepl_transaction timestamp values to ensure cleanup is occurring. If still not, then Profile Trace is needed to determine why DELETE is not finding rows to be removed.
4. What happens if the server is rebooted (e.g. Windows update)?
>>>Upon startup SQL Agent is restarted which automatically restart Replication Agents. Log Reader will compute last replicated “committed” LSN written to the Distribution database and start pulling new data from the Transaction Log. Distirbution Agent will query subscriber’s last replicated transaction and too start moving transactions to the subscriber.
5. How to re-create publication / subscription using the generated script?
>>>The sp_addsubscription generated Replication script contains “generic” settings. For example, @sync_type = N’replication support only’.
These may need to modified to include options such as @sync_type = N’initialize with backup’, @backupdevicetype= N’disk’, @backupdevicename= N’c:\backupfiles\published.bak’. I would recommend testing the customize settings on a small publication.
6. How to delete publication / subscription?
>>>Depending on your configuration, you may need to execute two drop subscribers commands.