Impact of large batch on Transactional Replication topology
Every write an update statement touching +50 million commands in a single transaction? Ever on a table being Published in Transaction Replication? How about to a subscriber half way around the world or on a satellite connected network?
While nothing in Replication design prevents, there may be challenges that should be considered when making large single transactions in a Microsoft SQL Server Transactional Replication environment. The example below is meant to illustrate impact of large transactions but not actually how LogReader “internals” log processing works.
“All or Nothing”
The default for Replication designed is to keep transactional consistency from Publisher to Distributor. To accomplish this, transactions are written to the Distributor then Subscriber “all or nothing”. For large transactions, a high number of locks are placed on distribution and subscriber databases. For large transactions when resources are low, do to other activity, SQL Server will report the following warning:
The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions. (Source: MSSQLServer, Error number: 1007)
“I think it is hung”
SQL Replication Agents report status once transactions are committed. If hardware\network topology takes hours to move the “BatchCommitSize”, DBAs may incorrectly believe the Replication Agents are “hung”, restarting the agents, rolling back current work, forcing the LogReader Agent to restart at the beginning of the last uncommitted transaction.
“1 Large vs. Many Small”
What is different in LogReader performance for 1 large batch transactions vs. smaller batches when application is modifying the same number of rows? This is a very good question. To demonstration we’ll push down a large transaction that grows the TLOG by 100gb.
BEGIN_TRAN…(100gb of UPDATES)…COMMIT_TRAN
LogReader Agent starts, sends a READ request to SQL Server Publisher. The SQL Server starts reading the published database transaction log, looking for BEGIN_TRAN, finds one, records the location (LSN), then starts searching for the either a ROLLBACK or COMMIT. Scanning continues, reading 100gb of data until the COMMIT (LNS) is found. Now we’ve identified a valid transaction. Time go back to the BEGIN_TRAN, re-read the 100gb transaction log and start sending the commands to the LogReader.exe.
During 2nd pass, the 100gb of commands is transferred to the LogRead.exe then streamed to the Distribution database in a single transaction. Again, any interruption of the LogReader Agent while writing to the Distribution database, blocking, connection drop, etc. and the entire transaction is rolled back, including the 200gb (100gb x 2) READ from the transaction log.
As you can see the SQL Server just made two passes through the transaction log reading 200gb (100gb x 2) of on-disk data. Its bit more complicated, tracking “roll-back” transactions, skipping non-replicated data such as changes to index records, etc. Again, any interruption of the LogReader writing this into the Distribution database and we start all over. On some SQL Servers, the non-replicated log records may exceed the amount of replicated log records processed as part of the log scan activity.
“Advantages of Smaller Batches”
Advantages of smaller batches are:
- any interruption Agents only repeats the last small batch; all prior batches are marked as delivered and not reprocessed.
- feedback to DBA more frequent, less likely Agent will be inadvertently restarted, i.e. DBA restart the Agent when it was already working.
- “delivered” water mark is moved forward with each batch, the 50gb of non-replicated data is read only once, watermark moves forward, no need to go back and re-read non-replicated data.
- SQL Server engine overhead lower for smaller transactions.
To reduce Replication explore using Stored Procedure Replication. This feature replicated the “execution of” not the “results of” a stored procedure. For our +50 million transaction Replication may only need to send down a single “sp_run_my_update” instead of +50 million update statements.
“Large Batch Impact on Distribution Agent”
Distribution Agent will also process the data changes as a single transaction. It too will attempt an all-or-nothing read from Distribution database and write to the Subscriber. Any interruption and it will restart the entire transactions.
“MaxCmdsInTran changes LogReader behavor to improve Distribution Agent”
If the application team is unable to break up large transactions, we can help the Distribution Agent by asking the LogReader agent to break up the 1 large transaction into smaller transaction while writing commands into the Distribution database by adding the ” –MaxCmdsInTran” parameter to the LogReader SQL Agent job properties. SQL Server and LogReader still see 1 large transaction, it is just breaks up the single transactions into smaller transactions while writing into the Distribution database. The Distribution Agent will now see and deliver to subscribers the batch of smaller transactions.
The –MaxCmdsInTran parameter specifies the maximum number of statements grouped into a transaction as the Log Reader writes commands to the distribution database.
Specifying this parameter can reduce contention at the Distributor and reduce latency between the Publisher and Subscriber. Because the original transaction is applied in smaller units, the Subscriber can access rows of a large logical Publisher transaction prior to the end of the original transaction, breaking strict transactional atomicity.
additional information on LogReader parameters
“Minimize Data Replicating”
If you’re seeing Replication Latency take a minute to ensure you’re only moving data through Transaction Replication data that needs near real-time data flow. For table updated less often, perhaps another option such as Snapshot Replication, Change Tracking, or SSIS package is better tool to move the data. Maybe best solution is AlwaysOn and use Read_Only secondary SQL Server!
I hope you find this information helpful.
Microsoft SQL Server Escalation Services
One thought on “Impact of large batch on Transactional Replication topology”
Hi Chris, could you maybe make a post about how loopback detection works in bi-directional transactional replication? For example, I read somewhere that when server A replicates a transaction to server B, server B’s log reader will still pick up that transaction and send it to it’s distributor (i.e. gets written to Distribution database). Only when the subscriber that replicates it to server A sees it in the distribution database will it not replicate it, as it knows that server A was the originator of the transaction. If that is true, this would mean that filtering data for the publication, if your data is partitioned such that one type of data only needs to replicate in one direction, would be better as it would mean less data going to the distributor.