WHY CENTRAL MERGE SUBSCRIBERS ARE NOT SUPPORTED
Holger Linke
Microsoft SQL Server Escalation Services
Abstract
Provides details and explanations why SQL Server Merge Replication does not support central subscriber topologies. This fact was originally stated in SQL Server Books Online and further documented in KB article KB 2750005, but is still causing some misconceptions and confusion.
This blog article will provide additional thoughts and background information about why configuring central subscribers with Merge Replication is not a good idea.
The Central Subscriber model
In a central subscriber model, the subscriber database synchronizes with two or more publishing sources. The subscriber’s publishing sources will consist of one of the following combinations:
- Two or more publications inside the same publisher database at the same publishing server instance.
- Two or more publications in two or more publisher databases at the same publishing server instance.
- Two or more publications in two or more publisher databases at different publishing server instances.
Practical examples for these topologies are:
- The publisher database contains more published tables than would fit into a single publication. Or you may group published tables into several publications for administrative reasons, e.g. to be able to reinitialize only a subset of the published tables.
- You want to consolidate the data from several databases on the same SQL Server instance into a single subscriber database.
- You want to consolidate the data from several SQL Server instances into a single subscriber database.
SQL Server will allow you to configure such a topology, because there is no good way to check for the conditions when creating the subscriptions. In most cases, it will work directly after setting it up, and you actually might never have a problem later on.
The inherent problems with this kind of topology usually start to surface if you want to change some aspects of it, like removing and re-adding articles, subscriptions, or publications. This might become necessary after some system failure or accident, or if you need to change the publications or subscriptions to address application design improvements.
Merge Replication Design Considerations
Merge Replication had been designed to deliver the following feature set:
- Resilient environment. You may restore a publisher or subscriber database to an older backup, and still experience only minimal or no data loss.
- Allow for subscribers that remain disconnected for some time.
- Allow for subscribers on different database types, like SQL Server, SQL Server Compact, or even Microsoft Access (Jet) databases.
- Provide detailed conflict resolution strategies, based on subscriber priorities down to the column level of the published tables.
- Create complex scale-out topologies, e.g. Republishing servers spread over distributed regions in order to bring the data closer to the users.
The typical scenario to be addressed by Merge replication was the disconnected sales person, who would be synchronizing on-demand without a specific schedule. The Central Subscribers feature was never on the design list, and therefore wasn’t reflected in the initial implementation. It was also never properly tested
The first version of SQL Server Merge replication had been released with SQL Server 7.0 back in 1998. It has received major feature upgrades in SQL Server 2000 and 2005, and has further evolved to SQL Server 2012. The core metadata tables have remained practically unchanged though throughout the versions.
Synchronization of metadata
The Merge Replication metadata of each replica database is stored in a single set of metadata tables. If several subscriptions are created in the same database, the metadata of all subscriptions will be stored in the same set of system tables. The replication metadata is therefore shared by multiple merge agents, all reading and writing from/to the same system tables.
Each time a merge agent synchronizes the publisher with the subscriber database, it will synchronize publication and subscription information between their configured synchronization partners. For example, rows from the subscriber’s sysmergesubscriptions system table are uploaded to the publisher, the publisher’s rows are downloaded to the subscriber. This is necessary for the conflict handling in Republishing topologies: the main publisher and each republisher needs to know about each node in the topology, in order to resolve conflicts according to the replica priorities you had configured. If a new republisher or subscriber is added on one branch of the topology, the metadata for it needs to be uploaded to the main publisher first before it can be downloaded into the other branches.
And it has to be done on each synchronization: if for example you have to restore a node from an older backup, the node will always receive current information from their synchronization partners.
So what’s the problem then?
If you have configured a central subscriber topology, you will have two or more merge agents synchronizing the subscriber metadata with its publisher partners. The potential disaster comes from the fact that each merge agent exchanges not only the metadata for its own publication and subscription, but also for the unrelated other publications and subscriptions.
If e.g. you drop a subscription, the metadata information will be removed immediately from the publisher and the subscriber, but not from the other replicas in the topology. In a central subscriber scenario, if the other merge agents at that subscriber synchronize with their partners, they will re-introduce the outdated subscription information back into replicas, and may remove the information about the deleted subscription at their partner. Because of the disconnected nature of the subscribers, the incorrect metadata might start to oscillate through the topology.
Another aspect is related to running several merge agents in parallel, servicing the same subscriber database at the same time. The potential issues usually start to show if the topology reaches a certain size and complexity. Contributing factors could be the number of publishers and subscribers, the number of data partitions for dynamic filters, the amount of metadata stored in the change tracking system tables, and the complexity of the dynamic filter design.
We have seen support cases with specific symptoms, where the only common denominator were the central subscribers. It included scalability problems due to blocking, or due to the way how the change evaluation is handled, or unexpected conflicts and retries. These issues are typically impossible to reproduce in a test environment, because in most cases you can’t duplicate the complexity and timing of your production environment. So you usually can’t scientifically prove that the central subscriber is the root cause. But you still may see that tweaking the agent schedules or removing the central subscribers from the topology can
lessen or avoid those issues.
The resulting issues might be unpredictable, intermittent, and unexpected. They may appear at a much later time than the step that actually caused the symptom.
Two Examples
The following examples are the simplest and most obvious ways to demonstrate metadata issues resulting from a central subscriber topology. The first would also cause direct errors, as they are documented in article KB 2750005. Variations of these steps may lead to other symptoms or subtle differences.
Example 1: Publications in separate publication databases are synchronizing into the same subscriber database
Consider the following configuration and steps:
- The initial topology consists of Publication PUB1, internally identified by the publication ID GUID1, and Publication PUB2, identified by the publication ID GUID2. PUB1 and PUB2 exist in two separate publisher databases, either on the same or different publication servers. This topology usually works well immediately after configuration.
- Both publications replicate into the same subscriber database.
- Because of some accident or a design consideration, you need to drop PUB1 and its subscription. You re-create the publication again with the same name PUB1; it will receive a new internal identifier GUID3. You also create a new subscription into the existing subscriber database.
- At this stage, the combination of PUB1/GUID3 is known at publisher database 1 and the subscriber. At publisher database 2, the old combination PUB1/GUID1 is still known if the merge agent hasn’t synchronized already.
- The merge agent for PUB2 synchronizes. It uploads the information about PUB1/GUID3 to its publisher PUB2, and downloads the information about PUB1/GUID1 into the subscriber database.
- If PUB2 has further subscriptions, it will distribute the PUB1/GUID3 information into its sub-topology, and receive the PUB1/GUID1 during each of these synchronizations.
- The merge agent for PUB1 synchronizes. It detects the mismatch between PUB1/GUID3 at the publisher and PUB1/GUID1 at the subscriber. It reports an error pointing to invalid subscription information, or that the publication had been replaced with a new one.
After reaching this point, your topology has been compromised. The incorrect PUB1/GUID1 combination will continue to oscillate through the system, and there is no practical way to get this subscriber back into the topology.
Example 2: Two or more publications in the same publisher database are synchronizing into the same subscriber database
Consider the following configuration and steps:
- The initial topology consists of Publication PUB1, identified by the publication ID GUID1, and Publication PUB2, identified by the publication ID GUID2. Both coexist in the same publication database on the same publisher server. This usually works well immediately after configuration.
- Both publications replicate into the same subscriber database SUB1, and into other subscriber databases SUB2…SUBn.
- Because of some accident or design consideration, you need to drop PUB1 and its subscription. You re-create the publication again with the same name PUB1; it will receive a new internal identifier GUID3.
- You create a new subscription into the existing subscriber database SUB1.
- At this stage, the combination of PUB1/GUID3 is known at publisher and the subscriber database SUB1. At subscriber databases SUB2, the old combination PUB1/GUID1 is still known if the merge agent hasn’t synchronized already.
- The merge agent for PUB2 synchronizes with SUB2. It uploads the information about PUB1/GUID1 to the publisher database, and downloads the information about PUB1/GUID3 into the subscriber database.
After reaching this point, your topology maintains information about PUB1/GUID1 and PUB1/GUID3. This doesn’t present an immediate issue, as the orphaned old metadata and the current new metadata appear to be able coexist. If the steps are repeated in large topologies, the metadata tables will be bloated unnecessarily. The effects on Republishing topologies haven’t been fully understood because of the increasing complexity of the agent interactions.
Conclusion
A central subscriber might work for you immediately after configuring the topology. It might continue to work, as long as you can avoid applying changes to your topology, like removing and re-adding articles, subscriptions, or publications.
But if you get into issues that are related to the central subscriber topology, there is nothing that may help you except for dropping the entire topology and starting over from scratch. This usually implies a major administrative challenge, especially if you need to recover from a failure in an emergency situation.
Regarding enhancing Merge replication to make central subscribers a supported scenario: This would require a major redesign of the core of Merge replication and the way how the metadata is stored and handled. It would effectively mean the development of a completely new type of replication. If you take into account that Merge replication has evolved over several versions of SQL Servers, starting with SQL Server 7.0 back
in 1998, you may see that this just wouldn’t be a simple step in the next Cumulative Update or Service Pack.
So the strong recommendation is to avoid central subscriber topologies in Merge Replication, and choose a different configuration instead. An alternate solution could be a single central merge publication that provides data partitions based on a parameterized row filter for the individual subscriber databases. Or use transactional replication instead of merge replication, if you really have to rely on this type of topology.
References:
Article KB 2750005 in the KnowledgeBase:
Merge replication does not support centralized subscriber topologies
Books Online for SQL Server 2012:
Publish Data and Database Objects
http://msdn.microsoft.com/en-us/library/ms152559.aspx
(Note: Refer to the topic “Considerations for Publishing” – “Publishing Tables in More Than One Publication” in this article.
Books Online for SQL Server 2008 R2:
Integrating Data from Multiple Sites (Client)
http://msdn.microsoft.com/en-us/library/ms151790(v=sql.105).aspx
(Note: Refer to the topic “The type of replication to use for this scenario” in this article)