Chris Skorlinski – Microsoft SQL Server Escalation Services
While visiting a customer site, we discussed consequences of publishing common article\tables into multiple Publications. For this customer, each Publication contains same set of core or common tables used by all subscribers, but then some subscribers had tables unique to just that subscriber. For example, all Publications contain Customers table, while 1 Publication\Subscription adds Sales information while a 2nd Publication\Subscription contains Product Feedback data. The Customer\Sales Publication goes to 1 subscriber while the Customer\Feedback Publication goes to a different subscriber.
Using 2 Publications to 2 different subscribers is a technique to scale-out reporting workload as each subscriber can have tables and non-clustered indexes unique to support its specific workload. Reducing number of non-clustered indexes at a subscriber can have dramatic improvements in Distribution Agent performance by lower IO required for data update as fewer indexes are updated and maintained.
Okay, so what’s the catch?
Catch is the LogReader will create an entry in the Distribution Database for the same table for each Publication to which it belongs. In example below I published AdventureWorks Customer table in 2 different Publications then changed 1 row. As you can see, there are 2 rows in the Distribution database, one for each Publication.
Now take that Customer table and push through 100 million changes per day x 2 publication and you have 200 million commands to move from LogReader to Distribution database and 200 million commands need cleanup once delivered. Multiply this by other “common” tables and more publication and quickly have over 800 million commands moving through the Replication environment bloating the Distribution database.
Alternative Designs. – Common Tables Publication
There are 2 different solutions to reduce the Distribution database bloat. First, only publish tables once by moving all the “common” tables into 1 publication for all subscribers, then create separate publications for those tables unique to that subscriber. In the example above we’d have 3 publications. At the individual subscribers create non-clustered indexes for that subscriber once the initial snapshot has been delivered.
Sales subscriber (CommonCustomer and Sales publications)
Feedback subscriber (CommonCustomer and Feedback publications)
Alternative Designs. – One Publication
Another solution is to create 1 publication that contains ALL tables and push this to ALL subscribers. Yes, the Sales subscriber will now contain Feedback table and vice-versa, however, if the extra tables used by the other subscriber is small and the number of data changes in minimal, then over head of the other subscriber’s table would be not as great as distribution database bloat problem. This also has the added benefit of providing data redundancy at the subscribing allow either application to use either subscriber should the need arise.
Alternative Designs. – One Publication – Selective Subscriptions
Sounds good, but say I still have a large “sales” table not needed by Feedback subscriber? For variation on the 1 publication approach configure the subscriber to select to subset of articles within the common publication. By default, the subscriber gets @article =
‘all’, instead via scripts execute sp_addsubscription selecting which tables goes to which subscribers.
— Sales Subscriber
‘SalesAndFeedback’, @subscriber = ‘Sales’,@article = ‘Customers’
‘SalesAndFeedback’, @subscriber = ‘Sales’,@article = ‘Sales’
— Feedback Subscriber
‘SalesAndFeedback’, @subscriber = ‘Feedback’,@article = ‘Customers’
‘SalesAndFeedback’, @subscriber = ‘Feedback’,@article = ‘Feedback’
If publishing using default “concurrent snapshot” you’ll get error below when subscriber to selective articles.
Msg 14100, Level 16, State 1, Procedure sp_MSrepl_addsubscription, Line 572
Specify all articles when subscribing to a publication using concurrent snapshot processing.
This “selective subscriptions” capability does introduce special consideration when executing sp_addpublication that may not fit all situations. For example, @sync_method supports “database snapshot” or “native” and @immediate_sync feature must turned off.
… @allow_anonymous =
I hope these options help you explore Transactional Replication design options to reduce Distribution database bloat.