Service Bus Topics and Synchronizing Data

I am probably going to compare apples and oranges in this post, with a slight bias towards oranges.  Yes, I like fruit.  I also like SQL Azure Data Sync and how it makes data synchronization tasks a breeze.  You  install your agent, register your database, go to the portal, setup your sync groups, conflict resolution and you are ready to go.  Eventually (if you set things up right) data you need will be synchronized from cloud to cloud, cloud to on premise…  Awesome.

As it turns out, I also like Service Bus, and the powerful features it provides to build distributed applications, aid transactional behavior, etc.

In our scenario we have a company that runs an online store in all six Windows Azure data centers (how romantic), and, through Traffic Manager, the users can get routed anywhere.  How do we deal with the challenge of data consistency? Well, we can use Data Sync, for sure, but what if we are in a more restrictive scenario?  What if  we are not allowed to make changes to the database?  Well, the Data Sync option quickly fades away, doesn’t it?  Part of Data Sync’s magic consists of installing a Windows Service and a number of tables in your database to keep track of changes.  Busted.

So, how do we go about solving this problem?  I will go down the path of  Service Bus topics and subscriptions to comply with our restriction.  I choose Service Bus because we are already working with Windows Azure,  I’ve seen it in action, been working with it for the last couple of months and know that I can find a relatively simple way to manage this.  It’ll be wholesome to discuss other options as well.

This is a high level diagram of the idea

Our beloved customer places an order through any of the data centers.  Because the user can see his own orders, and because he might get routed by Traffic Manager to another data center, we need to have that information available everywhere, eventually.

We create a topic (let’s call it SyncTopic) in our favorite data center.  Just one (let’s add a budget restriction while we are at it, and a little bit of KISS). We create one subscription per data center, and we’ll have our SyncListener component running in each cloud subscribe to that topic.  The SyncListener will be responsible for writing the orders to the database.  Insert only, it makes the story easier  (that would depend on how we designed our data model, of course)

Every time a customer places an order, the SyncSender component (a separate worker role or a new task running in a worker role – you choose) posts a message to the topic, a copy of the order.  Each SyncListener will receive a copy of the message and perform the corresponding write operation to the db.  That is the big picture.

Implementation considerations.

This is all very nice if we draw it on paper, but we need to be sure that we are tackling other concerns, for example, and in no particular order of importance…

Sync loops: because the data center where the order originated also contains a SyncListener, it could potentially receive a copy of the order it already has.  We can avoid this by implementing a filter on the subscription, and filling the corresponding property in the BrokeredMessage.  The filter (a SQLFilter why not) can be something like  SenderDataCenter != ListeningDataCenter.

Conflict Resolution: we need to be covered here as well.  The path we choose will depend on our given context and might even take us back to square one.

Transactional Behavior: We would have to use the PeekLock mode to receive messages for starters, and we would have to have a supervising mechanism to deal with transient failures, we can also use Topaz.  Consider using the dead letter queue as well if something goes wrong when receiving the message.

Message size: What if our BrokeredMessage exceeds its size?  We need to consider another mechanism to get the required data through.  Maybe storing it in blob storage, and then passing a reference in the message for the listener to pick up.  The story gets complicated, but it could happen.

Security: Although the topic and subscriptions are used internally, it is always a good idea to secure them.  With ACS we can easily secure the edges.

Hybridness: now we need all the orders data on premise (wait, don’t we have that already? oops).  By setting up  a new subscription, and a SyncListener on premise we can now get a copy of the data.

and probably others…

Conclusion

We can go about it many ways.  Data Sync is simple and quick to setup.  We don’t have 100% control over it, but it gets the job done.   As our context changes, so does our need of considering other possibilities where we need greater control over things.  I am aware that this idea can trigger LOTS of other considerations I haven’t mentioned. Consider this post not as a definitive solution, but as a starting point to a fun discussion.

Happy Learning!



Leave a Reply