Enterprise Replication (ER)

Enterprise Replication (ER) technology, which is both flexible and powerful, allows businesses to replicate data from branch offices to corporate office, replicate data from corporate office to branch offices, replicate data closer to customers.

ER works by you first defining the servers among which you would like data to be replicated. This creates a network topology — of root, leaf, and non-root, non-leaf nodes — that will transfer the data. Each ER node may be a single server or a cluster of servers, which this article further discusses below. All the interconnected nodes together are called an ER domain. The domain does not define what data will be transferred, just the paths or routes along which the data will flow.

Next, you decide what data you would like to be replicated. This is defined by an SQL query involving a server, database, table, and columns. The query acts as a filter whose output decides the data to replicate and proves to be a very flexible tool. If you'd like an entire table to be replicated, your query would be SELECT * FROM .... If you'd like just the first three columns to be replicated, your query would now be SELECT column1, column2, column3 FROM .... Or, if you'd like only certain rows to be replicated, you just use a WHERE clause in your query. ER can replicate data with very broad or very fine filters. To help with replication ER requires that the table have a primary key defined.

After the query is written, determine what nodes should participate in replicating the data. Let's say you want nodes A, B, and C to all have the same data in their Employee table. So A, B, and C are your participants. ER can be configured so that if data changes are made on any server, the other participants become updated too. This is called an update anywhere scenario. What happens if you only want updates to flow from A, into both B and C, but back never to A? ER gives you the flexibility to do these primary-target scenarios as well. In situations where data changes occur on more than one participant, it may happen that a row changed on server B conflicts with a change occurring on server C. ER allows you to define rules for automatically handling such conflicts. These include comparing timestamps, running a stored procedure, and ignoring conflicts.

Now that you know the query, the participants, and the scenario, take all this information and use it to create something called a replicate. Think of a replicate as a faucet that controls the flow of data from the query. The faucet can be turned on or off, and you can have as many faucets as you want. Replicates can be grouped into sets, which make it easier for users to control multiple replicates. Also, you can use templates to help you quickly create replicates for multiple tables. All this work is done through the server utility cdr.

After a replicate has been created and started, how does the data actually get moved? Well, first of all, the replication happens asynchronously. This means there may be a delay between data being committed on one node and it appears on another node. And only committed data, of course, is replicated. ER works by reading the logical logs, testing if a log record needs to be replicated, and finally sending that information to the appropriate participants.