Data Replication with SQL Remote
PART 2. Replication Design for SQL Remote
CHAPTER 6. Principles of SQL Remote Design
SQL Remote replication is based on the transaction log, enabling it to replicate only changes to data, rather than all data, in each update. When we say that SQL Remote replicates data, we really mean that SQL Remote replicates SQL statements that modify data.
SQL Remote replicates only statements in committed transactions, to ensure proper transaction atomicity throughout the replication setup and maintain a consistency among the databases involved in the replication, albeit with some time lag while the data is replicated.
When an UPDATE or a DELETE is replicated, SQL Remote uses the primary key columns to uniquely identify the row being updated or deleted. All tables being replicated should have a declared primary key or uniqueness constraint. A unique index is not sufficient. The columns of the primary key are used in the WHERE clause of replicated updates and deletes. If a table has no primary key, the WHERE clause refers to all columns in the table.
When a simple INSERT statement is entered at one database, it is sent to other databases in the SQL Remote setup as an INSERT statement. However, not all statements are replicated exactly as they are entered by the client application. This section describes how SQL Remote replicates SQL statements. It is important to understand this material if you are to design a robust SQL Remote installation.
The Message Agent is the component that carries out the replication of statements.
INSERT and DELETE statements are the simplest replication case. SQL Remote takes each INSERT or DELETE operation from the transaction log, and sends it to all sites that subscribe to the row being inserted or deleted.
If only a subset of the columns in the table is subscribed to, the INSERT statements sent to subscribers contains only those columns.
The Message Agent ensures that statements are not replicated to the user that initially entered them.
UPDATE statements are not replicated exactly as the client application enters them. This section describes two ways in which the replicated UPDATE statement may differ from the entered UPDATE statement.
If an UPDATE statement has the effect of removing a row from a given remote user's subscription, it is sent to that user as a DELETE statement. If an UPDATE statement has the effect of adding a row to a given remote user's subscription, it is sent to that user as an INSERT statement.
The figure illustrates a publication, where each subscriber subscribes by their name:
An UPDATE that changes the Rep value of a row from Marc to Ann is replicated to Marc as a DELETE statement, and to Ann as an INSERT statement.
This reassignment of rows among subscribers is sometimes called territory realignment, because it is a common feature of sales force automation applications, where customers are periodically reassigned among representatives.
An UPDATE statement changes the value of one or more rows from some existing value to a new value. The rows altered depend on the WHERE clause of the UPDATE statement.
When SQL Remote replicates an UPDATE statement, it does so as a set of single-row updates. These single-row statements can fail for one of the following reasons:
The row to be updated does not exist Each row is identified by its primary key values, and if a primary key has been altered by some other user, the row to be updated is not found.
In this case, the UPDATE does not update anything.
The row to be updated differs in one or more of its columns If one of the values expected to be present has been changed by some other user, an update conflict occurs.
At remote databases, the update takes place regardless of the values in the row.
At the consolidated database, SQL Remote allows conflict resolution operations to take place. Conflict resolution operations are held in a trigger or stored procedure, and run automatically when a conflict is detected.
In Adaptive Server Anywhere, the conflict resolution trigger runs before the update, and the update .proceeds when the trigger is finished. In Adaptive Server Enterprise, the conflict resolution procedure runs after the update has been applied.
A table without a primary key or uniqueness constraint refers to all columns in the WHERE clause of replicated updates When two users update the same row, replicated updates will not update anything and databases will become inconsistent. All replicated tables should have a primary key or uniqueness constraint and the columns in the constraint should never be updated.
Any replication system is faced with a choice between two options when replicating a stored procedure call:
Replicate the procedure call A corresponding procedure is executed at the replicate site, or
Replicate the procedure actions The individual actions (INSERTs, UPDATEs, DELETEs and so on) of the procedure are replicated.
SQL Remote replicates procedures by replicating the actions of a procedure. The procedure call is not replicated.
Trigger replication in SQL Remote is different for the Adaptive Server Enterprise Message Agent and the Adaptive Server Anywhere Message Agent.
From Adaptive Server Enterprise, trigger actions are replicated. For this reason, care must be taken in the remote Adaptive Server Anywhere databases to be sure that triggers are not fired when operations are being applied by the Message Agent, or they are written so that the replicated trigger actions from the Adaptive Server Enterprise server do not cause a problem.
The FIRE_TRIGGERS Adaptive Server Anywhere database option prevents triggers from being fired. You can set this option for the user ID used by the Message Agent, but be careful to not use this user ID for other purposes. Alternatively, you can use CURRENT REMOTE USER in your triggers make some trigger code not execute when it is NULL when operations are being applied by the Message Agent.
By default, the Message Agent for Adaptive Server Anywhere does not replicate actions performed by triggers; it is assumed that the trigger is defined remotely. This avoids permissions issues and the possibility of each action occurring twice. There are some exceptions to this rule:
Conflict resolution trigger actions The actions carried out by conflict resolution, or RESOLVE UPDATE, triggers are replicated from a consolidated database to all remote databases, including the one that sent the message causing the conflict.
Replication of BEFORE triggers Some BEFORE triggers can produce undesirable results when using SQL Remote, and so BEFORE trigger actions that modify the row being updated are replicated, before UPDATE actions.
You must be aware of this behavior when designing your installation. For example, a BEFORE UPDATE that bumps a counter column in the row to keep track of the number of times a row is updated would double count if replicated, as the BEFORE UPDATE trigger will fire when the UPDATE is replicated. To prevent this problem, you must ensure that, at the subscriber database, the trigger is not present or does not carry out the replicated action. Also, a BEFORE UPDATE that sets a column to the time of the last update will get the time the UPDATE is replicated as well.
The Adaptive Server Anywhere Message Agent has a command-line switch that causes it to replicate all trigger actions when sending messages. This is the dbremote -t switch.
If you use this switch, you must ensure that the trigger actions are not carried out twice at remote databases, once by the trigger being fired at the remote site, and once by the explicit application of the replicated actions from the consolidated database.
To ensure that trigger actions are not carried out twice, you can wrap an IF CURRENT REMOTE USER IS NULL ... END IF statement around the body of the triggers or you can set the Adaptive Server Anywhere Fire_triggers option to OFF for the Message Agent user ID.
Data definition statements (CREATE, ALTER, DROP, and others that modify database objects) are not replicated by SQL Remote unless they are entered while in passthrough mode.
For information about passthrough mode for Adaptive Server Anywhere, see Using passthrough mode.
Blobs are LONG VARCHAR, LONG BINARY, TEXT, and IMAGE data types: values that are longer than 256 characters.
SQL Remote includes a special method for replicating blobs between Adaptive Server Anywhere databases.
The Message Agent uses a variable in place of the value in the INSERT or UPDATE statement that is being replicated. The value of the variable is built up by a sequence of statements of the form
SET vble = vble || 'more_stuff'
This makes the size of the SQL statements involving long values smaller, so that they fit within a single message. The SET statements are separate SQL statements, so that the blob is effectively split over several SQL Remote messages.
Some blobs can be replicated in SQL Remote installations including an Adaptive Server Enterprise, but there are limitations on the size of object that can be replicated. The objects being replicated must fit into half the maximum size of a single message.
Ensure that all Message Agents in the system (both dbremote and ssremote) are running with a maximum message size greater than twice the size of the maximum blob size. You can configure the maximum message size using the -l command-line option.
If the maximum blob size is 100 Kb, run the Message Agents with -l 220k.
For information on Message Agent command lines, see The Message Agent.
Set the BLOB_THRESHOLD database option to a value larger the largest blob.
For example, with a maximum blob size of 100Kb, you could set BLOB_THRESHOLD to 110k. If you have SQL Anywhere 5.5.04 or earlier in your system, it will complain about BLOB_THRESHOLD being an unknown option: you can ignore this error.
For information about setting options, see SQL Remote options.
Sybase Open Client CTLIB applications that manipulate the CS_IODESC structure must not set the log_on_update member to FALSE.
The Message Agent for Adaptive Server Anywhere may be slow when applying the messages with large blobs.
The Verify_threshold database option can prevent long values from being verified (in the VERIFY clause of a replicated UPDATE). The default value for the option is 1000. If the data type of a column is longer than the threshold, old values for the column are not verified when an UPDATE is replicated. This keeps the size of SQL Remote messages down, but has the disadvantage that conflicting updates of long values are not detected.
There is a technique allowing detection of conflicts when Verify_threshold is being used to reduce the size of messages. Whenever a "blob" is updated, a last_modified column in the same table should also be updated. Conflicts can then be detected because the old value of the last_modified column is verified.
Repeated updates to a blob should be done in a "work" table, and the final version should be assigned to the replicated table. For example, if a document in progress is updated 20 times throughout the day and the Message Agent is run once at the end of the day, all 20 updates are replicated. If the document is 200Kb in length, this causes 4Mb of messages to be sent.
The better solution is to have a document_in_progress table. When the user is done revising a document, the application moves it from the document_in_progress table to the replicated table. The results in a single update (200Kb of messages).
The Adaptive Server Anywhere BLOB_THRESHOLD option allows further control over the replication of long values. Any value longer than the BLOB_THRESHOLD option is replicated as a blob. That is, it is broken into pieces and replicated in chunks, before being reconstituted by using a SQL variable and concatenating the pieces at the recipient site.
By setting BLOB_THRESHOLD to a high value in remote Adaptive Server Anywhere databases, blobs are not broken into pieces, and operations can be applied to Adaptive Server Enterprise by the Message Agent. Each SQL statement must fit within a message, so this only allows replication of small blobs.