Ensuring unique primary keys

Data Replication with SQL Remote
PART 2. Replication Design for SQL Remote
CHAPTER 8. SQL Remote Design for Adaptive Server Enterprise

Ensuring unique primary keys

Users at physically distinct sites can each INSERT new rows to a table, so there is an obvious problem ensuring that primary key values are kept unique.

If two users INSERT a row using the same primary key values, the second INSERT to reach a given database in the replication system will fail. As SQL Remote is a replication system for occasionally-connected users, there can be no locking mechanism across all databases in the installation. It is necessary to design your SQL Remote installation so that primary key errors do not occur.

For primary key errors to be designed out of SQL Remote installations; the primary keys of tables that may be modified at more than one site must be guaranteed unique. There are several ways of achieving this goal. This chapter describes a general, economical and reliable method that uses a pool of primary key values for each site in the installation.

Overview of primary key pools

The primary key pool is a table that holds a set of primary key values for each database in the SQL Remote installation. Each remote user receives their own set of primary key values. When a remote user inserts a new row into a table, they use a stored procedure to select a valid primary key from the pool. The pool is maintained by periodically running a procedure at the consolidated database that replenishes the supply.

The method is described using a simple example database consisting of sales representatives and their customers. The tables are much simpler than you would use in a real database; this allows us to focus just on those issues important for replication.

The primary key pool

The pool of primary keys is held in a separate table. The following CREATE TABLE statement creates a primary key pool table:

CREATE TABLE KeyPool (
   table_name VARCHAR(40) NOT NULL,
   value INTEGER NOT NULL,
   location VARCHAR(6) NOT NULL,
   PRIMARY KEY (table_name, value),
)
go

The columns of this table have the following meanings:

Column

Description

table_name

Holds the names of tables for which primary key pools must be maintained. In our simple example, if new sales representatives were to be added only at the consolidated database, only the Customer table needs a primary key pool and this column is redundant. It is included to show a general solution.

value

Holds a list of primary key values. Each value is unique for each table listed in table_name.

location

In some setups, this could be the same as the rep_key value of the SalesRep table. In other setups, there will be users other than sales representatives and the two identifiers should be distinct.

Column	Description
table_name	Holds the names of tables for which primary key pools must be maintained. In our simple example, if new sales representatives were to be added only at the consolidated database, only the Customer table needs a primary key pool and this column is redundant. It is included to show a general solution.
value	Holds a list of primary key values. Each value is unique for each table listed in table_name.
location	In some setups, this could be the same as the rep_key value of the SalesRep table. In other setups, there will be users other than sales representatives and the two identifiers should be distinct.

For performance reasons, you may wish to create an index on the table:

CREATE INDEX KeyPoolLocation 
ON KeyPool (table_name, location, value)
go

Replicating the primary key pool

You can either incorporate the key pool into an existing publication, or share it as a separate publication. In this example, we create a separate publication for the primary key pool.

To replicate the primary key pool:

Create a publication for the primary key pool data.

sp_create_publication 'KeyPoolData'
go
sp_add_remote_table 'KeyPool'
go
sp_add_article 'KeyPoolData', 'KeyPool', 
   NULL, 'location'
go

Create subscriptions for each remote database to the KeyPoolData publication.
```
sp_subscription 'create', 
   KeyPoolData,
   field_user, 
   rep1
go
```
The subscription argument is the location identifier.

In some circumstances it makes sense to add the KeyPool table to an existing publication and use the same argument to subscribe to each publication. Here we keep the location and rep_key values distinct to provide a more general solution.

Filling and replenishing the key pool

Every time a user adds a new customer, their pool of available primary keys is depleted by one. The primary key pool table needs to be periodically replenished at the consolidated database using a procedure such as the following:

CREATE PROCEDURE ReplenishPool AS
BEGIN
   DECLARE @CurrTable  VARCHAR(40)
   DECLARE @MaxValue   INTEGER
   DECLARE EachTable   CURSOR FOR 
            SELECT table_name, max(value) 
            FROM KeyPool
            GROUP BY table_name
   DECLARE @CurrLoc    VARCHAR(6)
   DECLARE @NumValues  INTEGER
   DECLARE EachLoc       CURSOR FOR 
            SELECT location, count(*)
            FROM KeyPool 
            WHERE table_name = @CurrTable
            GROUP BY location
   OPEN EachTable
   WHILE 1=1 BEGIN
      FETCH EachTable INTO @CurrTable, @MaxValue
      IF @@sqlstatus != 0 BREAK
      OPEN EachLoc
      WHILE 1=1 BEGIN
         FETCH EachLoc INTO @CurrLoc, @NumValues
         IF @@sqlstatus != 0 BREAK
         -- make sure there are 10 values
         WHILE @NumValues < 10 BEGIN
            SELECT @MaxValue = @MaxValue + 1
            SELECT @NumValues = @NumValues + 1
            INSERT INTO KeyPool 
                   (table_name, location, value)
            VALUES (@CurrTable, @CurrLoc, @MaxValue)
         END
      END
      CLOSE EachLoc
   END
   CLOSE EachTable
END
go

This procedure fills the pool for each user up to ten values. You may wish to use a larger value in a production environment. The value you need depends on how often users are inserting rows into the tables in the database.

The ReplenishPool procedure must be run periodically at the consolidated database to refill the pool of primary key values in the KeyPool table.

The ReplenishPool procedure requires at least one primary key value to exist for each subscriber, so that it can find the maximum value and add one to generate the next set. To initially fill the pool you can insert a single value for each user, and then call ReplenishPool to fill up the rest. The following example illustrates this for three remote users and a single consolidated user named Office:

INSERT INTO KeyPool VALUES( 'Customer', 40, 'rep1' )
INSERT INTO KeyPool VALUES( 'Customer', 41, 'rep2' )
INSERT INTO KeyPool VALUES( 'Customer', 42, 'rep3' ) 
INSERT INTO KeyPool VALUES( 'Customer', 43, 'Office')
EXEC ReplenishPool
go

Cannot use a trigger to replenish the key pool
You cannot use a trigger to replenish the key pool, as no actions are replicated to the remote database performing the original operation, including trigger actions.

Adding new customers

When a sales representative wants to add a new customer to the Customer table, the primary key value to be inserted is obtained using a stored procedure. This example shows a stored procedure to supply the primary key value, and also illustrates a stored procedure to carry out the INSERT.

The procedures takes advantage of the fact that the Sales Rep identifier is the CURRENT PUBLISHER of the remote database.

NewKey procedure The NewKey procedure supplies an integer value from the key pool and deletes the value from the pool.

CREATE PROCEDURE NewKey 
         @TableName VARCHAR(40),
         @Location VARCHAR(6),
         @Value INTEGER OUTPUT AS  
   BEGIN
      DECLARE @NumValues INTEGER
      SELECT @NumValues = count(*), 
             @Value = min(value)
      FROM KeyPool 
      WHERE table_name = @TableName 
      AND location = @Location
      IF @NumValues > 1 
         DELETE FROM KeyPool 
         WHERE table_name = @TableName 
         AND value = @Value
      ELSE
         -- Never take the last value, 
         -- because RestorePool will not work.
         -- The key pool should be kept large 
         -- enough so this never happens.
         SELECT @Value = NULL
   END

NewCustomer procedure The NewCustomer procedure inserts a new customer into the table, using the value obtained by NewKey to construct the primary key.

CREATE PROCEDURE NewCustomer @name VARCHAR(40), 
         @loc VARCHAR(6) AS
BEGIN
   DECLARE @cust  INTEGER
   DECLARE @cust_key  VARCHAR(12)
   
   EXEC NewKey 'Customer', @loc, @cust output
   SELECT @cust_key = 'cust' + 
      convert( VARCHAR(12), @cust )
   INSERT INTO Customer (cust_key, name, rep_key ) 
   VALUES ( @cust_key, @name, @loc )
END

You may want to enhance this procedure by testing the @cust value obtained from NewKey to check that it is not NULL, and preventing the insert if it is NULL.

Testing the key pool

To test the primary key pool:

Re-extract a remote database using the field_user user ID.
Try this sample INSERT at the remote and consolidated sites:
```
EXEC NewCustomer 'Great White North', rep1
```

Primary key pool summary

The primary key pool technique requires the following components:

Key pool table A table to hold valid primary key values for each database in the installation.
Replenishment procedure A stored procedure keeps the key pool table filled.
Sharing of key pools Each database in the installation must subscribe to its own set of valid values from the key pool table.
Data entry procedures New rows are entered using a stored procedure that picks the next valid primary key value from the pool and delete that value from the key pool.