Azure SQL DW Identity Column Bugs

Azure SQL DW Identity Column Bugs

Those of you who are seasoned at implementing SQL Server have probably used the identity column feature. It is a great tool for generating surrogate keys in your tables without needing to include logic in your code. With any new feature, comes the good and the bad. The good news if you looking at Azure SQL Data Warehouse (SQL DW) for your next implementation is that Microsoft introduced the identity column feature last year. Even better news is that you can use it the same way as its on-premise SQL Server counterpart. The bad news is that you may encounter some nasty data bugs if you are not careful. This blog addresses two separate (but related) bugs that exist with identity columns and ways to work around them.

Issue 1: Identity Value Reuse After Copying Table

The first area to be careful with is copying a table with an identity column.  In most cases, the preferred method for copying a table (whether it is to rename / add columns in the middle of a table or to redistribute a table) is to write a query such as CREATE TABLE dbo.copy_table AS SELECT * FROM dbo.original_table, as suggested by Microsoft.  However, the following example demonstrates how this can result in duplicate identity value generation.

Consider the case where we create a source table with an identity column and >200,000 records:

CREATE TABLE dbo.test_identity
(
	dim_key int identity(1,1) NOT NULL,
	int_value int NOT NULL,
	string_value varchar(50) NOT NULL
)
WITH
(
	DISTRIBUTION = ROUND_ROBIN,
	HEAP
);

INSERT INTO dbo.test_identity (int_value, string_value)
SELECT 1, 'xxx'
  FROM sys.all_objects ao1
 CROSS JOIN sys.all_objects ao2;

INSERT INTO dbo.test_identity (int_value, string_value)
SELECT int_value, string_value FROM dbo.test_identity;

INSERT INTO dbo.test_identity (int_value, string_value)
SELECT int_value, string_value FROM dbo.test_identity;

We will then verify that this table has no duplicate identity column values:

-- This returns 0 records
SELECT dim_key, COUNT(1) FROM dbo.test_identity GROUP BY dim_key HAVING COUNT(1) > 1 ORDER BY 1 ASC;

Next, we will create a copy of this table:

CREATE TABLE dbo.test_identity_2
WITH
(
	DISTRIBUTION = REPLICATE,
	HEAP
)
AS
SELECT *
  FROM dbo.test_identity;

We will now verify that there are no duplicate values in the new table (as expected):

-- This returns 0 records
SELECT dim_key, COUNT(1) FROM dbo.test_identity_2 GROUP BY dim_key HAVING COUNT(1) > 1 ORDER BY 1 ASC;

Next, we will perform an insert to double the amount of records in the new table:

INSERT INTO dbo.test_identity_2 (int_value, string_value)
SELECT int_value, string_value FROM dbo.test_identity;

As we check for duplicates again, we will now see duplicate keys assigned:

-- This brings back many records
SELECT dim_key, COUNT(1) FROM dbo.test_identity_2 GROUP BY dim_key HAVING COUNT(1) > 1 ORDER BY 1 ASC;

The issue happening here is that SQL DW may seed the identity column incorrectly after copying a table or performing IDENTITY_INSERT.  You may think adding a primary key constraint may help detect when this happens. However, Azure SQL DW does not support any form of unique constraints in its current form (see the current list of unsupported features here). To work around this problem, make sure you reset the identity seed on your new table. Returning to our example, we will create one more copy of the original table and reseed the copied table immediately after.

CREATE TABLE dbo.test_identity_3
WITH
(
	DISTRIBUTION = REPLICATE,
	HEAP
)
AS
SELECT *
  FROM dbo.test_identity;

DECLARE @maxValue int;
SELECT @maxValue = MAX(dim_key) + 1 FROM dbo.test_identity_3;
DBCC CHECKIDENT('dbo.test_identity_3', RESEED, @maxValue);

We will again insert records to double the size of this table. This time, we will see no duplicate keys in our final table:

INSERT INTO dbo.test_identity_3 (int_value, string_value)
SELECT int_value, string_value FROM dbo.test_identity;

-- This brings back 0 records
SELECT dim_key, COUNT(1) FROM dbo.test_identity_3 GROUP BY dim_key HAVING COUNT(1) > 1 ORDER BY 1 ASC;

Issue 2: Copying Replicated Table with Identity Column Changes Generated Values

The second issue only occurs when copying a replicated table with an identity column to another replicated table. This one is a bit more esoteric in that the following conditions need to be true for this bug to occur:

  1. The source table is using DISTRIBUTION = REPLICATE.
  2. The source table had its identity column reseeded at some point.
  3. The destination table is using DISTRIBUTION = REPLICATE.

In this scenario, using the normal approach of CREATE TABLE dbo.copy_table AS SELECT * FROM dbo.original_table can cause existing identity values to be reassigned in the target table. The following script demonstrates this issue:

CREATE TABLE dbo.test_identity
(
	dim_key int identity(1,1) NOT NULL,
	int_value int NOT NULL,
	string_value varchar(50) NOT NULL
)
WITH
(
	DISTRIBUTION = REPLICATE,
	HEAP
);


INSERT INTO dbo.test_identity (int_value, string_value)
SELECT 3, 'aaa'
UNION ALL
SELECT 5, 'bbb'
UNION ALL
SELECT 7, 'ccc'
UNION ALL
SELECT 9, 'ddd'
UNION ALL
SELECT 11, 'eee';

SELECT * FROM dbo.test_identity ORDER BY 1 ASC;
GO


DECLARE @maxValue int;
SELECT @maxValue = MAX(dim_key) + 1 FROM dbo.test_identity;
DBCC CHECKIDENT('dbo.test_identity', RESEED, @maxValue);
GO


CREATE TABLE dbo.test_identity_2
WITH
(
	DISTRIBUTION = REPLICATE,
	HEAP
)
AS
SELECT *
  FROM dbo.test_identity;
GO


SELECT * FROM dbo.test_identity_2 ORDER BY 1 ASC;
GO

When you run this script, you will see that the original table and the new table have completely different identity values assigned. To see how esoteric this issue is, if you comment out the reseeding part of the script, you will see that this copy works as expected (values are the same). To work around this problem, create an empty table first, then perform identity inserts on that table (while also running a reseed just to be safe):

CREATE TABLE dbo.test_identity_3
(
	dim_key int identity(1,1) NOT NULL,
	int_value int NOT NULL,
	string_value varchar(50) NOT NULL
)
WITH
(
	DISTRIBUTION = REPLICATE,
	HEAP
);
GO


SET IDENTITY_INSERT dbo.test_identity_3 ON;

INSERT INTO dbo.test_identity_3 (dim_key, int_value, string_value)
SELECT dim_key, int_value, string_value FROM dbo.test_identity;

SET IDENTITY_INSERT dbo.test_identity_3 OFF;

DECLARE @maxValue int;
SELECT @maxValue = MAX(dim_key) + 1 FROM dbo.test_identity_3;
DBCC CHECKIDENT('dbo.test_identity_3', RESEED, @maxValue);
GO


SELECT * FROM dbo.test_identity_3 ORDER BY 1 ASC;
GO

You will now see that the values in the new table line up with the values in the original table.

The Bottom Line:

Identity columns can do some very unexpected things in Azure SQL DW when copying tables. You can get duplicate key values or have all your identity values unexpectedly reassigned in the new table. Errors like this can possibly be very difficult to recover from after the fact.

To proactively avoid issues, ensure you do the following when copying a table with an identity column using CTAS (CREATE TABLE AS) syntax:

  • Always reseed the new table before inserting any new records. This will prevent duplicate identity values from being assigned when inserting new records.  Reseeding identity columns is as simple as running the following script on your new table. All you need to do is substitute in the correct column and table name.
DECLARE @maxValue int;
SELECT @maxValue = MAX(dim_key) + 1 FROM dbo.new_table;
DBCC CHECKIDENT('dbo.new_table', RESEED, @maxValue);
  • If copying a replicated table to another replicated table, create the new table first, then perform identity inserts.  This is a slower operation, but is much safer.  After doing this, refer to the previous bullet point to reseed the new table.

If you follow these steps, you can be comfortable that you have worked around the identity column bugs.  (It should also go without saying – verify this works on your Development environment first!)

Phone: 312-602-4000
Email: marketing@westmonroepartners.com
222 W. Adams
Chicago, IL 60606
Show Buttons
Share On Facebook
Share On Twitter
Share on LinkedIn
Hide Buttons