Friday, May 29, 2009

SQL_Latin1_General_Cp850_BIN

While working on Auto-Moderator for PostJobFree.com I encountered the following problem: my C# code considered ‘3’ and ‘3’ as different words, but SQL Server considered them the same.
That expressed itself in the following error:
Cannot insert duplicate key row in object 'dbo.Word' with unique index 'IX_Word'.

That was a little bit surprising, considering that I defined Word as Unicode column (nvarchar).

While searching for the solution, my first thought was to make 'IX_Word' index not unique. That worked, but would have introduced other problems with spam filtering business logic down the road.

The solution should have been about making SQL Server to compare strings exactly the same way C# code does.

I started to look into SQL Server Collations, and finally found the solution: use SQL_Latin1_General_Cp850_BIN collation.

Basically the solution is about to declaring 'Word' column with SQL_Latin1_General_Cp850_BIN collation:
Create Table Word(
WordId bigint identity(1,1) not null,
Word nvarchar(50) COLLATE SQL_Latin1_General_Cp850_BIN not null,
JobPostCount int not null DEFAULT 0,
JobLogSpamCount int not null DEFAULT 0,
CreateDate datetime not null,
UpdateDate datetime not null,
CONSTRAINT PK_Word Primary Key Clustered
(
WordId ASC
)
)
GO

Create Unique Index IX_Word ON Word
(
Word
)
GO

Possible drawbacks of the solution: using SQL_Latin1_General_Cp850_BIN collation may cause weird sorting in SQL queries, but sorting collation can be easily redefined like this:
select * from Word
order by Word COLLATE SQL_Latin1_General_CP1_CI_AS


Moreover, the sorting it provided by binary collation (SQL_Latin1_General_Cp850_BIN) looks quite reasonable.

You may also use SQL_Latin1_General_Cp850_BIN2 collation for better sorting.


Here’s SQL sample to you to play with:
--drop table t;
select N'3' as Word
into t;

insert into t
select '3' as Word;

select * from t
where Word = N'3';

select * from t
where Word = N'3'

select * from t
where Word = N'3' collate SQL_Latin1_General_Cp850_BIN;

select * from t
where Word = N'3' collate SQL_Latin1_General_Cp850_BIN

Enjoy!

Followers

About Me

My photo
Email me: blog@postjobfree.com