
Signature
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
> To the best of my knowlegde NVARCHAR in SQL Server is UCS-2 (double byte
> Unicode). In DB2 that would match GRAPHIC in a Unicode database.
> If you have a lot of NVARCHAR flying around you may want to consider
> just using a unicode. Your VARCHAR columns will then be UTF-8 and
> GRAPHIC UCS-2.
That is interesting. So, if the database's default character set is
unicode or UTF-8, then the SQL Server NVARCHAR would just map to a
VARCHAR in DB2. (I take it the same is true for other Nxyz data types
too.) That makes sense and simplifies things a lot.
Thanks a lot!
Serge Rielau - 10 Apr 2005 14:28 GMT
>>To the best of my knowlegde NVARCHAR in SQL Server is UCS-2 (double
>
[quoted text clipped - 11 lines]
>
> Thanks a lot!
Yes and no. It is correct that UTF-8 and UCS-2 have the same expressive
power w.r.t. codepoints.
Things are getting interesting when you do do SUBSTR() or LENGTH().
In UCS-2 things are easy (I simply a tiny bit here by not considering
"combining charcters") since 2 bytes match 1 character - always. DB2
knows that and SUBSTR(graphiccol, 3, 5) will truly give you the 5
charcters starting with the third.
In UTF-8 things get messy. Both SUBSTR() and LENGTH() (as well as other
string operations) use bytes for their unit for CHAR. So SUBSTR(utf8col,
3, 5) can give anywhere from 2-5 characters.
So if you don't do much in the way of string manipulation (other then
concat which is harmless) then UTF-8 will be good (space efficient). If
you do string manipulation I recommend GRAPHIC (at the cost of space).
Hope that helps.
Cheers
Serge
PS: In a futire version of DB2 character based string manipulation will
be provided. But this is the way of the land as it is right now.

Signature
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab