Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / DB2 Topics / May 2006

Tip: Looking for answers? Try searching our database.

Storing some Japanese data.

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
tony.pahl@gmail.com - 09 May 2006 21:29 GMT
We are converting a data warehouse to a Unicode database to get ready
for multilingual support.  If we will have 95% of our data in English
as we currently do, and less than 5% in other foreign languages
including Japanese, it appears as if we would be best off using
codepage of 1208 and UTF-8.  We are thinking we would need to expand
our 'char' and 'varchar' datatypes by four times to accommadate the
Japanese data.  Using 'varchar' should minimize the database size
required to store our data, right?  We would be minimizing storage
required for single byte English characters which are the majority of
our data.  Can anyone validate or shed some further light on this?
Thanks, Tony
Rhino - 11 May 2006 23:14 GMT
> We are converting a data warehouse to a Unicode database to get ready
> for multilingual support.  If we will have 95% of our data in English
[quoted text clipped - 7 lines]
> our data.  Can anyone validate or shed some further light on this?
> Thanks, Tony

I've never really had anything to do with storing foreign character sets but
I've always understood that Japanese and other ideographic languages are
supposed to use the "graphic" datatypes, namely GRAPHIC, VARGRAPHIC, and
LONG VARGRAPHIC. These are designed for DBCS (Double Byte Character Set)
data and only use twice as much space as English characters, not four times
as much.

As I said, I've never really worked with Japanese, Korean, Thai or other
non-Latin languages and I'm not very current on the preferred ways of
handling them. It's quite possible that Unicode is the better way to handle
that sort of data today.

I think you should be able to find more information on the best ways of
handling foreign character sets, like Japanese, in the Information Center
for your version of DB2. The information center for DB2 Version 8 for
Unix/Linux/Windows can be found at
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp. I search on
DBCS and came up with lots of hits that discussed DBCS, UTF-8, and other
approaches, including this chart:

 Table 50. Japan, territory identifier: JP Code page  Group  Code set
Territory code  Locale  Operating system
     932  D-1  IBM-932  81  Ja_JP  AIX
     943  D-1  IBM-943  81  Ja_JP  AIX
     See note 2.
     954  D-1  IBM-eucJP  81  ja_JP  AIX
     1208  N-1  UTF-8  81  JA_JP  AIX
     930  D-1  IBM-930  81  -  Host
     939  D-1  IBM-939  81  -  Host
     5026  D-1  IBM-5026  81  -  Host
     5035  D-1  IBM-5035  81  -  Host
     1390  D-1     81  -  Host
     1399  D-1     81  -  Host
     954  D-1  eucJP  81  ja_JP.eucJP  HP-UX
     5039  D-1  SJIS  81  ja_JP.SJIS  HP-UX
     954  D-1  EUC-JP  81  ja_JP  Linux
     932  D-1  IBM-932  81  -  OS/2
     942  D-1  IBM-942  81  -  OS/2
     943  D-1  IBM-943  81  -  OS/2
     954  D-1  eucJP  81  ja  SCO
     954  D-1  eucJP  81  ja_JP  SCO
     954  D-1  eucJP  81  ja_JP.EUC  SCO
     954  D-1  eucJP  81  ja_JP.eucJP  SCO
     943  D-1  IBM-943  81  ja_JP.PCK  Solaris
     954  D-1  eucJP  81  ja  Solaris
     954  D-1  eucJP  81  japanese  Solaris
     1208  N-1  UTF-8  81  ja_JP.UTF-8  Solaris
     943  D-1  IBM-943  81  -  Windows
     1394  D-1     81  -
     See note 3.

These are the relevant notes for the table:
 1..
 2.. On AIX 4.3 or later the code page is 943. If you are using AIX 4.2 or
earlier, the code page is 932.
 3.. Code page 1394 (Shift JIS X0213) can only be used with the load or
import utilities to move data from code page 1394 to a DB2 UDB Unicode
database, or to export from a DB2 UDB Unicode database to code page 1394.

If you search on terms like "DBCS", "Japanese", "Unicode", "UCS-2", and so
forth, you should find the best ways to store Japanese data.

--
Rhino
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.