NAME Encode::JIS2K - JIS X 0212 (aka JIS 2000) Encodings INSTALLATION To install this module type the following: perl Makefile.PL make make test make install SYNOPSIS use Encode::JIS2K; use Encode qw/encode decode/; $euc_2k = encode("euc-jisx0213", $utf8); $utf8 = decode("euc-jisx0213", $euc_jp); ABSTRACT This module implements encodings that covers JIS X 0213 charset (AKA JIS 2000, hence the module name). Encodings supported are as follows. Canonical Alias Description -------------------------------------------------------------------- euc-jisx0213 qr/\beuc.*jp[ \-]?(?:2000|2k)$/i EUC-JISX0213 qr/\bjp.*euc[ \-]?(2000|2k)$/i qr/\bujis[ \-]?(?:2000|2k)$/i shiftjisx0123 qr/\bshift.*jis(?:2000|2k)$/i Shift_JISX0213 qr/\bsjisp \-]?(?:2000|2k)$/i iso-2022-jp-3 jis0213-1-raw JIS X 0213 plane 1, raw format jis0213-2-raw JIS X 0213 plane 2, raw format -------------------------------------------------------------------- DESCRIPTION To find out how to use this module in detail, see the Encode manpage. what is JIS X 0213 anyway? Simply put, JIS X 0213 is a rework and reorganization of JIS X 0208 and JIS X 0212. They consist of two 94x94 planes which roughly corrensponds as follows; JIS X 0213 Plane 1 = JIS X 0208 + extension JIS X 0213 Plane 2 = JIS X 0212 reorganized + extension And here is the character repertoire there of at a glance. # of codepoints Kuten Ku (rows) used -------------------------------------------------------- JIS X 0208 6,879 1..8,16..83 JIS X 0213-1 8,762 1..94 (all!) JIS X 0212 6,067 2,6..7,9..11,16..77 JIS X 0213-2 2,436 1,3..5,8,12..15,78..94 ------------------------------------------------------- (JIS X0213 Total) 11,197 JIS X 0213 was designed to extend JIS X 0208 and JIS X 0212 without being imcompatible to (classic) EUC-JP and Shift_JIS. The following characteristics are as a result thereof. o JIS X plane 1 is (almost) a superset of JIS X 0208. However, with Unicode 3.2.0 the mappings differ in 3 codepoints. Kuten JIS X 0208 -> Unicode JIS X 0213 -> Unicode -------------------------------------------------------------- 1-1-17 # FULLWIDTH MACRON # OVERLINE 1-1-29 # EM DASH # HORIZONTAL BAR 1-1-79 # FULLWIDTH YEN SIGN # YEN SIGN o By the same token, JIS X 0213 plane 2 contains JIS Dai-4 Suijun Kanji (JIS Kanji Repertoire Level 4). This allows EUC-JP's G3 to contain both JIS X 0212 and JIS 0213 plane 2. However, JIS X 0212:1990 already contains many of Dai-4 Suijun Kanji so EUC's G3 is subject to containing dupli- cate mappings. o Because of Halfwidth Katakana, Shift_JIS mapping has been tricky and it is even trickier. Here is a regex that matches Shift_JISX0213 sequence (note: you have to "use bytes" to make it work!) $re_valid_shifjisx0213 = qr/^(?: [\x00-\x7f] | # ASCII or [\xa1-\xdf] | # JIS X 0201 KANA or [\x81-\x9f\xe0-\xfc][\x40-\x7e\x80-\xfc] # JIS X 0213 )+$/xo; Note on EUC-JISX0213 (vs. EUC-JP) As of Encode-1.64, 'euc-jp' does support euc-jisx0213 for decoding. However, 'euc-jp' in Encode and 'euc-jisx0213' differ as follows; euc-jp euc-jisx0213 -------------------------------------------------------------- Decodes.... (0201-K|0208|0212|0213) ditto Round-Trip (|0) (020-K|0208|0212) JIS X (0201-K|0213) Decode Only (|3) those only found in 0213 those only found in 0212 -------------------------------------------------------------- AUTHORS Dan Kogai COPYRIGHT Copyright 2002 by Dan Kogai . This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://www.perl.com/perl/misc/Artistic.html SEE ALSO the Encode manpage, the Encode::JP manpage Japanese Graphic Character Set for Information Interchange -- Plane 1 http://www.itscj.ipsj.or.jp/ISO-IR/228.pdf Japanese Graphic Character Set for Information Interchange -- Plane 2 http://www.itscj.ipsj.or.jp/ISO-IR/229.pdf