convert_cyr_string()
(PHP 4, PHP 5, PHP 7)
将字符由一种 Cyrillic 字符转换成另一种
说明
convert_cyr_string(string $str, string $from, string $to) : string
此函数将给定的字符串从一种 Cyrillic 字符转换成另一种,返回转换之后的字符串。
参数
- $str
要转换的字符。
- $from
单个字符,代表源 Cyrillic 字符集。
- $to
单个字符,代表了目标 Cyrillic 字符集。
支持的类型有:
- k - koi8-r
- w - windows-1251
- i - iso8859-5
- a - x-cp866
- d - x-cp866
- m - x-mac-cyrillic
返回值
返回转换后的字符串。
注释
Note:此函数可安全用于二进制对象。
Only this code works OK for me, for translating win-1251 to utf-8 for macedonian letters! // Modificated by tapin13 // Corrected by Timuretis // Corrected by Sote for macedonian cyrillic // Convert win-1251 to utf-8 function unicode_mk_cyr($str) { $encode = ""; for ($ii=0;$ii<strlen($str);$ii++) { $xchr=substr($str,$ii,1); echo "<p>".ord($xchr)."</p>\n"; if (ord($xchr)>191) { $xchr=ord($xchr)+848; $xchr="&#" . $xchr . ";"; } if(ord($xchr) == 129) { $xchr = "Ѓ"; } if(ord($xchr) == 163) { $xchr = "Ј"; } if(ord($xchr) == 138) { $xchr = "Љ"; } if(ord($xchr) == 140) { $xchr = "Њ"; } if(ord($xchr) == 143) { $xchr = "Џ"; } if(ord($xchr) == 141) { $xchr = "Ќ"; } if(ord($xchr) == 189) { $xchr = "Ѕ"; } if(ord($xchr) == 188) { $xchr = "ј"; } if(ord($xchr) == 131) { $xchr = "ѓ"; } if(ord($xchr) == 190) { $xchr = "ѕ"; } if(ord($xchr) == 154) { $xchr = "љ"; } if(ord($xchr) == 156) { $xchr = "њ"; } if(ord($xchr) == 159) { $xchr = "џ"; } if(ord($xchr) == 157) { $xchr = "ќ"; } $encode=$encode . $xchr; } return $encode; }
He is improved function to decode win1251->UTF8 <?php function win2utf($s){ $c209 = chr(209); $c208 = chr(208); $c129 = chr(129); for($i=0; $i<strlen($s); $i++) { $c=ord($s[$i]); if ($c>=192 and $c<=239) $t.=$c208.chr($c-48); elseif ($c>239) $t.=$c209.chr($c-112); elseif ($c==184) $t.=$c209.$c209; elseif ($c==168) $t.=$c208.$c129; else $t.=$s[$i]; } return $t; } ?>
To: mihailsbo at lycos dot ru Transliteration could be done easier: <? function transliterate($cyrstr) { $ru = array('A', 'a', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?'); $en = array('A', 'a', 'B', 'b', 'V', 'v', 'G', 'g', 'D', 'd', 'E', 'e', 'E', 'e', 'Zh', 'zh', 'Z', 'z', 'I', 'i', 'J', 'j', 'K', 'k', 'L', 'l', 'M', 'm', 'N', 'n', 'O', 'o', 'P', 'p', 'R', 'r', 'S', 's', 'T', 't', 'U', 'u', 'F', 'f', 'H', 'h', 'C', 'c', 'Ch', 'ch', 'Sh', 'sh', 'Sch', 'sch', '\'', '\'', 'Y', 'y', '\'', '\'', 'E', 'e', 'Ju', 'ju', 'Ja', 'ja'); return str_replace($ru, $en, $cyrstr); } ?>
cathody at mail dot ru(27-Jul-2005 06:41) You function doesn't work on my PC.. It's work: function Encode2($str,$type) { $conv=array(); for($x=192;$x<=239;$x++) $conv[u][chr($x)]=chr(208).chr($x-48); for($x=240;$x<=255;$x++) $conv[u][chr($x)]=chr(209).chr($x-112); $conv[u][chr(168)]=chr(208).chr(129); $conv[u][chr(184)]=chr(209).chr(209); $conv[w]=array_reverse($conv[u]); if($type=='w' || $type=='u') return strtr($str,$conv[$type]); else return $str; }
Check this code -- exelent to convert win-1251 to UTF-8 just one fix!!! if ($c==184) { $t.=chr(209).chr(145); continue; }; Anything more it is not necessary. It is grateful to threed [at] koralsoft.com 28-Jul-2003 03:37 i tried all functions here to convert from cp1251 to unicode, but they don't work. i think that this work : <?php function win3utf($s) { for($i=0, $m=strlen($s); $i<$m; $i++) { $c=ord($s[$i]); if ($c<=127) {$t.=chr($c); continue; } if ($c>=192 && $c<=207) {$t.=chr(208).chr($c-48); continue; } if ($c>=208 && $c<=239) {$t.=chr(208).chr($c-48); continue; } if ($c>=240 && $c<=255) {$t.=chr(209).chr($c-112); continue; } if ($c==184) { $t.=chr(209).chr(209); continue; }; if ($c==168) { $t.=chr(208).chr(129); continue; }; } return $t; } ?>
i tried all functions here to convert from cp1251 to unicode, but they don't work. i think that this work : <?php function win3utf($s) { for($i=0, $m=strlen($s); $i<$m; $i++) { $c=ord($s[$i]); if ($c<=127) {$t.=chr($c); continue; } if ($c>=192 && $c<=207) {$t.=chr(208).chr($c-48); continue; } if ($c>=208 && $c<=239) {$t.=chr(208).chr($c-48); continue; } if ($c>=240 && $c<=255) {$t.=chr(209).chr($c-112); continue; } if ($c==184) { $t.=chr(209).chr(209); continue; }; if ($c==168) { $t.=chr(208).chr(129); continue; }; } return $t; } ?>
A better function to convert cp1251 string to utf8. Works with russian and ukrainian text. function unicod($str) { $conv=array(); for($x=128;$x<=143;$x++) $conv[$x+112]=chr(209).chr($x); for($x=144;$x<=191;$x++) $conv[$x+48]=chr(208).chr($x); $conv[184]=chr(209).chr(145); #ё $conv[168]=chr(208).chr(129); #Ё $conv[179]=chr(209).chr(150); #і $conv[178]=chr(208).chr(134); #І $conv[191]=chr(209).chr(151); #ї $conv[175]=chr(208).chr(135); #ї $conv[186]=chr(209).chr(148); #є $conv[170]=chr(208).chr(132); #Є $conv[180]=chr(210).chr(145); #ґ $conv[165]=chr(210).chr(144); #Ґ $conv[184]=chr(209).chr(145); #Ґ $ar=str_split($str); foreach($ar as $b) if(isset($conv[ord($b)])) $nstr.=$conv[ord($b)]; else $nstr.=$b; return $nstr; }
//I've also built the same way for hebrew to utf converting function heb2utf($s) { for($i=0, $m=strlen($s); $i<$m; $i++) { $c=ord($s[$i]); if ($c<=127) {$t.=chr($c); continue; } if ($c>=224 ) {$t.=chr(215).chr($c-80); continue; } } return $t; } //Simple unicoder and decoder for hebrew and russian: function unicode_hebrew($str) { for ($ii=0;$ii<strlen($str);$ii++) { $xchr=substr($str,$ii,1); if (ord($xchr)>223) { $xchr=ord($xchr)+1264; $xchr="&#" . $xchr . ";"; } $encode=$encode . $xchr; } return $encode; } function unicode_russian($str) { for ($ii=0;$ii<strlen($str);$ii++) { $xchr=substr($str,$ii,1); if (ord($xchr)>191) { $xchr=ord($xchr)+848; $xchr="&#" . $xchr . ";"; } $encode=$encode . $xchr; } return $encode; } function decode_unicoded_hebrew($str) { $decode=""; $ar=split("&#",$str); foreach ($ar as $value ) { $in1=strpos($value,";"); //end of code if ($in1>0) {// unicode $code=substr($value,0,$in1); if ($code>=1456 and $code<=1514) { //hebrew $code=$code-1264; $xchr=chr($code); } else { //other unicode $xchr="&#" . $code . ";"; } $xchr=$xchr . substr($value,$in1+1); } else //not unicode $xchr = $value; $decode=$decode . $xchr; } return $decode; } function decode_unicoded_russian($str) { $decode=""; $ar=split("&#",$str); foreach ($ar as $value ) { $in1=strpos($value,";"); //end of code if ($in1>0) {// unicode $code=substr($value,0,$in1); if ($code>=1040 and $code<=1103) { $code=$code-848; $xchr=chr($code); } else { $xchr="&#" . $code . ";"; } $xchr=$xchr . substr($value,$in1+1); } else $xchr = $value; $decode=$decode . $xchr; } return $decode; }
Not all of cyrilic characters are supported by this function. Cyrilic chars from Macedonian Alphabet like Sh, kj, dz' ,nj are not supported.
:) what about NUMBER!!!??? function Utf8Win($str,$type="w") { static $conv=''; if (!is_array($conv)) { $conv = array(); for($x=128;$x<=143;$x++) { $conv['u'][]=chr(209).chr($x); $conv['w'][]=chr($x+112); } for($x=144;$x<=191;$x++) { $conv['u'][]=chr(208).chr($x); $conv['w'][]=chr($x+48); } $conv['u'][]=chr(208).chr(129); // Ё $conv['w'][]=chr(168); $conv['u'][]=chr(209).chr(145); // ё $conv['w'][]=chr(184); $conv['u'][]=chr(208).chr(135); // Ї $conv['w'][]=chr(175); $conv['u'][]=chr(209).chr(151); // ї $conv['w'][]=chr(191); $conv['u'][]=chr(208).chr(134); // І $conv['w'][]=chr(178); $conv['u'][]=chr(209).chr(150); // і $conv['w'][]=chr(179); $conv['u'][]=chr(210).chr(144); // Ґ $conv['w'][]=chr(165); $conv['u'][]=chr(210).chr(145); // ґ $conv['w'][]=chr(180); $conv['u'][]=chr(208).chr(132); // Є $conv['w'][]=chr(170); $conv['u'][]=chr(209).chr(148); // є $conv['w'][]=chr(186); $conv['u'][]=chr(226).chr(132).chr(150); // № $conv['w'][]=chr(185); } if ($type == 'w') { return str_replace($conv['u'],$conv['w'],$str); } elseif ($type == 'u') { return str_replace($conv['w'], $conv['u'],$str); } else { return $str; } }
Sorry for my previous post. NOT array_reverce, array_flip is actual function. Correct function: function Encode($str,$type=u) { $conv=array(); for($x=192;$x<=239;$x++) $conv[u][chr($x)]=chr(208).chr($x-48); for($x=240;$x<=255;$x++) $conv[u][chr($x)]=chr(209).chr($x-112); $conv[u][chr(168)]=chr(208).chr(129); $conv[u][chr(184)]=chr(209).chr(209); $conv[w]=array_flip($conv[u]); if($type=='w' || $type=='u') return strtr($str,$conv[$type]); else return $str; } Sorry for my English ;)
threed's function works great, but the replacement for the letter small io (ё) needs to be changed from <?php if ($c==184) { $t.=chr(209).chr(209); continue; }; ?> to <?php if ($c==184) { $t.=chr(209).chr(145); continue; }; ?> so, the final working result should look like this: <?php function win3utf($s) { for($i=0, $m=strlen($s); $i<$m; $i++) { $c=ord($s[$i]); if ($c<=127) {$t.=chr($c); continue; } if ($c>=192 && $c<=207) {$t.=chr(208).chr($c-48); continue; } if ($c>=208 && $c<=239) {$t.=chr(208).chr($c-48); continue; } if ($c>=240 && $c<=255) {$t.=chr(209).chr($c-112); continue; } if ($c==184) { $t.=chr(209).chr(209); continue; }; if ($c==168) { $t.=chr(208).chr(129); continue; }; } return $t; } ?>
I have made mistake remove this test line: echo "<p>".ord($xchr)."</p>\n"; code should be like this: // Modificated by tapin13 // Corrected by Timuretis // Corrected by Sote for macedonian cyrillic // Convert win-1251 to utf-8 function unicode_mk_cyr($str) { $encode = ""; for ($ii=0;$ii<strlen($str);$ii++) { $xchr=substr($str,$ii,1); if (ord($xchr)>191) { $xchr=ord($xchr)+848; $xchr="&#" . $xchr . ";"; } if(ord($xchr) == 129) { $xchr = "Ѓ"; } if(ord($xchr) == 163) { $xchr = "Ј"; } if(ord($xchr) == 138) { $xchr = "Љ"; } if(ord($xchr) == 140) { $xchr = "Њ"; } if(ord($xchr) == 143) { $xchr = "Џ"; } if(ord($xchr) == 141) { $xchr = "Ќ"; } if(ord($xchr) == 189) { $xchr = "Ѕ"; } if(ord($xchr) == 188) { $xchr = "ј"; } if(ord($xchr) == 131) { $xchr = "ѓ"; } if(ord($xchr) == 190) { $xchr = "ѕ"; } if(ord($xchr) == 154) { $xchr = "љ"; } if(ord($xchr) == 156) { $xchr = "њ"; } if(ord($xchr) == 159) { $xchr = "џ"; } if(ord($xchr) == 157) { $xchr = "ќ"; } $encode=$encode . $xchr; } return $encode; }
previous bit of code (grmaxim's win_to_utf8 function) didn't work for me, so I wrote my own func to convert from win1251 to utf8: <?php function win2utf($s) { for($i=0,$m=strlen($s);$i<$m;$i++) { $c=ord($s[$i]); if ($c>127) // convert only special chars if ($c==184) $t.=chr(209).chr(209); // small io elseif ($c==168) $t.=chr(208).chr(129); // capital io else $t.=($c>239?chr(209): chr(208)).chr($c-48); else $t.=$s[$i]; } return $t; } ?> Hope this helps
Unfortunately input data must be a string only. But it is may be changed! ;) To convert multi-dimensional array I use this recursive function: <?php function convert_cyr_array($array,$from,$to){ foreach($array as $key=>$value){ if(is_array($value)) { $result[$key] = convert_cyr_array($value,$from,$to); continue; } $result[$key] = convert_cyr_string($value,$from,$to); } return $result; } ?> An example: <?php $array[0] = "сВМПЛП"; $array[1] = "зТХЫБ"; $array[2] = array("пЗХТЕГ","рПНЙДПТ"); $array[3] = array( array("бРЕМШУЙО","нБОДБТЙО"), array("бВТЙЛПУ","рЕТУЙЛ") ); $result = convert_cyr_array($array,"k","w"); /* Returns: Array ( [0] => Яблоко [1] => Груша [2] => Array ( [0] => Огурец [1] => Помидор ) [3] => Array ( [0] => Array ( [0] => Апельсин [1] => Мандарин ) [1] => Array ( [0] => Абрикос [1] => Персик ) ) )*/ ?>
Praising other people for their efforts to write a convenient UTF8 to Win-1251 functions may I mention that, since str_replace allows arrays as parameters, the function may be rewritten in a slightly efficient way (moreover, the array generated may be stored for performance improvement): <?php function Encode ( $str, $type ) { // $type: // 'w' - encodes from UTF to win // 'u' - encodes from win to UTF static $conv=''; if (!is_array ( $conv )) { $conv=array (); for ( $x=128; $x <=143; $x++ ) { $conv['utf'][]=chr(209).chr($x); $conv['win'][]=chr($x+112); } for ( $x=144; $x <=191; $x++ ) { $conv['utf'][]=chr(208).chr($x); $conv['win'][]=chr($x+48); } $conv['utf'][]=chr(208).chr(129); $conv['win'][]=chr(168); $conv['utf'][]=chr(209).chr(145); $conv['win'][]=chr(184); } if ( $type=='w' ) return str_replace ( $conv['utf'], $conv['win'], $str ); elseif ( $type=='u' ) return str_replace ( $conv['win'], $conv['utf'], $str ); else return $str; } ?>