mb_split()
(PHP 4 >= 4.2.0, PHP 5, PHP 7)
使用正则表达式分割多字节字符串
说明
mb_split(string $pattern,string $string[,int $limit= -1]): array
使用正则表达式$pattern分割多字节$string并返回结果array。
参数
- $pattern
正则表达式。
- $string
待分割的string。
- $limit
- 如果指定了可选参数$limit,将最多分割为$limit个元素。
返回值
array的结果。
注释
Note:Thecharacter encoding specified by mb_regex_encoding()will be used as the character encoding for this function by default.
参见
mb_regex_encoding()
Set/Get character encoding for multibyte regexmb_ereg()
Regular expression match with multibyte support
a (simpler) way to extract all characters from a UTF-8 string to array with a single call to a built-in function: <?php $str = 'Ма- руся'; print_r(preg_split('//u', $str, null, PREG_SPLIT_NO_EMPTY)); ?> Output: Array ( [0] => М [1] => а [2] => - [3] => [4] => р [5] => у [6] => с [7] => я )
The $pattern argument doesn't use /pattern/ delimiters, unlike other regex functions such as preg_match. <?php # Works. No slashes around the /pattern/ print_r( mb_split("\s", "hello world") ); Array ( [0] => hello [1] => world ) # Doesn't work: print_r( mb_split("/\s/", "hello world") ); Array ( [0] => hello world ) ?>
I figure most people will want a simple way to break-up a multibyte string into its individual characters. Here's a function I'm using to do that. Change UTF-8 to your chosen encoding method. <?php function mbStringToArray ($string) { $strlen = mb_strlen($string); while ($strlen) { $array[] = mb_substr($string,0,1,"UTF-8"); $string = mb_substr($string,1,$strlen,"UTF-8"); $strlen = mb_strlen($string); } return $array; } ?>
To split an string like this: "日、に、本、ほん、語、ご" using the "、" delimiter i used: $v = mb_split('、',"日、に、本、ほん、語、ご"); but didn't work. The solution was to set this before: mb_regex_encoding('UTF-8'); mb_internal_encoding("UTF-8"); $v = mb_split('、',"日、に、本、ほん、語、ご"); and now it's working: Array ( [0] => 日 [1] => に [2] => 本 [3] => ほん [4] => 語 [5] => ご )
In addition to Sezer Yalcin's tip. This function splits a multibyte string into an array of characters. Comparable to str_split(). <?php function mb_str_split( $string ) { # Split at all position not after the start: ^ # and not before the end: $ return preg_split('/(?<!^)(?!$)/u', $string ); } $string = '火车票'; $charlist = mb_str_split( $string ); print_r( $charlist ); ?> # Prints: Array ( [0] => 火 [1] => 车 [2] => 票 )
I agree that some people might want a mb_explode('', $string); this is my solution for it: <?php $string = 'Hallöle'; $array = array_map(function ($i) use ($string) { return mb_substr($string, $i, 1); }, range(0, mb_strlen($string) -1)); expect($array)->toEqual(['H', 'a', 'l', 'l', 'ö', 'l', 'e']); ?>
an other way to str_split multibyte string: <?php $s='әӘөүҗңһ'; //$temp_s=iconv('UTF-8','UTF-16',$s); $temp_s=mb_convert_encoding($s,'UTF-16','UTF-8'); $temp_a=str_split($temp_s,4); $temp_a_len=count($temp_a); for($i=0;$i<$temp_a_len;$i++){ //$temp_a[$i]=iconv('UTF-16','UTF-8',$temp_a[$i]); $temp_a[$i]=mb_convert_encoding($temp_a[$i],'UTF-8','UTF-16'); } echo('<pre>'); print_r($temp_a); echo('</pre>'); //also possible to directly use UTF-16: define('SLS',mb_convert_encoding('/','UTF-16')); $temp_s=mb_convert_encoding($s,'UTF-16','UTF-8'); $temp_a=str_split($temp_s,4); $temp_s=implode(SLS,$temp_a); $temp_s=mb_convert_encoding($temp_s,'UTF-8','UTF-16'); echo($temp_s); ?>
We are talking about Multi Byte ( e.g. UTF-8) strings here, so preg_split will fail for the following string: 'Weiße Rosen sind nicht grün!' And because I didn't find a regex to simulate a str_split I optimized the first solution from adjwilli a bit: <?php $string = 'Weiße Rosen sind nicht grün!' $stop = mb_strlen( $string); $result = array(); for( $idx = 0; $idx < $stop; $idx++) { $result[] = mb_substr( $string, $idx, 1); } ?> Here is an example with adjwilli's function: <?php mb_internal_encoding( 'UTF-8'); mb_regex_encoding( 'UTF-8'); function mbStringToArray ( $string ) { $stop = mb_strlen( $string); $result = array(); for( $idx = 0; $idx < $stop; $idx++) { $result[] = mb_substr( $string, $idx, 1); } return $result; } echo '<pre>', PHP_EOL, print_r( mbStringToArray( 'Weiße Rosen sind nicht grün!', true)), PHP_EOL, '</pre>'; ?> Let me know [by personal email], if someone found a regex to simulate a str_split with mb_split.