实验:
a) 系统环境:
$echo $LANG en_US.UTF-8 |
CREATE TABLE `t1` ( `id` int(11) NOT NULL AUTO_INCREMENT, `str` varchar(128) CHARACTER SET utf8 DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB |
b) set names latin1;
mysql> set names latin1; Query OK, 0 rows affected (0.00 sec) mysql> insert into t1(str) values('春眠'); Query OK, 1 row affected (0.00 sec) mysql> insert into t1(str) values(x'E698A5E79CA0'); Query OK, 1 row affected (0.00 sec) mysql> select * from t1; +----+--------+ | id | str | +----+--------+ | 1 | 春眠 | | 2 | ?? | +----+--------+ 2 rows in set (0.00 sec) mysql> select id,hex(str) from t1; +----+--------------------------+ | id | hex(str) | +----+--------------------------+ | 1 | C3A6CB9CC2A5C3A7C593C2A0 | | 2 | E698A5E79CA0 | +----+--------------------------+ 2 rows in set (0.00 sec) |
c) set names utf8;
mysql> set names utf8; Query OK, 0 rows affected (0.00 sec) mysql> select * from t1; +----+--------------+ | id | str | +----+--------------+ | 1 | æ˜¥çœ | | 2 | 春眠 | +----+--------------+ 2 rows in set (0.00 sec) mysql> select id,hex(str) from t1; +----+--------------------------+ | id | hex(str) | +----+--------------------------+ | 1 | C3A6CB9CC2A5C3A7C593C2A0 | | 2 | E698A5E79CA0 | +----+--------------------------+ 2 rows in set (0.00 sec) mysql> insert into t1(str) values('春眠'); Query OK, 1 row affected (0.00 sec) mysql> insert into t1(str) values(x'E698A5E79CA0'); Query OK, 1 row affected (0.00 sec) mysql> select * from t1; +----+--------------+ | id | str | +----+--------------+ | 1 | æ˜¥çœ | | 2 | 春眠 | | 3 | 春眠 | | 4 | 春眠 | +----+--------------+ 4 rows in set (0.00 sec) mysql> select id,hex(str) from t1; +----+--------------------------+ | id | hex(str) | +----+--------------------------+ | 1 | C3A6CB9CC2A5C3A7C593C2A0 | | 2 | E698A5E79CA0 | | 3 | E698A5E79CA0 | | 4 | E698A5E79CA0 | +----+--------------------------+ 4 rows in set (0.00 sec) |
分析:
E698A5E79CA0是中文‘春晓’的UTF8编码
用vim以utf8编辑中文’春眠’保存/tmp/utf8
$hexdump -C /tmp/utf8 00000000 e6 98 a5 e7 9c a0 |.......| 00000007 |
将utf8文件转码成latin1后查看:
$iconv -f latin1 -t utf8 /tmp/utf8 |hexdump -C 00000000 c3 a6 c2 98 c2 a5 c3 a7 c2 9c c2 a0 |.............| 0000000d |
即,set names latin1下,’春晓’的utf8编码E698A5E79CA0被视为latin1编码,将其转成utf8编码后变成:C3A6CB9CC2A5C3A7C593C2A0,而读取时也只能在set names latin1下才能正确
结论:
1、以x’hexstring’方式发送的SQL将不会进行转码(忽略set names),但是会检查16进制编码是否满足校验集COLLATION
mysql> insert into t1(str) values(x'B4BAC3DF'); Query OK, 1 row affected, 1 warning (0.00 sec) mysql> show warnings; +---------+------+----------------------------------------------------------------------+ | Level | Code | Message | +---------+------+----------------------------------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xB4\xBA\xC3\xDF' for column 'str' at row 1 | +---------+------+----------------------------------------------------------------------+ 1 row in set (0.00 sec) |
PS:x’B4BAC3DF’是’春眠’的GBK编码
2、即使写入的字符串编码方式(latin1)与表定义(存储)编码方式(utf8)不同,只要读和写使用的是同一种编码方式就能正确读取数据,代价是mysql会额外转码
一个应用场景
a) 场景描述
在上篇文章MySQL binlog字符编码方式中提到了使用mysqlbinlog的解析结果回滚历史修改,淘宝智胜的想法很直接:逆序执行‘反转binlog’(INSERT–>DELETE, DELETE–>INSERT, UPDATE–>swap WHERE and SET)
mysqldump -v /path/to/mysql-bin.000001|grep ^### |
### INSERT INTO test.t1 ### SET ### @1=13 ### @2='春眠不觉晓' ### UPDATE test.t1 ### WHERE ### @1=13 ### @2='春眠不觉晓' ### SET ### @1=13 ### @2='春眠' ### DELETE FROM test.t1 ### WHERE ### @1=13 ### @2='春眠' |
然而遇到了字符集问题,原因在MySQL binlog字符编码方式分析过了,binlog中@i=’xxx’中的‘xxx’的字符集实际上是表中列的字符集(存储字符集),如果一个表中不同列采用的字符集各异,逆向执行‘反向binlog’就可能会出现问题(上述例子中都是采用UTF8),原因是SQL执行过程中涉及到的字符集转换:
character_set_client –> character_connection_client –> column character set
然而执行SQL过程中没有办法为每一列指定一个字符集,set names yyy指定的话,所有char/varchar字段被视为同一种字符集
b) 解决方案:
修改mysqlbinlog客户端程序,对于char/varchar类型,输出其16进制,这样在写入时就避免了字符集转换,效果:
mysqldump -v --hexstring /path/to/mysql-bin.000001|grep ^### |
### INSERT INTO test.t1 ### SET ### @1=13 ### @2=x'e698a5e79ca0e4b88de8a789e69993' ### UPDATE test.t1 ### WHERE ### @1=13 ### @2=x'e698a5e79ca0e4b88de8a789e69993' ### SET ### @1=13 ### @2=x'e698a5e79ca0' ### DELETE FROM test.t1 ### WHERE ### @1=13 ### @2=x'e698a5e79ca0' |
We’re a group of volunteers and starting a new scheme in our community. Your site offered us with valuable information to work on. You’ve done an impressive job and our whole community will be grateful to you.