MySQL 16进制字符串与编码

By | 2013 年 7 月 29 日

实验:

a) 系统环境:

$echo $LANG
en_US.UTF-8
CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `str` varchar(128) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB

b) set names latin1;

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into t1(str) values('春眠');
Query OK, 1 row affected (0.00 sec)
 
mysql> insert into t1(str) values(x'E698A5E79CA0');
Query OK, 1 row affected (0.00 sec)
 
mysql> select * from t1;
+----+--------+
| id | str    |
+----+--------+
|  1 | 春眠 |
|  2 | ??     |
+----+--------+
2 rows in set (0.00 sec)
 
mysql> select id,hex(str) from t1;
+----+--------------------------+
| id | hex(str)                 |
+----+--------------------------+
|  1 | C3A6CB9CC2A5C3A7C593C2A0 |
|  2 | E698A5E79CA0             |
+----+--------------------------+
2 rows in set (0.00 sec)

c) set names utf8;

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
 
mysql> select * from t1;
+----+--------------+
| id | str          |
+----+--------------+
|  1 | æ˜¥çœ  |
|  2 | 春眠       |
+----+--------------+
2 rows in set (0.00 sec)
 
mysql> select id,hex(str) from t1;
+----+--------------------------+
| id | hex(str)                 |
+----+--------------------------+
|  1 | C3A6CB9CC2A5C3A7C593C2A0 |
|  2 | E698A5E79CA0             |
+----+--------------------------+
2 rows in set (0.00 sec)
 
mysql> insert into t1(str) values('春眠');
Query OK, 1 row affected (0.00 sec)
 
mysql> insert into t1(str) values(x'E698A5E79CA0');
Query OK, 1 row affected (0.00 sec)
 
mysql> select * from t1;
+----+--------------+
| id | str          |
+----+--------------+
|  1 | æ˜¥çœ  |
|  2 | 春眠       |
|  3 | 春眠       |
|  4 | 春眠       |
+----+--------------+
4 rows in set (0.00 sec)
 
mysql> select id,hex(str) from t1;
+----+--------------------------+
| id | hex(str)                 |
+----+--------------------------+
|  1 | C3A6CB9CC2A5C3A7C593C2A0 |
|  2 | E698A5E79CA0             |
|  3 | E698A5E79CA0             |
|  4 | E698A5E79CA0             |
+----+--------------------------+
4 rows in set (0.00 sec)

分析:

E698A5E79CA0是中文‘春晓’的UTF8编码

用vim以utf8编辑中文’春眠’保存/tmp/utf8

$hexdump -C /tmp/utf8
00000000  e6 98 a5 e7 9c a0                              |.......|
00000007

将utf8文件转码成latin1后查看:

$iconv -f latin1 -t utf8 /tmp/utf8 |hexdump -C
00000000  c3 a6 c2 98 c2 a5 c3 a7  c2 9c c2 a0           |.............|
0000000d

即,set names latin1下,’春晓’的utf8编码E698A5E79CA0被视为latin1编码,将其转成utf8编码后变成:C3A6CB9CC2A5C3A7C593C2A0,而读取时也只能在set names latin1下才能正确

结论:

1、以x’hexstring’方式发送的SQL将不会进行转码(忽略set names),但是会检查16进制编码是否满足校验集COLLATION

mysql> insert into t1(str) values(x'B4BAC3DF');
Query OK, 1 row affected, 1 warning (0.00 sec)
 
mysql> show warnings;
+---------+------+----------------------------------------------------------------------+
| Level   | Code | Message                                                              |
+---------+------+----------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xB4\xBA\xC3\xDF' for column 'str' at row 1 |
+---------+------+----------------------------------------------------------------------+
1 row in set (0.00 sec)

PS:x’B4BAC3DF’是’春眠’的GBK编码

2、即使写入的字符串编码方式(latin1)与表定义(存储)编码方式(utf8)不同,只要读和写使用的是同一种编码方式就能正确读取数据,代价是mysql会额外转码

一个应用场景

a) 场景描述

在上篇文章MySQL binlog字符编码方式中提到了使用mysqlbinlog的解析结果回滚历史修改,淘宝智胜的想法很直接:逆序执行‘反转binlog’(INSERT–>DELETE, DELETE–>INSERT, UPDATE–>swap WHERE and SET)

mysqldump -v /path/to/mysql-bin.000001|grep ^###
### INSERT INTO test.t1
### SET
###   @1=13
###   @2='春眠不觉晓'
### UPDATE test.t1
### WHERE
###   @1=13
###   @2='春眠不觉晓'
### SET
###   @1=13
###   @2='春眠'
### DELETE FROM test.t1
### WHERE
###   @1=13
###   @2='春眠'

然而遇到了字符集问题,原因在MySQL binlog字符编码方式分析过了,binlog中@i=’xxx’中的‘xxx’的字符集实际上是表中列的字符集(存储字符集),如果一个表中不同列采用的字符集各异,逆向执行‘反向binlog’就可能会出现问题(上述例子中都是采用UTF8),原因是SQL执行过程中涉及到的字符集转换:
character_set_client –> character_connection_client –> column character set
然而执行SQL过程中没有办法为每一列指定一个字符集,set names yyy指定的话,所有char/varchar字段被视为同一种字符集

b) 解决方案:

修改mysqlbinlog客户端程序,对于char/varchar类型,输出其16进制,这样在写入时就避免了字符集转换,效果:

mysqldump -v --hexstring /path/to/mysql-bin.000001|grep ^###
### INSERT INTO test.t1
### SET
###   @1=13
###   @2=x'e698a5e79ca0e4b88de8a789e69993'
### UPDATE test.t1
### WHERE
###   @1=13
###   @2=x'e698a5e79ca0e4b88de8a789e69993'
### SET
###   @1=13
###   @2=x'e698a5e79ca0'
### DELETE FROM test.t1
### WHERE
###   @1=13
###   @2=x'e698a5e79ca0'

c) patch下载

mysqlbinlog_hexstring.patch

One thought on “MySQL 16进制字符串与编码

  1. Jerseys Wholesale From China

    We’re a group of volunteers and starting a new scheme in our community. Your site offered us with valuable information to work on. You’ve done an impressive job and our whole community will be grateful to you.

    Reply

发表评论

电子邮件地址不会被公开。 必填项已用*标注