MySQL 16进制字符串与编码

实验:

a) 系统环境:

$echo $LANG
en_US.UTF-8
CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `str` varchar(128) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB

b) set names latin1;

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into t1(str) values('春眠');
Query OK, 1 row affected (0.00 sec)
 
mysql> insert into t1(str) values(x'E698A5E79CA0');
Query OK, 1 row affected (0.00 sec)
 
mysql> select * from t1;
+----+--------+
| id | str    |
+----+--------+
|  1 | 春眠 |
|  2 | ??     |
+----+--------+
2 rows in set (0.00 sec)
 
mysql> select id,hex(str) from t1;
+----+--------------------------+
| id | hex(str)                 |
+----+--------------------------+
|  1 | C3A6CB9CC2A5C3A7C593C2A0 |
|  2 | E698A5E79CA0             |
+----+--------------------------+
2 rows in set (0.00 sec)

c) set names utf8;

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
 
mysql> select * from t1;
+----+--------------+
| id | str          |
+----+--------------+
|  1 | æ˜¥çœ  |
|  2 | 春眠       |
+----+--------------+
2 rows in set (0.00 sec)
 
mysql> select id,hex(str) from t1;
+----+--------------------------+
| id | hex(str)                 |
+----+--------------------------+
|  1 | C3A6CB9CC2A5C3A7C593C2A0 |
|  2 | E698A5E79CA0             |
+----+--------------------------+
2 rows in set (0.00 sec)
 
mysql> insert into t1(str) values('春眠');
Query OK, 1 row affected (0.00 sec)
 
mysql> insert into t1(str) values(x'E698A5E79CA0');
Query OK, 1 row affected (0.00 sec)
 
mysql> select * from t1;
+----+--------------+
| id | str          |
+----+--------------+
|  1 | æ˜¥çœ  |
|  2 | 春眠       |
|  3 | 春眠       |
|  4 | 春眠       |
+----+--------------+
4 rows in set (0.00 sec)
 
mysql> select id,hex(str) from t1;
+----+--------------------------+
| id | hex(str)                 |
+----+--------------------------+
|  1 | C3A6CB9CC2A5C3A7C593C2A0 |
|  2 | E698A5E79CA0             |
|  3 | E698A5E79CA0             |
|  4 | E698A5E79CA0             |
+----+--------------------------+
4 rows in set (0.00 sec)

分析:

E698A5E79CA0是中文‘春晓’的UTF8编码

用vim以utf8编辑中文’春眠’保存/tmp/utf8

$hexdump -C /tmp/utf8
00000000  e6 98 a5 e7 9c a0                              |.......|
00000007

将utf8文件转码成latin1后查看:

$iconv -f latin1 -t utf8 /tmp/utf8 |hexdump -C
00000000  c3 a6 c2 98 c2 a5 c3 a7  c2 9c c2 a0           |.............|
0000000d

即,set names latin1下,’春晓’的utf8编码E698A5E79CA0被视为latin1编码,将其转成utf8编码后变成:C3A6CB9CC2A5C3A7C593C2A0,而读取时也只能在set names latin1下才能正确

结论:

1、以x’hexstring’方式发送的SQL将不会进行转码(忽略set names),但是会检查16进制编码是否满足校验集COLLATION

mysql> insert into t1(str) values(x'B4BAC3DF');
Query OK, 1 row affected, 1 warning (0.00 sec)
 
mysql> show warnings;
+---------+------+----------------------------------------------------------------------+
| Level   | Code | Message                                                              |
+---------+------+----------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xB4\xBA\xC3\xDF' for column 'str' at row 1 |
+---------+------+----------------------------------------------------------------------+
1 row in set (0.00 sec)

PS:x’B4BAC3DF’是’春眠’的GBK编码

2、即使写入的字符串编码方式(latin1)与表定义(存储)编码方式(utf8)不同,只要读和写使用的是同一种编码方式就能正确读取数据,代价是mysql会额外转码

一个应用场景

a) 场景描述

在上篇文章MySQL binlog字符编码方式中提到了使用mysqlbinlog的解析结果回滚历史修改,淘宝智胜的想法很直接:逆序执行‘反转binlog’(INSERT–>DELETE, DELETE–>INSERT, UPDATE–>swap WHERE and SET)

mysqldump -v /path/to/mysql-bin.000001|grep ^###
### INSERT INTO test.t1
### SET
###   @1=13
###   @2='春眠不觉晓'
### UPDATE test.t1
### WHERE
###   @1=13
###   @2='春眠不觉晓'
### SET
###   @1=13
###   @2='春眠'
### DELETE FROM test.t1
### WHERE
###   @1=13
###   @2='春眠'

然而遇到了字符集问题,原因在MySQL binlog字符编码方式分析过了,binlog中@i=’xxx’中的‘xxx’的字符集实际上是表中列的字符集(存储字符集),如果一个表中不同列采用的字符集各异,逆向执行‘反向binlog’就可能会出现问题(上述例子中都是采用UTF8),原因是SQL执行过程中涉及到的字符集转换:
character_set_client –> character_connection_client –> column character set
然而执行SQL过程中没有办法为每一列指定一个字符集,set names yyy指定的话,所有char/varchar字段被视为同一种字符集

b) 解决方案:

修改mysqlbinlog客户端程序,对于char/varchar类型,输出其16进制,这样在写入时就避免了字符集转换,效果:

mysqldump -v --hexstring /path/to/mysql-bin.000001|grep ^###
### INSERT INTO test.t1
### SET
###   @1=13
###   @2=x'e698a5e79ca0e4b88de8a789e69993'
### UPDATE test.t1
### WHERE
###   @1=13
###   @2=x'e698a5e79ca0e4b88de8a789e69993'
### SET
###   @1=13
###   @2=x'e698a5e79ca0'
### DELETE FROM test.t1
### WHERE
###   @1=13
###   @2=x'e698a5e79ca0'

c) patch下载

mysqlbinlog_hexstring.patch

《MySQL 16进制字符串与编码》上有1条评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注