问题背景
线上相同结构的两个MyISAM表,数据量差不多,执行的SQL查询计划基本一致,但是执行时间却相差很大
mysql> select * from auction_info_d__201402 where unit_id=7016 and day='2014-03-27'; Empty set (0.31 sec) mysql> select * from auction_info_d__201403 where unit_id=7016 and day='2013-03-27'; Empty set (19.81 sec) |
MySQL版本:
mysql> select @@version; +------------+ | @@version | +------------+ | 5.1.48-log | +------------+ 1 row in set (0.00 sec) |
查询计划基本一样
mysql> explain select * from auction_info_d__201402 where unit_id=7016 and day='2014-03-27'; +----+-------------+-----------------------------+------+---------------+---------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------------------+------+---------------+---------+---------+-------+--------+-------------+ | 1 | SIMPLE | auction_info_d__201402 | ref | PRIMARY | PRIMARY | 4 | const | 326584 | Using where | +----+-------------+-----------------------------+------+---------------+---------+---------+-------+--------+-------------+ 1 row in set (0.00 sec) mysql> explain select * from auction_info_d__201403 where unit_id=7016 and day='2013-03-27'; +----+-------------+-----------------------------+------+---------------+---------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------------------+------+---------------+---------+---------+-------+--------+-------------+ | 1 | SIMPLE | auction_info_d__201403 | ref | PRIMARY | PRIMARY | 4 | const | 224574 | Using where | +----+-------------+-----------------------------+------+---------------+---------+---------+-------+--------+-------------+ 1 row in set (0.00 sec) |
排查过程
这两个查询都是使用primary key(只有一个索引:PRIMARY KEY (`unit_id`,`aucid`,`day`))中的unit_id查出数据,然后采用day进行过滤,因为auction_info_d__201402保存的是2月份的数据,而auction_info_d__201403是3月份的数据,因此两个select命中的数据集都应该为空
mysql> select min(day) as mind, max(day) as maxd from auction_info_d__201402 where unit_id=7016; +------------+------------+ | mind | maxd | +------------+------------+ | 2014-02-01 | 2014-02-28 | +------------+------------+ 1 row in set (0.14 sec) mysql> select min(day) as mind, max(day) as maxd from auction_info_d__201403 where unit_id=7016; +------------+------------+ | mind | maxd | +------------+------------+ | 2014-03-01 | 2014-03-31 | +------------+------------+ 1 row in set (0.11 sec) |
从explain中可以看出两个select根据unit_id条件过滤数据量相差不大,应该不至于性能相差这么大,首先查询数据量大小和MyISAM相关配置:
$ ll auction_info_d__201402.* -rw-rw---- 1 mysql dba 10379 Feb 2 09:47 auction_info_d__201402.frm -rw-rw---- 1 mysql dba 3845544846 Mar 1 06:01 auction_info_d__201402.MYD -rw-rw---- 1 mysql dba 691238912 Mar 1 06:13 auction_info_d__201402.MYI $ ll auction_info_d__201403.* -rw-rw---- 1 mysql dba 10379 Mar 2 05:50 auction_info_d__201403.frm -rw-rw---- 1 mysql dba 4325708816 Apr 1 06:06 auction_info_d__201403.MYD -rw-rw---- 1 mysql dba 859683840 Apr 1 10:00 auction_info_d__201403.MYI |
MyISAM key_buffer_size:
mysql> show global variables like '%key_buffer%'; +-----------------+-----------+ | Variable_name | Value | +-----------------+-----------+ | key_buffer_size | 104857600 | +-----------------+-----------+ 1 row in set (0.00 sec) |
可以看出key_buffer_size约100M的样子,auction_info_d__201402和auction_info_d__201403的索引文件都大于600M,key_buffer_size设置得太小了,调大应该可以增加性能,但是这也无法解释为什么auction_info_d__201402上的查询远远快于auction_info_d__201403上的查询,因此尝试使用profile寻找原因:
mysql> show profile all for query 1; +--------------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+------------------+---------------+-------------+ | Status | Duration | CPU_user | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function | Source_file | Source_line | +--------------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+------------------+---------------+-------------+ | starting | 0.000071 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NULL | NULL | NULL | | checking query cache for query | 0.000048 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_cache.cc | 1446 | | Opening tables | 0.000517 | 0.000000 | 0.000000 | 1 | 0 | 136 | 0 | 0 | 0 | 2 | 1 | 0 | unknown function | sql_base.cc | 4517 | | System lock | 0.000034 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | lock.cc | 258 | | Table lock | 0.000035 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | lock.cc | 269 | | init | 0.000082 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | unknown function | sql_select.cc | 2484 | | optimizing | 0.000027 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_select.cc | 820 | | statistics | 0.000079 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | unknown function | sql_select.cc | 1011 | | preparing | 0.000031 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_select.cc | 1033 | | executing | 0.000020 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_select.cc | 1755 | | Sending data | 0.312744 | 0.000000 | 0.000000 | 237 | 3 | 2112 | 0 | 0 | 0 | 115 | 347 | 0 | unknown function | sql_select.cc | 2309 | | end | 0.000072 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_select.cc | 2530 | | query end | 0.000022 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_parse.cc | 4986 | | freeing items | 0.000060 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_parse.cc | 6012 | | logging slow query | 0.000021 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_parse.cc | 1687 | | cleaning up | 0.000023 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_parse.cc | 1655 | +--------------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+------------------+---------------+-------------+ 16 rows in set (0.00 sec) mysql> show profile all for query 2; +--------------------------------+-----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+------------------+---------------+-------------+ | Status | Duration | CPU_user | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function | Source_file | Source_line | +--------------------------------+-----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+------------------+---------------+-------------+ | starting | 0.000070 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NULL | NULL | NULL | | checking query cache for query | 0.000041 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_cache.cc | 1446 | | Opening tables | 0.000070 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | unknown function | sql_base.cc | 4517 | | System lock | 0.000022 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | lock.cc | 258 | | Table lock | 0.000028 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | lock.cc | 269 | | init | 0.000046 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_select.cc | 2484 | | optimizing | 0.000027 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_select.cc | 820 | | statistics | 0.000072 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | unknown function | sql_select.cc | 1011 | | preparing | 0.000030 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_select.cc | 1033 | | executing | 0.000021 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_select.cc | 1755 | | Sending data | 19.815528 | 0.000000 | 0.000000 | 74035 | 371 | 601616 | 736 | 0 | 0 | 221 | 6291 | 0 | unknown function | sql_select.cc | 2309 | | end | 0.000062 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_select.cc | 2530 | | query end | 0.000022 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_parse.cc | 4986 | | freeing items | 0.000048 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_parse.cc | 6012 | | logging slow query | 0.000021 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_parse.cc | 1687 | | logging slow query | 0.000551 | 0.000000 | 0.000000 | 2 | 0 | 80 | 8 | 0 | 0 | 1 | 0 | 0 | unknown function | sql_parse.cc | 1697 | | cleaning up | 0.000026 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | unknown function | sql_parse.cc | 1655 | +--------------------------------+-----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+------------------+---------------+-------------+ 17 rows in set (0.00 sec) |
有了一些发现:慢查询对应的profile中Sending result阶段context switch和page fault比较异常,问题原因逐渐明显:慢查询产生了更多的page置换,page置换消耗了大量时间,导致cpu context switch次数变多,这个主要是因为key_buffer_size设置得太小引起的,但是这个也同样无法解释为什么另外一个查询没有产生同样的问题
因此询问应用开发人员业务场景,被告知auction_info_d__201402是历史数据,不会更新,auction_info_d__201403目前每天凌晨都会load一次当天数据进去,线上其它时间只有select操作,猜想load导致了索引文件碎片化,从而使得随机IO变多:
mysql> show status like 'key_read%'; +-------------------+----------+ | Variable_name | Value | +-------------------+----------+ | Key_read_requests | 19653701 | | Key_reads | 337870 | +-------------------+----------+ 2 rows in set (0.00 sec) mysql> select * from auction_info_d__201402 where unit_id=7016 and day='2014-03-27'; Empty set (0.32 sec) mysql> show status like 'key_read%'; +-------------------+----------+ | Variable_name | Value | +-------------------+----------+ | Key_read_requests | 19705651 | | Key_reads | 338072 | +-------------------+----------+ 2 rows in set (0.00 sec) mysql> select * from auction_info_d__201403 where unit_id=7016 and day='2013-03-27'; Empty set (19.81 sec) mysql> show status like 'key_read%'; +-------------------+----------+ | Variable_name | Value | +-------------------+----------+ | Key_read_requests | 19831609 | | Key_reads | 340363 | +-------------------+----------+ 2 rows in set (0.00 sec) |
对比发现:
1. auction_info_d__201402(简称02)上的查询key_reads只增加了2,说明都在内存中命中了,而auction_info_d__201403(简称03)上查询这个增量是12k,cache出现大量miss需要从磁盘中加载数据,这解释了profile的结果
2. 02上查询Key_read_requests的增长为5w多,而03上查询这个增量是12w,说明03上查询需要访问的page数量比较多,因此需要执行Analyze table
MySQL Manual上建议key_buffer_size的设置应该使得:
Key_reads/Key_read_requests < 0.01 |
而从线上信息计算得到的比值为:(340363/19831609) = 0.017,因此建议增大key_buffer_size
解决方案
1. 增大key_buffer_size,缓冲更多的索引信息,减小page fault
2. load数据后执行一次ANALYZE LOCAL TABLE(使用LOCAL是为了避免影响到备库)更新索引及统计信息