Purge死锁
环境准备
场景:业务批量删除,然后插入操作
select @@tx_isolation;
+----------------+
| @@tx_isolation |
+----------------+
| READ-COMMITTED |
+----------------+
1 row in set, 1 warning (0.00 sec)
set global innodb_status_output_locks=1;
启用后,InnoDB锁定监视器会打印有关SHOW ENGINE INNODB STATUS输出锁定的附加信息
为了演示繁忙的场景,使用debug版本启动
/home/mysql/mysql/bin/mysqld-debug --basedir=/home/mysql/mysql --datadir=/home/mysql/data
set global innodb_purge_stop_now=1;
show这个变量,结果还是off,这个不用管,purge线程已停止了
create table test_purge(a int auto_increment primary key, b int , unique key(b));
insert into test_purge(b) values(10),(20),(30),(40),(50),(60);
select * from test_purge;
+---+------+
| a | b |
+---+------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
| 5 | 50 |
| 6 | 60 |
+---+------+
6 rows in set (0.00 sec)
加锁过程
终端会话1
delete from test_purge where b=20;
终端会话2
select * from test_purge;
beign;
insert into test_purge(b) values(20);
终端会话3
pager less
show engine innodb status\G
---TRANSACTION 3342, ACTIVE 22 sec
3 lock struct(s), heap size 1160, 3 row lock(s), undo log entries 1
MySQL thread id 4, OS thread handle 139843750467328, query id 11 localhost root
TABLE LOCK table `guo_test`.`test_purge` trx id 3342 lock mode IX
RECORD LOCKS space id 28 page no 4 n bits 72 index b of table `guo_test`.`test_purge` trx id 3342 lock mode S
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
0: len 4; hex 80000014; asc ;;
1: len 4; hex 80000002; asc ;;
-- heap no=3表示是第二个插入的记录20
-- 且info bits为32,表示记录被标记删除了
0: len 4; hex 80000014; asc ;; -- 记录为20
1: len 4; hex 80000002; asc ;; -- 对应的主键为2
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 8000001e; asc ;;
1: len 4; hex 80000003; asc ;;
-- heap no=4表示的是20的下一个记录30
-- 且该记录上也有S lock
RECORD LOCKS space id 28 page no 4 n bits 72 index b of table `guo_test`.`test_purge` trx id 3342 lock mode S locks gap before rec
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 #heap no=6为新插入的记录20,从隐式锁提升为显示锁
0: len 4; hex 80000014; asc ;;
1: len 4; hex 80000005; asc ;;
- 因为是唯一索引,需要做唯一性检查,从老的记录 20 开始检查(第一个小于等于自己的值),则此时 20 上要加上一把 S lock ,然后往下检查到第一个不相等的记录,即 记录30 ,然后退出,但是这个 记录30 也要 加上S lock
- 在插入新的记录20的时候,发现下一条记录30上有锁,则自己插入的时的隐式锁提升为显示锁(见插入步骤)
- 目前锁住的范围是 (10,20], (20,30]
- 新插入的记录20本身是一把 S-Gap Lock (前面20*的有S lock了,由于是唯一索引,本身其实就不需要有记录锁了,有GAP就够了)
所以25,15都无法插入,都在锁定区间
会话3
root@guo_test 01:48:14>insert into test_purge(b) values(25);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
root@guo_test 01:48:23>insert into test_purge(b) values(15);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Purge死锁演示
删除50,20前面已经删除
delete from test_purge where b=50;
T1 | T2 |
---|---|
begin; | begin; |
insert into test_purge(b) values(25); | insert into test_purge(b) values(55); |
insert into test_purge(b) values(50); | |
insert into test_purge(b) values(20); | |
Deadlock found.. | |
insert into test_purge(b) values(50);Query OK, 1 row affected (2.32 sec) |
show engine innodb status\G
LATEST DETECTED DEADLOCK
------------------------
2018-09-18 12:25:41 0x7ff339aca700
*** (1) TRANSACTION:
TRANSACTION 6692, ACTIVE 53 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1160, 4 row lock(s), undo log entries 2
MySQL thread id 4, OS thread handle 140682621667072, query id 27 localhost root update
insert into test_purge(b) values(50)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 35 page no 4 n bits 80 index b of table `guo_test`.`test_purge` trx id 6692 lock mode S waiting
Record lock, heap no 9 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80000037; asc 7;; #等待55的S锁
1: len 4; hex 80000008; asc ;;
*** (2) TRANSACTION:
TRANSACTION 6693, ACTIVE 28 sec inserting, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1160, 4 row lock(s), undo log entries 2
MySQL thread id 5, OS thread handle 140682621396736, query id 28 localhost root update
insert into test_purge(b) values(20)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 35 page no 4 n bits 80 index b of table `guo_test`.`test_purge` trx id 6693 lock_mode X locks rec but not gap
Record lock, heap no 9 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80000037; asc 7;; #持有55的S锁
1: len 4; hex 80000008; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 35 page no 4 n bits 80 index b of table `guo_test`.`test_purge` trx id 6693 lock mode S waiting
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 80000019; asc ;; #等待25的S锁
1: len 4; hex 80000007; asc ;;
该问题的场景
- 使用了唯一索引(主要原因)
- 大量delete数据后,又立即插入了数据
- 插入的数据和部分删除的数据的唯一索引一样,且purge还没有来得及回收删除的数据
解决办法
- delete后等待较长时间后(增大purge线程数,等Purge回收),再插入新数据 (不推荐)
- 使用普通索引
- 仍然使用唯一索引,但是插入前要对唯一索引进行分线程排序 (只会有等待,不会有死锁)