环境准备

场景：业务批量删除，然后插入操作

select @@tx_isolation;
+----------------+
| @@tx_isolation |
+----------------+
| READ-COMMITTED |
+----------------+
1 row in set, 1 warning (0.00 sec)

set global innodb_status_output_locks=1;

启用后，InnoDB锁定监视器会打印有关SHOW ENGINE INNODB STATUS输出锁定的附加信息

为了演示繁忙的场景，使用debug版本启动

/home/mysql/mysql/bin/mysqld-debug --basedir=/home/mysql/mysql --datadir=/home/mysql/data

set global innodb_purge_stop_now=1;

show这个变量􏰀,结果还是off,这个不用管,purge线程已􏰁停止了

create table test_purge(a int auto_increment primary key, b int , unique key(b));

insert into test_purge(b) values(10),(20),(30),(40),(50),(60);

select * from test_purge;
+---+------+
| a | b    |
+---+------+
| 1 |   10 |
| 2 |   20 |
| 3 |   30 |
| 4 |   40 |
| 5 |   50 |
| 6 |   60 |
+---+------+
6 rows in set (0.00 sec)

加锁过程

终端会话1

delete from test_purge where b=20;

终端会话2

select * from test_purge;
beign;
insert into test_purge(b) values(20);

终端会话3

pager less


show engine innodb status\G

---TRANSACTION 3342, ACTIVE 22 sec
3 lock struct(s), heap size 1160, 3 row lock(s), undo log entries 1
MySQL thread id 4, OS thread handle 139843750467328, query id 11 localhost root
TABLE LOCK table `guo_test`.`test_purge` trx id 3342 lock mode IX
RECORD LOCKS space id 28 page no 4 n bits 72 index b of table `guo_test`.`test_purge` trx id 3342 lock mode S
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80000014; asc     ;;
 1: len 4; hex 80000002; asc     ;;

-- heap no=3表示是第二个插入的记录20
-- 且info bits为32,表示记录被标记删除了
0: len 4; hex 80000014; asc ;; -- 记录为20
1: len 4; hex 80000002; asc ;; -- 对应的主键为2


Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 8000001e; asc     ;;
 1: len 4; hex 80000003; asc     ;;
-- heap no=4表示的是20的下一个记录30
-- 且该记录上也有S lock
RECORD LOCKS space id 28 page no 4 n bits 72 index b of table `guo_test`.`test_purge` trx id 3342 lock mode S locks gap before rec
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 #heap no=6为新插入的记录20,从隐式锁提升为显示锁
 0: len 4; hex 80000014; asc     ;;
 1: len 4; hex 80000005; asc     ;;

因为是唯一索引,需要做唯一性检查,从老的记录 20 开始检查(第一个小于等于自己的值),则此时 20 上要加上一把 S lock ,然后往下检查到第一个不相等的记录,即记录30 ,然后退出,但是这个记录30 也要加上S lock
在插入新的记录20的时候,发现下一条记录30上有锁,则自己插入的时的隐式锁提升为显示锁(见插入步骤)
目前锁住的范围是 (10,20], (20,30]
新插入的记录20本身是一把 S-Gap Lock (前面20*的有S lock了,由于是唯一索引,本身其实就不需要有记录锁了,有GAP就够了)

所以25，15都无法插入，都在锁定区间
会话3

root@guo_test 01:48:14>insert into test_purge(b) values(25);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
root@guo_test 01:48:23>insert into test_purge(b) values(15);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Purge死锁演示

删除50，20前面已经删除

delete from test_purge where b=50;

T1	T2
begin;	begin;
insert into test_purge(b) values(25);	insert into test_purge(b) values(55);
insert into test_purge(b) values(50);
	insert into test_purge(b) values(20);
	Deadlock found..
insert into test_purge(b) values(50);Query OK, 1 row affected (2.32 sec)

show engine innodb status\G

LATEST DETECTED DEADLOCK
------------------------
2018-09-18 12:25:41 0x7ff339aca700
*** (1) TRANSACTION:
TRANSACTION 6692, ACTIVE 53 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1160, 4 row lock(s), undo log entries 2
MySQL thread id 4, OS thread handle 140682621667072, query id 27 localhost root update
insert into test_purge(b) values(50)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 35 page no 4 n bits 80 index b of table `guo_test`.`test_purge` trx id 6692 lock mode S waiting
Record lock, heap no 9 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 80000037; asc    7;;     #等待55的S锁
 1: len 4; hex 80000008; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 6693, ACTIVE 28 sec inserting, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1160, 4 row lock(s), undo log entries 2
MySQL thread id 5, OS thread handle 140682621396736, query id 28 localhost root update
insert into test_purge(b) values(20)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 35 page no 4 n bits 80 index b of table `guo_test`.`test_purge` trx id 6693 lock_mode X locks rec but not gap
Record lock, heap no 9 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 80000037; asc    7;;      #持有55的S锁
 1: len 4; hex 80000008; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 35 page no 4 n bits 80 index b of table `guo_test`.`test_purge` trx id 6693 lock mode S waiting
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 80000019; asc     ;;      #等待25的S锁
 1: len 4; hex 80000007; asc     ;;

该问题的场景

使用了唯一索引(主要原因)
大量delete数据后,又立即插入了数据
插入的数据和部分删除的数据的唯一索引一样,且purge还没有来得及回收删除的数据

解决办法

delete后等待较长时间后(增大purge线程数,等Purge回收),再插入新数据 (不推荐)
使用普通索引
仍然使用唯一索引,但是插入前要对唯一索引进行分线程排序 (只会有等待,不会有死锁)