Contents
  1. 1. 表准备
  2. 2. 只更改character_set_client为utf8
  3. 3. 只设置character_set_connection为utf8
  4. 4. 只设置character_set_results为utf8
  5. 5. 更改三者为utf8

我们知道set names utf8;改变的character_set_client、character_set_connection、character_set_result三个值,MySQL字符集的内部转换到底是哪个阶段呢,下面通过控制变量法来验证。

表准备

建立gbk字符集的表

show create table t\G
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `data` varchar(100) DEFAULT NULL,
  `comment` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_data` (`data`)
) ENGINE=InnoDB  DEFAULT CHARSET=gbk
1 row in set (0.00 sec)

character_set_client、character_set_connection、character_set_result为gbk,SecureCRT为utf-8

show variables like 'character%';
+--------------------------+-----------------------------------+
| Variable_name            | Value                             |
+--------------------------+-----------------------------------+
| character_set_client     | gbk                               |
| character_set_connection | gbk                               |
| character_set_database   | gbk                               |
| character_set_filesystem | binary                            |
| character_set_results    | gbk                               |
| character_set_server     | utf8mb4                           |
| character_set_system     | utf8                              |
| character_sets_dir       | /home/mysql/mysql/share/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.00 sec)

insert into T values(null,'小明','3 gbk');

SELECT LENGTH(data), CHAR_LENGTH(data),data,hex(data),hex('小明'),comment  from t;
+--------------+-------------------+--------+--------------+---------------+---------+
| LENGTH(data) | CHAR_LENGTH(data) | data   | hex(data)    | hex('小明') | comment |
+--------------+-------------------+--------+--------------+---------------+---------+
|            6 |                 3 | 小明   | E5B08FE6988E | E5B08FE6988E  | 3 gbk   |
+--------------+-------------------+--------+--------------+---------------+---------+
1 row in set (0.00 sec)

这里在SecureCRT明文输入hex(‘小明’)
可以看到查询出的二进制流为E5B08FE6988E
由于character_set_client、character_set_connection、character_set_result和表字符集均为GBK,不涉及MySQL内部编码转换。因此,表字符集虽然为GBK,但“小明”的编码并非为GBK编码的二进制流,而是UTF8的二进制流,两个汉字占用了6个字节,而读取是一个逆向的过程,不涉及到编码转换,查询依然能正确返回“小明”。

只更改character_set_client为utf8

set character_set_client=utf8;

Query OK, 0 rows affected (0.00 sec)

show variables like 'character%';
+--------------------------+-----------------------------------+
| Variable_name            | Value                             |
+--------------------------+-----------------------------------+
| character_set_client     | utf8                              |
| character_set_connection | gbk                               |
| character_set_database   | gbk                               |
| character_set_filesystem | binary                            |
| character_set_results    | gbk                               |
| character_set_server     | utf8mb4                           |
| character_set_system     | utf8                              |
| character_sets_dir       | /home/mysql/mysql/share/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.00 sec)

insert into T values(null,'小明','character_set_client utf8');  

Query OK, 1 row affected (0.01 sec)

SELECT id,LENGTH(data), CHAR_LENGTH(data),data,hex(data),hex('小明'),comment  from t order by id ;
+----+--------------+-------------------+--------+--------------+-------------+---------------------------+
| id | LENGTH(data) | CHAR_LENGTH(data) | data   | hex(data)    | hex('С) | comment                   |
+----+--------------+-------------------+--------+--------------+-------------+---------------------------+
|  1 |            6 |                 3 | 小明    | E5B08FE6988E | D0A1C3F7    | 3 gbk                     |
|  2 |            4 |                 2 | С      | D0A1C3F7     | D0A1C3F7    | character_set_client utf8 |
+----+--------------+-------------------+--------+--------------+-------------+---------------------------+
2 rows in set (0.00 sec)

可以看到,数据库里存储的字符为utf8的二进制流E5B08FE6988E
MySQL把SecureCRT明文输入的“小明”转换成了gbk的二进制流D0A1C3F7
把SecureCRT字符集设置成gbk用unhex反向查看确认D0A1C3F7是不是gbk的二进制流

select unhex('D0A1C3F7');
+-------------------+
| unhex('D0A1C3F7') |
+-------------------+
| 小明              |
+-------------------+
1 row in set (0.00 sec)

所以D0A1C3F7为小明gbk的二进制流
而E5B08FE6988E呢

select unhex('E5B08FE6988E');
+-----------------------+
| unhex('E5B08FE6988E') |
+-----------------------+
| 灏忔槑                |
+-----------------------+
1 row in set (0.00 sec)

由于SecureCRT是gbk字符集,把utf-8二进制流E5B08FE6988E按gbk解码成了gbk对应的汉字“灏忔槑”,就是日常所说的乱码。把SecureCRT设置成utf-8后,即恢复正常

select unhex('E5B08FE6988E');
+-----------------------+
| unhex('E5B08FE6988E') |
+-----------------------+
| 小明                |
+-----------------------+
1 row in set (0.00 sec)

所以E5B08FE6988E为小明utf8的二进制流

SecureCRT(utf8 E5B08FE6988E)–> character_set_client(utf8 E5B08FE6988E)–>character_set_connection(gbk D0A1C3F7)–>database(gbk D0A1C3F7)–>character_set_results(gbk D0A1C3F7)–>SecureCRT(utf8 D0A1C3F7)

client(utf8)–>connection(gbk)作了转化

connection把utf8的二进制流转换为gbk的二进制流,由于gbk不包含utf8,需要先用原utf8解码再转码为gbk,用Python演示过程为

a = u"小明"
a_utf_8 = a.encode('utf-8')
print(a_utf_8)
a_unicode = a_utf_8.decode('utf-8')
assert (a_unicode == a)
a_gb2312 = a_unicode.encode('gb2312')
print(a_gb2312)

结果为

b'\xe5\xb0\x8f\xe6\x98\x8e'
b'\xd0\xa1\xc3\xf7'

只设置character_set_connection为utf8

set names gbk;
Query OK, 0 rows affected (0.00 sec)

set character_set_connection=utf8;
Query OK, 0 rows affected (0.00 sec)

show variables like 'character%';
+--------------------------+-----------------------------------+
| Variable_name            | Value                             |
+--------------------------+-----------------------------------+
| character_set_client     | gbk                               |
| character_set_connection | utf8                              |
| character_set_database   | gbk                               |
| character_set_filesystem | binary                            |
| character_set_results    | gbk                               |
| character_set_server     | utf8mb4                           |
| character_set_system     | utf8                              |
| character_sets_dir       | /home/mysql/mysql/share/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.00 sec)

 insert into T values(null,'小明','character_set_connection utf8');
Query OK, 1 row affected (0.01 sec)

guo@ha 12:25:58>SELECT id,LENGTH(data), CHAR_LENGTH(data),data,hex(data),hex('小明'),comment  from t order by id ;
+----+--------------+-------------------+--------+--------------+--------------------+-------------------------------+
| id | LENGTH(data) | CHAR_LENGTH(data) | data   | hex(data)    | hex('小明')      | comment                       |
+----+--------------+-------------------+--------+--------------+--------------------+-------------------------------+
|  1 |            6 |                 3 | 小明    | E5B08FE6988E | E7818FE5BF94E6A791 | 3 gbk                         |
|  2 |            4 |                 2 | С      | D0A1C3F7     | E7818FE5BF94E6A791 | character_set_client utf8     |
|  3 |            6 |                 3 | 小明    | E5B08FE6988E | E7818FE5BF94E6A791 | character_set_connection utf8 |
+----+--------------+-------------------+--------+--------------+--------------------+-------------------------------+
3 rows in set (0.00 sec)

SecureCRT(utf8 E5B08FE6988E)–> character_set_client(gbk E5B08FE6988E)–>character_set_connection(utf8 E5B08FE6988E)–>database(gbk E5B08FE6988E)–>character_set_results(gbk E5B08FE6988E)–>SecureCRT(utf8 E5B08FE6988E)

数据库存储的二进制流E5B08FE6988E不变
SecureCRT输入的utf-8字符集‘小明’在这里变成了E7818FE5BF94E6A791
client(gbk)–>connection(utf8)未作转化

只设置character_set_results为utf8

set names gbk;
Query OK, 0 rows affected (0.00 sec)

set character_set_results=utf8;
Query OK, 0 rows affected (0.00 sec)

show variables like 'character%';
+--------------------------+-----------------------------------+
| Variable_name            | Value                             |
+--------------------------+-----------------------------------+
| character_set_client     | gbk                               |
| character_set_connection | gbk                               |
| character_set_database   | gbk                               |
| character_set_filesystem | binary                            |
| character_set_results    | utf8                              |
| character_set_server     | utf8mb4                           |
| character_set_system     | utf8                              |
| character_sets_dir       | /home/mysql/mysql/share/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.00 sec)

insert into T values(null,'小明','character_set_results utf8');          
Query OK, 1 row affected (0.00 sec)

guo@ha 12:27:06>SELECT id,LENGTH(data), CHAR_LENGTH(data),data,hex(data),hex('小明'),comment  from t order by id ;
+----+--------------+-------------------+-----------+--------------+------------------+-------------------------------+
| id | LENGTH(data) | CHAR_LENGTH(data) | data      | hex(data)    | hex('灏忔槑') | comment                       |
+----+--------------+-------------------+-----------+--------------+------------------+-------------------------------+
|  1 |            6 |                 3 | 灏忔槑 | E5B08FE6988E | E5B08FE6988E     | 3 gbk                         |
|  2 |            4 |                 2 | 小明    | D0A1C3F7     | E5B08FE6988E     | character_set_client utf8     |
|  3 |            6 |                 3 | 灏忔槑 | E5B08FE6988E | E5B08FE6988E     | character_set_connection utf8 |
|  4 |            6 |                 3 | 灏忔槑 | E5B08FE6988E | E5B08FE6988E     | character_set_results utf8    |
+----+--------------+-------------------+-----------+--------------+------------------+-------------------------------+
4 rows in set (0.00 sec)

SecureCRT(utf8 E5B08FE6988E)–> character_set_client(gbk E5B08FE6988E)–>character_set_connection(gbk E5B08FE6988E)–>database(gbk E5B08FE6988E)–>character_set_results(utf8 E7818FE5BF94E6A791 灏忔槑)–>SecureCRT(utf8 E7818FE5BF94E6A791 灏忔槑)
client(gbk)–>connection(gbk)未作转化

这里只是显示出了问题,数据库存储的二进制流E5B08FE6988E不变,gbk存储的“小明”也能正常显示
SecureCRT输入的utf-8字符集‘小明’在这里也是E5B08FE6988E,但是“小明”显示成了“灏忔槑”对应二进制流E7818FE5BF94E6A791,results并没有转换而是直接把E5B08FE6988E当作gbk的字符处理成E7818FE5BF94E6A791

select unhex('E7818FE5BF94E6A791');
+-----------------------------+
| unhex('E7818FE5BF94E6A791') |
+-----------------------------+
| 灏忔槑                   |
+-----------------------------+
1 row in set (0.00 sec)

更改三者为utf8

改变character_set_client、character_set_connection、character_set_results的设置为utf8,查询插入的值。

set names utf8;
Query OK, 0 rows affected (0.00 sec)

show variables like 'character%';
+--------------------------+-----------------------------------+
| Variable_name            | Value                             |
+--------------------------+-----------------------------------+
| character_set_client     | utf8                              |
| character_set_connection | utf8                              |
| character_set_database   | gbk                               |
| character_set_filesystem | binary                            |
| character_set_results    | utf8                              |
| character_set_server     | utf8mb4                           |
| character_set_system     | utf8                              |
| character_sets_dir       | /home/mysql/mysql/share/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.00 sec)

 insert into T values(null,'小明','3 utf8');                     
Query OK, 1 row affected (0.00 sec)

guo@ha 12:28:10>SELECT id,LENGTH(data), CHAR_LENGTH(data),data,hex(data),hex('小明'),comment  from t order by id ;
+----+--------------+-------------------+-----------+--------------+---------------+-------------------------------+
| id | LENGTH(data) | CHAR_LENGTH(data) | data      | hex(data)    | hex('小明') | comment                       |
+----+--------------+-------------------+-----------+--------------+---------------+-------------------------------+
|  1 |            6 |                 3 | 灏忔槑 | E5B08FE6988E | E5B08FE6988E  | 3 gbk                         |
|  2 |            4 |                 2 | 小明    | D0A1C3F7     | E5B08FE6988E  | character_set_client utf8     |
|  3 |            6 |                 3 | 灏忔槑 | E5B08FE6988E | E5B08FE6988E  | character_set_connection utf8 |
|  4 |            6 |                 3 | 灏忔槑 | E5B08FE6988E | E5B08FE6988E  | character_set_results utf8    |
|  5 |            4 |                 2 | 小明    | D0A1C3F7     | E5B08FE6988E  | 3 utf8                        |
+----+--------------+-------------------+-----------+--------------+---------------+-------------------------------+
5 rows in set (0.00 sec)

SecureCRT(utf8 E5B08FE6988E)–> character_set_client(utf8 E5B08FE6988E)–>character_set_connection(utf8 E5B08FE6988E)–>database(gbk D0A1C3F7)–>character_set_results(utf8 E5B08FE6988E)–>SecureCRT(utf8 E5B08FE6988E)

character_set_connection(utf8)–>database(gbk)作了转化

综上,MySQL内部二进制流重编码转化条件

  • 只在connection和database阶段进行
  • connection、database的前一字符集是当前字符集的超集

设置SecureCRT的字符集为GBK,看看SecureCRT字符集设置对结果影响

set names gbk;

Query OK, 0 rows affected (0.00 sec)

show variables like '%char%';

+--------------------------+-----------------------------------+
| Variable_name            | Value                             |
+--------------------------+-----------------------------------+
| character_set_client     | gbk                               |
| character_set_connection | gbk                               |
| character_set_database   | gbk                               |
| character_set_filesystem | binary                            |
| character_set_results    | gbk                               |
| character_set_server     | utf8mb4                           |
| character_set_system     | utf8                              |
| character_sets_dir       | /home/mysql/mysql/share/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.00 sec)

SELECT id,LENGTH(data), CHAR_LENGTH(data),data,hex(data),comment  from t order by id ;
+----+--------------+-------------------+--------+----------------+-------------------------------+
| id | LENGTH(data) | CHAR_LENGTH(data) | data   | hex(data)      | comment                       |
+----+--------------+-------------------+--------+----------------+-------------------------------+
|  1 |            6 |                 3 | 灏忔槑 | E5B08FE6988E    | 3 gbk                         |
|  2 |            4 |                 2 | 小明   | D0A1C3F7       | character_set_client utf8     |
|  3 |            6 |                 3 | 灏忔槑 | E5B08FE6988E   | character_set_connection utf8 |
|  4 |            6 |                 3 | 灏忔槑 | E5B08FE6988E   | character_set_results utf8    |
|  5 |            4 |                 2 | 小明   | D0A1C3F7       | 3 utf8                        |
+----+--------------+-------------------+--------+----------------+-------------------------------+
5 rows in set (0.00 sec)

set  names utf8;

Query OK, 0 rows affected (0.00 sec)

show variables like '%char%';
+--------------------------+-----------------------------------+
| Variable_name            | Value                             |
+--------------------------+-----------------------------------+
| character_set_client     | utf8                              |
| character_set_connection | utf8                              |
| character_set_database   | gbk                               |
| character_set_filesystem | binary                            |
| character_set_results    | utf8                              |
| character_set_server     | utf8mb4                           |
| character_set_system     | utf8                              |
| character_sets_dir       | /home/mysql/mysql/share/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.00 sec)

SELECT id,LENGTH(data), CHAR_LENGTH(data),data,hex(data),comment  from t order by id ;
+----+--------------+-------------------+-----------+--------------+-------------------------------+
| id | LENGTH(data) | CHAR_LENGTH(data) | data      | hex(data)    | comment                       |
+----+--------------+-------------------+-----------+--------------+-------------------------------+
|  1 |            6 |                 3 | 鐏忓繑妲    B08FE6988E    | 3 gbk                         |
|  2 |            4 |                 2 | 灏忔槑     |D0A1C3F7     | character_set_client utf8     |
|  3 |            6 |                 3 | 鐏忓繑妲    B08FE6988E    | character_set_connection utf8 |
|  4 |            6 |                 3 | 鐏忓繑妲    B08FE6988E    | character_set_results utf8    |
|  5 |            4 |                 2 | 灏忔槑     | D0A1C3F7     | 3 utf8                        |
+----+--------------+-------------------+-----------+--------------+-------------------------------+
5 rows in set (0.00 sec)

相关阅读
https://www.cnblogs.com/cchust/p/4327019.html

Contents
  1. 1. 表准备
  2. 2. 只更改character_set_client为utf8
  3. 3. 只设置character_set_connection为utf8
  4. 4. 只设置character_set_results为utf8
  5. 5. 更改三者为utf8