面试官：请说一下Mysql中count(1)、count(*)以及count(列)的区别？

2024-01-31 09:42 由码农Academy 发表于 #数据库

近期在Review项目代码时，发现同事们在查询MySQL行数时存在多样的方式，有的使用COUNT(1), 有的用COUNT(id), 还有人选择了COUNT(*)。这混杂的选择引发了我的思考。当然这三种count的方式也是众说纷纭，其中最大的分歧点就是COUNT(*)和COUNT(1)查询性能上，有人觉得COUNT(*)需要转换为COUNT(1)，所以COUNT(1)得速度更快。究竟这三种计数方式之间有何区别，它们的背后原理是怎样的呢？

COUNT()含义

在《高性能Mysql》一书第236页中是这么解释COUNT的作用的：

COUNT()是一个特殊的函数，有两种非常不同的作用:它可以统计某个列值的数量，也可以统计行数。在统计列值时要求列值是非空的(不统计NULL)。如果在COUNT()的括号中指定了列或者列的表达式，则统计的就是这个表达式有值的结果数”。因为很多人对 NULL理解有问题，所以这里很容易产生误解。
COUNT()的另一个作用是统计结果集的行数。当MySOL确认括号内的表达式值不可能为空时，实际上就是在统计行数。最简单的就是当我们使用COUNT()的时候，这种情况下通配符并不会像我们猜想的那样扩展成所有的列，实际上，它会忽略所有的列而直接统计所有的行数。
我们发现一个最常见的错误就是，在括号内指定了一个列却希望统计结果集的行数。如果希望知道的是结果集的行数，最好使用COUNT(*)，这样写意义清晰，性能也会很好

由此我们也可以大概总结COUNT函数的种方式如下：

COUNT(1): 此查询返回的是结果集中的行数，不关心具体的列内容，因此使用常数1。
在很多数据库系统中，这种方式被优化为与 SELECT COUNT(*) 相同的性能水平，因为数据库引擎通常忽略括号内的内容。
COUNT(*)：统计整个表的行数，不考虑是否有NULL值。
通常优于 COUNT(id)，因为它不需要关心具体的列，且现代数据库引擎会对其进行特殊优化。
COUNT(列) ：统计指定列非空值的数量。需要考虑是否有NULL值
此种方式取决于列是否有索引。如果列有索引，数据库引擎可能会利用索引进行快速计数。如果没有索引，或者有大量NULL值，性能可能较差，因为需要扫描整个表。

区别

1、Mysql5.7

在MySql 5.7官方文档中是这么介绍COUNT(expr)函数的

COUNT(expr)
Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement. The result is a BIGINT value.
If there are no matching rows, COUNT() returns 0.
mysql> SELECT student.student_name,COUNT(*)
FROM student,course
WHERE student.student_id=course.student_id
GROUP BY student_name;
COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.

For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.

InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.

Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index. As of MySQL 5.7.18, InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.

Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

For MyISAM tables, COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:
mysql> SELECT COUNT(*) FROM student;
This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly. COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.

从官方文档中我们可以看出mysql官方对COUNT函数的解释：

COUNT(expr) 返回由 SELECT 语句检索的行中 expr 的非 NULL 值的数量，结果为 BIGINT 值。如果没有匹配的行，COUNT() 返回 0。
COUNT(*) 有所不同，它返回所检索的行数的计数，无论它们是否包含 NULL 值。
对于事务性存储引擎（如 InnoDB），存储准确的行数是有问题的。因为多个事务可能同时影响计数，InnoDB 不会保留表中行的内部计数。SELECT COUNT(*) 只会计算当前事务可见的行。
在 MySQL 5.7.18 之前，InnoDB 通过扫描聚集索引处理 SELECT COUNT(*) 语句。从 MySQL 5.7.18 开始，除非索引或优化器提示指示使用其他索引，InnoDB 会通过遍历最小的可用二级索引来处理 SELECT COUNT(*) 语句。如果没有二级索引，则将扫描聚集索引。
处理 SELECT COUNT(*) 语句可能花费一些时间，如果索引记录没有完全在缓冲池中。为了更快的计数，可以创建一个计数器表，并根据插入和删除操作进行更新。然而，在成千上万的并发事务更新同一计数器表的情况下，该方法可能无法很好地扩展。如果粗略的行数足够，可以使用SHOW TABLE STATUS。
InnoDB 处理 SELECT COUNT(*) 和 SELECT COUNT(1) 操作的方式相同，没有性能差异。
对于 MyISAM 表，COUNT(\*) 在从一个表中检索、没有检索其他列、没有 WHERE 子句的情况下可以快速返回，因为 MyISAM 存储了准确的行数。COUNT(1) 只有在第一列被定义为 NOT NULL 时才能进行相同的优化。

2、Mysql 8.0

在Mysql8.0的文档中对COUNT(expr)的解释是这样

COUNT(expr) [over_clause]

Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement. The result is a BIGINT value.

If there are no matching rows, COUNT() returns 0.COUNT(NULL)returns 0.

This function executes as a window function if over_clause is present. over_clause is as described in Section 12.20.2, “Window Function Concepts and Syntax”.

mysql> SELECT student.student_name,COUNT(*) FROM student,course WHERE student.student_id=course.student_id GROUP BY student_name;
COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.

For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.

InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.

As of MySQL 8.0.13, SELECT COUNT(*) FROM tbl_name query performance for InnoDB tables is optimized for single-threaded workloads if there are no extra clauses such as WHERE or GROUP BY.

InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index.

Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

For MyISAM tables, COUNT(*)is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:

mysql> SELECT COUNT(*) FROM student;
This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly. COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.

从mysql8.0的文档中我们可以看出mysql8.0对COUNT的解释

COUNT(expr) 返回在由 SELECT 语句检索的行中 _expr_ 的非 NULL 值的数量，结果为 BIGINT 值。如果没有匹配的行，COUNT() 返回 0。COUNT(NULL) 也返回 0。
COUNT(*) 有所不同，它返回所检索的行数的计数，无论它们是否包含 NULL 值。
对于事务性存储引擎（如 InnoDB），存储准确的行数是有问题的，因为多个事务可能同时影响计数。InnoDB 不会保留表中行的内部计数。SELECT COUNT(*) 只计算当前事务可见的行。
在 MySQL 8.0.13 及以后版本，对于 InnoDB 表，执行 SELECT COUNT(*) FROM tbl_name 查询性能在没有额外子句（如 WHERE 或 GROUP BY）的情况下进行了优化，特别适用于单线程工作负载。
InnoDB 处理 SELECT COUNT(*) 语句的方式：
- 通过遍历最小可用二级索引，除非指示使用其他索引。
- 如果没有二级索引，InnoDB 通过扫描聚集索引来处理 SELECT COUNT(*) 语句。
处理 SELECT COUNT(*) 语句可能花费一些时间，如果索引记录没有完全在缓冲池中。为了更快的计数，可以创建一个计数器表，让应用程序根据插入和删除操作进行更新。但是，这种方法在数千个并发事务同时对同一计数器表进行更新的情况下可能不会很好地扩展。如果粗略的行数足够，可以使用 SHOW TABLE STATUS。
InnoDB 对待 SELECT COUNT(*) 和 SELECT COUNT(1) 操作的方式相同，没有性能差异。
对于 MyISAM 表，COUNT(*) 在从一个表中检索、没有检索其他列、没有 WHERE 子句的情况下可以快速返回，因为 MyISAM 存储了准确的行数。COUNT(1) 只有在第一列被定义为 NOT NULL 时才能进行相同的优化。

结合Mysql5.7与Mysql8.0的文档我们可以看出两个版本对COUNT的支持的差异：

MySQL 8.0 优化了 InnoDB 表的 SELECT COUNT(*) 查询性能： 在 MySQL 8.0.13 及以后版本，对于 InnoDB 表，执行 SELECT COUNT(*) 查询的性能进行了优化，特别适用于单线程工作负载。这是 MySQL 5.7 文档中未包含的新特性。
MyISAM 表的优化说明：
MySQL 8.0 文档中强调了 MyISAM 表在执行 COUNT(*) 时的优化情况，即在从一个表中检索、没有检索其他列、没有 WHERE 子句的情况下可以快速返回。MySQL 5.7 文档中也提到了 MyISAM 表的优化，但对于 COUNT(*) 的具体优化情况没有细节。

MySQL 8.0 在性能优化方面对于 InnoDB 表的 SELECT COUNT(*) 查询进行了特别的关注，而且在 MyISAM 表的优化方面进行了详细的说明。其他方面，两个版本在COUNT()函数的解释和使用上基本保持一致。

基于此我们明白，其实COUNT(*)和COUNT(1)其实是一样的，在性能上并没有差异。

那这两种方式与COUNT(列)的差异呢？从以上《高性能Mysql》以及Mysql官方文档中我们知道，COUNT(列)是统计非空列的行数，它也会遍历整张表，然后会对列对应的值做非空判断，非空的字段进行个数累加。当然这是列为主键索引时的操作。如果列不为主键索引时，那么查询时还需要进行回表操作，再根据主键获取数据，此时无疑是增加了一次IO，在性能上其实是不如COUNT(*)和COUNT(1)的。那么我们就可以知道，按照效率来看，count(*) = count(1) > count(主键) > count(非主键列)。

使用建议

如果你想知道一张表的大概行数，我们可以直接使用show table status命令或者咱们使用的一些mysql客户端Navicat或者datagrip都可以办到。
如果你想获取一张表的确切行数时，我可以是优先使用使用 COUNT(*) 获取行数，这样写法清晰，性能较好，尤其对于 InnoDB 表的优化更为明显。我们要避免使用 COUNT(列) 统计行数，除非你真的需要统计该列非空值的数量，否则容易产生误解。

篇幅有限，深入验证将在后续文章中介绍。

本文已收录于我的个人博客：码农Academy的博客，专注分享Java技术干货，包括Java基础、Spring Boot、Spring Cloud、Mysql、Redis、Elasticsearch、中间件、架构设计、面试题、程序员攻略等

热门相关：深渊巨兽扫黑行动之黄金动物界没有爱过的女人影子告白