Matt Mazur

2018-04-26T11:23:08-04:00

My solution involves the use of dependent subqueries.

select user_id,
(select count(*) from posts where posts.user_id=users.user_id) as post_count,
(select count(*) from pages where pages.user_id=users.user_id) as page_count
from users;

To test performance differences, I loaded the tables with 16,000 posts and nearly 25,000 pages. Limited testing showed nearly identical performance with this query to your query using left join to select subqueries. Your updated simpler method took over 2000 times as long (nearly 3 minutes compared to .02 seconds) to process the same data.

Using EXPLAIN with each of the queries shows that both of your approaches involves a filesort which is avoided with my query. Adding a key to the user_id on the posts and pages tables avoids the file sort and sped up the slow query to only take 18 seconds. That is still significantly slower then the other two queries.

I do believe my approach is a bit easier to follow. I ran across this while trying to perform a similar task with a query containing about a dozen columns. More columns also required adding to the GROUP BY portion of the query.

	DROP TABLE IF EXISTS users;
	CREATE TABLE users (user_id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(20));
	INSERT INTO users (name) VALUES ('Matt');
	INSERT INTO users (name) VALUES ('Simon');
	INSERT INTO users (name) VALUES ('Jen');

	DROP TABLE IF EXISTS posts;
	CREATE TABLE posts (post_id INT PRIMARY KEY AUTO_INCREMENT, user_id INT);
	INSERT INTO posts (user_id) VALUES (1);
	INSERT INTO posts (user_id) VALUES (1);
	INSERT INTO posts (user_id) VALUES (1);
	INSERT INTO posts (user_id) VALUES (2);
	INSERT INTO posts (user_id) VALUES (2);

	DROP TABLE IF EXISTS pages;
	CREATE TABLE pages (page_id INT PRIMARY KEY AUTO_INCREMENT, user_id INT);
	INSERT INTO pages (user_id) VALUES (2);
	INSERT INTO pages (user_id) VALUES (2);
	INSERT INTO pages (user_id) VALUES (3);
	INSERT INTO pages (user_id) VALUES (3);
	INSERT INTO pages (user_id) VALUES (3);
	INSERT INTO pages (user_id) VALUES (3);
	INSERT INTO pages (user_id) VALUES (3);

	+———+——-+
	\| user_id \| name \|
	+———+——-+
	\| 1 \| Matt \|
	\| 2 \| Simon \|
	\| 3 \| Jen \|
	+———+——-+

	+———+———+
	\| post_id \| user_id \|
	+———+———+
	\| 1 \| 1 \|
	\| 2 \| 1 \|
	\| 3 \| 1 \|
	\| 4 \| 2 \|
	\| 5 \| 2 \|
	+———+———+

	+———+———+
	\| page_id \| user_id \|
	+———+———+
	\| 1 \| 2 \|
	\| 2 \| 2 \|
	\| 3 \| 3 \|
	\| 4 \| 3 \|
	\| 5 \| 3 \|
	\| 6 \| 3 \|
	\| 7 \| 3 \|
	+———+———+

	+———-+————+————+
	\| user_id \| post_count \| page_count \|
	+———-+————+————+
	\| 1 \| 3 \| 0 \|
	\| 2 \| 2 \| 2 \|
	\| 3 \| 0 \| 5 \|
	+———-+————+————+

	SELECT users.user_id, COUNT(*) AS post_count
	FROM users
	JOIN posts ON posts.user_id = users.user_id
	GROUP BY 1

	+———+————+
	\| user_id \| post_count \|
	+———+————+
	\| 1 \| 3 \|
	\| 2 \| 2 \|
	+———+————+

	SELECT users.user_id, COUNT(*) AS post_count
	FROM users
	LEFT JOIN posts ON posts.user_id = users.user_id
	GROUP BY 1

	+———+————+
	\| user_id \| post_count \|
	+———+————+
	\| 1 \| 3 \|
	\| 2 \| 2 \|
	\| 3 \| 1 \|
	+———+————+

	SELECT *
	FROM users
	LEFT JOIN posts ON posts.user_id = users.user_id

	+———-+——-+———+———+
	\| user_id \| name \| post_id \| user_id \|
	+———-+——-+———+———+
	\| 1 \| Matt \| 1 \| 1 \|
	\| 1 \| Matt \| 2 \| 1 \|
	\| 1 \| Matt \| 3 \| 1 \|
	\| 2 \| Simon \| 4 \| 2 \|
	\| 2 \| Simon \| 5 \| 2 \|
	\| 3 \| Jen \| NULL \| NULL \|
	+———-+——-+———+———+

	SELECT users.user_id, COUNT(post_id) AS post_count
	FROM users
	LEFT JOIN posts ON posts.user_id = users.user_id
	GROUP BY 1

	+———-+————+
	\| user_id \| post_count \|
	+———-+————+
	\| 1 \| 3 \|
	\| 2 \| 2 \|
	\| 3 \| 0 \|
	+———-+————+

	SELECT users.user_id, SUM(IF(post_id IS NULL, 0, 1)) AS post_count
	FROM users
	LEFT JOIN posts ON posts.user_id = users.user_id
	GROUP BY 1

	+———+————+
	\| user_id \| post_count \|
	+———+————+
	\| 1 \| 3 \|
	\| 2 \| 2 \|
	\| 3 \| 0 \|
	+———+————+

	SELECT
	users.user_id,
	SUM(IF(post_id IS NULL, 0, 1)) AS post_count,
	SUM(IF(page_id IS NULL, 0, 1)) AS page_count
	FROM users
	LEFT JOIN posts ON posts.user_id = users.user_id
	LEFT JOIN pages ON pages.user_id = users.user_id
	GROUP BY 1

	# or

	SELECT
	users.user_id,
	COUNT(post_id) AS post_count,
	COUNT(page_id) AS page_count
	FROM users
	LEFT JOIN posts ON posts.user_id = users.user_id
	LEFT JOIN pages ON pages.user_id = users.user_id
	GROUP BY 1

	+———+————+————+
	\| user_id \| post_count \| page_count \|
	+———+————+————+
	\| 1 \| 3 \| 0 \|
	\| 2 \| 4 \| 4 \|
	\| 3 \| 0 \| 5 \|
	+———+————+————+

	SELECT
	users.user_id,
	COALESCE(post_count, 0) AS post_count,
	COALESCE(page_count, 0) AS page_count
	FROM users
	LEFT JOIN (
	SELECT user_id, COUNT(*) AS post_count
	FROM posts
	GROUP BY user_id
	) post_counts ON post_counts.user_id = users.user_id
	LEFT JOIN (
	SELECT user_id, COUNT(*) AS page_count
	FROM pages
	GROUP BY user_id
	) page_counts ON page_counts.user_id = users.user_id

	+———+————+————+
	\| user_id \| post_count \| page_count \|
	+———+————+————+
	\| 1 \| 3 \| 0 \|
	\| 2 \| 2 \| 2 \|
	\| 3 \| 0 \| 5 \|
	+———+————+————+

	SELECT COALESCE(NULL, 0)

	+——————-+
	\| COALESCE(NULL, 0) \|
	+——————-+
	\| 0 \|
	+——————-+

	SELECT
	users.user_id,
	COUNT(DISTINCT post_id) AS post_count,
	COUNT(DISTINCT page_id) AS page_count
	FROM users
	LEFT JOIN posts ON posts.user_id = users.user_id
	LEFT JOIN pages ON pages.user_id = users.user_id
	GROUP BY 1

	+———+————+————+
	\| user_id \| post_count \| page_count \|
	+———+————+————+
	\| 1 \| 3 \| 0 \|
	\| 2 \| 2 \| 2 \|
	\| 3 \| 0 \| 5 \|
	+———+————+————+

Attempt 1: COUNT with JOIN

Attempt 2: COUNT with LEFT JOIN

Attempt 3: SUM/IF, and LEFT JOIN

The solution: Subqueries and COALESCE

Share this:

18 thoughts on “Counting in MySQL When Joins are Involved”

Leave a comment Cancel reply