While trying to wrangle a MySQL query to rank a set of results, I got sucked into a few Stack Overflow threads outlining various approaches.
In order to understand the pros and cons of each technique, I created some test data and reimplemented the solutions, all of which I’ve shared below.
If there are any other methods worth including here, please drop a note in the comments.
We’ll be working with this table:
DROP TABLE IF EXISTS users; CREATE TABLE users (user_id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(20), start_date DATE, team_id INT); INSERT INTO users (name, start_date, team_id) VALUES ('Matt', '2017-01-01', 1); INSERT INTO users (name, start_date, team_id) VALUES ('John', '2017-01-02', 2); INSERT INTO users (name, start_date, team_id) VALUES ('Sara', '2017-01-02', 2); INSERT INTO users (name, start_date, team_id) VALUES ('Tim', '2017-01-02', 3); INSERT INTO users (name, start_date, team_id) VALUES ('Bob', '2017-01-03', 3); INSERT INTO users (name, start_date, team_id) VALUES ('Bill', '2017-01-04', 3); INSERT INTO users (name, start_date, team_id) VALUES ('Kathy', '2017-01-04', 3); INSERT INTO users (name, start_date, team_id) VALUES ('Anne', '2017-01-05', 3);
+---------+-------+------------+---------+ | user_id | name | start_date | team_id | +---------+-------+------------+---------+ | 1 | Matt | 2017-01-01 | 1 | | 2 | John | 2017-01-02 | 2 | | 3 | Sara | 2017-01-02 | 2 | | 4 | Tim | 2017-01-02 | 3 | | 5 | Bob | 2017-01-03 | 3 | | 6 | Bill | 2017-01-04 | 3 | | 7 | Kathy | 2017-01-04 | 3 | | 8 | Anne | 2017-01-05 | 3 | +---------+-------+------------+---------+
Ranked by start date
SELECT * FROM users ORDER by start_date ASC
Ranked by start date with ties broken by user id
SELECT * FROM users ORDER by start_date ASC, user_id ASC
First employee by start date with ties broken by user id
SELECT * FROM users ORDER by start_date ASC, user_id ASC LIMIT 1
First employee by start date with ties
SELECT * FROM users WHERE start_date = (SELECT MIN(start_date) FROM users);
Second employee by start date with ties broken by user id
SELECT * FROM users ORDER by start_date ASC, user_id ASC LIMIT 1 OFFSET 1
Second employee by start date with ties
SELECT * FROM users WHERE start_date = ( SELECT DISTINCT start_date FROM users ORDER BY start_date ASC LIMIT 1 OFFSET 1 )
Ranked by start date using variable
SET @rank := 0; SELECT *, @rank := @rank + 1 AS rank FROM users ORDER BY start_date ASC
+---------+-------+------------+---------+------+ | user_id | name | start_date | team_id | rank | +---------+-------+------------+---------+------+ | 1 | Matt | 2017-01-01 | 1 | 1 | | 2 | John | 2017-01-02 | 2 | 2 | | 3 | Sara | 2017-01-02 | 2 | 3 | | 4 | Tim | 2017-01-02 | 3 | 4 | | 5 | Bob | 2017-01-03 | 3 | 5 | | 6 | Bill | 2017-01-04 | 3 | 6 | | 7 | Kathy | 2017-01-04 | 3 | 7 | | 8 | Anne | 2017-01-05 | 3 | 8 | +---------+-------+------------+---------+------+
Ranked by start date using a variable
Based on this Stack Overflow comment:
SELECT *, @rank := @rank + 1 AS rank FROM users, (SELECT @rank := 0) r ORDER BY start_date ASC
+---------+-------+------------+---------+------------+------+ | user_id | name | start_date | team_id | @rank := 0 | rank | +---------+-------+------------+---------+------------+------+ | 1 | Matt | 2017-01-01 | 1 | 0 | 1 | | 2 | John | 2017-01-02 | 2 | 0 | 2 | | 3 | Sara | 2017-01-02 | 2 | 0 | 3 | | 4 | Tim | 2017-01-02 | 3 | 0 | 4 | | 5 | Bob | 2017-01-03 | 3 | 0 | 5 | | 6 | Bill | 2017-01-04 | 3 | 0 | 6 | | 7 | Kathy | 2017-01-04 | 3 | 0 | 7 | | 8 | Anne | 2017-01-05 | 3 | 0 | 8 | +---------+-------+------------+---------+------------+------+
First employee by start date using by setting a variable
SET @rank := 0; SELECT * FROM ( SELECT *, @rank := @rank + 1 AS rank FROM users ORDER BY start_date ASC ) ranked WHERE rank = 1
Ranked by start date with ties
Based on this Stack Overflow comment:
SET @prev_start_date = NULL; SET @rank := 0; SELECT *, CASE WHEN @prev_start_date = start_date THEN @rank -- Note that the assignment here will always be true WHEN @prev_start_date := start_date THEN @rank := @rank + 1 END AS rank FROM users ORDER BY start_date ASC
+---------+-------+------------+---------+------+ | user_id | name | start_date | team_id | rank | +---------+-------+------------+---------+------+ | 1 | Matt | 2017-01-01 | 1 | 1 | | 2 | John | 2017-01-02 | 2 | 2 | | 3 | Sara | 2017-01-02 | 2 | 2 | | 4 | Tim | 2017-01-02 | 3 | 2 | | 5 | Bob | 2017-01-03 | 3 | 3 | | 6 | Bill | 2017-01-04 | 3 | 4 | | 7 | Kathy | 2017-01-04 | 3 | 4 | | 8 | Anne | 2017-01-05 | 3 | 5 | +---------+-------+------------+---------+------+
Ranked by user id within each team
SELECT a.*, COUNT(*) AS rank FROM users a INNER JOIN users b ON a.team_id = b.team_id AND a.user_id >= b.user_id GROUP BY a.team_id, a.user_id
or, based on this Stack Overflow comment:
SELECT a.*, ( SELECT COUNT(*) FROM users b WHERE a.team_id = b.team_id AND a.user_id >= b.user_id ) AS ranked FROM users a
+---------+-------+------------+---------+--------+ | user_id | name | start_date | team_id | ranked | +---------+-------+------------+---------+--------+ | 1 | Matt | 2017-01-01 | 1 | 1 | | 2 | John | 2017-01-02 | 2 | 1 | | 3 | Sara | 2017-01-02 | 2 | 2 | | 4 | Tim | 2017-01-02 | 3 | 1 | | 5 | Bob | 2017-01-03 | 3 | 2 | | 6 | Bill | 2017-01-04 | 3 | 3 | | 7 | Kathy | 2017-01-04 | 3 | 4 | | 8 | Anne | 2017-01-05 | 3 | 5 | +---------+-------+------------+---------+--------+
Note that both of these techniques require that there be a column without duplicates that we can rank on within the partition. For example, we can’t use start_date
due to the duplicates within team 2 (2017-01-02
) and team 3 (2017-01-14
):
SELECT a.*, COUNT(*) AS rank FROM users a INNER JOIN users b ON a.team_id = b.team_id AND a.start_date >= b.start_date GROUP BY a.team_id, a.user_id
+---------+-------+------------+---------+------+ | user_id | name | start_date | team_id | rank | +---------+-------+------------+---------+------+ | 1 | Matt | 2017-01-01 | 1 | 1 | | 2 | John | 2017-01-02 | 2 | 2 | | 3 | Sara | 2017-01-02 | 2 | 2 | | 4 | Tim | 2017-01-02 | 3 | 1 | | 5 | Bob | 2017-01-03 | 3 | 2 | | 6 | Bill | 2017-01-04 | 3 | 4 | | 7 | Kathy | 2017-01-04 | 3 | 4 | | 8 | Anne | 2017-01-05 | 3 | 5 | +---------+-------+------------+---------+------+
Return the last person to join within each team based on user id
Based on this Stack Overflow comment:
SELECT a.* FROM users a LEFT JOIN users b ON a.team_id = b.team_id AND a.user_id < b.user_id WHERE b.team_id IS NULL
or
SELECT a.* FROM users a WHERE user_id IN ( SELECT MAX(user_id) FROM users GROUP BY team_id )
+---------+------+------------+---------+ | user_id | name | start_date | team_id | +---------+------+------------+---------+ | 1 | Matt | 2017-01-01 | 1 | | 3 | Sara | 2017-01-02 | 2 | | 8 | Anne | 2017-01-05 | 3 | +---------+------+------------+---------+
Return the last people to join within each team based on start date
SELECT a.* FROM users a LEFT JOIN users b ON a.team_id = b.team_id AND a.start_date < b.start_date WHERE b.team_id IS NULL
or, based on this groupwise max post:
SELECT a.* FROM users a INNER JOIN ( SELECT team_id, MAX(start_date) AS max_start_date FROM users b GROUP BY team_id ) max_start_dates ON a.team_id = max_start_dates.team_id AND a.start_date = max_start_dates.max_start_date
+---------+------+------------+---------+ | user_id | name | start_date | team_id | +---------+------+------------+---------+ | 1 | Matt | 2017-01-01 | 1 | | 2 | John | 2017-01-02 | 2 | | 3 | Sara | 2017-01-02 | 2 | | 8 | Anne | 2017-01-05 | 3 | +---------+------+------------+---------+
Ranked with gaps
Based on this Stack Overflow comment:
SELECT user_id, name, start_date, team_id, rank FROM ( SELECT *, IF(start_date = @_last_start_date, @cur_rank := @cur_rank, @cur_rank := @_sequence) AS rank, @_sequence := @_sequence + 1, @_last_start_date := start_date FROM users, (SELECT @cur_rank := 1, @_sequence := 1, @_last_start_date := NULL) r ORDER BY start_date ) ranked
Notice that after the three tied for second earliest start date, the next one jumps to 5 (not 3):
+---------+-------+------------+---------+------+ | user_id | name | start_date | team_id | rank | +---------+-------+------------+---------+------+ | 1 | Matt | 2017-01-01 | 1 | 1 | | 2 | John | 2017-01-02 | 2 | 2 | | 3 | Sara | 2017-01-02 | 2 | 2 | | 4 | Tim | 2017-01-02 | 3 | 2 | | 5 | Bob | 2017-01-03 | 3 | 5 | | 6 | Bill | 2017-01-04 | 3 | 6 | | 7 | Kathy | 2017-01-04 | 3 | 6 | | 8 | Anne | 2017-01-05 | 3 | 8 | +---------+-------+------------+---------+------+
Technically this is known as the rank whereas the other examples, which didn’t include gaps, is the dense rank.
Happy ranking!