The Art of Software

SQL Query to Generate Data for Histogram/Frequency Plot


Say we have two MySQL relations, User and Checkin, where each User can have zero or more Checkins. Furthermore, the User table has two attributes: id and name, while the Checkin table has three attributes: id, user_id, and date.

We would like to generate a plot of the number of users versus the number of checkins. That is, how many users have checked in one time, two times, etc?

We arrive at our answer by first calculating the number of checkins per user, which can be generated using the following query:

 SELECT user_id, count(user_id) AS cicount FROM Checkin GROUP BY user_id 

This gives us a table with two attributes: user_id, and cicount, where cicount means “checkin count”. Let’s call this table “PerUserCount”, which we can query to get the counts we really want.

 SELECT cicount, count(user_id) AS nusers FROM PerUserCount GROUP BY cicount 

Of course we can’t write the query above as is. We have to use a derived table. The complete query is:

SELECT cicount, count(user_id) as nusers FROM (SELECT user_id, count(user_id) AS cicount FROM Checkin GROUP BY user_id ) as PerUserCount GROUP BY cicount

This Stack Overflow post was helpful in deriving this solution.