Back in September 2009 I launched a small web app called HNTrends.com, a tool for visualizing the movement of stories on HackerNews’s front page over time.
I haven’t worked on the site much since then, but the script that logs the data has been diligently recording the front page submissions every 15 minutes since it started.
It occurred to me that a detailed analysis of the data might yield some interesting results such as how the site has grown since then, when is the best time to post a new submission, user participation rates, or some insight that changes the way we see the site. I offer it to you today so that you may analyze it to your heart’s content.
You can download it here (CSV, 13.4 MB zipped, 169 MB unzipped).
In total, the database contains 514,478 records spanning from August 31, 2009 to March 7, 2010.
A single line looks like this:
"1","http://paulgraham.com/kate.html","What Kate saw in Silicon Valley","129","albertcardona","2009-08-31 20:15:15","63","1","2009-08-31 23:15:15","796573","HackerNews","c18577"
Removing the quotes and splitting by comma, here is what each item represents:
1
– Primary key
http://paulgraham.com/kate.html
– Destination URL
What Kate saw in Silicon Valley
– Title
129
– Points
albertcardona
– Submitter
2009-08-31 20:15:15
– Approximate UTC submission time, calculated based on the time minus the age of the submission
63
– Comments
1
– Rank
2009-08-31 23:15:15
– UTC time record was created
796573
– HackerNews ID
HackerNews
– Always “HackerNews”
c18577
– Color for display purposes
One final note: this database covers roughly 99% of the time period since it started. For a while the script broke whenever an article didn’t contain comment link, and every so often it goes down for miscellaneous reasons.