Back in September 2009 I launched a small web app called HNTrends.com, a tool for visualizing the movement of stories on HackerNews’s front page over time.
I haven’t worked on the site much since then, but the script that logs the data has been diligently recording the front page submissions every 15 minutes since it started.
It occurred to me that a detailed analysis of the data might yield some interesting results such as how the site has grown since then, when is the best time to post a new submission, user participation rates, or some insight that changes the way we see the site. I offer it to you today so that you may analyze it to your heart’s content.
You can download it here (CSV, 13.4 MB zipped, 169 MB unzipped).
In total, the database contains 514,478 records spanning from August 31, 2009 to March 7, 2010.
A single line looks like this:
"1","http://paulgraham.com/kate.html","What Kate saw in Silicon Valley","129","albertcardona","2009-08-31 20:15:15","63","1","2009-08-31 23:15:15","796573","HackerNews","c18577"
Removing the quotes and splitting by comma, here is what each item represents:
1
– Primary keyhttp://paulgraham.com/kate.html
– Destination URLWhat Kate saw in Silicon Valley
– Title129
– Pointsalbertcardona
– Submitter2009-08-31 20:15:15
– Approximate UTC submission time, calculated based on the time minus the age of the submission63
– Comments1
– Rank2009-08-31 23:15:15
– UTC time record was created796573
– HackerNews IDHackerNews
– Always “HackerNews”c18577
– Color for display purposes
One final note: this database covers roughly 99% of the time period since it started. For a while the script broke whenever an article didn’t contain comment link, and every so often it goes down for miscellaneous reasons.
thank you, was offline for a couple of months, allows me to catch back up.
Virtual Worker Network | Blog | 3 Things All Affiliate Marketers Need To Survive Online
Is Google now ranking based on page speed? | Web Marketing with LinkLocal
About Us
Good Pilates dvd for the relief of chronic lower and upper back pain? | Your Back Pain Cause And Cure Guide
About
Hi, the link doesn’t seem to work right now. Can you fix it please?
Link is fixed now (sorry for the delay)
For others who might be thinking about working with this dataset, notice that there are a lot of duplicates:
$ perl -ne 'print if m/#1 Rule of Programming/' hntrends2009-2010.csv
"153196","item?id=901710","#1 Rule of Programming Is..","5","cyman","2009-10-25 16:30:03","2","4","2009-10-25 16:45:03","901710","HackerNews","7fdc19"
"153225","item?id=901710","#1 Rule of Programming Is..","6","cyman","2009-10-25 16:30:04","6","3","2009-10-25 17:00:04","901710","HackerNews","787b0d"
"153255","item?id=901710","#1 Rule of Programming Is..","11","cyman","2009-10-25 16:30:03","18","3","2009-10-25 17:15:03","901710","HackerNews","351cae"
"153285","item?id=901710","#1 Rule of Programming Is..","18","cyman","2009-10-25 16:30:02","24","3","2009-10-25 17:30:02","901710","HackerNews","105278"
"153315","item?id=901710","#1 Rule of Programming Is..","23","cyman","2009-10-25 16:45:02","33","3","2009-10-25 17:45:02","901710","HackerNews","c5102b"
"153345","item?id=901710","#1 Rule of Programming Is..","23","cyman","2009-10-25 17:00:04","35","3","2009-10-25 18:00:04","901710","HackerNews","327606"
"153375","item?id=901710","#1 Rule of Programming Is..","27","cyman","2009-10-25 17:15:03","41","3","2009-10-25 18:15:03","901710","HackerNews","683dcd"
"153405","item?id=901710","#1 Rule of Programming Is..","30","cyman","2009-10-25 16:30:10","45","3","2009-10-25 18:30:10","901710","HackerNews","6386a7"
"153436","item?id=901710","#1 Rule of Programming Is..","35","cyman","2009-10-25 16:45:03","47","4","2009-10-25 18:45:03","901710","HackerNews","d3c601"
"153466","item?id=901710","#1 Rule of Programming Is..","36","cyman","2009-10-25 17:00:06","52","4","2009-10-25 19:00:06","901710","HackerNews","79ae81"
"153496","item?id=901710","#1 Rule of Programming Is..","37","cyman","2009-10-25 17:15:06","54","4","2009-10-25 19:15:06","901710","HackerNews","5e9b60"
"153527","item?id=901710","#1 Rule of Programming Is..","39","cyman","2009-10-25 16:30:03","57","5","2009-10-25 19:30:03","901710","HackerNews","cf4e18"
"153557","item?id=901710","#1 Rule of Programming Is..","40","cyman","2009-10-25 16:45:03","59","5","2009-10-25 19:45:03","901710","HackerNews","1d48be"
Looking at the timestamp we can see that the ranks were grabbed every fifteen minutes and watch this story’s descent.
And Matt, thank you very much for compiling and sharing this with us.