Six Months of HackerNews Front Page Data

Back in September 2009 I launched a small web app called HNTrends.com, a tool for visualizing the movement of stories on HackerNews’s front page over time.

I haven’t worked on the site much since then, but the script that logs the data has been diligently recording the front page submissions every 15 minutes since it started.

It occurred to me that a detailed analysis of the data might yield some interesting results such as how the site has grown since then, when is the best time to post a new submission, user participation rates, or some insight that changes the way we see the site. I offer it to you today so that you may analyze it to your heart’s content.

You can download it here (CSV, 13.4 MB zipped, 169 MB unzipped).

In total, the database contains 514,478 records spanning from August 31, 2009 to March 7, 2010.

A single line looks like this:

"1","http://paulgraham.com/kate.html","What Kate saw in Silicon Valley","129","albertcardona","2009-08-31 20:15:15","63","1","2009-08-31 23:15:15","796573","HackerNews","c18577"

Removing the quotes and splitting by comma, here is what each item represents:

  • 1 – Primary key
  • http://paulgraham.com/kate.html – Destination URL
  • What Kate saw in Silicon Valley – Title
  • 129 – Points
  • albertcardona – Submitter
  • 2009-08-31 20:15:15 – Approximate UTC submission time, calculated based on the time minus the age of the submission
  • 63 – Comments
  • 1 – Rank
  • 2009-08-31 23:15:15 – UTC time record was created
  • 796573 – HackerNews ID
  • HackerNews – Always “HackerNews”
  • c18577 – Color for display purposes

One final note: this database covers roughly 99% of the time period since it started. For a while the script broke whenever an article didn’t contain comment link, and every so often it goes down for miscellaneous reasons.

10 thoughts on “Six Months of HackerNews Front Page Data

  1. Pingback: About Us
  2. Pingback: About
  3. For others who might be thinking about working with this dataset, notice that there are a lot of duplicates:


    $ perl -ne 'print if m/#1 Rule of Programming/' hntrends2009-2010.csv
    "153196","item?id=901710","#1 Rule of Programming Is..","5","cyman","2009-10-25 16:30:03","2","4","2009-10-25 16:45:03","901710","HackerNews","7fdc19"
    "153225","item?id=901710","#1 Rule of Programming Is..","6","cyman","2009-10-25 16:30:04","6","3","2009-10-25 17:00:04","901710","HackerNews","787b0d"
    "153255","item?id=901710","#1 Rule of Programming Is..","11","cyman","2009-10-25 16:30:03","18","3","2009-10-25 17:15:03","901710","HackerNews","351cae"
    "153285","item?id=901710","#1 Rule of Programming Is..","18","cyman","2009-10-25 16:30:02","24","3","2009-10-25 17:30:02","901710","HackerNews","105278"
    "153315","item?id=901710","#1 Rule of Programming Is..","23","cyman","2009-10-25 16:45:02","33","3","2009-10-25 17:45:02","901710","HackerNews","c5102b"
    "153345","item?id=901710","#1 Rule of Programming Is..","23","cyman","2009-10-25 17:00:04","35","3","2009-10-25 18:00:04","901710","HackerNews","327606"
    "153375","item?id=901710","#1 Rule of Programming Is..","27","cyman","2009-10-25 17:15:03","41","3","2009-10-25 18:15:03","901710","HackerNews","683dcd"
    "153405","item?id=901710","#1 Rule of Programming Is..","30","cyman","2009-10-25 16:30:10","45","3","2009-10-25 18:30:10","901710","HackerNews","6386a7"
    "153436","item?id=901710","#1 Rule of Programming Is..","35","cyman","2009-10-25 16:45:03","47","4","2009-10-25 18:45:03","901710","HackerNews","d3c601"
    "153466","item?id=901710","#1 Rule of Programming Is..","36","cyman","2009-10-25 17:00:06","52","4","2009-10-25 19:00:06","901710","HackerNews","79ae81"
    "153496","item?id=901710","#1 Rule of Programming Is..","37","cyman","2009-10-25 17:15:06","54","4","2009-10-25 19:15:06","901710","HackerNews","5e9b60"
    "153527","item?id=901710","#1 Rule of Programming Is..","39","cyman","2009-10-25 16:30:03","57","5","2009-10-25 19:30:03","901710","HackerNews","cf4e18"
    "153557","item?id=901710","#1 Rule of Programming Is..","40","cyman","2009-10-25 16:45:03","59","5","2009-10-25 19:45:03","901710","HackerNews","1d48be"

    Looking at the timestamp we can see that the ranks were grabbed every fifteen minutes and watch this story’s descent.

Leave a comment