So, I’m sitting at my kitchen table the other night thinking about startup type things when an idea pops into my head: Create an index for Hacker News.
Now, this isn’t the first time this occurred to me. A few weeks ago I emailed Paul Graham asking whether I could create a searchable database of Hacker News. He said he’d rather me not, plus I found out later about searchyc.com, which does exactly that.
But an index… that would have a different purpose. You could do all sorts of interesting analysis on it… top posts, top contributors, posting frequency, etc etc. No, I wouldn’t save the content, just the relevant information for the submissions only (no comments) like title, URL, points, # comments, and date.
The software wasn’t hard to write. The submissions are sequentially numbered from 1 to about 270K and it’s easy to differentiate between submissions and comments by searching the HTML. After about an hour of work and a little testing, I set off my small VB program to crawl the site.
This was Tuesday night. I went to sleep, eager to analyze the results the next day.
Wednesday morning I woke up and checked its status. 30% or something low like that. I couldn’t do any analysis then anyway — so off to work. I got home that evening and it was still chugging along. 55%. Getting there…
That night, around 9, I checked the status. 73.64%. Stupid slow connection. I came back half an hour later. 73.65%. Man, my connection is really terrible, I thought to myself. I loaded up Amazon to see if it would load. No problem. I restarted my computer, thinking it’s some connection problem. When it reboots, I check YC again … it took about 20 seconds and finally loaded. Hmm. Then it hit me. Wait a minute. Oh no. No no no no. What if the indexing caused HackerNews to go down?
This is not good. Not good at all.
So I shut the program down and went to bed. Next morning, Thursday morning, I checked my email before heading out, half expecting to see some sort of email. Nothing. Phew. YC was still somewhat slow at that point, but was improving.
I checked HackerNews throughout the day at work. Seemed to be just about better. Sometime in the afternoon I checked GMail. I had an email from Paul Graham titled “please stop”. It says:
Would you please not do that to the server again?
“Shit” I said. My coworker shot a puzzled look at me. “Nothing” I told him, “Its a long story.”
I wrote an response, apologizing profusely. Unfortunately, I realized later that night that the response didn’t go through… only a blank email. So, I rewrote the email and sent if off.
I’d like to take this opportunity again to say sorry to Paul and any other member of the HackerNews community that was affected by this. I didn’t think through what effect the indexing would cause, and would never have done it if I realized it would unintentionally result in a denial of service attack on my favorite news site. I don’t know how much time it took to fix it and apologize for any lost time YC took to correct it.
If you’re considering doing something like this, you should rethink your plans. It’s not exactly the best way to make an impression.