The Wrong Way to Get Noticed by YC

Doh
So, I’m sitting at my kitchen table the other night thinking about startup type things when an idea pops into my head: Create an index for Hacker News.

Now, this isn’t the first time this occurred to me. A few weeks ago I emailed Paul Graham asking whether I could create a searchable database of Hacker News. He said he’d rather me not, plus I found out later about searchyc.com, which does exactly that.

But an index… that would have a different purpose. You could do all sorts of interesting analysis on it… top posts, top contributors, posting frequency, etc etc. No, I wouldn’t save the content, just the relevant information for the submissions only (no comments) like title, URL, points, # comments, and date.

The software wasn’t hard to write. The submissions are sequentially numbered from 1 to about 270K and it’s easy to differentiate between submissions and comments by searching the HTML. After about an hour of work and a little testing, I set off my small VB program to crawl the site.

This was Tuesday night. I went to sleep, eager to analyze the results the next day.

Wednesday morning I woke up and checked its status. 30% or something low like that. I couldn’t do any analysis then anyway — so off to work. I got home that evening and it was still chugging along. 55%. Getting there…

That night, around 9, I checked the status. 73.64%. Stupid slow connection. I came back half an hour later. 73.65%. Man, my connection is really terrible, I thought to myself. I loaded up Amazon to see if it would load. No problem. I restarted my computer, thinking it’s some connection problem. When it reboots, I check YC again … it took about 20 seconds and finally loaded. Hmm. Then it hit me. Wait a minute. Oh no. No no no no. What if the indexing caused HackerNews to go down?

This is not good. Not good at all.

So I shut the program down and went to bed. Next morning, Thursday morning, I checked my email before heading out, half expecting to see some sort of email. Nothing. Phew. YC was still somewhat slow at that point, but was improving.

I checked HackerNews throughout the day at work. Seemed to be just about better. Sometime in the afternoon I checked GMail. I had an email from Paul Graham titled “please stop”. It says:

Would you please not do that to the server again?

“Shit” I said. My coworker shot a puzzled look at me. “Nothing” I told him, “Its a long story.”

I wrote an response, apologizing profusely. Unfortunately, I realized later that night that the response didn’t go through… only a blank email. So, I rewrote the email and sent if off.

I’d like to take this opportunity again to say sorry to Paul and any other member of the HackerNews community that was affected by this. I didn’t think through what effect the indexing would cause, and would never have done it if I realized it would unintentionally result in a denial of service attack on my favorite news site. I don’t know how much time it took to fix it and apologize for any lost time YC took to correct it.

If you’re considering doing something like this, you should rethink your plans. It’s not exactly the best way to make an impression.

Ajax Lab: Draggable w/ Memory

Added position saving to yesterday’s Prototype example: you can see it here. When you move the box around it will save the position where you drop it, then when you reload the page it’ll start out in that position. Pretty nifty, eh?

I ran into some more syntax issues where I kept switching PHP and JavaScript syntax. That’s mostly an experience issue which should erode with time.

One important concern is that anyone can POST data to the page’s accompanying PHP file. I tested this by posting a position of 1000000px,1000000px and sure enough, when I loaded the page the green box jumped to that spot. Firefox was not too fond of this. To remedy the problem, I added a conditional to the JavaScript which adjusts negative and high values but in retrospect, I should check them in the PHP file before they are saved in the database.

I asked John whether there was a way to check that a POST was coming from my site and he said yeah, with $_SERVER['HTTP_REFERER']. At first this seemed like it would solve the problem, but I realized that you can use FireBug to modify the source code on any page. That means that someone can visit any page on this site, edit the HTML to POST to the PHP file, and it would treat it the same as code I had written.

I think the best solution is to validate the data before inputting it into the database. That way if someone does try to set the position to 1000000px, 1000000px it won’t cause any problems.

Two more helpful sites:

JavaScript – Converting Strings to Numbers

JavaScript – Conditional Statement Syntax

On an unrelated note, I added a “Recommended Books” section to the sidebar, which is something I wish other tech writers did more often.

Ajax Prototype Lab: Draggable Objects

Today’s Ajax Prototype experiment involves draggable objects: Check it out.

A few issues came up during its development.

I couldn’t find a list of observable events (ie “mousedown”, “mouseup”, “mousemove” etc). That took a little trial and error to figure out.

In Internet Explorer when I click in the box to drag it around sometimes the text is highlighted (like you’d see if you were highlighting text to copy it). Not sure how to stop that, other than turning the entire thing into an image. No problems with this on Firefox 3.

Firefox 3 has a feature that when you click down on an object you can drag a semi-transparent copy of it around the page. Not sure the purpose of it — it looks cool, I guess. That was causing problems until I threw in a line to stop the event propagation: Event.stop(event);. In that respect, Internet Explorer came out on top.

Tomorrow I’ll add the ability to save the box’s last position so that when someone new loads the page it is in the same spot that the last person dropped it at.

Two sites that were helpful today:
JavaScript Kit – DOM Element properties
CSS Cursors

Ajax!

Today, the Ajax adventure continues: Arrows with a Counter! (it saves the state in a database)

The logic isn’t terribly complicated, but I stumbled over the syntax writing it up. PHP and JavaScript are syntactically similar which makes debugging difficult, especially since JavaScript fails silently. I resorted to lots of alerts to help me narrow down the failure points. Exciting… I know.

…GMail, here I come.