The Kenya Quick Answer Goes Viral, Again

On Thursday evening Chris Ingraham, a journalist with 100k followers on Twitter, shared a screenshot of the now-famous “african country that starts with k” Google Quick Answer, which quickly went viral, garnering over 82k likes and 3 million views as of the time of this writing on Monday morning:

Preceden’s designer, Milan, saw it on his feed and shared it with me on Friday morning, which I first saw when I loaded Twitter on my phone at the gym that morning.

I was like alright, here we go again. I’ll reply and explain what’s going on so anyone seeing the screenshot has some context:

And then my notifications started exploding with… shall we say… mixed reactions.

Some screenshots for posterity… 🀣

And the tweet that inspired the header image on this post:

I’m sitting there at the gym and just like… what is going on

BoingBoing wound up writing an article about the incident:

I’ll note that Emergent Mind is not abandoned, for what it’s worth. Just on a long pause πŸ˜€.

In retrospect, in my original response to Chris I should have given more context, and not just assume people would click through and read the blog post. Or maybe I should not have responded at all, though I’m glad I was able to provide context, even if it resulted in some mean tweets against me, hah.

I also stand by my decision to keep that Emergent Mind page online. It’s a harmless and obviously incorrect answer that has now become the canonical example of how Google quick answers can get things wrong. Identifying and eliminating incorrect Quick Answers is no doubt hard to do at scale, but I hope this snippet and others like it contribute to Google addressing accuracy issues in time.

Sometime tells me this won’t be the last time this goes viral…

Dealing with Preceden’s Spam Problem

Image courtesy of DALLE-3

I’m honestly surprised it took spammers so long: Preceden is a freemium product (meaning people can sign up and try it for free), the product makes it very easy to create link-filled user generated content, I had no automated spam prevention mechanisms in place, and I was too busy with other things to do anything about it.

Around a year ago, I started noticing some spammy timelines being created on Preceden, my SaaS timeline maker tool.

Here’s a simple example of what a spammy timeline looks like:

Most of the spammy timelines were like this:

  • The spammer signed up for a free account
  • They created 1 timeline and made the timeline public
  • They added 1 event to the timeline and in the event’s notes, added backlinks to other sites

Along with the spammy Preceden timeline, the spammer usually also created pages on Facebook, Tumblr, and a host of other sites:

Best I can figure, the spammers were hoping some of the pages would rank well on search engines for key search terms, which would then drive traffic to the destination site (an online gambling site in this case) on behalf of some client (who I’m guessing paid someone to do this type of work). And the purpose of interlinking the pages was to try to make them more discoverable to search engines. I think most are overseas in southeast Asia, and they have some checklist of sites they work through which Preceden got added to at some point.

Backlink building is a possible purpose too, but all of Preceden’s backlinks in user generated content is set to nofollow, so don’t pass on any authority to the spammy site, and surely the spammers must have realized this. That makes the spray-and-pray theory of creating pages and hoping somehow they would drive traffic to the destination site the most likely explanation to me.

Dealing with it

At first, like I mentioned, I didn’t do anything. If I happened to stumble across a spammy timeline in the course of support work or data analysis or whatever, I’d delete the account and the timeline with it.

This obviously wasn’t scalable though, and the spammers must have realized I wasn’t doing anything to stop this, and so they flooded in, creating hundreds of spammy timelines.

Most were simple like the example above, but sometimes the spammers would wind up creating complex timelines with multiple events:

Eventually I built some initial tooling to look at event notes and through a combination of automated and manual work, ban spammy accounts.

Quick aside: banning users took some consideration. Should I delete the account and the timeline? What if I incorrectly identify something as spam and delete a legitimate user’s data? How do I prevent them from signing up for a new account? I wound up implementing it such that it made their timeline private and locked them out of their account until they emailed support, and also flagged their browser to prevent them from signing up again. There was one day where I banned 500 old accounts from prolific Vietnamese spammers. To my surprise, someone wrote into support asking what happened to their accounts. I asked what the purpose of the timelines were and never heard back 🀣.

It’s a bit of an arms race though. I originally was only looking for outbound links in event notes, but I think some of the spammers realized that, so started adding outbound links to the overall timeline notes as well as layer notes (layers are used to group similar events together on timelines) which my tool didn’t pick up.

This took me some time to realize (only this week tbh) and required some further adjustments to the spam tools to identify.

To date, I’ve banned 1,248 accounts with 1,573 timelines. And I now have solid tooling set up now to automatically ban suspicious accounts and give me ways to manually review outbound links periodically.

The impact of these timelines on Preceden isn’t clear, but there’s no way it’s helping things. Google has been rolling out lots of updates to fight low quality, spammy content, and maybe Preceden has gotten penalized for the presence of these spammy timelines.

Maybe that’s part of why Preceden’s rankings have been fluctuating so much in recent months:

Hard to know, but I’m glad it’s mostly cleaned up now.

One of the million things you gotta deal with running your own SaaS, or at least a freemium one with public user generated content πŸ˜….