Reflecting on My First Year as a Full Time Indie Founder

At the beginning of 2023 I went full time on Preceden, my SaaS timeline maker business, after 13 years of working on it on the side. A year has passed, so I wanted to share an update on how things are going and some lessons learned.

Preceden

Preceden today

My main focus in 2023 was building AI capabilities into Preceden to make it easier for users to create timelines. For some context: historically people would have to sign up for an account and then manually build their timeline, adding events to it one at a time. For some types of timelines where the events are unique and only known to the user (like a timeline about a legal case or a project plan), that’s still necessary. But for many other use cases (like historical timelines), Preceden can now generate comprehensive timelines for users in less than a minute, for free, directly from the homepage.

It took a good chunk of the year to get that tool to where it is today, starting in February with the launch of a tool for logged-in users to generate suggested events for their existing timelines which laid the groundwork for the launch of the logged-out homepage timeline generator in May. The v1 of that tool was slow and buggy and had design issues and I still hadn’t figured out how to integrate it into Preceden’s pricing model, but a few more months of work got most of those issues ironed out.

Since the launch of that tool in late May, people have generated more than 80k timelines with it, and around a third of new users are signing up to edit an AI generated timeline vs create one from scratch. I’m quite happy with how it turned out, and it’s miles ahead of the competition.

Marketing wise, I didn’t do enough (as usual) but did spend a few weeks working on creating a directory of high quality AI generated timelines about historical topics, some of which are starting to rank well. I also threw a few thousand dollars at advertising on Reddit, though there weren’t enough conversions to justify keeping it up.

I also executed a pricing increase for about 400 legacy customers, which I’ll see the results of this year. More on the results of that and the controversy around it in a future blog post.

Business wise, Preceden makes money in two ways: premium SaaS plans and ads. In 2023, revenue from the SaaS side of the business grew 23% YoY and revenue from the ad side of the business grew 33% YoY. The ad revenue is highly volatile though due to some swingy Google rankings, and will likely mostly disappear in 2024. Still, the SaaS revenue is the main business, and I’ll take 23% YoY growth for a 14 year old business, especially in a year where many SaaS companies struggled to grow.

Emergent Mind

Where to begin? :)

Shortly after ChatGPT launched in late 2022, I launched LearnGPT, a site for sharing ChatGPT examples. The site gained some traction and was even featured in a GPT tutorial on YouTube by Andrej Karpathy. But, a hundred competitors quickly popped up, and my interest in continuing to build a ChatGPT examples site waned, so I decided to shut it down. But then I got some interest from people to buy it, so I put it up for sale, got a $7k offer, but turned it down, and then rebranded the site to Emergent Mind and switched the focus to AI news. A few months into that iteration, I lost interest again (AI news competition is also fierce, and I didn’t think Emergent Mind was competitive, despite some people really liking it), so tried selling it again. I didn’t get any high enough offers, so decided to shut it down, but then decided to keep it, even though I didn’t know what I’d do with it.

And guess what: in November I had an idea for another iteration of the site, this time pivoting away from AI news and into a resource for staying informed about AI/ML research. I worked on that for a good chunk of November/December, and am currently mostly focused on it 😅.

I’m cautiously optimistic about this direction though: the handful of people that I’ve shared it with have been very enthusiastic about it and provided lots of great feedback that I’ve been working through.

Unlike my previous product launches, I’m saving a HN/Reddit/X launch announcement for later, after I’ve gotten the product in really good shape. There’s still lots of issues and areas for improvement, and I believe now it’s a better route to soft launch and iterate on it quietly based on 1:1 feedback before drawing too much attention to an unpolished product. Hat-tip Hiten Shah for influencing how I think about MVPs.

I’ll add too that this “surfacing trending AI/ML research” direction is the first step in a larger vision I have for the site. I think it could evolve into something really neat – maybe even a business – though time will tell.

2024

Preceden is in a good/interesting spot where it’s a fairly feature-complete product that requires very little support and maintenance. I don’t have any employees, and could not work on it for months and it would likely still grow and continue to work fine.

When I look ahead, the most popular feature requests seem like they won’t be heavily used and will wind up bloating the product and codebase. That doesn’t mean there’s no room for improvement – there always is – just that I’m not sure it makes sense anymore for me to be so heads down in VS Code working on it. It’s the first time maybe ever that I’ve thought that. I’d probably see more business impact by spending my time on marketing, but that’s not exactly what I want to spend a lot of my time doing, plus I also can’t afford the kind of talent I’d need to market it effectively either (marketing a B2C horizontal SaaS isn’t fun).

So, my current thinking is that I’ll keep improving and lightly marketing Preceden, but with less intensity than I have in years past. Instead, I’ll devote more of my time to building other products: Emergent Mind and maybe others in the future. Maybe one of those will turn into a second income stream but maybe not. I enjoy the 0 to 1 aspect of creating new products, and the income from Preceden supports me in pursuing that for now. And if Preceden starts declining, I can always start focusing on it again, or go back to contracting or a full time position somewhere, which isn’t a bad outcome either.

Also, one thing I regret not doing more of in 2023 was spending more time wandering. It’s easy for me to get super focused on some project and not leave any time in my day for exploring what else is out there. Only toward the end of the year did I start experimenting with new AI tech like Mixtral. Going forward, I want to spend some time each week learning about, experimenting with, and blogging about new AI tech. I’m still very much in the “AI will change the world in the coming years” camp, and I have the freedom and interest to spend some of my time learning and tinkering, so am going to try to do that.

As always, I welcome any feedback on how I’m thinking about things.

Happy new year everyone and thanks for reading 👋.

Running Mistral 7B Instruct on a Macbook

Similar to yesterday’s post on running Mistral 8x7Bs Mixture of Experts (MOE) model, I wanted to document the steps I took to run Mistral’s 7B-Instruct-v0.2 model on a Mac for anyone else interested in playing around with it.

Unlike yesterday’s post though, this 7B Instruct model’s inference speed is about 20 tokens/second on my M2 Macbook with its 24GB of RAM, making it something a lot more practical to play around with than the 10 tokens/hour MOE model.

These instructions are once again inspired by Einar Vollset’s post where he shared his steps, though updated to account for a few changes in recent days.

Update Dec 19: A far easier way to run this model is to use Ollama. Simply install it on your Mac, open it, then run ollama run mistral from the command line. However, if you want to go the more complex route, here are the steps:

1) Download HuggingFace’s model downloader

bash <(curl -sSL https://g.bodaay.io/hfd) -h

2) Download the Mistral 7B Instruct model

./hfdownloader -m mistralai/Mistral-7B-Instruct-v0.2

For me, I ran both of the commands above in my ~/code directory, and the downloader downloaded the model into ~/code/Storage/mistralai_Mistral-7B-Instruct-v0.2.

3) Clone llama.cpp and install the necessary packages

Using the GitHub CLI:

gh repo clone ggerganov/llama.cpp

And after you have it cloned, install the necessary packages:

python3 -m pip install -r requirements.txt

4) Move the 7B model folder into llama.cpp/models

5) Convert to F16

python3 convert.py models/mistralai_Mistral-7B-Instruct-v0.2 --outfile models/mistralai_Mistral-7B-Instruct-v0.2/ggml-model-f16.gguf --outtype f16

6) Quantize it

./quantize models/mistralai_Mistral-7B-Instruct-v0.2/ggml-model-f16.gguf models/mistralai_Mistral-7B-Instruct-v0.2/ggml-model-q4_0.gguf q4_0

7) Run it

./main -m ./models/mistralai_Mistral-7B-Instruct-v0.2/ggml-model-q4_0.gguf -p "I believe the meaning of life is" -ngl 999 -s 1 -n 128 -t 8

Alternatively, run the built-in web server:

make -j && ./server -m models/mistralai_Mistral-7B-Instruct-v0.2/ggml-model-q4_0.gguf -c 4096

Unless you have a very powerful Macbook, definitely experiment with this model instead of the MOE model 🤣.

Running Mistral 8x7Bs Mixture of Experts on a Macbook

Below are the steps I used to get Mistral 8x7Bs Mixture of Experts (MOE) model running locally on my Macbook (with its Apple M2 chip and 24 GB of memory). Here’s a great overview of the model for anyone interested in learning more. Short version:

The Mistral “Mixtral” 8x7B 32k model,developed by Mistral AI, is a Mixture of Experts (MoE) model designed to enhance machine understanding and generation of text. Similar to GPT-4, Mixtral-8x7b uses a Mixture of Experts (MoE) architecture with 8 experts, each having 7 billion parameters.

Mixtral 8x7b has a total of 56 billion parameters, supports a 32k context window, and displaces both Meta Llama 2 and OpenAI GPT-3.5 in 4 out of 7 leading LLM benchmarks.

These step below were inspired by those shared by Einar Vollset on X, but specific to the 8x7B MOE model, not the Mistral-7B-Instruct-v0.2 model, and taking into account recent changes to the llama.cpp repo to support this model.

Note that Einar’s 16gb Macbook generated 10 tokens/second with the Instruct model, but my 24gb Macbook absolutely crawled running this MOE model, generating more like 10 tokens/hour, and becoming unusable in the process. Here’s my command line output if anyone can help me figure out why it’s so slow, though it’s likely that the model is just too much this hardware. Unless you have a very powerful Macbook, I’d recommend running the Mistral 7B Instruct model instead of this 8x7Bs MOE model.

1) Clone llama.cpp and install the necessary packages

gh repo clone ggerganov/llama.cpp

This uses the GitHub CLI, though it isn’t completely necessary.

After you have it cloned:

python3 -m pip install -r requirements.txt

2) Download the model torrent

I use µTorrent, though any other torrent app will do.

Here’s a direct link to the torrent.

3) Move the model directory into llama.cpp/models

4) Convert the model to F16

python3 convert.py ./models/mixtral-8x7b-32kseqlen --outfile ./models/mixtral-8x7b-32kseqlen/ggml-model-f16.gguf --outtype f16

This converts the model to a 16-bit floating-point representation to reduce the model’s size and computational requirements.

5) Quantize it

./quantize ./models/mixtral-8x7b-32kseqlen/ggml-model-f16.gguf ./models/mixtral-8x7b-32kseqlen/ggml-model-q4_0.gguf q4_0

Quantizing the model reduces the precision of the numbers used in a model, which can lead to smaller model sizes and faster inference times at the cost of some accuracy. The “q4_0” means that the model is being quantized to use 4-bit representation for each number.

6) Use it

Either via the command line:

./main -m ./models/mixtral-8x7b-32kseqlen/ggml-model-q4_0.gguf -p "I believe the meaning of life is" -ngl 999 -s 1 -n 128 -t 8

Or the built-in web app:

make -j && ./server -m models/mixtral-8x7b-32kseqlen/ggml-model-q4_0.gguf -c 4096

Enjoy!

Emergent Mind in The Atlantic

One of the many people who saw the incorrect Google Quick Answer about Kenya was an editor at The Atlantic who asked Caroline Mimbs Nyce, one of their reporters, to look into it. Caroline interviewed me for the article they just published which focused on the challenges Google is facing in the age of AI-generated content.

From the article:

Given how nonsensical this response is, you might not be surprised to hear that the snippet was originally written by ChatGPT. But you may be surprised by how it became a featured answer on the internet’s preeminent knowledge base. The search engine is pulling this blurb from a user post on Hacker News, an online message board about technology, which is itself quoting from a website called Emergent Mind, which exists to teach people about AI—including its flaws. At some point, Google’s crawlers scraped the text, and now its algorithm automatically presents the chatbot’s nonsense answer as fact, with a link to the Hacker News discussion. The Kenya error, however unlikely a user is to stumble upon it, isn’t a one-off: I first came across the response in a viral tweet from the journalist Christopher Ingraham last month, and it was reported by Futurism as far back as August. (When Ingraham and Futurism saw it, Google was citing that initial Emergent Mind post, rather than Hacker News.)

One thing I learned from the article is the reason why Google hasn’t removed the Kenya quick answer, despite it being obviously incorrect and existing since at least August, is that it doesn’t violate their Terms of Service, and they are more focused on addressing the larger accuracy issue, not dealing with one-off instances of incorrect answers:

The Kenya result still pops up on Google, despite viral posts about it. This is a strategic choice, not an error. If a snippet violates Google policy (for example, if it includes hate speech) the company manually intervenes and suppresses it, Nayak said. However, if the snippet is untrue but doesn’t violate any policy or cause harm, the company will not intervene. Instead, Nayak said the team focuses on the bigger underlying problem, and whether its algorithm can be trained to address it.

The Atlantic article was published before I was alerted earlier this week by Full Fact, a UK fact-checking organization, about a more egregious example where Google misinterpreted a creative writing example on Emergent Mind about the health benefits of eating glass and was showing it as a Quick Answer:

You can read Full Fact’s article about this glass-eating snippet here: Google snippets falsely claimed eating glass has health benefits. As I noted on X, I quickly removed this page from Emergent Mind on the off-chance that someone misinterprets it as health advice.

Something tells me this won’t be the last we’ll hear about Google misinterpreting ChatGPT examples on Emergent Mind. Until then…