Generating High Quality Available .com Domain Names for a Specific Industry

In my last post I detailed how to extract all of the available .com domain names from the .com zone file. In this post I’m going to show you how to do something very useful with the result: finding a great available domain name for a business in a specific industry.

For example, we’re going to find great business names that can fill in the blanks for the industry of your choosing:

  • ____________Marketing.com
  • ____________Consulting.com
  • ____________SEO.com
  • ____________Data.com
  • ____________Media.com
  • ____________Systems.com
  • ____________Law.com

The big idea: Check for keywords that are registered for other industries, but not registered for yours

Consider this: what if we looked at all of the registered domains that end with advertising.com, figure out the keyword, and then check whether the corresponding marketing.com domain is available? For example, imagine we check and see that the domain HightowerAdvertising.com is registered (we’ll refer to Hightower as the keyword here). We can then check to see if HightowerMarketing.com is registered. Because someone already registered the keyword for the advertising industry, there’s a good chance that the keyword is meaningful and worth checking for the marketing industry as well.

We can take this a step further by checking for common keywords in multiple industries. For example, we check all the domains that end in advertising.com, all that end in media.com, see which keywords they have in common, then check which of those are not registered for marketing.com domains.

The fewer industies we check for common keywords, the more results we’ll have, but the lower the quality. The more industries we check, the fewer the results, but the higher the quality.

Getting your command line on

If you went through my last post, you should have wound up with a domains.txt file that has about 108M registered .com domain names:

$ wc -l domains.txt 
 108894538 domains.txt

With a little bit of command line magic, we can extract all of the domains that end in ADVERTISING (like HIGHTOWERADVERTISING), then remove the trailing ADVERTISING word to get just HIGHTOWER, then sort those results and save it to a list:

$ LC_ALL=C grep ADVERTISING$ domains.txt | sed 's/.\{11\}$//' | sort -u > tmp/advertising.txt

Which will generate a list such as:

Then we do the same for MARKETING domains:

$ LC_ALL=C grep MARKETING$ domains.txt | sed 's/.\{9\}$//' | sort -u > tmp/marketing.txt

And finally, we figure out which domains are in the advertising list but not in the marketing list:

$ comm -23 tmp/advertising.txt tmp/marketing.txt > results/marketing.txt

If we want to find common keywords registered in multiple industries, we need to add an extra step to generate that list of common keywords before figuring out which ones are available in ours:

$ comm -12 tmp/advertising.txt tmp/media.txt | comm -12 - tmp/design.txt | sort -u > tmp/common.txt
$ comm -23 tmp/common.txt tmp/marketing.txt > results/marketing.txt

The resulting marketing.txt list will have the common keywords in the other industries that are likely not registered in yours:

The way to interpret this is that for a keyword like Adspace, those domains are registered in the other industries (AdspaceAdvertising.com, AdspaceMedia.com), but not registered for ours (AdspaceMarketing.com). Again, the more similiar industries you check for common keywords, the higher the quality of results you’ll have. We could add three or four more industries to get a short, very high quality list.

By the way, the reason I say likely not registered is because once a domain loses its name servers – for example, if it’s way past its expiration date – it will drop out of the zone file even though the name isn’t available to register yet. Therefore some of the results might actually be registered, but a quick WHOIS check will confirm if it is or not:

$ whois blueheronmarketing.com

No match for domain "BLUEHERONMARKETING.COM".

Or you could just use this Ruby script

Because it’s a pain to run all of these commands while searching for available domains in an industry, I put together this small Ruby script to help:

https://github.com/mattm/industry-domain-name-generator

There are instructions in the README explaining how to set the industry and similar industries in the script. If all goes well, it will run all of the necessary commands to generate the list of results:

$ ruby generator.rb 
Finding available domains for marketing...
Generating industry name lists...
Searching for domains that end with 'advertising'...
  LC_ALL=C grep ADVERTISING$ domains.txt | sed 's/.\{11\}$//' | sort -u > tmp/advertising.txt
Searching for domains that end with 'media'...
  LC_ALL=C grep MEDIA$ domains.txt | sed 's/.\{5\}$//' | sort -u > tmp/media.txt
Searching for domains that end with 'design'...
  LC_ALL=C grep DESIGN$ domains.txt | sed 's/.\{6\}$//' | sort -u > tmp/design.txt
Searching for domains that end with 'marketing'...
  LC_ALL=C grep MARKETING$ domains.txt | sed 's/.\{9\}$//' | sort -u > tmp/marketing.txt
Finding common names in industries...
  comm -12 tmp/advertising.txt tmp/media.txt | comm -12 - tmp/design.txt | sort -u > tmp/common.txt
Finding names not registered for marketing...
  comm -23 tmp/common.txt tmp/marketing.txt > results/marketing.txt
Done, results available in results/marketing.txt

And with a little luck, you’ll find a great domain in the list to use for your new business.

Extracting a List of All Registered .com Domains from the Verisign Zone File

Back in the day when I worked on Lean Domain Search I got a lot of experience working with Verisign’s .com zone file because that’s what Lean Domain Search uses behind the scenes to check whether a given domain is available to register or not.

I still get a lot of emails asking for details about how it worked so over a series of posts, I’m going to walk through how to work with the zone file and eventually explain exactly how Lean Domain Search works.

What’s a zone file?

A zone file lists all registered domains for a given Top Level Domain (like .com, .net, etc) and the name servers associated with the domain. For example, because this blog is hosted on WordPress.com, the zone file lists the WordPress.com name servers for it:

MATTMAZUR NS NS1.WORDPRESS
MATTMAZUR NS NS2.WORDPRESS
MATTMAZUR NS NS3.WORDPRESS

How do I get access to the zone file?

Anyone can fill out a form, apply, and get access. There are details on this page. I detailed in this old post on Lean Domain Search how I filled out the form, though it has changed since then so you’ll need to make some adjustments.

What happens after I apply for access?

Verisign will provide you details to log into the FTP to download the zone file:

Screen Shot 2018-05-18 at 1.07.14 PM.png

The zone file is that 2.91 GB com.zone.gz which unzipped is 11.47 GB currently.

What’s in the zone file?

It begins with some administrative details, then begins listing domains and their associated name server. Note that registered domains without a name server (such as ones that are close to expiring) are not included in this list.

How can I extract a list of just the domains?

Glad you asked! It takes a little bit of command line fu.

If you’d like to follow along, here are the first 1,000 lines of the zone file. You can download this and use the terminal commands below just like you would if you were working with the entire 317,338,073 line zone file.

1) First, we’ll grab a list of just the domains:

$ awk '{print $1}' com.zone > domains-only.txt

For a line like this:

KITCHENEROKTOBERFEST NS NS1.UNIREGISTRYMARKET.LINK.

This command will return just KITCHENEROKTOBERFEST.

This will also return some non-domains from the administrative section at the top of the zone file, but we’ll filter those out later.

Here’s what domains-only.txt should look like.

2) Next, we’ll sort the results and remove duplicates:

$ sort -u domains-only.txt --output domains-unique.txt

This is necessary because most domains will have multiple name servers, but we don’t want the domain to appear multiple times in our final list of domains.

Here’s what domains-unique.txt should look like.

3) Last but not least, we’ll ensure the results include only domains:

$ LC_ALL=C grep '^[A-Z0-9\-]*$' domains-unique.txt > domains.txt

There are a few things to note here.

First, make sure to use gnu grep, which is not the default on Macs. GNU grep is fast.

The LC_ALL=C forces grep to use the locale C, which tells grep this is an ASCII file, not a UTF-8 file. More details here. While not important for this 1,000-line file, it significantly reduces how much time grep takes on the full 300M+ line zone file.

The ^[A-Z0-9\-]*$ regular expression here looks for lines that are made up of letters, numbers, and dashes. The reason we use a * (0 or more characters) vs + (1 or more characters) is simply because the grep command doesn’t support +.

Technically this regex will match strings that are longer than domains can actually be (the max is 63 characters) as well as strings that start or end with a dash (which isn’t valid for a domain) but there aren’t any of those in the zone file so it’s not a big deal and grep will run faster this way. If you really wanted to get fancy, you could match proper domains, but it will take longer to run: ^[A-Z0-9]([A-Z0-9\-]{0,61}[A-Z0-9])?$

Here’s what domains.txt should look like.

Note that this does include some domain-like strings from the administrative section like 1526140941 which isn’t actually a domain. Depending on what you’re using the zone file for you could remove these lines, but it’s never been a big deal for my use case. Because Lean Domain Search is limited to letters-only domains, it actually just uses  ^[A-Z]* for the regex.

Here’s some actual code from Lean Domain Search with these steps above:

Screen Shot 2018-05-18 at 1.43.18 PM.png

If you run into any trouble or have suggestions on how to improve any of these commands, don’t hesitate to reach out. Cheers!

Building a Startup in 45 Minutes per day While Deployed to Iraq

deployment.png

You may one day find yourself in a position where you’re eager to work on a startup but limited by the amount of time you can put into it due to a day job, family or other obligations. In this post I would like to share with you all the story behind Lean Domain Search, a domain name generator that I built in about 45 minutes per day during a 5-month deployment to the Middle East. If you’re struggling to find time to put into your startup, I hope this convinces you that you can accomplish a lot over time by putting a small amount of work into it each day.

Background

In the summer of 2011 I was a 26-year-old freshly pinned-on captain in the Air Force serving as a project manager at Hanscom Air Force Base in Massachusetts. I was 4 years into my 5-year service Academy commitment which meant that I had to serve one more year to pay back the Air Force for my education and training.

At the time I also had two moderately successful side projects that I had built on nights and weekends in the years prior: Preceden, a web based timeline maker, and Lean Designs, a drag and drop web design tool.

Everything was going smoothly until my Unit Deployment Manager called me into his office one day and informed me that I had been selected to go on a six month deployment in August.

This presented quite a predicament. As a solo founder, I didn’t have anyone I could turn my two projects over to maintain while I was away. I also had no idea what the internet situation would be like wherever I was headed, but more importantly I didn’t want to be distracted by these projects while I was out there.

I was contacted by the officer whose position I was going to take over when I arrived. He filled me in on some of the details and I eventually learned that there was limited internet access where I was going to live, but it was slow, had a firewall, and I’d probably be moving bases several weeks after I arrived anyway. I asked him if he could check to see if he had access to sites like Heroku (where my sites were hosted) and Github and he confirmed he did, but that still didn’t guarantee I’d have access to make changes to my sites, time to work them, or even internet access for the entire deployment.

I decided to keep the sites running, but to stop working on them several weeks prior to the deployment. That would provide time for any bugs to surface which would allow me to head out on the deployment knowing that the sites were in good shape. I also decided not to work on them at all during the deployment so that they wouldn’t distract me from my job.

A small deployment side project

During the pre-deployment training one of our instructors suggested we pick up a hobby or something else to work on during downtime. For example, some officers use downtime during their deployments to take online classes towards a master’s degree. I wasn’t interested in that, but decided that I would try to work on a small software project when I had time.

Back in 2009 I had another domain search tool called Domain Pigeon. I was just getting started with web development so I couldn’t figure out at the time how to do what I really wanted to do which was to allow users to enter a keyword and pair the keyword with lots of other terms to generate and quickly check the availability of quality domains. Instead, I built Domain Pigeon, a service that simply listed interesting available .com domains:

domain-pigeon.png

Domain Piegon, Lean Domain Search’s predecessor, in November 2010

I eventually shut Domain Pigeon down to focus on other projects, but the original idea stuck in the back of my head. By the time my deployment came around, I had a pretty good idea of how to implement it so I decided that would be what I would work on.

My daily schedule

I wound up getting assigned to lead a team that oversaw communications (network, radio, satellite, etc) for the aviation unit that supported special operations forces in Iraq.

We worked 12-hour days every day for the entire deployment including weekends. I need roughly 8-9 hours of sleep to function at full capacity which left me with about 3-4 hours at the end of each day (typically around 6am) to have a meal, exercise, shower, chat with my wife, hang out with my coworkers, unwind and maybe work on my side project. In practice, that usually was about 45 minutes per day. Sometimes more, but often not at all.

Fortunately, there were never any major issues with my other projects during the deployment. A few small bugs surfaced, but nothing that impacted many users. I still had access to my email so I could respond to support requests when I had time. And because I was working on the new domain name generator locally on my laptop, I could work on it without worrying that there would be issues in production.

Piggy-backing on the popular lean startup movement as well as the name for my existing Lean Designs tool, I decided to call the new domain name generator Lean Domain Search.

Due to the drawdown of US forces in Iraq at the end of 2011, I wound up coming home after 5 months instead of six – in January 2012 instead of February 2012 like originally planned. I had two weeks of R&R after I got back, the first of which I spent with my wife on vacation in Maine, the second of which I launched the first version of Lean Domain Search.

leandomainsearch-launch.png

Lean Domain Search when it launched in January 2012

I continued working as a project manager at Hanscom Air Force Base until my commitment ended in September 2012. My wife and I then moved back to Florida to be closer to family and I decided to work on Lean Domain Search full time.

Acquired

Remember I mentioned Domain Pigeon, my original domain name generator? When I launched it in early 2009, Matt Mullenweg, co-founder of WordPress and now CEO of Automattic, saw its launch on HackerNews and shot me an email saying he thought Domain Pigeon was neat and that if it didn’t become a full time job there were a lot of opportunities to work on domains at Automattic.

We chatted briefly on Skype, but I was a second lieutenant at the time and still had over three years left on my Air Force commitment so it didn’t go anywhere.

In early 2013 after I had been working on Lean Domain Search full time for several months, I remembered Matt’s old email about Domain Pigeon. I checked out Automattic and WordPress.com and decided to reach back out to Matt to see if there was still an opportunity. I found his original email and responded to it again, this time 4 years after he sent it. I reminded him who I was, explained that I was working on a new domain name generator, and that I saw an opportunity for it to be put to use on WordPress.com to help users find better domain names. He encouraged me to apply for a developer position which I did and in the end Automattic wound up hiring me and acquiring Lean Domain Search.

leandomainsearch-today.png

Lean Domain Search today

That period from August 2011 when I deployed to June 2013 when I started at Automattic was probably the most intense period of my life. I am extremely grateful that things worked out the way they did. In the end I wound up with a small acquisition, an amazing job at Automattic, a deployment that I’m really proud of, and experiences that will stay with me for the rest of my life.

If you’re considering working on a startup but can’t make the leap to do it full time for whatever reason, remember that even a few hours per week can have a huge impact in the long run.

Stick with it. Amazing things can happen.

Discussion: HackerNews/r/startups, /r/entrepreneur

Lean Domain Search at 3½

Lean Domain Search, despite almost no work since its acquisition by Automattic two years ago, has continued to thrive, now handling more than 160,000 searches per month:

lean-domain-search-3.5-black

It’s monthly growth rate works out to be about 6.5%. Not huge, but not bad for maintenance mode. :)

I think its growth is still driven almost entirely by word of mouth so if you’ve ever shared it with anyone (I’m looking at you, Jay Neely), thanks!

One Year After its Acquisition, Lean Domain Search’s Monthly Search Volume is Up 200%

One year ago today, Automattic, the company behind WordPress.com, acquired my small startup, Lean Domain Search.

I’m happy to report that Lean Domain Search’s monthly search volume has exploded over the last year going from 31,000 searches in May 2013 to more than 95,000 searches in May 2014.

To put that in perspective, here is a chart showing the number of searches per month for its entire history:

leandomainsearch-traffic

Despite the fact that we’ve haven’t done much work on it in the last year (we’ve been focused heavily on improving domain search and registration on WordPress.com), Lean Domain Search’s traffic is 3x what it was a year ago. Not bad right?

I think its growth can be attributed to four main factors:

First, before we announced the acquisition I made Lean Domain Search completely free to use. Prior to that you could perform a search but you would only be shown a limited number of results unless you paid for a premium plan: 150 search results for free, $79 for two months of full access (5,000 search results) or $199 for full access year-round. With no restrictions in place, Lean Domain Search became a lot more useful for non paying users which made folks more likely to perform multiple searches.

Second, with the help of Ashish and Barry on Automattic’s Systems Team, we moved Lean Domain Search’s search server over to Automattic’s infrastructure, giving it a nice performance boost. Today it generates all 5,000 search results in about 1.25 seconds on average.

Third, becoming an Automattic product definitely didn’t hurt things. While we don’t go out of our way to advertise it, the new ownership does add a certain amount of legitimacy to it that I think has helped it spread.

That brings me to the last and most important factor in its growth over the last year: people sharing it with each other. When folks use Lean Domain Search to name their website, there is a good chance they will wind up sharing it with others. Some of those new folks will head over to Lean Domain Search to check it out and wind up using it to name their website and then sharing it with their friends and so on. I think this virality factor is a big part of its fast yet steady growth over the last year so if you’ve ever shared Lean Domain Search with a friend, thanks :)

Will Lean Domain Search’s growth continue at the same pace over the next year? Maybe its growth will accelerate even more? We will see :)

#onward

PS: Interested in making the web a better place? We’re hiring.