Generating High Quality Available .com Domain Names for a Specific Industry

In my last post I detailed how to extract all of the available .com domain names from the .com zone file. In this post I’m going to show you how to do something very useful with the result: finding a great available domain name for a business in a specific industry.

For example, we’re going to find great business names that can fill in the blanks for the industry of your choosing:

  • ____________Marketing.com
  • ____________Consulting.com
  • ____________SEO.com
  • ____________Data.com
  • ____________Media.com
  • ____________Systems.com
  • ____________Law.com

The big idea: Check for keywords that are registered for other industries, but not registered for yours

Consider this: what if we looked at all of the registered domains that end with advertising.com, figure out the keyword, and then check whether the corresponding marketing.com domain is available? For example, imagine we check and see that the domain HightowerAdvertising.com is registered (we’ll refer to Hightower as the keyword here). We can then check to see if HightowerMarketing.com is registered. Because someone already registered the keyword for the advertising industry, there’s a good chance that the keyword is meaningful and worth checking for the marketing industry as well.

We can take this a step further by checking for common keywords in multiple industries. For example, we check all the domains that end in advertising.com, all that end in media.com, see which keywords they have in common, then check which of those are not registered for marketing.com domains.

The fewer industies we check for common keywords, the more results we’ll have, but the lower the quality. The more industries we check, the fewer the results, but the higher the quality.

Getting your command line on

If you went through my last post, you should have wound up with a domains.txt file that has about 108M registered .com domain names:

$ wc -l domains.txt 
 108894538 domains.txt

With a little bit of command line magic, we can extract all of the domains that end in ADVERTISING (like HIGHTOWERADVERTISING), then remove the trailing ADVERTISING word to get just HIGHTOWER, then sort those results and save it to a list:

$ LC_ALL=C grep ADVERTISING$ domains.txt | sed 's/.\{11\}$//' | sort -u > tmp/advertising.txt

Which will generate a list such as:

Then we do the same for MARKETING domains:

$ LC_ALL=C grep MARKETING$ domains.txt | sed 's/.\{9\}$//' | sort -u > tmp/marketing.txt

And finally, we figure out which domains are in the advertising list but not in the marketing list:

$ comm -23 tmp/advertising.txt tmp/marketing.txt > results/marketing.txt

If we want to find common keywords registered in multiple industries, we need to add an extra step to generate that list of common keywords before figuring out which ones are available in ours:

$ comm -12 tmp/advertising.txt tmp/media.txt | comm -12 - tmp/design.txt | sort -u > tmp/common.txt
$ comm -23 tmp/common.txt tmp/marketing.txt > results/marketing.txt

The resulting marketing.txt list will have the common keywords in the other industries that are likely not registered in yours:

The way to interpret this is that for a keyword like Adspace, those domains are registered in the other industries (AdspaceAdvertising.com, AdspaceMedia.com), but not registered for ours (AdspaceMarketing.com). Again, the more similiar industries you check for common keywords, the higher the quality of results you’ll have. We could add three or four more industries to get a short, very high quality list.

By the way, the reason I say likely not registered is because once a domain loses its name servers – for example, if it’s way past its expiration date – it will drop out of the zone file even though the name isn’t available to register yet. Therefore some of the results might actually be registered, but a quick WHOIS check will confirm if it is or not:

$ whois blueheronmarketing.com

No match for domain "BLUEHERONMARKETING.COM".

Or you could just use this Ruby script

Because it’s a pain to run all of these commands while searching for available domains in an industry, I put together this small Ruby script to help:

https://github.com/mattm/industry-domain-name-generator

There are instructions in the README explaining how to set the industry and similar industries in the script. If all goes well, it will run all of the necessary commands to generate the list of results:

$ ruby generator.rb 
Finding available domains for marketing...
Generating industry name lists...
Searching for domains that end with 'advertising'...
  LC_ALL=C grep ADVERTISING$ domains.txt | sed 's/.\{11\}$//' | sort -u > tmp/advertising.txt
Searching for domains that end with 'media'...
  LC_ALL=C grep MEDIA$ domains.txt | sed 's/.\{5\}$//' | sort -u > tmp/media.txt
Searching for domains that end with 'design'...
  LC_ALL=C grep DESIGN$ domains.txt | sed 's/.\{6\}$//' | sort -u > tmp/design.txt
Searching for domains that end with 'marketing'...
  LC_ALL=C grep MARKETING$ domains.txt | sed 's/.\{9\}$//' | sort -u > tmp/marketing.txt
Finding common names in industries...
  comm -12 tmp/advertising.txt tmp/media.txt | comm -12 - tmp/design.txt | sort -u > tmp/common.txt
Finding names not registered for marketing...
  comm -23 tmp/common.txt tmp/marketing.txt > results/marketing.txt
Done, results available in results/marketing.txt

And with a little luck, you’ll find a great domain in the list to use for your new business.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s