In my last post I detailed how to extract all of the available .com domain names from the .com zone file. In this post I’m going to show you how to do something very useful with the result: finding a great available domain name for a business in a specific industry.
For example, we’re going to find great business names that can fill in the blanks for the industry of your choosing:
- ____________Marketing.com
- ____________Consulting.com
- ____________SEO.com
- ____________Data.com
- ____________Media.com
- ____________Systems.com
- ____________Law.com
The big idea: Check for keywords that are registered for other industries, but not registered for yours
Consider this: what if we looked at all of the registered domains that end with advertising.com, figure out the keyword, and then check whether the corresponding marketing.com domain is available? For example, imagine we check and see that the domain HightowerAdvertising.com is registered (we’ll refer to Hightower as the keyword here). We can then check to see if HightowerMarketing.com is registered. Because someone already registered the keyword for the advertising industry, there’s a good chance that the keyword is meaningful and worth checking for the marketing industry as well.
We can take this a step further by checking for common keywords in multiple industries. For example, we check all the domains that end in advertising.com, all that end in media.com, see which keywords they have in common, then check which of those are not registered for marketing.com domains.
The fewer industies we check for common keywords, the more results we’ll have, but the lower the quality. The more industries we check, the fewer the results, but the higher the quality.
Getting your command line on
If you went through my last post, you should have wound up with a domains.txt file that has about 108M registered .com domain names:
$ wc -l domains.txt 108894538 domains.txt
With a little bit of command line magic, we can extract all of the domains that end in ADVERTISING (like HIGHTOWERADVERTISING), then remove the trailing ADVERTISING word to get just HIGHTOWER, then sort those results and save it to a list:
$ LC_ALL=C grep ADVERTISING$ domains.txt | sed 's/.\{11\}$//' | sort -u > tmp/advertising.txt
Which will generate a list such as:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ head tmp/advertising.txt | |
A | |
AA | |
AAA | |
AAB | |
AAC | |
AADIT | |
AADS | |
AAGNEYA | |
AAHAA |
Then we do the same for MARKETING domains:
$ LC_ALL=C grep MARKETING$ domains.txt | sed 's/.\{9\}$//' | sort -u > tmp/marketing.txt
And finally, we figure out which domains are in the advertising list but not in the marketing list:
$ comm -23 tmp/advertising.txt tmp/marketing.txt > results/marketing.txt
If we want to find common keywords registered in multiple industries, we need to add an extra step to generate that list of common keywords before figuring out which ones are available in ours:
$ comm -12 tmp/advertising.txt tmp/media.txt | comm -12 - tmp/design.txt | sort -u > tmp/common.txt $ comm -23 tmp/common.txt tmp/marketing.txt > results/marketing.txt
The resulting marketing.txt list will have the common keywords in the other industries that are likely not registered in yours:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AANDG | |
AAS | |
ABILENE | |
ABRASIVE | |
ACCOMPLICE | |
ACENTO | |
ACTIONSPORTS | |
ADAGE | |
ADAIR | |
ADAY | |
ADCOM | |
ADDO | |
ADITHYA | |
ADJACENT | |
ADJECTIVE | |
ADLIB | |
ADOBE | |
ADONAI | |
ADONE | |
ADSPACE | |
… |
The way to interpret this is that for a keyword like Adspace, those domains are registered in the other industries (AdspaceAdvertising.com, AdspaceMedia.com), but not registered for ours (AdspaceMarketing.com). Again, the more similiar industries you check for common keywords, the higher the quality of results you’ll have. We could add three or four more industries to get a short, very high quality list.
By the way, the reason I say likely not registered is because once a domain loses its name servers – for example, if it’s way past its expiration date – it will drop out of the zone file even though the name isn’t available to register yet. Therefore some of the results might actually be registered, but a quick WHOIS check will confirm if it is or not:
$ whois blueheronmarketing.com No match for domain "BLUEHERONMARKETING.COM".
Or you could just use this Ruby script
Because it’s a pain to run all of these commands while searching for available domains in an industry, I put together this small Ruby script to help:
https://github.com/mattm/industry-domain-name-generator
There are instructions in the README explaining how to set the industry and similar industries in the script. If all goes well, it will run all of the necessary commands to generate the list of results:
$ ruby generator.rb Finding available domains for marketing... Generating industry name lists... Searching for domains that end with 'advertising'... LC_ALL=C grep ADVERTISING$ domains.txt | sed 's/.\{11\}$//' | sort -u > tmp/advertising.txt Searching for domains that end with 'media'... LC_ALL=C grep MEDIA$ domains.txt | sed 's/.\{5\}$//' | sort -u > tmp/media.txt Searching for domains that end with 'design'... LC_ALL=C grep DESIGN$ domains.txt | sed 's/.\{6\}$//' | sort -u > tmp/design.txt Searching for domains that end with 'marketing'... LC_ALL=C grep MARKETING$ domains.txt | sed 's/.\{9\}$//' | sort -u > tmp/marketing.txt Finding common names in industries... comm -12 tmp/advertising.txt tmp/media.txt | comm -12 - tmp/design.txt | sort -u > tmp/common.txt Finding names not registered for marketing... comm -23 tmp/common.txt tmp/marketing.txt > results/marketing.txt Done, results available in results/marketing.txt
And with a little luck, you’ll find a great domain in the list to use for your new business.