This is part 5 in a series examining the geography of new domain endings (nTLDs). We began by identifying the most active registrant nations, asking why some countries are over/under-represented. Next, we focused on GEO suffixes like .LONDON and .IRISH, measuring just how local they are (in terms of ownership) and how popular compared to non-GEOs. Now it’s time to cross borders, leave behind place names, and consider a cultural aspect of geography that follows us around the planet: Language.
We can’t understand nTLD distribution without reference to language. .IMMOBILIEN is at home in a cluster of German-speaking nations. Spanish long ago outgrew the confines of Spain; so an extension like .VIAJES travels too, following Columbus. Chinese is spoken in China, of course – but also in Hong Kong and Taiwan. Therefore we’d expect similar patterns of nTLD adoption among those Chinese-speaking countries. Meanwhile, a multilingual Switzerland will presumably be a blend of French, German, and Italian nTLDs. As for the lingua franca, English keyword suffixes could show up anywhere; but we’d expect to see them most in the UK and former British colonies – the USA, Canada, South Africa, Australia, New Zealand, and India.
Really, the nTLD program is all about language. Its aim: To carve up the world into fragmentary name spaces based not only on topic but on vocabulary. Since most nTLDs are meaningful keywords or abbreviations, they’re constrained by or biased toward some language’s geography. .EARTH and .GLOBAL, despite the grand planetary canvas they describe, disclose an English perspective – just like .PLUMBING or .DOCTOR. And that fact does imply that their center of gravity will reside in English-speaking countries.
Anybody who has scanned a list of nTLDs will have noticed the prevalence of English terms –reflecting the lopsided priorities of registry applicants, their own national origins, or perhaps the scale of pent-up market demand in the USA where .COM crowding is felt most keenly. Yet, among nTLDs, more than 40 languages are represented.
So how many nTLDs are English or French or German or whatever? At first glance, that seems a straightforward question. But enumerating suffixes based on language is, in fact, a very tricky problem. Try assigning 1 language to each nTLD, and the process quickly falls apart. In the previous example, .GLOBAL and .DOCTOR don’t exclusively belong to any 1 language. .PLUMBING is uniquely English, and .MOI is surely solely French. But read the list .INTERNATIONAL, .SCIENCE, .CONSTRUCTION, .BOUTIQUE, .EXPERT, .RESTAURANT, .TENNIS, .BIBLE, .DIRECT, .PROTECTION, .POKER, and .TENNIS, and your first impression will be English or French depending on context and your mother tongue.
Languages share words, thanks to common origin or regular commerce between them. But inconsistent spellings can severely limit nTLD market size. A French or Italian audience might accept .POKER, but Spanish or Portuguese speakers would expect .POQUER. There goes South America! .FOOTBALL fails to account for the Spanish .FUTBOL, which itself won’t cover “futebol” in Brazil.
Some nTLDs can be labeled with 6 or more languages. .HOTEL is English, yes … and French … and Spanish … and Italian … and Portuguese … and German … and so forth. The word is shared internationally, but it isn’t truly global because plenty of languages do rely on their own native term for hotel. The Arabic word is utterly unrelated, for instance.
Certainly some suffixes can be used globally, irrespective of the audience’s language. Legacy gTLDs such as .COM and .NET and .INFO are treated in that way. .XYZ and .OOO aspire to the same kind of “language-less-ness”. Whatever their actual adoption, in theory such non-keyword TLDs are quite versatile. Arguably, some more meaningful suffixes – e.g. .CLUB, .VIP, and .TOP – achieve a similar global stature because they’re widely recognized loan words. Good for them! Bad for us, though, if we’re trying to categorize nTLDs based on language!
Some of you will remember the first nTLD ever to be released, .شبكة. It’s Arabic. Yet, in another sense, so too are .ARAB, .HALAL, and .ISLAM – all proposed nTLDs. Those transliterations reach beyond Arabic-speaking countries, anywhere Muslims and/or Arabs live – i.e. everywhere. We could as easily call these terms English as Arabic. Not to mention other languages. And does it make sense, really, to classify .شبكة and .HALAL both as “Arabic” in the same sense? Or, for that matter, to treat both .WANG and .在线 as Chinese?
As another example, take .RED. Did the registry intend an English color (as with .PINK) or the Spanish word for “network” (rather like .NET)? Does their assumption or goal even matter anyway? After all, we consumers may repurpose their product, as we have done with ccTLDs such as .ME, .IO, and .TV. Sí, .SOY may have been meant as Spanish “I am”; but one man’s existential echo of Descartes is another man’s bean paste. Ultimately, meaning is in the eye of the beholder. If you stare hard enough, almost any TLD begins to look Chinese.
Abbreviations pose similar challenges. .NGO is English; but .ONG, which has the very same meaning, is Spanish / Italian / French / Portuguese / Romanian. .GMBH is a German abbreviation. .LTDA (like English’s .LTD for “limited”) is Spanish and Portuguese … but not Italian. And .IMMO manages to function in German / Italian / French … but not in Spanish, whose spelling deviates by introducing an “N”: “inmobiliario” as opposed to “immobilien” / “immobili” / “immobilier”.
Getting a headache yet? Good luck labeling .DESI, a term used throughout the multifarious societies of India, Pakistan, and Bangladesh. India alone recognizes 22 official languages! Place names are no less problematic. .PARIS spells Paris in French, English, and other languages besides. But .MOSCOW isn’t Russian; it’s English. The Cyrillic IDN .МОСКВА has a different pronunciation (“Moskva”). And the city would be called Mosca, Moscou, or Moscú in Italian / French / Spanish. There is no consistency at all regarding which words languages do or don’t share. For instance, .BERLIN requires no translation moving from German to English, whereas .WIEN must be rewritten as Vienna.
It’s crucial to acknowledge and discuss all these challenges up front. Because I’m about to count nTLDs based on language, and readers must take the following data with a grain of salt. Here goes:
Setting aside Dot Brands such as .LOREAL, .MARRIOTT, and .AARP, there are 761 nTLDs to be looked at. Of those, I’ve marked 2 as “language-less”: .XYZ and .OOO. To all the others, I’ve assigned 1 or more languages. Admittedly, this process reflects my own ignorance and laziness. Apart from English, I only know Spanish and Arabic. Beyond those 3 languages, I made some attempt to identify keywords that might fit French, Italian, Portuguese, and German. Since those were the ONLY languages I inspected when deciding whether a given nTLD is meaningful in more than 1 language, it’s quite likely that most other languages – e.g. Chinese, Dutch, Swedish, Turkish, etc. – have been under-counted.
Consequently, I’m confident that at least 145 nTLDs are meaningful in multiple languages. The remaining 614 suffixes seem – at this juncture – to be unique to a single language. But, of course, if I were to continue looking for instances of these terms in Greek or Bengali or Norwegian dictionaries, some would turn out to span more than 1 language after all. So the 614 number would shrink, and the 145 count would grow. Here’s the best way to interpret these numbers: (1) more than 145 multilingual nTLDs; (2) fewer than 614 monolingual nTLDs.
You’ll notice a column marked “TLDs (Unique)”. This gives a count of nTLDs that appear to be uniquely 1 language. So, by way of example, there are 530 English nTLDs. No more than 407 of these are strictly English. Another 123 English nTLDs (or more) are meaningful in at least 1 other language. Meanwhile, at most 12 nTLDs are uniquely French, though as many as 79 (if not more) can be interpreted as French.
The rightmost 2 columns limit attention to nTLDs in which at least 100 domains have been registered, in effect excluding suffixes that have yet to be publicly released. As you can see, only 3 out of 36 Arabic nTLDs have made it this far, whereas for German the fraction is 32 / 39. Even allowing for multilingual nTLDs and stalled rollouts, the preponderance of English keywords is clear as day. Depending on which column we look at, English outnumbers the 2nd most common nTLD language by a factor of 5.9 – 13.3.
Imperfect though my labeling is, this provides a rough idea of the role played by language within the nTLD program. In my next article, I’ll be delving into statistics based on this framework. So if you notice any languages that I’ve under-counted – i.e. nTLDs I’ve missed – please say so. This is a work in progress.