Welcome to article #6 in a series examining the geography of new domain endings (nTLDs). Let’s pick up where we left off: TLD language.
Last time, we attempted to count suffixes based on language. As it turns out, categorizing any keyword or abbreviation can be problematic. Some nTLDs (fewer than 614) belong clearly to just 1 language. For instance, .VIAJES is Spanish and (as far as I know) only Spanish. .PLUMBING works only in English. Many extensions, however, (at least 145) are shared by several languages. For example, .SCIENCE and .POKER can be read either as English or as French. Further complicating matters, .XYZ and .OOO don’t quite belong to any language at all.
My language labeling is imperfect, admittedly biased toward the languages I happen to know or whose dictionaries I inspected: English, Spanish, Arabic, French, German, Portuguese, Italian. Other languages – ranging from Chinese and Russian to Norwegian and Basque – may be somewhat under-counted, since I didn’t have time to determine whether each of nearly 1000 nTLDs might mean something in any of the planet’s estimated 6,500 spoken languages. Freely admitting these shortcomings and flaws, I’m going to make what inferences I can.
Language | TLDs | TLDs (Unique) |
TLDs > 100 |
TLDs (Unique) > 100 |
---|---|---|---|---|
English | 69.6% | 53.5% | 80.9% | 60.8% |
Chinese | 9.2% | 9.1% | 5.8% | 4.6% |
German | 5.1% | 2.6% | 6.6% | 3.7% |
French | 10.4% | 1.6% | 13.3% | 2.1% |
Spanish | 8.7% | 1.2% | 10.4% | 1.5% |
Russian | 2.0% | 2.0% | 1.2% | 1.2% |
Japanese | 2.1% | 1.3% | 2.9% | 0.8% |
Arabic | 4.7% | 3.7% | 0.6% | 0.6% |
Portuguese | 6.0% | 0.4% | 6.8% | 0.4% |
Korean | 0.7% | 0.5% | 0.6% | 0.4% |
Dutch | 0.5% | 0.4% | 0.6% | 0.4% |
Nepali | 0.5% | 0.5% | 0.2% | 0.2% |
Hindi | 0.3% | 0.1% | 0.4% | 0.2% |
Kurdish | 0.1% | 0.1% | 0.2% | 0.2% |
Tatar | 0.1% | 0.1% | 0.2% | 0.2% |
Basque | 0.1% | 0.1% | 0.2% | 0.2% |
Breton | 0.1% | 0.1% | 0.2% | 0.2% |
Welsh | 0.1% | 0.1% | 0.2% | 0.2% |
Frisian | 0.1% | 0.1% | 0.2% | 0.2% |
Italian | 5.3% | 0.1% | 6.0% | 0.0% |
Farsi | 0.4% | 0.4% | 0.0% | 0.0% |
Tamil | 0.4% | 0.4% | 0.0% | 0.0% |
Turkish | 0.3% | 0.0% | 0.4% | 0.0% |
Thai | 0.3% | 0.3% | 0.0% | 0.0% |
Bengali | 0.3% | 0.3% | 0.0% | 0.0% |
Urdu | 0.3% | 0.3% | 0.0% | 0.0% |
Afrikaans | 0.1% | 0.0% | 0.2% | 0.0% |
Romanian | 0.1% | 0.0% | 0.2% | 0.0% |
Hebrew | 0.1% | 0.1% | 0.0% | 0.0% |
Armenian | 0.1% | 0.1% | 0.0% | 0.0% |
Bulgarian | 0.1% | 0.1% | 0.0% | 0.0% |
Georgian | 0.1% | 0.1% | 0.0% | 0.0% |
Greek | 0.1% | 0.1% | 0.0% | 0.0% |
Gujarati | 0.1% | 0.1% | 0.0% | 0.0% |
Kazakh | 0.1% | 0.1% | 0.0% | 0.0% |
Punjabi | 0.1% | 0.1% | 0.0% | 0.0% |
Sinhala | 0.1% | 0.1% | 0.0% | 0.0% |
Telugu | 0.1% | 0.1% | 0.0% | 0.0% |
Finnish | 0.1% | 0.0% | 0.0% | 0.0% |
Hungarian | 0.1% | 0.0% | 0.0% | 0.0% |
Swedish | 0.1% | 0.0% | 0.0% | 0.0% |
This table is identical to one I published previously, except that TLD counts are here given in percentage terms. Leaving aside Dot Brands, there were 791 nTLDs requiring language categorization. Some of the new domain endings have yet to be released. Consequently, only 482 of those extensions show more than 100 domains registered. The rightmost 2 columns focus on those. If you didn’t read the last article, then you’re probably wondering what “TLDs (Unique)” means. This refers to nTLDs that belong ONLY to 1 language. For instance, 10.4% of proposed nTLDs can be read as French, though only 1.6% are exclusively French.
Going by the number of nTLDs proposed, and focusing on cases where the suffix is only meaningful in 1 language, we can get a fairly clear idea regarding which languages were most sought after in the nTLD program. In other words, look at the column for “TLDs (Unique)”. After English, the most targeted languages were Chinese (9.1%), Arabic (3.7%), German (2.6%), Russian (2.0%), French (1.6%), Japanese (1.3%), and Spanish (1.2%).
Once we include ambiguous cases, overlapping languages cause that ranking to change a bit. Some registries definitely did intend to appeal to a multilingual audience – .IMMO and .CLUB, for instance. Other linguistic overlaps were accidental and may or may not bear fruit in terms of consumer adoption – perhaps .RED (Spanish for “network”) and certainly .GIFT (German for “poison”). The biggest discrepancies can be the most misleading because we are sometimes counting linguistic overlaps, which, though technically valid, will have negligible bearing on the domain market. In particular, be careful when interpreting these: Portuguese goes from 0.4% to 6.0%; Spanish from 1.2% to 8.7%; French from 1.6% to 10.4%; German from 2.6% to 5.1%.
Looking at nTLDs that have actually been released, it seems a few languages have been left behind during the rollout process. Though 4.7% of proposed nTLDs are Arabic – and as many as 3.7% of them exclusively Arabic – Arabic comprises only 0.6% of nTLDs having a footprint of 100+ registrations. Chinese is in the same boat. Comparing active nTLDs to proposed nTLDs, the share allotted to Chinese is approximately cut in half. On the flip side, German, English, and French nTLDs have been brought to market much more reliably. So their share has gone up.
No matter how you slice it, the preponderance of English is undeniable. This one language outnumbers any other by a factor of 5.9 – 13.3. This invites questioning. Why? Arguably the nTLD program itself was spearheaded by American applicants primarily. And marketed to American and European companies more than to the developing world. Nationality will play a role, naturally enough; for entrepreneurs hoping to operate registries will think first of their home country. Beyond that, the U.S. domain market seemed (prior to the Chinese surge of 2015) to be the largest and most lucrative area for registries to explore – especially given the crowded .COM name space, which Americans (forgetful of their own .US ccTLD) disproportionately rely on. If more domain options were needed anywhere at all, then the USA would feel the pinch first.
Alright, English keywords dominate the nTLD program – disproportionately so, it seems. But what exactly would a “fair share” for each language be? How do we quantify that? And, having estimated how many suffixes OUGHT to be awarded to Spanish or Hindi or Arabic or Chinese, just how far off is the nTLD program, how biased toward the various languages?
Going back to my earliest articles in this series, we compared the number of nTLD domains registered in each country with the number of online citizens. This gave us a ratio by which to assess whether a given country was over/under-represented. We might try the same thing with language. Unfortunately, statistics regarding the number of language speakers online are to hand only for the top 10 languages:
Language | Internet Users |
TLD Ratio | TLD Ratio (Unique) |
TLD Ratio > 100 |
TLD Ratio (Unique) > 100 |
---|---|---|---|---|---|
English | 26.3% | 2.6 | 2 | 3.1 | 2.3 |
Chinese | 20.8% | 0.4 | 0.4 | 0.3 | 0.2 |
Spanish | 7.7% | 1.1 | 0.2 | 1.4 | 0.2 |
Arabic | 4.7% | 1 | 0.8 | 0.1 | 0.1 |
Portuguese | 4.3% | 1.4 | 0.1 | 1.6 | 0.1 |
Japanese | 3.2% | 0.7 | 0.4 | 0.9 | 0.3 |
Russian | 2.9% | 0.7 | 0.7 | 0.4 | 0.4 |
French | 2.8% | 3.7 | 0.6 | 4.8 | 0.8 |
German | 2.3% | 2.2 | 1.1 | 2.9 | 1.6 |
Here we’ve simply taken the TLD percentages from the first table and divided by the percentage of internet users who speak the language in question. A ratio of 1.0 indicates they match perfectly. Greater than 1.0, and the language is getting more than its fair share of nTLDs. Less than 1.0, and the language is underrepresented.
By this measure, English receives 2-3 times its due. Meanwhile, the number of nTLDs proposed for Chinese speakers is only 40% what they’d be entitled to based on their population size; and the number of nTLDs actually released for Chinese speakers is even less. For Arabic, the number of nTLDs proposed looks to be spot on; but those so far released are only about 1/10 what Arabic speakers might justly claim. Two major European languages, French and German, are about where they ought to be, proportionally speaking. Not coincidentally, those are also the 2 most active ccTLD markets in continental Europe.
Arguably, a more relevant metric would be the percentage of websites – not internet users – using each language. After all, we’re talking about domain names; and domain market demand probably correlates more with the number of developed sites than with the number of eyeballs looking at those sites. Some populations go online often enough but, so far, build fewer sites for themselves. Why? Reliance on mobile devices. Dependence on foreign platforms. Cultural habits regarding website ownership. As a side benefit, this metric gives us data for additional languages:
Language | % Sites | TLD Ratio | TLD Ratio (Unique) |
TLD Ratio > 100 |
TLD Ratio (Unique) > 100 |
---|---|---|---|---|---|
English | 52.3% | 1.3 | 1.0 | 1.5 | 1.2 |
Russian | 6.4% | 0.3 | 0.3 | 0.2 | 0.2 |
Japanese | 5.7% | 0.4 | 0.2 | 0.5 | 0.1 |
German | 5.4% | 0.9 | 0.5 | 1.2 | 0.7 |
Spanish | 5.0% | 1.7 | 0.2 | 2.1 | 0.3 |
French | 4.0% | 2.6 | 0.4 | 3.3 | 0.5 |
Portuguese | 2.6% | 2.3 | 0.2 | 2.6 | 0.2 |
Italian | 2.3% | 2.3 | 0.0 | 2.6 | 0.0 |
Chinese | 2.0% | 4.6 | 4.6 | 2.9 | 2.3 |
Turkish | 1.6% | 0.2 | 0.0 | 0.3 | 0.0 |
Farsi | 1.5% | 0.3 | 0.3 | 0.0 | 0.0 |
Dutch | 1.4% | 0.4 | 0.3 | 0.4 | 0.3 |
Korean | 0.9% | 0.8 | 0.6 | 0.7 | 0.4 |
Arabic | 0.8% | 5.9 | 4.6 | 0.8 | 0.8 |
Greek | 0.5% | 0.2 | 0.2 | 0.0 | 0.0 |
Swedish | 0.5% | 0.2 | 0.0 | 0.0 | 0.0 |
Romanian | 0.4% | 0.3 | 0.0 | 0.5 | 0.0 |
Hungarian | 0.4% | 0.3 | 0.0 | 0.0 | 0.0 |
Thai | 0.3% | 1.0 | 1.0 | 0.0 | 0.0 |
Finnish | 0.3% | 0.3 | 0.0 | 0.0 | 0.0 |
Hebrew | 0.2% | 0.5 | 0.5 | 0.0 | 0.0 |
Bulgarian | 0.2% | 0.5 | 0.5 | 0.0 | 0.0 |
Hindi | 0.1% | 3.0 | 1.0 | 4.0 | 2.0 |
You may be surprised to learn that English ISN’T quite as overrepresented within the nTLD program as it first appeared. On the contrary, the world needed 5.9 – 13.3 times more English nTLDs than it needed meaningful suffixes in any other language … simply to keep pace with the number of English websites being built. Using this metric, English is getting 100% – 150% of the nTLDs it deserves.
Equally revealing are the stats for Arabic. Instead of being underrepresented, as we had surmised based on the number of Arabic-speaking web users, Arabic is, in fact, seeing 80% of its due, given just 3 nTLDs with 100+ registrations each. One could argue that, in proposing still more Arabic nTLDs, registries had overestimated the need for Arabic-language websites by a factor 4.6 – 5.9.
Most languages seem to be underrepresented, based on the table above. But there’s symmetry if we consider some of the missing languages: Nepali, Kurdish, Tatar, Basque, Breton, Welsh, Frisian, Farsi, Tamil, Bengali, Urdu, Armenian, Georgian, Gujarati, Kazakh, Punjabi, Sinhala, Telugu, and Afrikaans can each lay claim to at least 1 nTLD. Yet all of those languages put together constitute less than 0.1% of the world’s websites currently. By that measure, they’re hugely overrepresented.
Of course, a few of those languages – like Urdu, Gujarati, and Punjabi – are spoken by vast populations who may eventually build numerous websites in their mother tongue. But Welsh, Breton, and Frisian are spoken only by small ethnic minorities. As deeply felt as the symbolism may be, and as successful as these nTLDs may prove to be within the domain market, any extension for such a minority language will always be “overrepresented” in the strict sense of proportionality.
A mere 39 of the world’s 6,500 spoken languages account for more than 99.9% of all websites. More than 1/3 of those 39 languages seem not to be represented by any nTLD (other than a Dot Brand, possibly). If my labeling is accurate and there are no nTLDs associated with Polish, Czech, Vietnamese, Indonesian, Danish, Slovak, Lithuanian, Norwegian, Ukrainian, Croatian, Serbian, Catalan, Slovenian, Latvian, or Estonian, then all of those languages are underrepresented. Does this imply those audiences want their own nTLDs? Not necessarily. We’re not talking about market demand – only assessing the supply. What can be said is this: In terms of participation in the nTLD program, these languages seem to be sitting on the sidelines.
Shockingly, it now seems Chinese may be overrepresented by a factor of 2.3 – 4.6. Going by the number of Chinese internet users, we had concluded that Chinese was underrepresented by this very same factor! Apparently China is building nowhere near as many websites, relative to its online population, as we might expect. Alright then, are Chinese nTLDs underrepresented by a factor of 2.5 – 5.0? Or are they overrepresented by a factor of 2.3 – 4.6? Depending on whether we reference internet users or developed websites, we can argue the case either way. Domain investors and registry operators ought to chew this fact thoroughly before swallowing. Given the scale of the Chinese domain market, this puzzling revelation is non-trivial.
gpmgroup says
Very Interesting analysis – a lot of work!, thanks for posting.
One of the reasons .info was selected in the first round was because .info works in so many languages.
Joseph Peterson says
@gpmgroup,
Good point regarding .INFO.
Adolfo Grego says
That’s another reason why I love .bar !!!