You are here

Cyrillic URL

Historically, there has always been a struggle - using local alphabets on a predominantly-American made hardware and software. Long before the age of the Internet, users had to battle the fact that the “27th” letter was not supposed to be there, that it did not have system support of DOS, Windows, Unix… Even when the state of the matter started to improve at the turn of the century, it was more cosmetics than a true improvement – there were tens of different standards, printers would wonder what you meant by the strange code (with unpredictable results) etc.

Users, of course, found their workarounds. Which in turn created more chaos, dozens of standards, bad quality fonts and sometimes impossible document exchange between platforms.

Fortunately, things are getting better.

How to make a good Cyrillic site

Gradually, that is the only word that explains it. Back in the day when IDN domain format was introduced, browsers were treating it differently, at their own will. For example, Mozilla Firefox asked for a special approval (i.e. hardcoding) of the allowed domain names, and .срб was not amongst the included.

The next dilemma was a character set. International symbols are represented by several standards, and one of the largest and the most pro-severing proponents of that chaos was no other than Microsoft, with their idea of “different and better” solutions to the already answered questions.

Obey Thou God

The only way forward is really, going along the standards of the Internet.  In case of international scripts, such as Serbian Cyrillic, it has turned to be UTF-8.

First of all, you have to inform the browser which charset you are using in the document:

<meta charset="utf-8" />

Secondly, you will need to use the same charset in the URL themselves:

https://www.rnids.rs/национални-домени/регистрација-националних-домена

That is what the standard provides. Browsers do not? That is their problem, indeed it is, and it will not go away easily. Even today, international letters in URL are treated differently than the English ones, but again – it is their problem. Persistence is the key, not the workaround.

Thirdly, it is not a bad practice to let know the “renegade browsers” what are you trying to accomplish:

<!--[if lt IE 7]><html class="lt-ie9 lt-ie8 lt-ie7" lang="sr" dir="ltr"><![endif]-->
<!--[if IE 7]><html class="lt-ie9 lt-ie8" lang="sr" dir="ltr"><![endif]-->
<!--[if IE 8]><html class="lt-ie9" lang="sr" dir="ltr"><![endif]-->

That about sums the “knowhow”, i.e. the out of the standard solutions.

Let the search engines know

Often, the non-Latin content is duplicated by a Latin-alphabet content. Let the search engines know that is the case. This feature is relatively recent, but very helpful, and we warmly recommend to follow it:

<link href="/%D0%BD%D0%B0%D1%86%D0%B8%D0%BE%D0%BD%D0%B0%D0%BB%D0%BD%D0%B8-%D0%B4%D0%BE%D0%BC%D0%B5%D0%BD%D0%B8/%D1%80%D0%B5%D0%B3%D0%B8%D1%81%D1%82%D1%80%D0%B0%D1%86%D0%B8%D1%98%D0%B0-%D0%BD%D0%B0%D1%86%D0%B8%D0%BE%D0%BD%D0%B0%D0%BB%D0%BD%D0%B8%D1%85-%D0%B4%D0%BE%D0%BC%D0%B5%D0%BD%D0%B0" rel="alternate" hreflang="sr" />

<link href="/lat/nacionalni-domeni/registracija-nacionalnih-domena" rel="alternate" hreflang="sr-Latn" />

<link href="/en/national-domains/registering-national-domains" rel="alternate" hreflang="en" />

We are citing the same URL as above, but Google Chrome is still ensconcing Cyrillic. Pay no attention to it, they will come to their senses sooner or later. In reality URL-s are spelled out in the same UTF-8 encoding, all three of them.

Making the search engines aware it is the same content will actually help. It will not favour English or any other variant – they will behave as per users’ system settings. If you are from Serbia, using Cyrillic, Windows (or another OS) needs to know about it, and you must stop feeling  “awkward” about your own language and script. No one else will stand up for you.

Be patient

Internet will come of age for sure, supporting all the alphabets in the world and more. Doing once-off workaround will not help, as a matter of fact, it may only reduce the speed of adoption. If you have a standard, go with it.

Follow the unfolding

As the browsers and other software progress, the quality of the support for non-English alphabet increases. Now that is beyond question, we can see it happen in the previous decade. You might be tempted to use hackerish, once-off solutions to display your letters immediately, but do not. The systematic support is coming. A much better idea is asking for it – Google, Microsoft, Mozilla, etc. are not deaf to such requests.

Use your script throughout

There is no reason why the text would be in Serbian Cyrillic, and the URL in Latin. There might have been in the past, but not anymore. Do not support the backwards, support the advancing.

Small secrets

There are a few less than obvious things when working with international URL-s.

Apache web server, the predominant one on the Internet, has difficulties in interpreting the direct input Cyrillic into its config files. You need to resort to their IDN equivalents when setting the server name, doing redirects and such. For example:

ServerName xn--d1aholi.xn--90a3ac # actually, рнидс.срб

Amusingly enough, the configuration file is called рнидс.срб.conf. That is because Linux has no issues with the Cyrillic alphabet, and on the face of it, Apache has neither – except….

If you try simple copy paste of the Cyrillic URL, you will end up with something like:

https://www.rnids.rs/%D1%9B%D0%B8%D1%80%D0%B8%D0%BB%D0%B8%D1%86%D0%B0-%D0%BD%D0%B0-%D0%B8%D0%BD%D1%82%D0%B5%D1%80%D0%BD%D0%B5%D1%82%D1%83/idn-%D0%B5%D0%BD%D0%BA%D0%BE%D0%B4%D0%B5%D1%80

It is actually simple – the URL reads https://www.rnids.rs/ћирилица-на-интернету/idn-енкодер. However, most of the browsers will „assist“ you by transcoding. The URL if perfectly valid and works, but is understandable only to the machine, completely useless for humans.

Google Chrome just recently removed such behaviour and now you can safely copy-paste Cyrillic URLs from its address line to other browsers. That emphasizes the point of “Be Patient” section given before. The other browsers, however, did not follow yet. But there is a workaround – omit just one letter, for example, the leading “h” in the protocol name, and the browser will indeed copy the URL correctly. Here is the result of the latest Firefox:

ttps://www.rnids.rs/ћирилица-на-интернету/idn-енкодер

A battle indeed, but the one deemed to be won.

 

Vladimir Marić, Mrežni Sistemi d.o.o.

www.mreznisistemi.rs