Category: "Internationalization"

17. April 2004

Charset conversions (i18n)

Yesterday, I came accross this interesting table which lets me know what conversions I need to do when I paste text from Word into a textarea and further want to use this text on the web...

To be accurate, this table is useful for conversion from the default windows charset (windows-1252 aka CP1252) to the default web charset (ISO-8859-1 aka Latin-1). Nethertheless, this allowed me to check the conversion in my b2evolution software and I noticed that it was missing one conversion (in a total of 27).

Anyway, the world actually extends way beyond cp1252 and Latin-1, so how would one deal with other languages? :?:

For example, how do I convert Latvian from Windows-1257 to iso-8859-13 (close match) ? Or Russian from Koi8-r to iso-8859-5 (funky match) ? Check out this awesome character set database provided by the Institute of the Estonian Language. (Wouldn't it make sense if unicode.org provided this? :crazy:)

By the way, how do I know what charsets are to be used for a particular language? Here's a page by the W3C, but it's a little sparse... Another one.

20. August 2003

Internationalizing web applications using gettext in PHP

As I have said before, gettext is a very interesting framework for i18n and i10n.

Now the question is, how do I apply this to web applications? Actually, I'm going to restrict my discussion here to PHP since this is what I'm working with right now... but you should expect similar behaviour when using other web development tools that integrate gettext.

First of all, the good news: PHP fully supports gettext since version 3.0.7. So it's been used for a long time and you can even find tutorials on the net.

18. August 2003

Introducing gettext and .PO files

As I said recently, i18n and l10n are best carried out using the right tools...

I've looked around somewhat and it turns out there seems to be an absolute reference in the area: the GNU gettext framework.

This framework actually comprehends several things:

A set of conventions about how programs should be written to support i18n;
A directory and file naming organization for the translated strings;
A runtime library to display localized text;
A set of utilities to handle the l10n process;
A special mode for Emacs which helps preparing the sources for i18n.

15. August 2003

Introducing i18n and l10n

When you develop a piece of software or a website up to a certain point, there comes a time when you try to reach an international audience.

No doubt your first move will be to provide an English version of your software or website.

However, you will soon realize this is not enough. Of course, many people do understand English to some extent; but you have to realize how painful it can be for them. Maybe you don't even realize how easily you can understand English compared to the average. Of course, if you are yourself a native English speaker, you need to try and imagine that every software you use comes in French or German by default! How would you feel about that? :P

Furthermore, you may have spent some time on making your software or website accessible. Users can now change the font size and enhance contrast if they have trouble reading those lines of funky rendered text... That's fine... but what's the use if their problem is not with the formatting but with the language!?

7. August 2003

GNU gettext

gettext is a package that includes everything you need to internationalize a piece of software and then let translators localize it on the run without worrying too much about it.

It's not perfect (for example, it's poor at using multiple languages simultaneously, i-e in the same output), but it's still very useful... and the best I found! ;)

Nerd Life