Charset conversions (i18n)

Yesterday, I came accross this interesting table which lets me know what conversions I need to do when I paste text from Word into a textarea and further want to use this text on the web...

To be accurate, this table is useful for conversion from the default windows charset (windows-1252 aka CP1252) to the default web charset (ISO-8859-1 aka Latin-1). Nethertheless, this allowed me to check the conversion in my b2evolution software and I noticed that it was missing one conversion (in a total of 27).

Anyway, the world actually extends way beyond cp1252 and Latin-1, so how would one deal with other languages? :?:

For example, how do I convert Latvian from Windows-1257 to iso-8859-13 (close match) ? Or Russian from Koi8-r to iso-8859-5 (funky match) ? Check out this awesome character set database provided by the Institute of the Estonian Language. (Wouldn't it make sense if provided this? :crazy:)

By the way, how do I know what charsets are to be used for a particular language? Here's a page by the W3C, but it's a little sparse... Another one.

Introducing i18n and l10n

When you develop a piece of software or a website up to a certain point, there comes a time when you try to reach an international audience.

No doubt your first move will be to provide an English version of your software or website.

However, you will soon realize this is not enough. Of course, many people do understand English to some extent; but you have to realize how painful it can be for them. Maybe you don't even realize how easily you can understand English compared to the average. Of course, if you are yourself a native English speaker, you need to try and imagine that every software you use comes in French or German by default! How would you feel about that? :P

Furthermore, you may have spent some time on making your software or website accessible. Users can now change the font size and enhance contrast if they have trouble reading those lines of funky rendered text... That's fine... but what's the use if their problem is not with the formatting but with the language!?

