Yesterday, I came accross this interesting table which lets me know what conversions I need to do when I paste text from Word into a textarea and further want to use this text on the web…

To be accurate, this table is useful for conversion from the default windows charset (windows-1252 aka CP1252) to the default web charset (ISO-8859-1 aka Latin-1). Nethertheless, this allowed me to check the conversion in my b2evolution software and I noticed that it was missing one conversion (in a total of 27).

Anyway, the world actually extends way beyond cp1252 and Latin-1, so how would one deal with other languages? :?:

For example, how do I convert Latvian from Windows-1257 to iso-8859-13 (close match) ? Or Russian from Koi8-r to iso-8859-5 (funky match) ? Check out this awesome character set database provided by the Institute of the Estonian Language. (Wouldn’t it make sense if unicode.org provided this? :crazy:)

By the way, how do I know what charsets are to be used for a particular language? Here’s a page by the W3C, but it’s a little sparse… Another one.


Comments from long ago:

Comment from: J.o.sue

how do I convert Hebreu

2006-03-03 18-24