Un article de Loria Wiki.
When character encodings go awry (note: this page is UTF-8 encoded). The point here isn't to list all possible permutations, but the ones that we actually encounter in real life.
|l\351g\350re||légère||ISO 8859-1 text being marked as ASCII||convert text from ISO8859-1 to UTF-8 (if in UTF-8); get the other party to fix their software|
|l?g?re||légère||ISO-8859-1 text being read as UTF-8||convert text to UTF-8|
|lÃ©gÃ¨re||légère||UTF-8 latin text being read as ISO 8859-1||switch your software to UTF-8|
|lÃÂ©gÃÂ¨re||légère||UTF-8 latin text mistakenly converted backwards (from latin-1 to utf-8) TWICE; crazy stuff!||convert text to ISO8859-1 (yes, i know it's weird); you should get a proper UTF-8 file back|
|l√©g√®re||légère||UTF-8 latin text being read as MacRoman||switch your software to UTF-8|
|lÈgËre||légère||ISO 8859-1 text being read as MacRoman||convert text to UTF-8; switch your software to UTF-8|
|ë¸ãêèé||лёгкий||cp1251 (cyrillic encoding) text being interpreted as iso8859-1||iconv -f iso8859-1 -t cp1251 <file>
or iconv -f utf-8 -t iso8859-1 <file> | iconv -f cp1251 -t utf-8
|МЈЗЛЙК||лёгкий||koi8-r (cyrillic encoding) text being interpreted as cp1251 (another cyrillic encoding)|| copy the text to a file in utf-8 ecodiing and issue the command
iconv -f utf-8 -t cp1251 <file> | iconv -f koi8-r -t utf-8
Note that I tend to prefer switching things to UTF-8 whenever possible. This encoding is ASCII friendly and it can represent any Unicode text. So, for example, if you want to mix Arabic with Chinese, or Russian with French, UTF-8 is the way to go.