Encoding gibberish
Un article de Loria Wiki.
When character encodings go awry (note: this page is UTF-8 encoded). The point here isn't to list all possible permutations, but the ones that we actually encounter in real life.
| symptom | original | cause | solution |
|---|---|---|---|
| l\351g\350re | légère | ISO 8859-1 text being marked as ASCII | convert text from ISO8859-1 to UTF-8 (if in UTF-8); get the other party to fix their software |
| le'ge`re | légère | Netscape mail? | |
| l?g?re | légère | ISO-8859-1 text being read as UTF-8 | convert text to UTF-8 |
| légère | légère | UTF-8 latin text being read as ISO 8859-1 | switch your software to UTF-8 |
| légère | légère | UTF-8 latin text mistakenly converted backwards (from latin-1 to utf-8) TWICE; crazy stuff! | convert text to ISO8859-1 (yes, i know it's weird); you should get a proper UTF-8 file back |
| l√©g√®re | légère | UTF-8 latin text being read as MacRoman | switch your software to UTF-8 |
| lÈgËre | légère | ISO 8859-1 text being read as MacRoman | convert text to UTF-8; switch your software to UTF-8 |
| ë¸ãêèé | лёгкий | cp1251 (cyrillic encoding) text being interpreted as iso8859-1 | iconv -f iso8859-1 -t cp1251 <file>
or iconv -f utf-8 -t iso8859-1 <file> | iconv -f cp1251 -t utf-8 |
| МЈЗЛЙК | лёгкий | koi8-r (cyrillic encoding) text being interpreted as cp1251 (another cyrillic encoding) | copy the text to a file in utf-8 ecodiing and issue the command
iconv -f utf-8 -t cp1251 <file> | iconv -f koi8-r -t utf-8 |
Note that I tend to prefer switching things to UTF-8 whenever possible. This encoding is ASCII friendly and it can represent any Unicode text. So, for example, if you want to mix Arabic with Chinese, or Russian with French, UTF-8 is the way to go.
[modifier]
