# é is not é: The same glyph can have different Unicode representations.

posted 2012-Jan-25

Did you know that é is not the same as é? No, seriously.

The first is a Unicode “Latin small letter e with acute” character, 0xC3 0xA9 in UTF-8.

The second is a “Latin small letter e” character (0x65 in ASCII and UTF-8) followed by a “Combining Acute Accent” character (0xCC 0x81 in UTF-8). The second glyph is zero-width and draws over top of the first.

• If you type alt+233 (on keypad) in Windows, you get the single-character Unicode.
• If you type option-e, e under Mac OS X (and presumably all previous MacOS versions), you get the two-character Unicode. (This is true in the Terminal, Finder, and TextEdit, though some programs like Sublime re-encode the typed character as the single-character Unicode when saving.)

Why does this matter? Well, if you use OS X to name and upload a file to your web server and then later try to navigate to the file by typing in the address in Windows, you will fail.

Making matters worse, when you then browse the directory of files on the web server and click on the link you get a file name that looks exactly like what you typed in, but that works (unlike what you typed). [Edit: I’ve actually put files with both names in the directory.]

Unicode is hrrd.

 joemppe 10:32AM ET2012-Mar-01 Unicode does provide a solution this, but it looks like Windows doesn’t implement Unicode canonical equivalence correctly. Might also be nice if OS X stored file names in Unicode normalized form.