A clever use for U+202E

There's been some discussion of the security implications of Unicode characters. In particular, some people worry that hackers could use Unicode characters to create strings that look just like other strings but behave very differently.

The Unicode U+006F ("o") looks a lot like the Unicode U+03BF ("ο"), for example, so that it's hard for people to tell the difference between "Google" and "Goοgle," even though they're actually different strings.

But there's a way to make Unicode even trickier, and that's by using the character U+202E, the "right to left override."

Here's the alphabet

ABCDEFGHIJKLMNOPQRSTUVWXYZ

and here's the alphabet with a single U+202E inserted in the middle of it

ABCDEFGHIJKLM‮NOPQRSTUVWXYZ

Note how the entire second half of the alphabet is displayed backwards when this single additional character is added. (If you can't see this, then your browser probably doesn't handle Unicode correctly. I tested it in IE 8 and Chrome 10 and it worked with both of them.)

And if you copy and paste the last three characters of the alphabet with the embedded U+202E, you'll find that when you select and copy the "PON" and paste it you get "NOP" back because the U+202E isn't in the part that you copied. You may think that you're selecting "PON," but you're really not.

Now imagine how a clever hacker could take advantage of this.

  • Tim

    @Dan I came from there too! Although I’m not sure exactly what this means/does, it looks pretty cool still.

    Reply

  • Gab

    Lol i came here too guys, still dont know how th is works lol

    Reply

  • Simon

    Interestingly, in Chrome 23, when I copy and paste the ‘PON’ at the end, I get ‘PON’, so I guess they’ve changed how they handle the RTL marker. In IE 9 I still get ‘NOP’ though.

    Reply

  • Andy

    xkcd brought me here

    Reply

  • Dice

    If you copy and paste it into word (even from Chrome) its NOP. Chrome keeps the u+202e while still in the application.
    P.S. If your from XKCD the cartoon reads “They didn’t even…””What the hell?…””How did you…””A**hole…”

    Reply

  • Istar

    Even looking at the source in Chrome showed everything backwards on that line after the U+202E. I had to save the html and load it into vi to see it.
    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    So apparently this only reverses the order on the line in which it appears and does not reverse it on subsequent lines. Is this the standard behavior it is supposed to have?

    Reply

  • Istar

    Formatting here is weird. I cannot get less than or greater than signs to show, and html escape characters don’t work. In any event, my previous post was supposed to show that vi displays the 202e character and does not flip what comes after. Which makes sense since it is for editing text files including random odd characters.

    Reply

  • Tim H

    Yes, also just been on XKCD…it’s a small web…

    Reply

  • Luther Martin

    But it was here a year and a half ago, wasn’t it?

    Reply

  • Phil

    Oddly enough in Safari and Chrome (but NOT in Firefox) under OSX any selection which includes the N produces WYSIWYG but selections which do not include the N produce the result as described.
    Firefox under OSX produces result as described always.

    Reply

  • Carl

    Opera’s Dragonfly program displays the source as having what looks like a closing paragraph tag, except the slash is after the p instead of in front? ( instead of )
    When I copy/paste the second alphabet out of opera, it pastes in the “real” order into notepad with a block character in the middle.
    But when I paste it here, it appears the same as the tricked text in the blog. It also exhibits weird backward highlighting behavior:
    ABCDEFGHIJKLM‮NOPQRSTUVWXYZ

    Reply

  • Tobi

    @Martin: I guess what they are saying is that the sudden rise in interest to this blog is due to the recend XKCD comic… which – not coincidentally – is also the reason for my post. (It is among the top hits for “U +202e” in google)

    Reply

  • Aaron

    One interesting thing is when highlighting from before the u+202e character marker you cannot highlight ‘Z’ until you get to the very end of the string. This would be noticeable as it leave a large segment unhighlighted for a time.
    Also, if you use Chrome and copy a segment from somewhere before the u+202e character until somewhere after and paste that into address bar of chrome you can see Google Search flipped around and in the middle of your paste in the “suggested search term” list.

    Reply

  • drumrobot

    I’ll throw in another honorable mention for XKCD; it’s reason I’m here.

    Reply

  • t3nshi

    ‮xkcd brought me here

    Reply

  • Fritz

    How funny. I wonder if XKCD has ever unintentionally fireballed a site for being in the top google results for some obscure term.

    Reply

  • TheMany

    If you highlight the “Z” everything past it will be backwards, but highlighting Y->N and pasting will put them back into order. However… I viewed the source code and could not see where the U+202e code was inserted. Maybe I’m just a newbie but could someone explain basic Unicode control implementation?

    Reply

  • Sronds

    Xkcd brought me here

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *