Be Genius

Fork Me
Mugshot

Hi, I'm Bodaniel Jeanes.

I'm a Ruby developer from Brisbane, Australia

I work at Mocra where I hack on awesome code. Follow me, recommend me, and link with me.

Unescaping UTF-8 Strings in Ruby 1.9

Today Radar and I encountered a little issue with String encodings in Ruby 1.9. In this project, some Merb UTF-8 params needed to be unescaped. We spent some trying to force strings into UTF-8 encoding but for some reason while the encoding was UTF-8, the actual contents of the strings were getting massacred.

Long story short:

1
2
3
4
5
6
7
# In Ruby 1.9 # Broken puts URI::unescape("Baden-W%C3%BCrttemberg") # => "Baden-Württemberg" # Working puts CGI::unescape("Baden-W%C3%BCrttemberg") # => "Baden-Württemberg"

Another lesson we learnt on our adventures today is the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# In Ruby 1.9 m1 = "Munich" puts = m1.encoding # let's suppose it is something other than UTF-8, such as ASCII-8Bit m2 = "München" puts = m2.encoding # let's suppose we have a UTF-8 encoded string # The following will raise an exception about mismatched encodings m3 = m1 << m2 # The easiest way to get around this is to do the concatenation like so: m3 = m3 = m1 << m2.force_encoding("UTF-8")