Saturday, January 3, 2009

URL Encoding in Tomcat

I'm working on a new feature at blueleftistconstructor (BLC) that is sorta like del.icio.us. The gist is this feature allows a member to create 'bookmarks' in their account on the site. Bookmarks can then be accessed from anywhere one can access BLC.

I created a bookmarklet that when invoked from any webpage will redirect to BLC and save the page as a bookmark. This is pretty clutch as it makes bookmarking a lot easier. I coded up this bookmarklet awhile ago but found a bug just the other day. Any site that had a title containing Unicode characters would cause the title to freak out. In the save bookmark form the characters where getting butchered.

Well it took a hour or so to troubleshoot the problem. It ends up that Tomcat (the Java webserver I host the site app in) uses ISO-Latin-1 (ISO-8859-1) when decoding URLs. Well that makes sense of why the Unicode characters where lost. As luck would have it I kinda guessed this was the issue and found this great post. If you are experiencing such issues just follow the directions in the post and you'll be fine.

No comments: