Wednesday, October 6, 2010

Unicode and HTML

I have been struggling with developing a sample application in Chinese for the past few days. Its a pretty simple app, add, search and edit employees. Well, not quite!

After building the sample pages, a search and list page and an add / edit page, I tried converting them into Chinese. Here's what I learnt till now:

  1. Don't forget to set pageEncoding tags in jsps. Had to set this to UTF-8 in conjunction with the meta charset tag.
  2. JavaScript understands Unicode strings in hex as \u, HTML needs it as &#;. So if I use hex strings in tags, it works, but same in an AJAX response makes the browser ignore it, they will show up as \u. Need to convert them into decimals.
  3. understands HTML encoded Unicode too (the decimal strings).
  4. IE has troubles understanding charset of a a HTTP post request. In spite of setting charset and encodings, it continues to treat it as windows specific or ISO-8859-1. The hack? Add a hidden variable in your form thats denotes a character not part of the standard encoding (any Cyrillic value will do!) and IE starts responding. Bizarre!
  5. Data posted by the form will most always be HTML encoded. Even if it is stored in DB as such the browser is able to pick it up correctly.
  6. For resource bundles, use the java native2ascii command. Store your files with native characters first, then run this tool as "native2ascii -encoding UTF-16 [input file] [output file]" and you're done!
  7. One form refused to post data in HTML encoded form. The hack? Forcefully converted it such (code below).
  8. Yet to fix the NCHAR / NVARCHAR storage :(

So how do you convert hex to decimal and back?
  • JavaScript: Remember that decimals get prefixed with &# and suffixed with semicolon. The hex prefixed with \u
  1. Try the charCodeAt() built in function. Take a string, iterate over it and call this function for each char.
  2. Hex to decimal using [str].parseInt(16)
  3. Decimal to hex using parseInt([str],16)
  • Java: Similar conversions, Integer.parseInt(str, 16) and Integer.toHexString(int)
Now need to figure out the nchar, nvarchar storage and Im done!

No comments:

Post a Comment