Wednesday, July 15, 2009

The Risk of Performing JavaScript Operations Around User-Generated HTML

Last year I wrote a ColdFusion application that included a page where an editor could take a list of short news abstracts and manipulate them (reorder them, place them in categories, add divider lines, create anchor links to highlighted stories, etc.) using a number of jQuery-powered JavaScript functions. It gives them a lot of control over the final layout, and they're pretty happy about it.

However, there have been occasions when this layout tool doesn't work at all. Why? Unclosed HTML tags in the abstract data.

The page they use to enter the news abstracts is comprised of several plain HTML text fields, a set of category checkboxes and two WYSIWYG textarea boxes (powered by JavaScript) for the abstract itself. I gave them the WYSIWYG boxes because I knew they would want to be able to do some basic HTML formatting on the abstract text (boldface, italics, etc.), and the WYSIWYG editors do produce solid, clean HTML.

What they do that gets them in trouble is they enter HTML tags (usually for italics) into the title form field, which is a normal HTML text field. Every once in awhile, they forget the closing tag or maybe just an angle bracket...the abstract itself saves correctly but when they get to the layout tool, that unclosed HTML tag messes up the entire DOM structure the layout tool is programmed to manipulate, rendering it non-functional.

They've now learned that when the header text of the various layout tools appears in italics, that usually means they've got an unclosed italics tag that needs fixing. It's easy enough to fix once they figure out which news item is responsible.

So a word of warning: if your JavaScript functions are designed to act upon HTML code that you yourself don't have complete control over (or the ability to make sure the HTML is clean and proper), consider the possibility that your JavaScript may end up breaking due to the content.