Categories
Computing

Microsoft WordML or a fork your in eye

Had to do some work with Word 2003’s XML format, WordML. It is definitely a good step that Microsoft allows us to read XML out of Word, but the format itself is borderline silly. Case in point – bulleted and numbered lists.

Word does not group together the bullets. Each bullet appears inside of a paragraph and the only thing that it does do is mark the physical offset the tab should appear from the edge of the page. As such, you cannot know where bullets begin or end and if you try to leverage or tweak the XSL stylesheet Microsoft provides to transform the XML file into HTML, it is impossible to convert this bullet-per-paragraph scheme to a normal ul/li scheme. Instead the XSL stylesheet uses convoluted spans with style attributes that use margins to push the bullets into their position.
One other thing is that since the WordML does not use ul/li elements for bullets, it needs to output some sort of character to denote the bullet. WordML, being focused on visual representation in the actual application, uses a dingbat character, which it outputs all the way to the HTML produced from the transform. Dingbats are not really available on the web and as a result you get weird bullets, like the character ‘n’.

In short, I just hope really hard that Word 2007 is much better in XML output, especially since is Office Open XML is now an ECMA standard. I wonder whether OpenOffice is much better…

Share
Share