blog community

Welcome to blog community Sign in | Join | Help
in Search

Wouter van Vugt

This blog is no longer maintained and has moved

Extracting data from a xml-mapped document

I have just returned from an MSDN InTrack session over here in the Microsoft Technology Center about Office Open XML and I wanted to share one of the demos I created for this event with you. The demo app extracts XML data from an XML-mapped WordprocessingML document. Note that this is not the Custom XML technology introduced in Open XML, but the earlier model already present in Office 2003. The one that looks like this:

Now the old model provides one great benefit. If you add new rows to the order-item table, the new rows are automatically mapped according to the XML schema that is attached to the document. One of the downsides is that the data is embedded in the rest of the main document markup which makes it harder to extract compared to the separate storage of Custom XML.

Of course using a little XSLT-trickery you can still obtain a data only copy (or use the Word UI for it, by saving as a 'Word 2003 XML document', which is a bit unfriendly to the users.

If you run the following XSLT against the main document body you can extract the data. Note that there are no data-type checks, and reviewing options in the document will mess things up (it's a demo…). Also line-breaks are not copied from the document, or other special non-printing characters such as tabs. Feel free to improve it and I'll update the XSLT here for the rest of the community.

<?xml version="1.0" encoding="UTF-8" ?>
<
xsl:stylesheet version="1.0"
  xmlns:xsl=http://www.w3.org/1999/XSL/Transform
  xmlns:w=http://schemas.openxmlformats.org/wordprocessingml/2006/main
  exclude-result-prefixes="w">

  <xsl:template match="w:p | w:r | w:tbl | w:tr | w:tc"
   <xsl:apply-templates select="w:customXml | w:p | w:r | w:tbl | w:tr | w:tc"/> 
  </xsl:template>
    
  <
xsl:template match="w:customXml">
    <
xsl:element name="{@w:element}" namespace="{@w:uri}">
      <xsl:choose>
        <
xsl:when test="not(descendant::w:customXml)">
          <
xsl:value-of select="descendant::text()" />
        </
xsl:when>
        <
xsl:otherwise>
          <
xsl:apply-templates select="child::node()"/>
        </
xsl:otherwise>
      </
xsl:choose>
    </
xsl:element>
  </
xsl:template>

</
xsl:stylesheet>

I'll post the demo-app later together with the rest of the demos. 

Hope it helps!

Published Thursday, October 11, 2007 8:14 PM by wouterv

Comments

 

Mike Glaser said:

Wouter, You got some great post here. In the near future more and more companies need to be able to regenerate documents (e.g. Sarbanes-Oxley, Government decisions). Since users are not able to modify XML or other data themselves, this is an excellent solution. User can modify the data directly in Word and other apps can retrieve the updated data through stylesheets. I can’t wait till you’ll post the demos. By the way, we met in the lobby of the MIC. You invited me to come to your session, but I was unable due to some other meetings. Thanks anyway, Mike
October 12, 2007 9:53 PM
 

wouterv said:

Hey Mike,

you found your way to my blog, cool! You are absolutely right, this can be a very useful thing to be able to do. I'll try and get the rest online soon.

Wouter

October 13, 2007 8:56 AM
Anonymous comments are disabled

This Blog

Syndication

News


Add to Technorati Favorites
Powered by Community Server, by Telligent Systems