
In part one we talked about using xslt to create simple xml templates and fill them with a identity transform using xslt. We can use this approach for every xml document thinkable, but for now I will focus on Word documents as those are fairly common in our line of work . The first thing to ask ourselves is, how do we get our own custom xml placeholders in Word, maybe we want different kinds of placeholders, like ones were a value will be put or ones that denote a conditional block of Word content.
The answer is simple, Word allows us to create our own xsd and hook that into a Word document, now this opens up all kinds of possibilities for custom functionality in our Word templates. So first, lets create a xsd. All the files shown will be attached for everyone to download in a .zip file so if you want to have a better look don’t worry. Our xsd looks like this:
This is a fairly simple xsd, ALWAYS provide a target namespace for your schema so your own elements can be identified as your own elements. This namespace will also be declared in our xsd with the prefix tns: so we can use this namespace to refer to our global types, global types and elements will belong to the targetNamespace by default.
Let’s look at the elements and types. First I have an element that will be applied to the entire Word document, this will be our document node, it is called WordTemplate. The type is templateType which is defined below it. This type specifies a xs:choice which contains a valueHolder element. Now why would we use xs:choice and not for example sequence? The answer is because I don’t want to add different kinds of placeholders in our Word document in a particular order. Now we have only one kind, but later on I may want to provide additional place holders. the only way to specify an unordered set of different kinds of optional elements in xsd is with a xs:choice which can occur multiple times. The valueHolder element will have one attribute named ‘query’. In our Word template this will be filled with a xpath query which defines which property value from .Net will be placed here.
Now how do we couple this xsd in our own Word template. When you open Word 2010 you need to enable the developer tab. Go to File –> Options –> Customize Ribbon and check the developer tab. From the developer tab you will see an option for schema. With this option you can add your own schema. Afterwards you can click the structure button and your Word document will look like this:
To the right you can see our own Schema elements, to only option we have at first is to apply our document node. So lets do that. In order to do so, all you need to do is to click on our element, Inside our document node we can apply our valueholder elements. I will apply it on places where I want values from .Net inserted. Afterwards our document will look like this:
You can see all our elements. When you press ctrl shift X, Word will toggle between this view and the view without the tags. To the right you can see yellow signs before the valueHolder elements this means those tags are invalid according to our schema. Word is right! Recall that those elements need to have a required attribute called query. So we right click all those elements and fill in the attributes like this:
Here I right clicked the valueHolder element next to Name: and I filled in a xpath query. From our template we need to know the structure of the .Net object we are going to work with. Specifying this in the template allows us to later add extra properties of the object, without modifying code but with only replacing the template. I will fill in the rest and save the document. You can save the document as a standard Word 2010 file, it will be .docx which is actually a zip file with other files in it. This approach will still work but in .Net we would have to use the packaging API to get the content.xml file and then transform it using xslt. For simplicity I will save the file as a Word 2010 xml file so all the info is contained in one file, no need for extracting.
Now where is the Identity transform I talked about? Well, right below:
1: <?xml version="1.0" encoding="utf-8"?>
2: <xsl:stylesheet version="1.0"
3: >="http://www.w3.org/1999/XSL/Transform"
4: >="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
5: >="urn:Chris:demo:word"
6: >="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
7:
8: <xsl:output method="xml" indent="yes"/>
9:
10: <xsl:template match="@* | node()">
11: <xsl:copy>
12: <xsl:apply-templates select="@* | node()"/>
13: </xsl:copy>
14: </xsl:template>
15:
16: <xsl:template
17: match="w:customXml[@w:element='valueHolder']/w:tc/w:p" >
18: <xsl:variable name="query"
19: select="../../w:customXmlPr/w:attr[@w:name='query']/@w:val">
20: </xsl:variable>
21: <xsl:copy>
22: <xsl:apply-templates select="@* | node()"/>
23: <w:r>
24: <w:t>
25: <xsl:value-of
26: select="wordgen:GetMessageValueByXpath($query)"/>
27: </w:t>
28: </w:r>
29: </xsl:copy>
30: </xsl:template>
31: </xsl:stylesheet>
Above is a xslt stylesheet created in VS2010. What strikes is that the identity transform I used in part one was defined as match=’attribute() | node()’. Here it is not. This is because .Net 4 only supports xslt and xpath 1.0! So this is the way to do it in 1.0 because 1.0 does not support the extra node tests..
The first template is the copy action, the second template is more interesting. Recall that we put our own xml tags in our Word template. Office 2010 saves those in its own WordML xml as <customXml> tags with element attributes describing its name. The xpath query in the match translates to the following: Only match on w: p nodes which are child’s of w:tc nodes which are childs of customXml nodes whose name is valueHolder. I match the w: p nodes because that is where Word wants the text. If you put text in our Word document within a custom xml tag word puts your text within the w: p node which is a descendant of the custom xml node representing your tag. when we have a match on the w: p node we perform the following steps:
- Extract the value of our query attribute.Word places it at a rather strange place, as a descendant of the customXml tag with a name and val attribute.
- We put this in a variable so we can use this later on in our template
- We recursively copy all the content of the p node to our output document so its completely intact when we add our own text.
- We add our own value extracted from a .Net object within a couple of Word mark up tags. Line 26 is especially interesting. The function that gets called there gets our xpath expression as a parameter. Our xpath expression was defined with the valueHolder as an attribute in our Word template. So how does this xslt function uses this expression on a .Net object??
Because actually, it’s a .Net function being called. And this is super cool, using so called xslt extensions we can call functions on .Net objects from xslt! How cool is this
1: namespace Chris.Demo.WordGenerator
2: {
3: /// <summary>
4: /// This class will be the heart of our custom Word functionality
5: /// </summary>
6: class XsltExtension<TMessage>
7: {
8: XPathNavigator messageDoc;
9: public XsltExtension(TMessage message)
10: {
11: //pre
12: Guard.ArgumentsNotNull(message);
13:
14: XmlSerializer xs = new XmlSerializer(typeof(TMessage));
15: MemoryStream ms = new MemoryStream();
16: xs.Serialize(ms, message);
17: ms.Seek(0, SeekOrigin.Begin);
18: messageDoc = new XPathDocument(ms).CreateNavigator();
19: }
20: /// <summary>
21: /// This method will be called from XSLT
22: /// </summary>
23: /// <param name="xpath"></param>
24: /// <returns></returns>
25: public string GetMessageValueByXpath(string xpath)
26: {
27: if (string.IsNullOrWhiteSpace(xpath))
28: {
29: return string.Empty;
30: }
31: else
32: {
33: XPathNodeIterator ni = messageDoc.Select(xpath);
34: ni.MoveNext();
35: return ni.Current.ToString();
36: }
37:
38: }
39: }
40: }
This class functions as the base for all of our extra xslt functions, for now there will only be one: GetMessageValueByXpath. This function is called from xslt to extract a value from a .Net object. The .Net object is supplied in the constructor of this extension, it is serialized to xml and converted to a XPathDocument, Once we have the navigator of the XpathDocument we can use every Xpath 1.0 function from our word template, including number formatting, date formatting and even our own defined xpath functions. For this example the object being passed into the xslt extension will be a PersonInfo object corresponding with the properties being specified in our Word template.
Lets the code that creates this extension and passes it to the xslt transform:
1: using System;
2: using System.Collections.Generic;
3: using System.Linq;
4: using System.Text;
5: using Chris.Demo.WordGenerator.Contract;
6: using System.Xml.Xsl;
7: using System.Reflection;
8: using System.Xml;
9: using Chris.Demo.WordGenerator.Util;
10: using System.IO;
11:
12: namespace Chris.Demo.WordGenerator
13: {
14: /// <summary>
15: /// A transformer that transforms the objects using XSLT
16: /// </summary>
17: class XslTransformer:IObjectDocumentTransformer
18: {
19: private XslCompiledTransform _xsltransform;
20: public XslTransformer()
21: {
22: _xsltransform = new XslCompiledTransform(false);
23: Stream stylesheet = Assembly.GetExecutingAssembly().GetManifestResourceStream(Constants.XSLTRESOURCE);
24: _xsltransform.Load(XmlReader.Create(stylesheet));
25: }
26: public System.IO.MemoryStream TransForm<TMessage>(TMessage message, System.IO.Stream template)
27: {
28: XmlReader templatedoc=XmlReader.Create(template);
29: MemoryStream returnStream = new MemoryStream();
30: XsltArgumentList arguments = new XsltArgumentList();
31: arguments.AddExtensionObject(Constants.DEMONAMESPACE, new XsltExtension<TMessage>(message));
32: _xsltransform.Transform(templatedoc, arguments, returnStream);
33: returnStream.Seek(0, SeekOrigin.Begin);
34: return returnStream;
35: }
36: }
37: }
Line 31 is the line that creates the XsltExtension, passes it the message object (PersonInfo in our demo app that will follow), and transforms the passed template with our xslt stylesheet and our extension to a valid word document. The xslt extension needs to be in a separate xml namespace. The DEMOSPACE constant I am using here maps to:urn:Chris:demo:word. In our xslt stylesheet you can see I prefixed this same namespace with wordgen. That’s why in the xslt stylesheet the function is called like wordgen:GetMessageValueByXpath($query).
This XslTransformer will be used by the WordGenerator class. The WordGenerator class is the class client applications will interact with. Its code is shown below:
1: using System;
2: using System.Collections.Generic;
3: using System.Linq;
4: using System.Text;
5: using Chris.Demo.WordGenerator.Contract;
6: using System.Xml.Xsl;
7: using System.Xml;
8: using Chris.Demo.WordGenerator.Util;
9: using System.IO;
10: using System.Reflection;
11: using Conditions = System.Diagnostics.Contracts;
12:
13: namespace Chris.Demo.WordGenerator
14: {
15: /// <summary>
16: /// Our generator
17: /// </summary>
18: public class WordGenerator :BaseGenerator
19: {
20: public WordGenerator():this(new XslTransformer())
21: {
22: }
23: public WordGenerator(IObjectDocumentTransformer transformer):base(transformer)
24: {
25: }
26:
27: /// <summary>
28: /// Later there will be more functionality here, for now a very basic implementation
29: /// </summary>
30: /// <typeparam name="TMessageObject"></typeparam>
31: /// <param name="message"></param>
32: /// <param name="template"></param>
33: /// <returns></returns>
34: public override System.IO.MemoryStream GenerateLetter<TMessageObject>(TMessageObject message, Stream template)
35: {
36: //pre didnt use code contracts because of the extra req tooling
37: Guard.ArgumentsNotNull(message, template);
38: return Transformer.TransForm(message, template);
39: }
40: }
41: }
It inherits functionality from its base class, functionality like the Transformer property and a GenerateAndSave file method which uses the overridden method here. For now it only adds a guard and delegates to the transformer object. Default it will use our xslt transformer but the constructor also accepts another transformer as long as it implements the IDocumentTransformer interface. From this code it is obvious that client applications who will be using this dll, must supply a .Net object that contains the data which you want to be shown on the Word letter and, the client app must supply a Word template. Each client app, or even different parts from one application can generate a lot of different Word letters using this approach, all they have to do is define a template, pass it and the data to the generator and we have a Word letter!
I created a demo app which uses the Word template we have defined earlier. It is a very basic Windows Form that asks for the info we wanted to show in the Word letter, and when we push the button it generates the letter. Lets have a look
We fill in the data, we press generate and the screen below will show:
It asks us where to save the document. I choose my Desktop and presto !
Lets see how this works. Here is our Windows Form class, its very simple:
1: using System;
2: using System.Collections.Generic;
3: using System.ComponentModel;
4: using System.Data;
5: using System.Drawing;
6: using System.Linq;
7: using System.Text;
8: using System.Windows.Forms;
9: using Chris.Demo.WordGenerator;
10: using System.IO;
11: using FEWordGenDemo.Properties;
12: using Chris.Demo.WordGenerator.Contract;
13:
14: namespace FEWordGenDemo
15: {
16: public partial class Form1 : Form
17: {
18: private PersonInfo pi = new PersonInfo();
19: private ILetterGenerator _generator;
20: public Form1(ILetterGenerator generator)
21: {
22: InitializeComponent();
23: personInfoBindingSource.DataSource = pi;
24: _generator = generator;
25: }
26:
27: private void btnGenerate_Click(object sender, EventArgs e)
28: {
29: if (svfWordDoc.ShowDialog(this)==DialogResult.OK)
30: {
31: Stream templateStream = new MemoryStream(Resources.WordTemplate);
32: _generator.GenerateAndSaveLetter(pi, templateStream, svfWordDoc.FileName, true);
33: }
34: }
35: }
36: }
The most exciting stuff happens from line 27 and downward. Here you will see:
- The SaveFile Dialog.
- Getting the Word template, in my app it was embedded as resource but imagine a scenario where you can download all the templates from a location. Whenever the templates need updating, you place them on the server and every client will download it automatically. The last scenario is very similar to a project I have worked on.
-
Asking our Word generator to generate a Word document save it and open it in the default application! You can see I use the constructor to inject the generator, this way I can easily switch implementations. I pass the generator our template as a stream, and the PersonInfo object which should contain al the data by use of databinding.
Conclusion
Xslt, xpath and xsd’s in combination with the programming language of our choice enables us to very easily template and generate all kinds of xml documents. Here we have looked at Word generation. Some of you might wonder, why this way, why not just grab the Open Xml sdk to generate Word 2010 documents. This is a valid point, but generating our Word documents this way allows us to easily add extra functionality, think about Conditional place holders, all the xpath formatting that can be done from our templates, Iterative place holders that will generate rows based on a .Net array, stuff like that!
Xslt extensions in .Net are very powerful, you can even pass nodes from xslt to .Net and vice versa allowing us to so some very cool modifications on our message object before sending back to xslt and injecting it in the Word document. This also allows for a very standard way of generating documents as it can be applied to all kinds of xml documents.
All the files I used and created are in one Visual Studio solution. here is the download link:
Pfew! this was a big post with lots of info. I hope you liked reading it. Feel free to comment!
Greetings
Chris
9 comments
I don’t think your solution will work in Office copies sold in the USA. Microsoft had to remove the custom XML feature because of a patent lawsuit. See this entry from Microsoft support:
http://support.microsoft.com/kb/2445060
Alex van Beek
Ah the solution does still work. You can still use the custom schema and your custom xml. Word will save it all to your document and you can use the template just fine in your software. I ran into this problem on the job also and the thing that does not work in some copies, is when you reopen your template in Word. You will lose your custom markup. You can try to specify the markup in functional designs, that way you know how your template was marked up if you accidentally lose the mark up. For word 2007 and below this shouldnt be a problem though.
If you do go with the 2007 and 2010 content controls approach, xslt is still a very flexible approach, especially because content controls allow xpath natively :). And in the end it is all XML..
Chris van Beek
I know, but contentcontrols require some extra work, it’s not like your solution works out of the box with contentcontrols. This is important for your american readers to know 🙂
Alex
Thanks for pointing it out. I did not know Microsoft had a support article about it! The solution is not intended to work with Content controls out of the box, but with custom xml tags. This will work fine in office 2007 and 2003 As for 2010 migrating the solution from custom xml to content controls is not much work at all, all you have to do is replace the custom xml tags with content controls, and adjust the template which extracts values to match on the contentcontrol element and use its xpath instead of matching the custom xml element and its query. All the rest can stay the same.
In the future I will also post on generating with content controls by using their xml databinding functionality without any xslt at all so you can compare the two solutions :).
Chris van Beek
That post about contentcontrols would be interesting 🙂
Alex van Beek
I created an open source project (fleXdoc) that does exactly this, but supports richer constructs. Check it out here: http://flexdoc.codeplex.com
Alex’s comments are correct. Choosing custom XML tags is not a very future proof direction. Content controls are and with some work, you can achieve similar functionality.
By the way: Microsoft is not yet finished with the patents-case: http://www.zdnet.com/blog/microsoft/supreme-court-agrees-to-hear-microsoft-patent-infringement-appeal/8074?alertspromo=&tag=nl.rSINGLE
Robert te Kaat
I created an open source project (fleXdoc) that does exactly this, but supports richer constructs. Check it out here: http://flexdoc.codeplex.com
Alex’s comments are correct. Choosing custom XML tags is not a very future proof direction. Content controls are and with some work, you can achieve similar functionality.
By the way: Microsoft is not yet finished with the patents-case: http://www.zdnet.com/blog/microsoft/supreme-court-agrees-to-hear-microsoft-patent-infringement-appeal/8074?alertspromo=&tag=nl.rSINGLE
Robert te Kaat
Wow cool. I did not have the include or the image functionality, the rest is almost exactly the same. I think it would be quite a bit of work to do the same with content controls. Value of functionality would be easy, it is provide for us in Word with Xml databinding but the if, constructs the loops etc just screams xslt and xpath again..Plus how will you distinguish functionality between different content controls..
But your codeplex project, cool stuff!:).
Chris van Beek
Same thing (pretty much) has been done using content controls by a colleague already. Although just a study assignment, he managed to implement similar functionality. The meta information that needs to come with the content controls (like what type (ValueOf, If, Image, etc), it’s properties (like path=”xslt…”), etc) needs to be stored inside a separate XML-part within the same document. To make editing the template possible, a Word-addin was created. It takes care of inserting the different type of controls and storing their properties inside the XML-part. Although not a production-ready solution, it could serve as a starting point for converting fleXdoc to content controls. However, as long as we still have Word 2007 copies laying around, why bother!
Robert te Kaat