
One of the cool new features of Microsoft Word 2007 is the ability to inspect documents for unwanted content. You have all probably sent a document to a prospect with the reviewing data and comments still contained inside the document, available for viewing with the flick of a button. The document inspector is there to alleviate that problem. The inspector provides extensibility features with which you can specify what content to remove from the document upon inspection. By default you can remove stuff like reviewing data, personal information or custom XML data.
One difficulty is to do this in an automated fashion, people tend to forget to push the 'Inspect document' button, just as they forgot to remove the comments by hand in the first place. There are various solutions which you can implement inside Word to automate the inspection process. In this article however I will use the power of SharePoint 2007, Workflow Foundation, InfoPath and Office Open XML to automate document inspection and cleansing on the server. Read on to learn how to develop a custom workflow, populate it with data using InfoPath and Forms Services and run it on document libraries containing Office documents. The solution is of course deployable using the new feature and solution framework.
When developing a custom workflow in MOSS two options are available. You can use the SharePoint Designer or Visual Studio, or resort to just using one of the built in workflows packed with MOSS. SharePoint Designer is obviously easier to use, but does not allow the same degree of customization as the Visual Studio developed workflows do. Since inspection of documents works directly on the Open XML markup using the Packaging API Visual Studio is the way to go.
There is also a requirement for users to specify what cleaning actions they want performed on the document. One of the cool things about the workflow architecture in MOSS is that it allows you to use custom ASP.NET or InfoPath forms for the data gathering part. Since ASP.NET makes me write way too much code, I choose to deploy a custom InfoPath form.
Here are the steps required to build the custom workflow. You can also download the source and WSP package.
Step 1: Create a Visual Studio project
Start out with creating a new Visual Studio solution. Choose the 'SharePoint Sequential Workflow Library' as the template. Delete all files in the project except the Workflow1.cs and AssemblyInfo.cs files. The deployment files are empty anyway and will be created manually later. The workflow1.cs is maintained because the file contains correctly setup fields for retrieving the MOSS workflow properties, something you can build by hand as well but this approach is a bit easier. Also make sure to remove the post build actions by going to the project properties and then the 'Build' tab. When inside the project properties dialog also visit the Signing tab and sign the assembly using a new key file. This is required for deploying the workflow assembly in the GAC.
Step 2: Designing the InfoPath form for data gathering
Just like in the Word document inspector a UI is presented where a user can choose what data to remove from the document. As a bonus you can also specify the name to which to set the author. This usually contains the username of the person who initiated the document, not so nice if the document leaves the company. The InfoPath form itself is extremely basic. First of all, open InfoPath and design a blank form. Add a few checkboxes to allow the use to specify the document cleanup actions. You can choose to use some conditional formatting and data validation to enrich the form a little bit.

There are various things you need to alter before the form can be used inside the MOSS workflow. The page will be displayed inline in the MOSS workflow web pages and hence needs to be web viewable. The form also needs to inform SharePoint that it has been filled out successfully. Just for this a new type of data connection is available. The form can submit its data to this connection and it will make sure the data is present later on in code.

- Make sure the form is set for web deployment. Open the Tools->Form Options dialog and next open the Compatibility tab. Make sure the form can be deployed over the web.
- Change the security settings to domain trust. Open the ToolsàForm Options dialog and open the Security tab. Clear the checkbox indicating automated determination of the security level, and override it with domain trust.
- Add a data connection to submit the form. Open ToolsàSubmit Options. Make sure the 'Allow users to submit this form' option is checked. Select the 'Hosting environment' as the target, and add the data connection using the 'Add' button. Uncheck the 'Show success and failure messages' field and set the after submit option to 'Close the form'.
This sets up the form for deployment within MOSS, but first the form needs to be deployed properly. First of all save the form template to disk somewhere, in the My Documents folder for instance. Next publish the form template to the folder where the Visual Studio project is stored. Create a subfolder named 'Forms' to hold all InfoPath form templates belonging to the workflow.
There are two important values which we need to retrieve from the InfoPath form template for later use within the workflow. The form's unique ID value is necessary for attaching the form to the workflow. The form's XML namespace is important if we want to extract data from the filled out form after the workflow is started by a user. You can find the Form ID by opening the form in the InfoPath designer and choosing FileàProperties. The XML namespace can be found by choosing FileàSave as Source Files, and opening the SampleData.xml file which has been saved to disk. It is usually the 'my' namespace prefix which is of interest. Make sure to write both strings down somewhere.
TIP: Run stsadm –o verifyformtemplate –filename <pathToXsn> to verify if your InfoPath form has been developed and published correctly for use within MOSS.
Step 3: Implementing the workflow
Open the Visual Studio project again, and include the Info Path form (not really necessary but just handy).
TIP: If the Forms folder does not appear within the Solution Explorer, select the 'Show All Files' button on the toolbar belonging to the Solution Explorer.
Open the designer for the workflow1.cs file and select the WorkflowActivated shape. Open the property window and locate the Invoked property. Specify a value which looks like 'WorkflowActivated_Invoked' and hit enter to drop into the code file. Two things to do inside this method, retrieving the workflow ID and retrieving the data from the InfoPath form. The workflow ID as well as the data can be retrieved from the workflowProperties field. The XML namespace which was written down in the last step is important here for querying the InfoPath data using System.Xml.
private void WorkflowActivated_Invoked( object sender, ExternalDataEventArgs e) { workflowId = workflowProperties.WorkflowId; try { XmlDocument initiationData = new XmlDocument(); initiationData.LoadXml(workflowProperties.InitiationData);
XmlNamespaceManager mgr = new XmlNamespaceManager(initiationData.NameTable); mgr.AddNamespace("my", _xmlnsInitiationForm);
XmlNode removeCommentsNode = initiationData.SelectSingleNode( "/my:myFields/my:cleanupActions/my:removeComments", mgr); Boolean.TryParse( removeCommentsNode.InnerText, out _removeComments); ... |
Open the designer for workflow1.cs again and add a new Code shape just below the WorkflowActivated shape. Select the Code shape and open the properties window. Locate the Execute property and specify something like 'CleanDocument_Invoked', press enter to implement this method as well. Inside this method the list item which is running the workflow is retrieved from SharePoint, and the document is opened from the list item. Next the Packaging API is used to open the file and perform the clean-up operations selected by the user of the workflow.
private void CleanDocument_Invoked( object sender, EventArgs e) { SPListItem item = workflowProperties.Item; if (item.File != null) { MemoryStream packageStream = new MemoryStream(); packageStream.Write(item.File.OpenBinary(), 0, (int)item.File.TotalLength); using (Package package = Package.Open(packageStream, FileMode.Open, FileAccess.ReadWrite)) { if (_changeAuthor) { ChangeAuthor(package); } if (_removeComments) { RemoveComments(package); } if (_removeRevisions) { RemoveRevisions(package); } package.Flush(); } item.File.SaveBinary(packageStream); item.Update(); } } |
The implementation is current lacking any type checking on the document, which might not contain WordprocessingML at all. You could opt for creating a custom WSS content type defining an empty Word document which has this workflow attached by default. I'll expand the sample to include this in a later blog post.
Step 4: Deploying the workflow
Right now a custom workflow and an InfoPath form have been developed. These two items need to be deployed into SharePoint and the thing to use is obviously the feature framework and solution packages. Start with creating the feature.
The first file to create is an XML file defining the workflow. Add a new XML file to the Visual Studio project. Call the file 'workflow.xml'. This file will define the workflow in SharePoint terms. You need to specify items such as the ID, name and title, but also the pages for viewing the workflow status and initiating the workflow.
TIP: To provide intellisense, open the property window when having the XML file open, open the schemas property and select the wss.xsd file present in the Template\XML folder inside the WSS directory.
Workflow.xml |
<Elements xmlns="http://schemas.microsoft.com/sharepoint/"> <Workflow Id="{BA25BE39-5EB9-4ed8-ABD7-A2F11511DED1}" Name ="WordprocessingML Document Cleanup" Title="WordprocessingML Document Cleanup" Description="Cleans unwanted markup from documents." CodeBesideAssembly ="DocumentCleanupWorkflow, Version=1.0.0.0, Culture=neutral, PublicKeyToken=ce0755a8781f4aa9" CodeBesideClass="DocumentCleanupWorkflow.Workflow1" AssociationUrl="_layouts/CstWrkflIP.aspx" InstantiationUrl="_layouts/IniWrkflIP.aspx" ModificationUrl="_layouts/ModWrkflIP.aspx"> <Categories /> <MetaData> <Association_FormURN> urn:schemas-microsoft-com:office:infopath:Initiation:-myXSD-2007-04-11T07-30-22 </Association_FormURN> <Instantiation_FormURN> urn:schemas-microsoft-com:office:infopath:Initiation:-myXSD-2007-04-11T07-30-22 </Instantiation_FormURN> </MetaData> </Workflow> </Elements> |
The Association, Instantiation and Modification URLs specified within the workflow element point to pre-existing files on disk used for viewing InfoPath forms. Use these when developing data gathering forms using InfoPath. The MetaData element points to the InfoPath form ID which was written down in an earlier step. The same form is specified two times, for association and instantiation. The association form is used by the administrator when associating the workflow with a document library. The data specified will be used as default values for the instantiation form. This instantiation form is shown when a user starts a workflow on an item in the document library. When the workflow is automatically started by adding new items to the list, only the association data is used.
Next up is the feature.xml file. Add another blank XML file to the Visual Studio project. Also attach the WSS XML schema. The feature defines the elements of the feature, which for this feature is only the workflow.xml file created in the previous step. The information specified here will be displayed in the screens where features can be activated (site settings and global admin pages).
Feature.xml |
<Feature xmlns="http://schemas.microsoft.com/sharepoint/" Id="{D487ABD5-8C4E-4174-B209-AF5E6C75F265}" SolutionId="{DBE750ED-8892-4d53-BE15-6C1409FD38CB}" Hidden="false" Description="Cleans documents" Title="Document Cleanup" ReceiverAssembly="Microsoft.Office.Workflow.Feature, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c" ReceiverClass="Microsoft.Office.Workflow.Feature.WorkflowFeatureReceiver" Scope="Site" Version="1.0.0.0"> <ElementManifests> <ElementManifest Location="workflow.xml" /> </ElementManifests> <Properties> <Property Key="GloballyAvailable" Value="true" /> <Property Key="RegisterForms" Value="Forms\*.xsn" /> </Properties> </Feature> |
There are just a few noteworthy items to mention here. First of all the ReceiverAssembly and ReceiverClass are required elements. These are used to register the InfoPath forms and workflow with SharePoint. You need to point to the built-in receiver class which was developed for this purpose. The scope in which you deploy a workflow is always the Site Collection.

Finally create a new solution package for deploying the workflow feature and InfoPath form. To do this you need two files, a manifest.xml defining the solution elements, and a DDF file to package all feature files into a single CAB package. These two files need to go into the root folder of your Visual Studio solution. To place these files in the correct location inside Visual Studio, first add a new 'solution folder' and add the XML and DDF file to that new folder. To create a new solution folder, right click the solution node in the Solution Explorer and choose AddàNew Solution Folder.
TIP: If Visual Studio doesn't show the solution inside the solution explorer, which it doesn't do on single-project solutions, open the ToolsàOptions dialog, select 'projects and solutions' and check the 'Always show solution' field.
Solution.xml |
<Solution xmlns="http://schemas.microsoft.com/sharepoint/" SolutionId="{DBE750ED-8892-4d53-BE15-6C1409FD38CB}"> <FeatureManifests> <FeatureManifest Location="DocumentCleanup\feature.xml/> </FeatureManifests> <Assemblies> <Assembly Location="DocumentCleanup\DocumentCleanupWorkflow.dll" DeploymentTarget="GlobalAssemblyCache" /> </Assemblies> </Solution> |
Solution.ddf |
.OPTION EXPLICIT .Set CabinetNameTemplate=WmlDocumentCleanup.wsp .Set DiskDirectoryTemplate=CDROM .Set CompressionType=MSZIP .Set Cabinet=on .Set DiskDirectory1=WmlDocumentCleanup
manifest.xml
.Set DestinationDir=DocumentCleanup DocumentCleanupWorkflow\Feature.xml DocumentCleanupWorkflow\workflow.xml DocumentCleanupWorkflow\bin\debug\DocumentCleanupWorkflow.dll
.Set DestinationDir=DocumentCleanup\Forms DocumentCleanupWorkflow\Forms\Initiation.xsn |
The cool thing is that you can automatically push the solution to all front-end web servers and do stuff such as registering DLLs or creating Safe Control entries. To build the WSP package run makecab /f solution.ddf. Next use stsadm to install and deploy the solution.
That's about it for the development. Go ahead and give it a try. You can download the source project on the downloads page.
Hope it helps!