Recently there has been a small war going on against Office Open XML. For those of you who are unaware, let me lay it down a bit. Right now there are two new major file formats for storing your office documents, Office Open XML and Open Document Format. Both try to achieve the same goal. Provide an open, standardized file format for storing office documents. Both have been initiated by two big companies.
The reason for these file formats can be explained in a variety of ways. My view is that we all, governments, businesses and developers alike, want insurance that our documents can be used and read many years from now using many different products. The ability to integrate document based operations into applications such as the creation or manipulation of documents is becoming more important as people get accustomed to more feature full applications. Now if I was able to steer a big company who owns some particular office suite, I might be enticed to create such a nifty file format to be able to provide the requested service for my customers. End result is that Sun is doing ODF, and Microsoft started OOX. ODF is an ISO standard, OOX is standardized at the ECMA. The Office Open XML standard is going through the ISO fast-track process right now, which is also the reason of the current mud throwing match against Open XML. People can raise objections against OOX to stop the fast-track and initiate a regular procedure which might take a few years. In my opinion this is the reason some people who are strong open source / ODF supporters are blogging about ‘what is wrong with Open XML…’. They try to raise objections on OOX. If successful, OOX is short-tracked, which is pretty important for ODF. Right now governments are calling out for use of open(source) software in their departments. If OOX takes a few years to become a real standard, ODF stands a good chance of becoming used more as the default format. Sun and IBM sell more software / support and make more money. Pretty darn simple in my opinion.
Now the sad thing about all of this is that we who develop applications in the real world will be non-the-better. The Microsoft office suite is being used world-wide in huge numbers. It is not for nothing that the ‘doc’ format is, or has been, the standard to use. We can expect to see a lot of Open XML coming in the near future as people will start stepping to the new and in my opinion better, technology which OOX presents. It would be a great thing for me as a developer if this Open XML is in the hands of ISO, as this will provide me more security for the future, interoperability, and the possibility to make the apps my customers demand.
I am all for a bit of competition, but some of the things I have been reading are under the belt or plain not true. Even more funny for me to read is how my friend Doug is the recent target of it. He is not what I consider an evil man working for an evil company (if you can see MS as one entity, as most people seem to do). So let me give you my view on some of the comments being sent out on the web at the moment.
Open XML contains MS specific markup to maintain backwards compatibility
This one is mentioned on various blogs. If you read the spec carefully, or do a search for ‘Word’, or ‘Excel’ or something you can find elements such as ‘footnoteLayoutLikeWW8’, which indicates to emulate some previous Office version layout algorithm. This is of course a big slap in the face for the open source community. How can one expect to implement a standard if it refers to closed source properties? Personally I do not see this as a major issue for interoperability. The OOX spec mentions that you are not required to implement all of these behaviors and you can default to something which you can understand. These features are there to support conversion of the current major file format to OOX, and new documents will not use them. I think that there will not be many documents relying on these features, they are there as a backwards compatibility mode.
For those folks who are big ODF supporters and think this is an area where ODF is better then OpenXML, I suggest you dig a little deeper. ODF avoided these types of complaints by simply sweeping them under the rug. They don’t document a single application level setting; even applications that support ODF have a large number of them. ODF has a simple extensibility mechanism for storing document settings, but this makes interoperability impossible unless you go beyond the ODF spec and read about what all the applications are doing separately. Let me give you a little sample on this. Because I personally believe OOX manages a better job here.
For example, ODF documents written out by OpenOffice will have the following element:
<config:config-item config:name="DoNotJustifyLinesWithManualBreak" config:type="boolean">false</config:config-item>
The ODF spec does mention the existence of this “config-item” tag, with the all important “name” attribute, and don’t forget your “type” attribute. But nowhere in all of the generous 737 pages of the ODF spec does it mention that the “name” attribute can contain something called “DoNotJustifyLinesWithManualBreak”, which seems like pretty important layout info to me. Funny that people say OOX is over specified, at least it tells me what to expect. This means that if I want to implement ODF according to the spec, I will not be able to read ODF files saved from OpenOffice properly. I will have all of these justification errors, not truly the same document. Now I have to go and find out what all of this means by examining each ODF application I want to interoperate with. Bye bye interoperability!
Take a look yourself. There are tons of these tags that aren’t documented in the spec. So which approach is better?
Just don’t document anything, create all these abstract elements and leave it up to each individual developer to figure out what each application is doing. You can never just use the standard by itself, but will instead need to look elsewhere as well.
Document everything that you know is currently used today. Maybe you don’t provide the exact layout algorithms behind each setting, but at least it has been classified as such. So if someone wants to specify that footnotes should behave like they did in a previously available application, they can do it. For everyone else that doesn’t care, they just ignore it.
One thing I think they could improve is that it should be clearly mentioned in the spec that those features are for compatibility reasons with prior versions of Microsoft Office. I think that it would not degrade the usability of the standard in any way, as MS Office is the most widespread Office suite in use for a long time. It makes me think of the story of American railroads. They had to standardize between two sets of distances between the rails. The distance was standardized on the one used most.
With Open XML year 1900 is a leap year
Back in the days, Lotus 1-2-3 contained a bug which made the year 1900 a leap year. This is of course very unfortunate as all calculations can be wrong if you do not take this information into account. It is even more unfortunate that Excel started to use the same wrong value for 1900 to be interoperable (LOL). Now Open XML uses the same. I think this one is a bit stupid. Microsoft could have implemented the correct scheme in Open XML, and have some converter in its office suite. The difficulty here is again the previous format and its widespread use. It would have involved directly modifying each and every formula in worksheets as well. It’s hard to predict what people are going to do with spreadsheet formulas, and the Excel guys actually tried to do this with the first version of SpreadsheetML (in Excel 2002). From what I heard, it didn’t work all too well, and often times the users formula would start returning the wrong value. So the choice for better interoperability with the world biggest format was chosen here. All in all it does suck a bit for new applications.
OOX does not follow ISO 639 "Codes for the Representation of Names and Languages."
Ehr, yes it does! The simple type everyone is referencing is never used directly in the spec. Instead it is always used in union with a string simple type. And guess what the spec says the value of that string should be… The ISO standard for languages of course! The other simple type is just there as an alternative, but the primary method is to use the ISO standard.
Similarly, 6.2.3.17 "Embedded Object Alternate Image Requests Types (page 5679) and section 6.4.3.1 "Clipboard Format Types" (page 5738) refer back to Windows Metafiles or Enhanced Metafiles
Either someone misread the OOX specs here, or they are just trying to throw in another ‘contradiction’ in there to try and foil the evil empire. OpenOffice uses Metafiles for embedded objects, so I doubt anyone out there would really think it’s a crime to do so. The Open XML spec does not contain any requirement on EMF or WMF. The section people are talking about (6.2.3.17) can use any format. The allowed values for clipboard format types are "Bitmap", "Pict", "PictOld", "PictPrint", and "PictScreen". The spec then gives you some potential formats based on those values, saying that "Pict" and "PictOld" could be mapped to the WMF and EMF formats. Though the spec could be more informative of this though.
Doug is evil
Well this one is actually true though. Doug is evil. He sits there with his big eyes, in his lair, or should I say cave. He lives there, and guards his precious Open XML there. Recently Doug and I have been discussing our evilness and intentions of hurting ODF people world-wide in defense of our precious. There are a lot of un-true things being said about Open XML, especially now with the ISO process. One such place which contains such fallacies is Wikipedia. Recently Doug asked Rick Jelliffe to update the contents of Wikipedia. Not necessarily in the favor of Microsoft, but in the way he sees fit himself. He even mentioned making this public so people won’t be fooled. Guess what? He was quickly stricken by the open source claw. Microsoft is evil, Doug is evil, and all kinds of MS bashing was done. Funny as Doug is the absolute most perfectly unequivocally non-evil guy I know. I did enjoy creating the picture of him though.
So all in all I think that the slander should stop about now. I personally see a sign of weakness in the range of posts people put out, and it does not pose any real threat to Office Open XML if people just dig a little deeper.
[edit/]