blog community

Welcome to blog community Sign in | Join | Help
in Search

Wouter van Vugt

This blog is no longer maintained and has moved

Questions on Open XML

For those of you not only reading my blog, but also the other content provided by the Info Support blog community you probably have come across the great SharePoint content being created by my colleague Bart Gunneman. He is very knowledgeable on this and other Microsoft server technologies. While we focus on other things we both seem to have a similar interest. Two days ago I received an email with some questions regarding Open XML. Bart is just stepping in to this new document format and had some rather sharp questions based on what he was reading on the web. Because I imagine many of you are sitting out there with similar questions I decided to translate his email and answer these questions online and available to all who are interested in Open XML and the war for 'openness' being waged by IBM.

You will find the questions in italics.

Wouter,

help me out here J

I was reading up on the feedback received on the Open XML ISO process, and besides the fact that I am running into that Package Explorer of yours on the official WIKI page, the feedback that I see leaves me with some questions….

I am reading the following factsheet: http://www.odfalliance.org/resources/OfficeOpenXMLFactSheet.pdf

1. They talk about 6000 pages being too much information to handle during a Fast Track process. It would not be feasible to form an opinion in the 9 to 12 month period for a spec this size. I can see the point, but do you know whether the 6000 pages forms a large spec compared to others which underwent the Fast Track? And isn't that very belittling? If it was really a problem, shouldn't the ISO board have put a limitation on the size of specs that go through a Fast Track?

[Wouter]I do not know whether this is large compared to other specs and I do feel that if this would have been a problem than ISO should have signaled this earlier.

The remark that I have real difficulty with is the statement that it would not be possible to review the spec in this time. Given the amount of feedback received from specific people I think you can easily refute this. The spec has been given a thorough review and is getting better because of it. In my opinion the Fast Track is going great and should result in a better Open XML. I think that in part Open XML has the community to thank for the amount of time they have put in to review it. There are many issues identified and known to the ISO, which allows the spec to move forward in interoperability and adoption.

One thing I am very much against is the notion that these issues are a breaking point for the Fast Track. On a spec that is this feature-full, a thousand issues is not a big thing. Many can be fixed by a quick edit, repairing a type-o or some unclear explanation. Some of the more in-depth technical issues are there as well, but these are very much less common, also fixable during the ballot resolution and during later maintenance. Brian Jones has a nice explanation of the process, and I quote:

  • Purpose of Fast-Track – created by ISO as a way of allowing standards organizations to submit one of their existing standards for approval as an ISO standard. This can only happen with something that has already been reviewed and approved as a standard by that organization. When the organization decides to submit the standard, it starts in ISO/IEC at the DIS stage (Draft International Standard).  Ecma's Office Open XML formats are now officially DIS 29500.

There will always be issues to fix or features to add to become a better standard (just as ODF is now adding formula support to their spreadsheets, moving ODF into the direction of enterprise-readiness).

2. "One reason for the length and complexity of OOXML is its failure to reuse existing standards. "

and then some jabbering about the perceived fact that there are other standards that solve these problems, and that it would make it inherently more difficult for developers if there is another one. Isn't this nonsense? Since when is 'developer adaptation' an argument to receive ISO accreditation?

[Wouter]Yep, I agree, total BS. I hope I can speak for the generic 'developer' since I like to feel that I belong to that crowd. I have no issue with any of this. You buy a library that implements Open XML, ODF, PDF, HTML, or whatever. My 'generic developer' company is not going to lose focus of its core business. Personally I do not care about the internals of any of this, and I would use libraries to shield me from everything. What developer gets into work thinking 'Great, today I get to find out how to make a paragraph bold in my document'? These are the things that make you lose focus of your core business, so instead of finding out you use some library and be done with it. Of course developers do like to find out about this stuff, and do not mind reading a well-constructed spec.

Another not e is that many of these 'standards' that are being mentioned are not standards in the way that you're reading it. Instead these are W3C 'recommendations', which many other technologies do not refer to as well. A common example is PDF-A, which I believe defines its own language for vector graphics, not using SVG. Just as DrawingML is there in Open XML to provide a value which is greater than the subset of it called SVG. Why would one standard be allowed to use their own format, and another isn't?

You will find another great example in the use of ISO dates. Many people are mad that Open XML uses a few date formats that are perceived to be incompatible with ISO 8601 for dates. Most of the people who feel this probably do not have a clue what an ISO 8601 date is all about. Rick had a nice post on it a while ago, read that and be amazed at the level of 'objections' on Open XML. Also the use of the well-known date bug where a date is off by one on a small subset of dates that can be expressed in the format is often mentioned (note that this bug is turned off by default). Funny that this is there to prevent breaking the conversion of existing Microsoft Office documents, which also had the same bug to prevent breaking the conversion from spreadsheet created by IBM's Lotus product, the same IBM who now objects to this (perhaps they should've created a better product back then). Imagine what this would do to documents containing formulas when the WEEKDAY function starts returning different values then before.

If you were to think that Open XML does not use existing standards you would be mistaken. There are many in use, but those unsuited were left out of the game. One funny note is that ODF has a built-in dependency on Java.

"http://fsfeurope.org/documents/msooxml-questions.en.html"

3. Aren't all these questions answered by now? They were not unfamiliar to me, but that last page changed on the 26th of June, that's three weeks ago.

[Wouter]Yes, these questions are answered.

  1. Application Independence: Open XML is independent of any Microsoft product. Without referring to any part of the spec where this would not be the case it is hard to discuss this point. The 'wordspacinglikeitwasinawordversionfromyearsago' is not a good point. It is not normative and there to support backward compatibility. Take a look what OpenOffice and KOffice do to the generic 'config-item' container to support similar notions. Leaving the question what is better, leave it in, or remove it.  Given the goal of Open XML to clearly support backwards compat, as a developer I'd rather read about these settings in a spec than having to go into documents created by other office suites to find it out.
  2. Support other standards: Just discussed this, it is untrue and unnecessary. The notion of it being hard to implement is also nonsense. Who cares? Let's not do it because it is hard? Sounds like someone wants to receive the gold without putting in the work.
  3. Backwards compatibility for all vendors: The question asked if anyone can build a converter tool from the old Microsoft binaries to Open XML. Well, duh! Of course not, and this is way way way outside of the scope of Open XML. The point is that the spec supports features found in the predominate Office Productivity Suite in the world, which has a bazillion documents out there that it wants to be able to maintain and convert because it feels their customers want and need this (and demand it as well). PS, this is also the very same Office Productivity Suite that forms the basis of the Open XML markup specification, just like there is an office productivity suite that has its own markup language called ODF. Both based on APPLICATION FEATURES!
  4. Proprietary extensions: Asking whether Open XML allows proprietary extensions. Well, yeah, just as ODF. Is the Microsoft implementation faithful? Well, I hope so for their business. If this is what government and business demand, do you think Microsoft would just differ from the spec? Not good for business I tell you! Are there safeguards against MS extending Open XML? Well, of course not, this is not a matter for the spec. Should we safeguard against IBM as well?
    And PS, who cares what Microsoft does. Microsoft brings an implementation of Open XML, the standard itself is an entirely different thing.
  5. Dual standards: There we go again. Open XML serves a different need, that of the enterprise. Can you imagine leaving out support for Accessibility or spreadsheet formulas from the specification? This would result in an entirely different goal, which is not 'serving the enterprise'. The goal of ODF to not become an enterprise markup language is clear from their feel towards including spreadsheet formulas in the spec. I can look up the email discussion which details this. Or perhaps Brian / Doug can find it for me, it's on one of their blogs somewhere.
  6. Legally safe: Done, discussed, and found to be in the same league with claims on ODF.

So there you have it, questions answered.

"http://www.robweir.com/blog/2007/07/ooxml-fails-to-gain-approval-in-us.html" (yep, there it is again J )

4. Two remarks:

  • Ok, Rob is pretty clearly against Open XML, his whole blog emanates this. I found it pretty funny to see that in his little graph 'NO' is being depicted as the positive value, and 'YES' as the negative one. Had you noticed?
  • His statement, that it is weird for sixteen committee members to join over the last month who will all vote 'YES with comments' triggers the annoyance-nerve with me. He has just about the entire ODF and open source community on his side… If he had seen this coming already a month ago it couldn't have been too hard to add some 'ODF' people as well? Sounds like he has been watching a cow sitting on a railroad track for hours and is now mad that it got flattened J

[Wouter]On the first one, no I hadn't, funny thing! And Yes! Rob does sound like he has been watching a cow sit on the railroad track. I think that his little stories are especially belittling to the community and their technical level.

So, a reaction to the above please J

[Wouter]Here you go. I hope it helps!

Published Sunday, July 22, 2007 1:50 PM by wouterv
Filed under: ,

Comments

 

Bart Gunneman said:

thanks for your time, Wouter :)
July 22, 2007 3:54 PM
 

Doug Mahugh said:

I think you've answered Bart's questions very well, but here are answers to a few more he might wonder about ...

One question, which Bart didn't ask but is sort of implied in his thoughts, is this one: how does the size of the Open XML spec compare to the size of the ODF spec? Both standards were created by other organizations (Ecma, OASIS) and then submitted to ISO/IEC for an accelerated standards process there (6 months in each case, Fast-track for Open XML and PAS for ODF). And many of the people who are criticizing the 6000-page spec like to point out that ODF has a 700-page spec, so one might wonder how they compare.

I think the apples-to-apples comparison between these two specs requires that you include all of the documentation that is required to actually a typical document, which may includes charts, other graphics, equations, and so on. So the comparison of 700 pages and 6000 pages is meaningless, because that 700 pages makes reference to other standards such as SVG, MathML, and so on, while the 6000 pages includes those topics (2D vector graphics, equations) within it. The 6000 pages also incudes things like spreadsheet formulas, which aren't discussed in the 700 pages.

So if you compare the full size of everything you need in each case, I think you're at a few thousand pages or so, regardless of which format you're looking at.

Another question that has come up quite a bit lately is the difference between "Approval with comments" and "Disapproval with comments," the two positions many countries are considering for the upcoming ISO/IEC vote on September 2. Some of the anti-Open XML lobbyists are trying to convince people that "Approval with comments" is not appropriate unless the comments are trivial typos and grammatical errors. Otherwise, they claim, countries must vote "Disapproval with comments."

This claim is not supported by the facts, or the history of the ISO standards process. The JTC 1 directives clearly state that "Approval with comments" can include "editorial or other comments," but some folks try to claim that that means "editorial comments ONLY," which is an interesting interpretation to say the least. And the ODF spec, ISO/IEC 26300, was approved with significant technical comments by many countries, including the UK, so there's clear precedent for "Approval with comments" including a variety of comments.

So why are these people lobbying for "Disapproval with comments"? Because that gives them the option, later in the BRM (ballot resolution meeting), of arguing that a single comment wasn't addressed thoroughly, and therefore the countries that voted "Disapproval with comments" should not approve the standard. And when you consider that these are the same people who have written thousands of words about things like how dates are handled in January/February of 1900 (an "issue" that applies to no other dates before or after those two months), I think you can see why they'd like to have that option!

One final thought ... people like Wouter spend their time helping developers deliver results, and demonstrating how to actually do something constructive. They post code samples, contribute utilities and libraries to the community, and answer specific how-to questions from developers. Many others involved in this debate, however, spend all of their time and energy complaining, critizing, and tearing others down, without ever contributing anything positive or results-oriented to the discussion. That's a distinction worth noting as you look around at who's saying what in the Open XML debate.

July 22, 2007 7:29 PM
 

Bart Gunneman said:

Thank for your time as well, Dough :) The size of the ODF spec was on my mind when I wrote my question to Wouter, but I did wonder about in a general-sense though. But I really appreciate the time you both took in answering these questions. Oh, and I _TOTALLY_ agree with your last point, that's why I turned to Wouter in the first place :)
July 22, 2007 10:43 PM
 

Ben Langhinrichs said:

I am a software developer working with both IBM and Microsoft technologies and customers. I have been doing a lot of work with both ODF and OOXML, and while I tend to favor ODF for a number of reasons, I am very interested in OOXML being as usuable as possible... because I am going to have to use it.

Anyway, I have a few comments on Wouter's answers, some of which seem reasonable and some of which seem quite disinegnuous.

1) This answer seems weak to me. A spec which is much longer than most specs (perhaps all, but I am not an expert) approved by ISO generates lots of feedback, so that proves the whole thing has been carefully reviewed? That doesn't make any sense. The amount of feedback is hardly evidence of a complete review, just evidence that issues have been found and raised, which could happen on the first 100 pages of a spec.
2) This seems wrong on two counts. The first is that the 'generic developer' is going to have trouble finding a library which supports OOXML to any degree. I know, because I am one of those trying to develop libraries supporting both ODF and OOXML, and the difference is extreme. The second is that the reuse of standards allows the library developer to incorporate all the previous use of previous library developers, not reinvent everything from scratch. The non-use of other standards is a huge problem, and deserves not to be glossed over.

Incidentally, where is the dependence of ODF on Java? I can't find it anywhere, and I have implemented a very successful library for ODF without any use of Java at all (it is all C/C++ based).

And although it is a minor point, Wouter must know perfectly well that Lotus 123 had the date bug a long, long time before IBM bought Lotus, so it is hardly fair to complain that IBM should have done it right back then. That is just IBM bashing of the worst kind.

3) There is a mix of good and bad. For example, I think reasonable people could disagree on the Application Dependence, so he makes that point well. I have already touched on my feelings about reliance on other standards. Wouter is quite correct about the backwards compatibility. He is technically correct about the proprietary standards, but glosses over how easily Microsoft could trump its own standard due to sheer dominance of its market. Wouter is partially right about the Dual Standards part, but only partially. The two standards are clearly designed for the same market and purpose, and ISO tradition is that you don't allow that, even if one happens to implement something slightly better than the other. I don't know enough about the patent issue to discuss it.

4) Rob is pro-ODF and anti-OOXML, without any doubt. So what? Brian Jones is pro-OOXML and anti-ODF, and I still read his blog too. Both can make some sensible arguments and both can get carried away, but both are valuable advocates for their positions.

Regardless, the second statement is completely disingenuous. Since the pro-ODF people chose not to try to stack a committee, they were stupid rather than ethical??? I'm sorry, but do you really want to be in on a race to the moral bottom here?

July 23, 2007 4:17 AM
 

Doug Mahugh said:

We're having TechReady here in Seattle this week, which is a great chance to catch up with colleagues
July 26, 2007 5:14 PM
 

Andre said:

a) "I do not know whether this is large compared to other specs and I do feel that if this would have been a problem than ISO should have signaled this earlier." As far as I can see that is very large for ISO fast-track. Many standard bodies were provoked by the size of the documents. b) patent contamination Why doesn't Microsoft sign the same patent agreement as SUN? c) Double Standard When you have a good international standard why add a second one for the very same purpose?
August 8, 2007 3:28 PM
 

wouterv said:

Andre,

if you add up the W3C recommendations that are used by ODF you get to a similar size, all in all still the same to review if someone wants to review the 'spec' for their document format. Just that they are in another document doesn't decrease the total size.

Microsoft didn't sign the same patent agreement, but a very similar one. Guess they like to have their legal matters defined by themselves which I can understand.

And on the last note. I do not feel that Open XML and ODF are the same merely because they are both document formats. They address different goals. Open XML targets the enterprise while ODF in effect doesn't given the glaring omissions from the specification.

Wouter

August 8, 2007 3:45 PM
 

Mitch 74 said:

May I speak up a bit?

- size: it was said that 6000 pages isn't too much since Microsoft redefined all the formats in the specs. As said, a new maths language, VML, and some other stuff. As said, too, VML is DEPRECATED, and SVG can do the same for better. Conterting VML to SVG is, nowadays, trivial. This is worse than reinventing the wheel, it's reinventing the air chamber at the time of steel-belted radials.

- redundancy: why reimplement VML? Same thing for maths, why use a new format when there is already MathML that can handle all of Microsoft's stuff? And the excuse 'because some details may be missed in conversion' doesn't hold water: it's trivial to update existing standards with fixes than to create a new standard from scratch. I think ODF proved it.

- redundancy (2): why use completely different formats for word processing, presentation and spreadsheets, when it is visibly possible to merge these into a single format? Why use different XML tag names that do the same thing across all formats? Consistency would require similar tags to have similar names (and not single-letter ones, either).

- enterprise readiness: charts formula. Well, no two spreadsheets use the same language, and the leading one (Excel) had glaring inconsistencies with mathematical errors (on operation priorities, no less!). It required a standard definition, too. ODF made one which corrected most of these errors, Microsoft didn't (negative ceilings are incorrectly implemented in OOXML)

- enterprise readiness (2): macros. OOXML includes parts of Visual Basic for Applications - but it's incomplete. Moreover, the situation is unclear: why is Office 2008/Mac not supporting VBA, then (it will use AppleScript) ? It must mean that MSO2008 doesn't support OOXML. However, MS claims it does. If this is true, then OOXML doesn't define a macro language to be used with it. As such, ODF's stance to not define a macro language finds an echo at Microsoft's. Inconsistent speech?

- compatibility: why would one want to store 150 fixed styles of borders instead of describing borders, allowing not only these 150 borders to be retrieved and stored, but also user-defined borders to be set up?

- compatibility (2): how comes attempts at retrieving archived Word 97 documents worked better when translated into ODF files than with Office 2007? ODF is supposed to not be compatible with legacy MS documents while OOXML should be, and the only known implementation of OOXML hangs on it - while one of the implementations of ODF does it quite well?

- compatibility (3): why define hard-set tags prototypes (LineWrapLikeOffice97) instead of describing them, so as to be able to store older settings in newer measures?

- compatibility (4): why use 4 different dates formatting conventions in the same format? Now, if OOXML can fix those points before becoming an ISO standard, it would be gladly accepted. But then, what relevance will it maintain versus ODF, which is already quite able to store most (if not all) past Word and Wordperfect, Lotus and dBase oddities?

August 22, 2007 4:51 PM
 

wouterv said:

Hi Mitch,

first of all, yes you may. I've even formatted your comment since community server removes carriage returns. Here are some of my thoughts.

Size: VML is not a replacement for SVG, this is a part of history. You should compare SVG and DrawingML. DrawingML is a superset of SVG as far as I am aware. VML is part of the spec not because of Microsoft, but the businesses relying on legacy graphics thought it would be a good idea to have it in there with the rest of 'the stuff' I agree that it might be better to move it to an non-normative annex (it isn't normative now either btw)

- Redundancy: Dunno

- Redundancy: This is because of their different goals. For instance, text in a presentation should allow cool effects such as reflection, while this is rather useless for a text document. SpreadsheetML doesn't have a first class notion of text, it uses cells. Hence different models. Next SpreadsheetML is optimized for loading speed, hence a smaller tag name without namespace prefix.

- Enterprise readiness: I believe this will be fixed

- Enterprise readiness: Nope, Open XML does not include any VBA. It is not part of the spec in any way shape or form. So the rest of your points are left dangling.

- Compatibility: Yes, it sucks. We addressed this in the Netherlands as well. This is part of the backwards-compat for the way Office works. It should include some extensibility mechanism. Will be addressed.

- Compatibility: Any factual backing? What I think is funny is that all the ODF apps use extensions for the ODF parts that are left undefined. Only Microsoft Office + converter will do pure ISO ODF. (and no, I have the same amount of factual backing at hand for this claim you have for this comment :) )

- Compatibility: These have been added not by Microsoft but by ECMA to better support migrating existing documents. I believe they should be moved to an 'legacy' container in the markup that can also be used by other apps for their legacy support issues. ODF saved from OpenOffice has an enormous amount of these settings as well (not even their names defined, Open XML wins here)

- Compatibility: I know of two formats for SpreadsheetML, one for legacy errors and the other default. WHat sucks is that dates before 1900 are impossbible. Now that is something that should be fixed.

And ps, ODF is not able to store a spreadsheet in any way, it knows not of spreadsheet formulas, and lacks support in other areas as well. There is goverment legislation that conversion needs to be 100%, hence ODF is useless as a document format in its current ISO form. One could even reason that it is quite useless overall since all applications are making a mess of it. Furthermore, I feel that Open XML is far better structured, has a better structure for the ZIP, allows greater extensibility, has defined more and in my opinion is better overall than ODF (which might be why everyone is criticising Open XML)

Thanks for your comments Mitch and feel free to reply to my statements. As long as we keep it technical and not emotional / political (I am prone to blurt out things), I am more than willing to post your comments.

August 22, 2007 7:22 PM
Anonymous comments are disabled

This Blog

Syndication

News


Add to Technorati Favorites
Powered by Community Server, by Telligent Systems