-
Custom XML and the demise of Office 2007 as we know it
Posted on December 23rd, 2009 at 12:37 7 commentsThere’s a lot of misinformation about this in the press, so let’s start with the basics.
You know about markup languages, yes? In its most basic form, a markup language lets you turn plain text into fancy text. For example, if you want the word Mxyzptlk to appear in bold italic, a markup language might understand something like:
< bold > < italic > Mxyzptlk < /italic > < /bold >
and display Mxyzptlk the way you want. Those thingies inside the < brackets > are called tags.
(If you’re an old WordPerfect user, you might remember a feature called “Reveal Codes.” In many ways, WordPerfect’s reveal codes are just a particular kind of markup language. When Microsoft introduced Word 1.0, it determined that Reveal Codes were harmful and hateful and fattening, and banished them from Word. Much wailing and gnashing of teeth emanated from the WordPerfect camp. But the worm has turned.)
XML, Microsoft’s eXtensible Markup Language, goes one step further and lets you define your own tags. So for example, you could create a formulation like this:
< bit > blah < /bit >== < bold > < italic > blah blah < /italic > </bold >
and the new tag < bit > suddenly takes on meaning.
In Office 2007, Microsoft introduced a new set of file formats based on XML. The .docx, .docm, .xlsx, .pptx and other formats you’ve probably sworn at, embody Microsoft’s attempt to move from a document file format that absolutely nobody could understand, to one that’s at least somewhat less inscrutable. If you crack open an Office XML file, you find that – to a first approximation anyway, and with a few if’s and but’s – it consists of a bunch of zipped text files, and a little bit of glue that holds the zipped text together. If you save a PowerPoint presentation in .pptx format, for example, each slide becomes its own zipped text file inside the pptx file.
With me so far?
Now for Custom XML. You can create your own, custom XML tags and stick them inside one of the new Office 2007 files. Not many people have the insane desire to write custom XML, but programmers (who may or may not have insane desires) use them on occasion. One example that Microsoft gives is for PowerPoint: if your company has a gazillion PowerPoint slides, you could write a program that scans the slides and sticks data inside custom XML tags that describes the slides. The data would be stored in the .pptx file, so it travels wherever the slides go. You could then write another program that asks a lowly human for his or her preferences, then scans all the slides in a particular slide dump, and assembles a new presentation based on whatever criteria the human had the temerity to give. The Really Neat Thing about PowerPoint Custom XML tags is that the data can be associated with a specific slide: the Custom XML contents get stored in a zipped file inside the pptx file, but the glue that holds the presentation together creates links between the Custom XML zipped file and the zipped file that holds the individual slide. Thus, the programer can reach into the presentation and gather slides like daisies in May and – this part is important – the program never has to use PowerPoint itself. The bloat and overhead that comes with dealing with PowerPoint never rears its ugly head.
So now you understand why Custom XML can be important, especially in big companies, and why mere mortals rarely use it. You can probably also see that there has to be a way for the glue inside the pptx file to bring together the file itself and the Custom XML data.
Back to the headlines. Back in June, 1994 (!), a little company in Toronto, Ontario (in Canada, eh?) applied for a US patent on a specific method for making the glue that binds parts of the documents and add-on files. Ends up that the method they invented is very close to the way Microsoft uses to bind pieces of Office 2007 documents and their embedded Custom XML zipped files. On May 20, a federal jury in Tyler, Texas, found Microsoft guilty of violating the i4i patent, and order Microsoft to pay i4i $200 million. Microsoft appealed. On August 11, Judge Leonard (no relation) Davis, citing Microsoft’s lawyers’ hijinx, slapped another $40 million onto the judgment for willful infringement, and cited $37 million in pre-judgment interest. Microsoft appealed, and lost its appeal yesterday.
Microsoft’s press release gives a very succinct and (far as I can tell) accurate assessment of the situation:
This injunction applies only to copies of Microsoft Word 2007 and Microsoft Office 2007 sold in the U.S. on or after the injunction date of January 11, 2010. Copies of these products sold before this date are not affected.
I’ve been searching up and down, and can’t find out why the injunction specifically applies to Word 2007, without also bringing down the wrath of the Court on Excel 2007 and PowerPoint 2007. My conjecture – and it’s only a conjecture – is that the case was so difficult, technically, that the i4i attorneys didn’t try to cloud the issue with the other products.
Microsoft’s been preparing for this eventuality for a long time. For example, companies that put together PCs with Office 2007 pre-installed have been installing versions of Office 2007 without Custom XML since October, per this advisory. (Thanks, Susan!) As of a couple of minutes ago, I can’t get through to that page. It’s possible that Microsoft took it down. If you can’t get to it either, here’s what it says:
Microsoft has released a supplement for Office 2007 (October 2009). The following patch is *required* for the United States. /The patch will work with all Office 2007 languages/.
After this patch is installed, Word will no longer read the Custom XML elements contained within DOCX, DOCM, or XML files. These files will continue to open, but any Custom XML elements will be removed. The ability to handle custom XML markup is typically used in association with automated server based processing of Word documents. Custom XML is not typically used by most end users of Word.
Note that this patch is only for OEMs – the companies that put together new PCs. It doesn’t affect any customers, like you and me.
Several of you have asked what I think will happen next. Obviously, Microsoft’s attorneys are burning the midnight oil, trying to reverse the Federal Circuit Court of Appeals decision – but at this point their options are very limited: get the Fed Court of Appeals to re-hear the case seems very unlikely, and the Supreme Court looks to me like an even longer shot.
Will i4i go back and try to get damages for copies of Word and Office sold prior to January 11? Hell, if I was in their shoes, I would try. Apparently the Custom XML Schema technology in Word 2003 may infringe on the patent, as well. And if Word 2007’s a dirty patent-buster, Excel 2007 and PowerPoint 2007 must be in the same pigpen.
I think it’s highly unlikely that Microsoft will cut a deal with i4i – which they obviously should’ve done from the get-go. I also don’t think that the Redmondians will have a sudden change of heart, decide that they shouldn’t have violated the patent in the first place, apologize, and compensate i4i. Naw. Never happen. Too many Microsoft lawyers making too much money off this one.
Funny. Sometimes the American legal system actually works.
7 responses to “Custom XML and the demise of Office 2007 as we know it”
-
rc primak December 23rd, 2009 at 15:57
Woody —
1) Is this the same case in which an injunction was issued, forbidding all sales of Word 2007 going forward (since vacated by the Courts), or was that part of a different patent case? Seems Microsoft should have read the handwriting on the wall as of that moment, eh?
2) I presume that OpenOffice.org does not use the type of XML meta-data which Microsoft uses. So are we who use OpenOffice in the clear on this point?
3) Chalk up another Microsoft boondoggle, making their products so bloated (er, feature-rich) and Byzantine as to invite this kind of patent-sniping. While I don’t usually advocate for businesses to develop their own in-house solutions, it appears that this type of programming might have best been left to third-party or in-house programmers, thus saving Microsoft from all of this litigation in the first place. But that is not the Microsoft way now, is it? Never has been, in my twenty or more years of using Microsoft products, going back to Office 95 in the Windows 95 OS.
-
RC -
I think it’s part of a different case. At least as of five minutes ago, the injunction was not vacated.
From what I’ve read, OpenOffice.org DOES, indeed, use similar technology, as do several other companies. i4i has fired a shot across the bow of several of them. Whether i4i has the grace and common sense to grant OO a free, permanent license remains to be seen. I consider it something of a litmus test.
-
Mark C December 23rd, 2009 at 22:17
woody
Here are links to Aug. 8, 2009 original injunction which affects word 2003 and word 2007. Only two pages and to the point.In Google search of “Microsoft injunction” you get
Microsoft Injunction
Aug 11, 2009 … Case 6:07-cv-00113-LEDDocument 413Filed 08/11/2009Page 1 of 2IN THE UNITED STATES DISTRICT COURT FOR THE EASTERN DISTRICT OF TEXAS TYLER …
http://www.docstoc.com/docs/9698189/Microsoft-Injunction – CachedWhich leads you to :
-
Mark C December 23rd, 2009 at 22:30
Woody
Link to Dec. 22, 2009 Court of Appeals full decision (pdf 49 pages and loads very fast) -
So how does this impact the Word 2007 Content Control binding feature?
You can store a custom XML file inside of the DOCX package, then insert a Content Control into the Word doc that points at an element inside of the custom XML file.
Is this impacted by the patent? Or it only when custom XML is stored along with the OOXML (like in your example above)?
-
Brian -
I don’t know. More to the point, I don’t know if MS is going to do away with Content Controls in Word 2007. There’s precious little info about the whole thing online…
-
Our business analysts are deadset on using MS Word to define the layouts of data collection UIs that are much like a survey forms. They are simple from a functionality standpoint, but very large and might include lots of text formatting. We were hoping to program something that would churn through these word documents and do some sort of code generation for the layout, and then we could wire up the textboxes/radio buttons/etc.
My hope was that I could actually imbed custom tags into the word doc itself to designate data entry controls so that if we later have to scrape the word document again due to some changes in text, then the data entry controls can also be generated from the custom tags. So I don’t need Word to support the tags, but if it’s going to remove the tags, then it’s a big problem and defeats the purpose. So if I understand correctly, what I want to do is no longer possible because my business analysts(the people editing the word doc that defines the layout of the form) might have the “crippled” version of word?
Leave a reply
-


