Office Open XML v. Acrobat PDF and Mars
New file formats are appearing in your most frequently used programs. Discover these changes in Microsoft Office, Acrobat and Mars, including benefits, implications and best form of use.
Much has been said about the new interface for Microsoft Office 2007, but less well known is the fact that it also introduces completely renovated file formats for Word, Excel and PowerPoint called Office Open XML.
Adobe has also released version 8 of Acrobat Professional, Standard and Reader, and launched a completely new PDF format in the process. It also published a beta version of an XML format dubbed "Mars" which will be supported for Acrobat 8 and beyond. Currently, Mars does not support all of the features of an Acrobat 8 PDF file but it is expected to do so over time.
In the past, Microsoft's Word, Excel and PowerPoint formats have been the standard for text, spreadsheets and presentations, respectively, while Adobe's PDF has become the standard for graphical representations of documents.
Approvals from Standards Organizations
Both companies have submitted their formats to international standards organizations: Microsoft's Office Open XML has been approved by ECMA (European Computer Manufacturers Association). Microsoft has also given a "Covenant Not to Sue" to provide comfort to third parties that they can build software applications around the Open XML format. Adobe is seeking approval from the better-known ISO (International Organization for Standardization) for its latest formats and its PDF/A format has been approved as an ISO standard for electronic document archival.
New File Extensions for Microsoft Documents but not for Adobe
Microsoft's new Word, Excel and PowerPoint files have new file extensions to identify the files that are stored in the new format. Files now have different extensions depending on whether the file is macro-enabled or not, making it easier for management within security regimes.
New Microsoft Office OpenXML Extensions
Office 2007 will open and save files from both the old and new formats. For Office 2000, XP and 2003 you can download a free service pack from Microsoft that allows you to open from, or save to, the new formats (see "Action Steps for Law Firms" sidebar).
Acrobat files still use the same .PDF extension for standard files, although the underlying format has changed. The XML-friendly Mars format uses the extension .mars.
What's in the Package?
The new file formats from both companies introduce the concept of a file as a "package." So let's look under the hood:
An Open XML file, such as a docx, is actually a package containing multiple files wrapped up in a ZIP file. If you change the docx extension of a Word document to .zip, for example, you can unzip the file and examine its "parts." Even a simple document has at least two folders that contain mostly XML files. Files for different parts of a document such as headers, body and footers are stored as files within separate folders and graphics or images are stored separately from the XML text (XML tags link the images to the correct location in an XML text file). This is a paradigm shift from Microsoft's previous proprietary binary format. Microsoft has published a document more than 1,000 pages long that details the full set of XML tags used for all markups of word processing, spreadsheet, presentation and chart files.
The use of the ZIP container also means that file sizes may be dramatically smaller than previous document files. Adding an Open XML file to another zip file or folder will now have negligible impact on reducing file storage size because it has already been zipped.
Adobe has dramatically revamped the structure of a PDF file. It has now become a "package" that can contain multiple documents – including Word, Excel, Visio, Publisher, Access, AutoCAD, Internet Explorer web pages and email from Outlook or Lotus Notes. Files can also be attached to a PDF in their native file format. These can be combined with scanned documents or documents printed from any Windows application to the PDF Distiller printer driver. PDF files can now be saved to Word format. You will need Acrobat 8 Standard or Professional to create the PDF packages. When you choose to create a package you can choose groups of folders or files and they will automatically be converted into the PDF file. There is an option to compress files that are included in a package, which, according to Adobe, means they will take up about half the space of the original documents. You also need to allow more time for creation of the package if you use compression.
Interestingly, if you add a Microsoft Office Open XML file to an Adobe PDF file, you are inserting one package into another.
Moving to XML
As the name Open XML suggests, Microsoft's new file formats use XML (eXtensible Markup Language). The openness will give greater comfort to organizations, including law firms, which adopt the standard knowing that it will be adhered to by Microsoft and other vendors.
XML is essentially plain text that is marked up with tags that specify formatting or other information about the document. Thus headers, footers, bookmarks, fields, and all formatting attributes are implemented through tags.
The following lines show what the Open XML would look like for a Heading in a Word document with the text "This is my first heading" with the style "Heading1":
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1"/>
</w:pPr>
<w:r>
<w:t>This is my first heading.</w:t>
</w:r>
</w:p>
Obviously, the XML coding can get much messier, but this gives you an idea of what the XML tags look like. Notice that all text of the document is visible in the raw document format, which was not true of previous Office formats.
Adobe Acrobat 8's standard PDF format is not XML-based, but the optional Mars format is. Both formats have been published and are open to use by third parties. You can download a free plug-in for Acrobat Reader 8 that supports the Mars XML format.
This is the XML for a Mars PDF text field that prompts for the Country of Citizenship and defaults the answer to " USA":
<Text Widget="4122307792-13" UIName="tt text" Justification="Centered"
Name="Country of Citizenship" Flags="Required RadiosInUnison">
<DefaultValue> USA</DefaultValue>
</Text>
Metadata and e-Discovery Implications
Although the file formats have changed, both formats can hold metadata (e.g. comments, annotations, bookmarks). The new Open XML format allows for more efficient and transparent access to the metadata since all metadata will be stored in the file surrounded by XML tags rather than in a binary format. But if you are using metadata removal tools, you will need to make sure that you obtain the latest versions that work with Open XML or the latest PDF formats. You should also ensure that e-Discovery tools you use have been updated to be able to search or index information within Open XML files.
Acrobat has been a good solution for minimizing metadata because printing a Word document to PDF would send just the printable image of the document while stripping any metadata. Now that the new PDF format can optionally include files in their native format, metadata remains an issue. Metadata removal tools should be used to clean files before they are added to a PDF package. Adobe has incorporated metadata removal and redaction capabilities in Acrobat 8 Professional for the PDF portions of the file.
Choosing Which Formats to Use and When
The choice of file formats depends on what type of file you are using and where it is in its life cycle. While you are drafting and continually revising a Word document, for example, you may wish to keep it in docx format. At the point where you are ready to send it to a client or opposing counsel, you may wish to save it to .doc format if you are not sure whether they can open docx files. Alternatively, you could put it into a PDF package (a nice feature of Acrobat 8 is that the PDF package can contain multiple documents on a transaction). This will preserve the appearance of your document, while allowing users (with the free Adobe Reader 8) to view and annotate documents and return the PDF package to you. On the other hand, if you want others to make further edits or changes to a Word document, perhaps using Track Changes, then sending them the Open XML version of the document may be preferred if you know they have an Open XML-compliant version of Word.
Once a document is executed, you may wish to scan or print a version of it to a PDF file. You can also use Acrobat to "sign" documents electronically.
If it is your practice to maintain folders of files for each client or transaction, once the file is closed, you may wish to archive all files for that client into a single Adobe PDF file. This would allow you to delete a multiplicity of files on your hard drive while also reducing storage space. You can later print any pages from any document or save back to Word format. Archiving to another location or off to DVD is also made easier by this technique and it is easier to restore or retrieve files later. You can think of an Acrobat 8 PDF file as an electronic file "brad." As documents are finalized and either sent, executed or filed, they can be added to the PDF package in chronological order.
If you are using document management software you will also need to save the document in formats that are recognized by your DMS.
Document Assembly Possibilities
Microsoft has previously left sophisticated "document assembly" to third party software like HotDocs while providing merge and macro features within Word. Microsoft now promotes document assembly as one of the potential benefits of the new XML file format. Programmers and third party vendors, equipped with the information about the XML structure of a Word document, will be able to manipulate the text far more easily than the previous binary format. Server-based assembly of documents will be feasible without the need to have a copy of Word running on the server. The potential to perform automated assembly of spreadsheets and PowerPoint files has also increased dramatically because the underlying XML components of the documents are now in plain text. Rather than using an Excel macro to manipulate a spreadsheet, a third party program could assemble chunks of XML that will be stored to a spreadsheet file that can be opened in Excel.
If you have an existing document assembly program you'll need to upgrade to the latest version in order to use it with Word 2007 Open XML formats.
Adobe Acrobat also has markedly improved document filling features. With Acrobat 8 Professional, you can import or scan forms into PDF format and let Acrobat create fill-able fields automatically. You can then distribute the form, let users fill it in and send it back. Then you can extract the information in XML, which can be stored to a database, re-used in other Acrobat forms or used to fill other non-Adobe documents. Adobe forms can be used as web-based interviews to obtain information from clients.
In the past, it has been possible to fill fields in a PDF form, but it has been very difficult to conditionally include or exclude pages or portions of pages of a PDF file. One solution had been to save multiple versions of a PDF form with the varying page configurations and then choose which one to fill. Now it will be possible to manipulate the XML text of a Mars PDF file so as to conditionally include any section of a PDF page and create resulting PDF packages that contain only the pages or content that you want. It will be interesting to see how soon third party vendors pick up on this capability.
Improved Integration with Case Management Software
The adoption of XML by both Microsoft and Adobe offers the potential of better integration with case management software because vendors will be able to more easily mine information from XML tags within documents and populate fields in databases. Similarly, information from databases can be supplied via XML to fill information into documents. Since XML was developed as a means to store database information within documents and Microsoft and Adobe have now adopted XML standards for storage of their documents, the distinction between case management databases and documents will become blurred. Documents can be used to gather information to populate the case management database and database information will flow more easily into documents and reports.
Better Document Management Software Integration
The new file formats also offer significant benefits for integration with document management software. DMS systems will be able to more easily harvest document profile information from the XML tags in the files. Full-text search features can be more optimized because they are searching plain text files with XML tags to help categorize and sort information.
Action Steps for Law Firms
- Learn about the new formats and determine which ones should be implemented at your firm.
- If you have already installed Office 2007, you have a choice to use the new Office Open XML format, or save to previous Office formats. Remember that your clients or others with whom you exchange documents might not have Office 2007 or the capability to open the new document formats, so you may need to save to the old formats before sending them to others.
- Decide whether to convert old Office documents to Open XML format. Microsoft has released a bulk conversion tool for converting your old documents as part of their 2007 Microsoft Office System Migration Guidance.
- If you have not upgraded to Office 2007, you may receive documents in the new format from clients or other firms, so download the free software for Office 2000, XP or 2003 that lets you open or save to Office Open XML format.
- Upgrade to Acrobat 8 Standard, Professional or Reader if you want to work with the new PDF file formats. Lawyers should seriously consider Acrobat 8 Professional because of the features for metadata removal, redaction and Bates numbering.
- If you want to use the optional XML-friendly Mars format, download the free Mars plug-in for Adobe Reader 8.
- If you are going to use or implement Office Open XML or Acrobat 8 PDF documents, ensure that your document management, full-text search, document comparison, case management and document assembly software can all work with the new file formats. Since the formats are so dramatically different, you will likely have to obtain upgrades from your vendors.
- Consider the benefits of standardizing other documents and processes that you have to use XML since XML will become more pervasive in software applications.
- Consider using Acrobat 8 Professional to create fill-able forms to gather information from clients or users within the firm.
- Develop strategies for archival of documents. Consider the possibility of using Acrobat 8 PDF files like a file brad to store all documents relevant to a file, thereby saving storage space and reducing the number of files.
About the Author
Doug Simpson is president of BackDraft Systems and consults to law firms on document assembly and legal technology.
Technology Calendar
Upcoming Technology Events
Conference
ABA TECHSHOW 2009
American Bar Association
Law Practice Management Section
April 2-4, 2009





.gif)







