The output of the Wd2XML macro can be saved to an XML file and styled with the perspective XSLT stylesheets.
“PERSPECTIVE” is a macro and stylesheet distribution that allows you to publish structured Word documents to web and print. Wd2XML, the embedded Microsoft Word macro produces XML to the Perspective Document Type Definition (DTD). The stylesheets in this distribution allow the creation and transformation of XML to HTML, PDF (using Formatting-Objects via XSLT) and DocBook, a widely used XML standard. This page and the accompanying PDF were produced using this software.
This package is still necessary in these days of XML file-formats. Currently there is no other way to easily and automatically generate semantic markup from a styled document. The importance of semantic markup is that you will have a clear separation between style and content, the same content being used to generate layouts in many media. Other solutions based on Wiki style markup of plain text don’t properly take into account document meta information, and they certainly don’t handle bibliographic data. This package provides that as well as generating compact and semantically meaningful XML that can be transformed to any other schema. This package produces XML to the Docbook schema, for example. DocBook may then be used for transforming to present and future formats such as the epub format (using the most recent versions of the DocBook XSL stylesheets) used in the Sony Reader and Adobe Digital Editions.
The perspective-schema DTD allows the creation of articles with hyperlinks, footnote entries, itemised lists, tables, image inclusions and citations, as well as certain inline elements such as superscripts, bold, underline and italic. It also handles proper quotation marks and a useful subset of foreign characters and currency symbols. The XSLT stylesheet will insert a drop cap at the start of the text body, process footnotes to provide forward and backward links, and create author initials in bibliographic entries.
Also bundled in the source folder is hrefUtilities, a group of routines that manipulate Word hyperlinks. Using Word's hyperlink dialogs can be cumbersome, and this bundle allows hyperlinks to be typed in the Wiki format [[URL][link text]] and then automatically converts the text to embedded hyperlinks. A help file which documents this more fully is to be found in the same folder.
Perspective provides a DTD for a reduced markup language,
Namespace
http://www.e-conomist.fsnet.co.uk/xml/dtd/perspective-schema.dtd
PUBLIC "-//CTIPPER//DTD perspective XML V2.0//EN"
Download here — perspective-2.1.9.zip 265KB
Wd2XML is a word macro package that generates a subset of my ‘perspective’ Document Type Definition. Wd2XML assumes that images, tables and bibliographies will be marked up manually, though the markup for these elements is easy to learn. This is a pragmatic decision based on the costs and benefits of implementing these features.
A Workflow can be defined as follows:
Create Word document based on perspective.dot template.
Go Tools|Macro|Macros dialog, select Wd2XML.main and click ‘Run’
Save the output to the xml subdirectory as text file, inserting the .xml extension manually.
Output XHTML using stylesheets/html-perspective.xsl.
Output PDF using stylesheets/fo-perspective.xsl.
Output Docbook XML using stylesheets/dbk-perspective.xsl.
NOTE: The Word template does not handle img, table, bibliography nor revhistory elements. To use these features of the stylesheet, some study of the perspective-schema.dtd will be helpful. (To be found in xml/dtd folder).
Notes:
It helps to have the perspective.dot in your Office Templates folder, but the template in the working directory always takes precedence. So you have two use-cases:
File | New menu and new from template
or
double-click perspective.dot in the working directory
You must use document styles in order to give the package enough hints for markup.
Document meta information is pulled from the File | Properties menu, so it helps to be scrupulous about filling in keywords, title and subject. ‘Comments’, which are mapped to the <introduction> element, are optional.
An article must contain a Title and a Subtitle for the transformation to be successful—obviously with the eponymous style applied.
Only Word style Heading2 is active, which maps to the <sub-heading> element. Using styles Heading1 and Heading3 will cause problems with rendering and validation.
Make sure to uncheck the “Replace straight quotes with smart quotes” in the Tools | Autocorrect | Autoformat as you type dialog. This interferes with any XML embedded in the document. You can manually Autoformat sections of the document that contain “curly quotes.”
This macro has been certified for use in Word XP, Word 2000 and Word 2007. On the Mac, the template MUST be in the user’s “My Templates” folder, or they will be unable to create a document from it. Indeed the whole package has been shown to work on the Macintosh, using Office 2004.
You may like to save to a text-formatted file using a .xml extension in the xml subdirectory. IE6 and Safari will load this xml file and use the linked stylesheet for instant presentation. For professional use, you will have to go to step 4.
Tx.tcl greatly expedites the transformation of the output to XHTML and Print.
The supplied Tcl scripts uses xsltproc. You may use GNOME libxml to process the pages. Win32 versions are available from http://www.zlatkovic.com/libxml.en.html. Please follow the instructions at this site for installation. You will need to download 4 packages and create PATH environment variables to use this software. See your operating system documentation. Libxml is already installed on Mac OS X.
To produce PDF you will need a FO renderer, such as XEP. XEP is available in a trial edition from http://www.renderx.com/
The license for this product (“LGPL.txt”) is fairly standard.
Modification of the software is at the user’s own risk, all derivative
products must give proper attribution and incorporation into commercial products
must be under the terms of the Lesser GNU Public License. 
|
|
|
October 2007—Health-care and life
expectancy...more
September 2007—The music industry is in
decline...more
August 2007—“Our Biotech Future” from
New York Review of Books...more
June 2007—Isaiah Berlin and “positive”
liberties...more
May 2007—Ten years is a long
time in politics...more
April 2007—Précis of “War of the
World”...more
|
|
|
|