MSWordView

A Word 8 converter for Unix

For clarity: MSWordView is free software, but it is not a GNU package. However, the GNU web site has long had this page about it, so it will remain for reference. Also, the project appears to be moribund; if you're interested in resurrecting it, please try contacting the author or searching around. We (GNU) have no information about it beyond what is posted here.

What is it

MSWordView is a program that can understand the microsofts word 8 binary file format (office97), it currently converts word into html, which can then be read with a browser.

MSWordView is being actively worked on, and will be pretty bleeding edge for the next few weeks, bear with me.

Current Features include

Currently Non Supported Features include
I will be working on the unsupported features, but as its already fairly useful, im releasing it. Also it only does word 8, not word 6 and/or word 7, i will be adding word 6 capabilities to it as well, and if i get lucky word 7.

This is to be considered early beta software as theres loads to be done and many bits and bobs to be fixed and supported.

What do you need

Just the source

Web Gateway

Demo mswordview here, dont use this to convert information you wouldnt want me to see, coz if the conversion doesnt work, ill be using the file you convert to try and extend what mswordview can support, which will require me to read it. This script is broken for non ascii languages, mswordview supports them but the utf-8 is getting stripped somewhere in the web interface to it.

More Info

MsWordView used to use laola to break the word file up into its ole streams, but now uses custom c code that is included in the distribution, after that the word specification that microsoft has made available is followed to extract the text and paragraph properties, i.e whether we are in a table or not.

How to Obtain Microsoft Office File Formats

The MS Office file formats (Word, Excel, Powerpoint, Office Binder and Office Drawing) are all freely available from the MS web site provided you are a member of the MS Developer Network (MSDN). Joining MSDN is free to gain access to these specifications

Simply go to the following address:
http://msdn.microsoft.com
From the list on the left of the screen select MSDN library online
If you are not a member of the MS Developer Network you will need to join - it's free.
Once you have subscribed to the MSDN, you can obtain online copies of the file formats. To do this, follow these steps:
1.On the MSDN World Wide Web site, click MSDN Library Online.
2.Under Member Area, click the Library Online tab.
3.Double-click Microsoft Office Development.
4.Double-click Office.
5.Double-click Microsoft Office 97 Binary File Formats.
6.Select the format you are interested in (Word, Excel, Powerpoint, etc.)

There is a definite need for converters for the other msoffice products. In relation to this converter ms office draw is needed, so go out there and work on it.

Other Decoders and related projects

There already exist a few attempts as word converters
laola (originally used by mswordview) includes one called elser, doesnt handle word 8, but can do word 6 and 7
word2x, which is for word 6 and doesnt do fastsaves
catdoc, which doesnt do fastsaves or tables, also for word 6.

all these converters are almost magical in how far they managed to go without access to the microsoft format specification, and their code was terribly useful in figuring out some things

Sun has something which displays word files on screen, though it doesnt print
Corels word processor for linux, has a very good converter for word6/7/8 built in. Its has had a few mistakes in conversion, but unlike current mswordview it retains formatting very very well.
Use wine and the ms 16bit word viewer, heres a howto.
the filters project.
A word macro investigation tool

Download MSWordView

Warning, mswordview no longer outputs to standard output by default

Remember this is a work in progress, its not finished yet and may show bugs.

Known Bugs

wmf files arent converted to any format that can be displayed in the html output yet.
Heres my CHANGELOG, keep track of it for news and updates what im working on etc.

Other Resources

mswordview outputs utf-8 encoded html for the most part, netscape has inbuilt support for this, but you might like to install a utf-8 font yourself, look at this page for more info.

Mailing List

an incredibly low volume mailing list for announcements has been set up for mswordview (Aug 24th 1998)
to subscribe send email to mswordview-subscribe@makelist.com
to unsubscribe send email to mswordview-unsubscribe@makelist.com
the address of the list itself is mswordview@makelist.com
the list archive is at http://www.findmail.com/list/mswordview/
Subscribe to mswordview
Enter your e-mail address:
FindMail List Archive
A mailing list hosted by FindMail

What would be nice to get