Tech List

[index] [prev] [next] [options] [help]
See the Mailing Lists Page for how to subscribe and unsubscribe.

eprints_tech messages

Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.

RE: [EP-tech] Questions on use

From: Ed Sponsler <eds AT library.caltech.edu>
Date: Tue, 15 May 2001 10:45:13 -0700


Threading: Re: [EP-tech] Questions on use from cjg AT ecs.soton.ac.uk
      • This Message
             RE: [EP-tech] Questions on use from tdb198 AT ecs.soton.ac.uk
             RE: [EP-tech] Questions on use from Mark.Faine AT msfc.nasa.gov
             Re: [EP-tech] Questions on use from support AT eprints.org


[If you have an postscript printer driver installed on your Windoz box,
converting MS Word to postscript is as easy as printing to file.]

Although using XML for storing document content is the ultimate solution,
there are significant hurdles to overcome, most not really technical. 

First, it is vital that the XML be valid against a DTD (Document Type
Definition). Without a DTD, you may still have well formed XML, however you
can forget about controlling the final output format using XSL or CSS (as
far as I'm aware). 

The DTD must be agreed upon by all authors submitting reports to your XML
based archive. For basic reports containing only text and images, this
shouldn't be too tough. Mathematical formulas may be converted to images, to
simplify an otherwise tricky hurdle.

Now that there is a DTD, scripts (XSL, CSS, etc.) are now able to take
advantage of a well defined structure and thus convert XML to HTML, for
example.

Now comes the hard part. How do you get the authors to construct the valid
XML? The best way would be to provide them with an XML editor that enforces
conformance to your DTD. I don't know of too many tools that do this. One
such tool (expensive) is available from Arbotext (http://www.arbortext.com/)
called the Epic Editor. With an editor like this, you can't violate the DTD.

Another option is to shift the responsibility of preparing the valid XML to
the archive maintainer, such as a librarian or (ugh) sys admin. The author
should at least help out by using a well defined style sheet (MS Word) or
class file (.cls in LaTex). The document perparer then has the wonderful
privilege of tagging the authors source file into proper XML. (Yack). There
are tools available to help. For example, James Clark has written a tool to
convert RTF output from Word into well-formed XML (I'll hunt it down if
anyone is interested by you can try http://jclark.com), however you still
have to make such output valid to a DTD. I don't believe this option is very
practical, except for very small projects. 

The two main barriers to using XML for the body of reports is formally
defining a document that all authors will agree on (generate a DTD) and
actually generating the valid XML.

The advantages of storing the document content in XML are numerous, as I'm
sure most are aware, and shouldn't be blown off just because of the
previously mentioned challenges. The burden of these challenges may be
reduced dramatically if the DTD were extremely simple. 

It is fortunate that OAI compliant archives (such as EPrints) spit out
metadata in XML. Thus, a important chunk of the document (the front matter)
is effectively already stored in XML. Now, if the bibliography were also
tagged in valid XML, then only the body of the document would require
definition. 

[Using XML for storing bibliographic info opens up some interesting
possibilities, such as context-sensitive reference linking:
http://www.sfxit.com. Utilities are available for converting the output of
common bibliographic tools, such as Endnote (Word) and BibTex (LaTex):
http://www.ctan.org/tex-archive/biblio/bibtex/utils/bibtools/.]

Does anyone have any thoughts on all this? Do you think authors will ever
give up their beloved Word, WordPerfect or LaTeX editors in favor of an XML
one? Do you think plug-ins will be developed for Word to enforce DTD
compliance? Does anyone know of an LaTeX/.cls to XML/DTD converter? Is it
really that important to store the document body in XML at all, or is it
good enough to have XML front matter (already here), XML bibliographies
(just around the corner) and an open-standard-formatted body like PDF or
postscript?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Ed Sponsler, Sr. Computing Analyst
Caltech Library System


-----Original Message-----
From: Christopher Gutteridge [mailto:cjg AT ecs.soton.ac.uk]
Sent: Monday, May 14, 2001 8:56 AM
To: EPrints.org Technical List
Subject: Re: [EP-tech] Questions on use


Oh, hang on, I misunderstood. Brain not work good.

You mean converting the actual upload, rather than the metadata to XML:

Hmmm. This is, in theory, the best solution. But I don't know of an XML
format with tools which will convert to and from it from Postscript, PDF,
MS Word and HTML, without loss of information.

The easiest solution is to convert everything to postscript as a base
format, but that is still hard for MS Word (there are tools, but not 
free to my knowledge)

Any advice and suggestions welcome.

On Thu, May 10, 2001 at 07:33:26AM -0500, Faine, Mark wrote:
>     Ok, I've noticed there are several different upload types and an
> arbirtrary flag. If you were building from the ground up a new system
> wouldn't it be best to provide an interface for that system that allowed
new
> reports to be inputted to the system directly and converted to XML to be
> stored on the archive server as XML.  Then when the user searched for a
> record it could be displayed in any number of formats using XSL or CSS.
> Before I started digging into the functionality of eprints, I thought that
> was how it worked.  
>  
> -Mark
> 
> _____________________________________ 
> 
> Mark Faine 
> Computer Programmer III/MTRS Administration 
> Lesco/MSFC 
> 256-961-1295 
> 
>  

-- 
Christopher Gutteridge -- cjg AT ecs.soton.ac.uk -- +44 (0)23 8059 4833
Life is like a box of chocolates: It's given to you by relatives on your 
birthday; it's not really what you were expecting but it's better than a 
pair of socks.

[index] [prev] [next] [options] [help]