See the Mailing Lists Page for how to subscribe and unsubscribe.
eprints_tech messages
Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.
[EP-tech] Questions on use
From: "Faine, Mark" <Mark.Faine AT msfc.nasa.gov>
Date: Thu, 10 May 2001 07:33:26 -0500
| Threading: | • This Message → Re: [EP-tech] Questions on use from cjg AT ecs.soton.ac.uk → Re: [EP-tech] Questions on use from cjg AT ecs.soton.ac.uk |
Ok, I've noticed there are several different upload types and an
arbirtrary flag. If you were building from the ground up a new system
wouldn't it be best to provide an interface for that system that allowed new
reports to be inputted to the system directly and converted to XML to be
stored on the archive server as XML. Then when the user searched for a
record it could be displayed in any number of formats using XSL or CSS.
Before I started digging into the functionality of eprints, I thought that
was how it worked.
-Mark
_____________________________________
Mark Faine
Computer Programmer III/MTRS Administration
Lesco/MSFC
256-961-1295
--
ATTACHMENT: eprints-tech.22.2.html!
--
Re: [EP-tech] Questions on use
From: Christopher Gutteridge <cjg AT ecs.soton.ac.uk>
Date: Mon, 14 May 2001 15:34:26 +0100
| Threading: | ↑ [EP-tech] Questions on use from Mark.Faine AT msfc.nasa.gov • This Message → RE: [EP-tech] Questions on use from eds AT library.caltech.edu |
Storing the metadata as XML is impractical, searching becomes very clumsey - the diffinitive(sp?) copy of the data will remain in the database, however the new version may have multiple pages for one record - eg: one with subjects and links in french and one in english. I may try to look into making XML versions of the abstract pages and views pages which get regenerated at the same time - that would mean they could be harvested/processed without any extra load on the server.... Hmmm. Don't hold your breath, though. On Thu, May 10, 2001 at 07:33:26AM -0500, Faine, Mark wrote: > Ok, I've noticed there are several different upload types and an > arbirtrary flag. If you were building from the ground up a new system > wouldn't it be best to provide an interface for that system that allowed ↵ new > reports to be inputted to the system directly and converted to XML to be > stored on the archive server as XML. Then when the user searched for a > record it could be displayed in any number of formats using XSL or CSS. > Before I started digging into the functionality of eprints, I thought that > was how it worked. > > -Mark > > _____________________________________ > > Mark Faine > Computer Programmer III/MTRS Administration > Lesco/MSFC > 256-961-1295 > > -- Christopher Gutteridge -- cjg AT ecs.soton.ac.uk -- +44 (0)23 8059 4833 Trust and Obey, for there's no other way to be happy in Jesus than to Trust and Obey.
Re: [EP-tech] Questions on use
From: Christopher Gutteridge <cjg AT ecs.soton.ac.uk>
Date: Mon, 14 May 2001 16:55:45 +0100
| Threading: | ↑ [EP-tech] Questions on use from Mark.Faine AT msfc.nasa.gov • This Message |
Oh, hang on, I misunderstood. Brain not work good. You mean converting the actual upload, rather than the metadata to XML: Hmmm. This is, in theory, the best solution. But I don't know of an XML format with tools which will convert to and from it from Postscript, PDF, MS Word and HTML, without loss of information. The easiest solution is to convert everything to postscript as a base format, but that is still hard for MS Word (there are tools, but not free to my knowledge) Any advice and suggestions welcome. On Thu, May 10, 2001 at 07:33:26AM -0500, Faine, Mark wrote: > Ok, I've noticed there are several different upload types and an > arbirtrary flag. If you were building from the ground up a new system > wouldn't it be best to provide an interface for that system that allowed ↵ new > reports to be inputted to the system directly and converted to XML to be > stored on the archive server as XML. Then when the user searched for a > record it could be displayed in any number of formats using XSL or CSS. > Before I started digging into the functionality of eprints, I thought that > was how it worked. > > -Mark > > _____________________________________ > > Mark Faine > Computer Programmer III/MTRS Administration > Lesco/MSFC > 256-961-1295 > > -- Christopher Gutteridge -- cjg AT ecs.soton.ac.uk -- +44 (0)23 8059 4833 Life is like a box of chocolates: It's given to you by relatives on your birthday; it's not really what you were expecting but it's better than a pair of socks.
RE: [EP-tech] Questions on use
From: Ed Sponsler <eds AT library.caltech.edu>
Date: Tue, 15 May 2001 10:45:13 -0700
| Threading: | ↑ Re: [EP-tech] Questions on use from cjg AT ecs.soton.ac.uk • This Message → RE: [EP-tech] Questions on use from tdb198 AT ecs.soton.ac.uk → RE: [EP-tech] Questions on use from Mark.Faine AT msfc.nasa.gov → Re: [EP-tech] Questions on use from support AT eprints.org |
[If you have an postscript printer driver installed on your Windoz box, converting MS Word to postscript is as easy as printing to file.] Although using XML for storing document content is the ultimate solution, there are significant hurdles to overcome, most not really technical. First, it is vital that the XML be valid against a DTD (Document Type Definition). Without a DTD, you may still have well formed XML, however you can forget about controlling the final output format using XSL or CSS (as far as I'm aware). The DTD must be agreed upon by all authors submitting reports to your XML based archive. For basic reports containing only text and images, this shouldn't be too tough. Mathematical formulas may be converted to images, to simplify an otherwise tricky hurdle. Now that there is a DTD, scripts (XSL, CSS, etc.) are now able to take advantage of a well defined structure and thus convert XML to HTML, for example. Now comes the hard part. How do you get the authors to construct the valid XML? The best way would be to provide them with an XML editor that enforces conformance to your DTD. I don't know of too many tools that do this. One such tool (expensive) is available from Arbotext (http://www.arbortext.com/) called the Epic Editor. With an editor like this, you can't violate the DTD. Another option is to shift the responsibility of preparing the valid XML to the archive maintainer, such as a librarian or (ugh) sys admin. The author should at least help out by using a well defined style sheet (MS Word) or class file (.cls in LaTex). The document perparer then has the wonderful privilege of tagging the authors source file into proper XML. (Yack). There are tools available to help. For example, James Clark has written a tool to convert RTF output from Word into well-formed XML (I'll hunt it down if anyone is interested by you can try http://jclark.com), however you still have to make such output valid to a DTD. I don't believe this option is very practical, except for very small projects. The two main barriers to using XML for the body of reports is formally defining a document that all authors will agree on (generate a DTD) and actually generating the valid XML. The advantages of storing the document content in XML are numerous, as I'm sure most are aware, and shouldn't be blown off just because of the previously mentioned challenges. The burden of these challenges may be reduced dramatically if the DTD were extremely simple. It is fortunate that OAI compliant archives (such as EPrints) spit out metadata in XML. Thus, a important chunk of the document (the front matter) is effectively already stored in XML. Now, if the bibliography were also tagged in valid XML, then only the body of the document would require definition. [Using XML for storing bibliographic info opens up some interesting possibilities, such as context-sensitive reference linking: http://www.sfxit.com. Utilities are available for converting the output of common bibliographic tools, such as Endnote (Word) and BibTex (LaTex): http://www.ctan.org/tex-archive/biblio/bibtex/utils/bibtools/.] Does anyone have any thoughts on all this? Do you think authors will ever give up their beloved Word, WordPerfect or LaTeX editors in favor of an XML one? Do you think plug-ins will be developed for Word to enforce DTD compliance? Does anyone know of an LaTeX/.cls to XML/DTD converter? Is it really that important to store the document body in XML at all, or is it good enough to have XML front matter (already here), XML bibliographies (just around the corner) and an open-standard-formatted body like PDF or postscript? =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Ed Sponsler, Sr. Computing Analyst Caltech Library System -----Original Message----- From: Christopher Gutteridge [mailto:cjg AT ecs.soton.ac.uk] Sent: Monday, May 14, 2001 8:56 AM To: EPrints.org Technical List Subject: Re: [EP-tech] Questions on use Oh, hang on, I misunderstood. Brain not work good. You mean converting the actual upload, rather than the metadata to XML: Hmmm. This is, in theory, the best solution. But I don't know of an XML format with tools which will convert to and from it from Postscript, PDF, MS Word and HTML, without loss of information. The easiest solution is to convert everything to postscript as a base format, but that is still hard for MS Word (there are tools, but not free to my knowledge) Any advice and suggestions welcome. On Thu, May 10, 2001 at 07:33:26AM -0500, Faine, Mark wrote: > Ok, I've noticed there are several different upload types and an > arbirtrary flag. If you were building from the ground up a new system > wouldn't it be best to provide an interface for that system that allowed new > reports to be inputted to the system directly and converted to XML to be > stored on the archive server as XML. Then when the user searched for a > record it could be displayed in any number of formats using XSL or CSS. > Before I started digging into the functionality of eprints, I thought that > was how it worked. > > -Mark > > _____________________________________ > > Mark Faine > Computer Programmer III/MTRS Administration > Lesco/MSFC > 256-961-1295 > > -- Christopher Gutteridge -- cjg AT ecs.soton.ac.uk -- +44 (0)23 8059 4833 Life is like a box of chocolates: It's given to you by relatives on your birthday; it's not really what you were expecting but it's better than a pair of socks.
RE: [EP-tech] Questions on use
From: Tim Brody <tdb198 AT ecs.soton.ac.uk>
Date: Tue, 15 May 2001 19:24:09 +0100 (BST)
| Threading: | ↑ RE: [EP-tech] Questions on use from eds AT library.caltech.edu • This Message |
On Tue, 15 May 2001, Ed Sponsler wrote: > It is fortunate that OAI compliant archives (such as EPrints) spit out > metadata in XML. Thus, a important chunk of the document (the front ↵ matter) > is effectively already stored in XML. Now, if the bibliography were also > tagged in valid XML, then only the body of the document would require > definition. We, in OpCit (http://opcit.eprints.org), are working towards this. For example see: http://cite-base.ecs.soton.ac.uk/cgi-bin/oai/OAI-script?verb=GetRecord&metadataPrefix=opcit_dc&identifier=oai:arXiv:hep-th/0001001 (liable to change, breakage etc.) That data is based on the automatic conversion of LaTeX bibliographic records. > [Using XML for storing bibliographic info opens up some interesting > possibilities, such as context-sensitive reference linking: > http://www.sfxit.com. Utilities are available for converting the output of > common bibliographic tools, such as Endnote (Word) and BibTex (LaTex): > http://www.ctan.org/tex-archive/biblio/bibtex/utils/bibtools/.] (you may also be interested in http://www.pubmedcentral.nih.gov/ which, I believe, stores its documents in XML - I would expect medical reports don't suffer as much from XML's inability to express formulas. Journals provide the properly formatted XML using a PubMed DTD) > Does anyone have any thoughts on all this? Do you think authors will ever > give up their beloved Word, WordPerfect or LaTeX editors in favor of an ↵ XML > one? Do you think plug-ins will be developed for Word to enforce DTD > compliance? Does anyone know of an LaTeX/.cls to XML/DTD converter? Is it > really that important to store the document body in XML at all, or is it > good enough to have XML front matter (already here), XML bibliographies > (just around the corner) and an open-standard-formatted body like PDF or > postscript? Only to say that archives will always store the formats that authors give them, otherwise the authors won't give the archive's any documents! (and while authors use Word, Perfect, LaTeX, dodgy HTML ...) All the best, Tim Brody ECS, Southampton
RE: [EP-tech] Questions on use
From: "Faine, Mark" <Mark.Faine AT msfc.nasa.gov>
Date: Tue, 15 May 2001 14:55:54 -0500
| Threading: | ↑ RE: [EP-tech] Questions on use from eds AT library.caltech.edu • This Message |
What we need for the authors is an editor with an easy to use gui that hides all the XML underneath and politely enforces a DTD based on a pre-defined template. All the ease of an application like word with something like Xmetal underneath. That would be the killer app that would make storing document content as XML feasible for everyone. -Mark -----Original Message----- From: Ed Sponsler [mailto:eds AT library.caltech.edu] Sent: Tuesday, May 15, 2001 12:45 PM To: 'EPrints.org Technical List' Subject: RE: [EP-tech] Questions on use [If you have an postscript printer driver installed on your Windoz box, converting MS Word to postscript is as easy as printing to file.] Although using XML for storing document content is the ultimate solution, there are significant hurdles to overcome, most not really technical. First, it is vital that the XML be valid against a DTD (Document Type Definition). Without a DTD, you may still have well formed XML, however you can forget about controlling the final output format using XSL or CSS (as far as I'm aware). The DTD must be agreed upon by all authors submitting reports to your XML based archive. For basic reports containing only text and images, this shouldn't be too tough. Mathematical formulas may be converted to images, to simplify an otherwise tricky hurdle. Now that there is a DTD, scripts (XSL, CSS, etc.) are now able to take advantage of a well defined structure and thus convert XML to HTML, for example. Now comes the hard part. How do you get the authors to construct the valid XML? The best way would be to provide them with an XML editor that enforces conformance to your DTD. I don't know of too many tools that do this. One such tool (expensive) is available from Arbotext (http://www.arbortext.com/) called the Epic Editor. With an editor like this, you can't violate the DTD. Another option is to shift the responsibility of preparing the valid XML to the archive maintainer, such as a librarian or (ugh) sys admin. The author should at least help out by using a well defined style sheet (MS Word) or class file (.cls in LaTex). The document perparer then has the wonderful privilege of tagging the authors source file into proper XML. (Yack). There are tools available to help. For example, James Clark has written a tool to convert RTF output from Word into well-formed XML (I'll hunt it down if anyone is interested by you can try http://jclark.com), however you still have to make such output valid to a DTD. I don't believe this option is very practical, except for very small projects. The two main barriers to using XML for the body of reports is formally defining a document that all authors will agree on (generate a DTD) and actually generating the valid XML. The advantages of storing the document content in XML are numerous, as I'm sure most are aware, and shouldn't be blown off just because of the previously mentioned challenges. The burden of these challenges may be reduced dramatically if the DTD were extremely simple. It is fortunate that OAI compliant archives (such as EPrints) spit out metadata in XML. Thus, a important chunk of the document (the front matter) is effectively already stored in XML. Now, if the bibliography were also tagged in valid XML, then only the body of the document would require definition. [Using XML for storing bibliographic info opens up some interesting possibilities, such as context-sensitive reference linking: http://www.sfxit.com. Utilities are available for converting the output of common bibliographic tools, such as Endnote (Word) and BibTex (LaTex): http://www.ctan.org/tex-archive/biblio/bibtex/utils/bibtools/.] Does anyone have any thoughts on all this? Do you think authors will ever give up their beloved Word, WordPerfect or LaTeX editors in favor of an XML one? Do you think plug-ins will be developed for Word to enforce DTD compliance? Does anyone know of an LaTeX/.cls to XML/DTD converter? Is it really that important to store the document body in XML at all, or is it good enough to have XML front matter (already here), XML bibliographies (just around the corner) and an open-standard-formatted body like PDF or postscript? =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Ed Sponsler, Sr. Computing Analyst Caltech Library System -----Original Message----- From: Christopher Gutteridge [mailto:cjg AT ecs.soton.ac.uk] Sent: Monday, May 14, 2001 8:56 AM To: EPrints.org Technical List Subject: Re: [EP-tech] Questions on use Oh, hang on, I misunderstood. Brain not work good. You mean converting the actual upload, rather than the metadata to XML: Hmmm. This is, in theory, the best solution. But I don't know of an XML format with tools which will convert to and from it from Postscript, PDF, MS Word and HTML, without loss of information. The easiest solution is to convert everything to postscript as a base format, but that is still hard for MS Word (there are tools, but not free to my knowledge) Any advice and suggestions welcome. On Thu, May 10, 2001 at 07:33:26AM -0500, Faine, Mark wrote: > Ok, I've noticed there are several different upload types and an > arbirtrary flag. If you were building from the ground up a new system > wouldn't it be best to provide an interface for that system that allowed new > reports to be inputted to the system directly and converted to XML to be > stored on the archive server as XML. Then when the user searched for a > record it could be displayed in any number of formats using XSL or CSS. > Before I started digging into the functionality of eprints, I thought that > was how it worked. > > -Mark > > _____________________________________ > > Mark Faine > Computer Programmer III/MTRS Administration > Lesco/MSFC > 256-961-1295 > > -- Christopher Gutteridge -- cjg AT ecs.soton.ac.uk -- +44 (0)23 8059 4833 Life is like a box of chocolates: It's given to you by relatives on your birthday; it's not really what you were expecting but it's better than a pair of socks.
Re: [EP-tech] Questions on use
From: ePrints Support <support AT eprints.org>
Date: Sat, 19 May 2001 18:46:43 +0100
| Threading: | ↑ RE: [EP-tech] Questions on use from eds AT library.caltech.edu • This Message |
ramble on the next gen of doc formats: I expect that in the long term someone will produce a system which allows documents to be expressed in a nice generic format like XML with all the style being seperate - stylesheets and XHTML are a forerunner but I don't think we're there yet. I believe there is a MathML for formulas... The problem is that joe user (or joe professor) wants to use a WYSIWYG editor, which usually combine formating and style with semantic structuring (headers, paragraphs, cross-refs). Even latex is a hodgepodge of content and layout. Ideally this magical mystery format would be easy for MS Word (and similar) to export. Unless it's easy to edit and produces nice output both on screen and paper, no one will use it. My discussions with users have shown me that they don't mind using something which provides more useful output, but they aren't going to go out of their way to help either. I think that's why OAI has a chance to succeed because it can be relatively painlessly bolted on top of an already existing archive - day to day users don't have to go through any more hassle to submit records, so they don't care. We need the same thing from a generic document format - something that can be added to existing software, initially. On Tue, May 15, 2001 at 10:45:13AM -0700, Ed Sponsler wrote: > [If you have an postscript printer driver installed on your Windoz box, > converting MS Word to postscript is as easy as printing to file.] > > Although using XML for storing document content is the ultimate solution, > there are significant hurdles to overcome, most not really technical. > > First, it is vital that the XML be valid against a DTD (Document Type > Definition). Without a DTD, you may still have well formed XML, however ↵ you > can forget about controlling the final output format using XSL or CSS (as > far as I'm aware). > > The DTD must be agreed upon by all authors submitting reports to your XML > based archive. For basic reports containing only text and images, this > shouldn't be too tough. Mathematical formulas may be converted to images, ↵ to > simplify an otherwise tricky hurdle. > > Now that there is a DTD, scripts (XSL, CSS, etc.) are now able to take > advantage of a well defined structure and thus convert XML to HTML, for > example. > > Now comes the hard part. How do you get the authors to construct the valid > XML? The best way would be to provide them with an XML editor that ↵ enforces > conformance to your DTD. I don't know of too many tools that do this. One > such tool (expensive) is available from Arbotext ↵ (http://www.arbortext.com/) > called the Epic Editor. With an editor like this, you can't violate the ↵ DTD. > > Another option is to shift the responsibility of preparing the valid XML ↵ to > the archive maintainer, such as a librarian or (ugh) sys admin. The author > should at least help out by using a well defined style sheet (MS Word) or > class file (.cls in LaTex). The document perparer then has the wonderful > privilege of tagging the authors source file into proper XML. (Yack). ↵ There > are tools available to help. For example, James Clark has written a tool ↵ to > convert RTF output from Word into well-formed XML (I'll hunt it down if > anyone is interested by you can try http://jclark.com), however you still > have to make such output valid to a DTD. I don't believe this option is ↵ very > practical, except for very small projects. > > The two main barriers to using XML for the body of reports is formally > defining a document that all authors will agree on (generate a DTD) and > actually generating the valid XML. > > The advantages of storing the document content in XML are numerous, as I'm > sure most are aware, and shouldn't be blown off just because of the > previously mentioned challenges. The burden of these challenges may be > reduced dramatically if the DTD were extremely simple. > > It is fortunate that OAI compliant archives (such as EPrints) spit out > metadata in XML. Thus, a important chunk of the document (the front ↵ matter) > is effectively already stored in XML. Now, if the bibliography were also > tagged in valid XML, then only the body of the document would require > definition. > > [Using XML for storing bibliographic info opens up some interesting > possibilities, such as context-sensitive reference linking: > http://www.sfxit.com. Utilities are available for converting the output of > common bibliographic tools, such as Endnote (Word) and BibTex (LaTex): > http://www.ctan.org/tex-archive/biblio/bibtex/utils/bibtools/.] > > Does anyone have any thoughts on all this? Do you think authors will ever > give up their beloved Word, WordPerfect or LaTeX editors in favor of an ↵ XML > one? Do you think plug-ins will be developed for Word to enforce DTD > compliance? Does anyone know of an LaTeX/.cls to XML/DTD converter? Is it > really that important to store the document body in XML at all, or is it > good enough to have XML front matter (already here), XML bibliographies > (just around the corner) and an open-standard-formatted body like PDF or > postscript? > > =-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Ed Sponsler, Sr. Computing Analyst > Caltech Library System > > > -----Original Message----- > From: Christopher Gutteridge [mailto:cjg AT ecs.soton.ac.uk] > Sent: Monday, May 14, 2001 8:56 AM > To: EPrints.org Technical List > Subject: Re: [EP-tech] Questions on use > > > Oh, hang on, I misunderstood. Brain not work good. > > You mean converting the actual upload, rather than the metadata to XML: > > Hmmm. This is, in theory, the best solution. But I don't know of an XML > format with tools which will convert to and from it from Postscript, PDF, > MS Word and HTML, without loss of information. > > The easiest solution is to convert everything to postscript as a base > format, but that is still hard for MS Word (there are tools, but not > free to my knowledge) > > Any advice and suggestions welcome. > > On Thu, May 10, 2001 at 07:33:26AM -0500, Faine, Mark wrote: > > Ok, I've noticed there are several different upload types and an > > arbirtrary flag. If you were building from the ground up a new system > > wouldn't it be best to provide an interface for that system that ↵ allowed > new > > reports to be inputted to the system directly and converted to XML to ↵ be > > stored on the archive server as XML. Then when the user searched for ↵ a > > record it could be displayed in any number of formats using XSL or ↵ CSS. > > Before I started digging into the functionality of eprints, I thought ↵ that > > was how it worked. > > > > -Mark > > > > _____________________________________ > > > > Mark Faine > > Computer Programmer III/MTRS Administration > > Lesco/MSFC > > 256-961-1295 > > > > > > -- > Christopher Gutteridge -- cjg AT ecs.soton.ac.uk -- +44 (0)23 8059 4833 > Life is like a box of chocolates: It's given to you by relatives on your > birthday; it's not really what you were expecting but it's better than a > pair of socks. -- Christopher Gutteridge support AT eprints.org ePrints Technical Support +44 23 8059 4833
[index] [options] [help]




