EPrints Technical Mailing List Archive

Message: #08313


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] General purpose CSV import


Dear all,

we quite frequently have to process lists of data (mostly CSV) coming from various sources to update our repositories. The requirement is usually as follows:

- there is one or several matching criterions (eprintid, DOI, ISSN, whatever)
- there are some fields (columns) in the source format or name
- there are the fields in the EPrints repo where the data must be filled in

So the task is always the same: You write a script (or a plugin) that matches the eprint to update by a search on the criterions and then updates the data or adds a new record and writes a report (or log). And the next time you write again a similar script because some of the fields or criteria have changed. In addition, there is the overhead of exchanging files between the repo software developer and the admins who want to have the data updated, usually done by us via issue tracking system.

Why not have a general purpose import plugin that allows the end-user (repo admin, OA monitoring expert, journal manager, you name them) to update data directly:
- choose the match columns and associate with the match fields of the repository
- choose the data columns and associate with update fields of the repository
- choose the action options  (update, create, create upon mismatch, ...)
- carry out the action (probably as detached process)
- inform the user about the status of the process (running, terminated, failed)
- obtain or download a report for quality control

Has anybody already created something similar? Interest?

Kind regards,

Martin

--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich