[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] digital preservation - indexing errors



We just completed an upgrade of our repository, which includes a re-indexing phase of all the contents.
It was a good opportunity to take note of the errors that come up during indexing.

Here is a list of the common errors that occurred during indexing:

1.       Error: Illegal entry in bfrange block in ToUnicode CMap

2.       Error: Invalid Font Weight

3.       Error (##): Illegal character <##> in hex string

4.       Error: Can't create transform

5.       Error: Couldn't link the profiles
There are also some of these:

6.
Use of uninitialized value $data in substr at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 68.
Use of uninitialized value $magic in numeric eq (==) at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 69.
Use of uninitialized value $magic in sprintf at /opt/eprints3/tools/../perl_lib/Text/Extract/Word.pm line 69.
This does not seem to be a Word document, but it is pretending to be one: 0 at /opt/eprints3/tools/doc2txt line 68
Error 255 from doc2txt command: [...]


Error #1 and #3 look to be the most common.

Have you encountered these types of indexing errors?
How serious are they in terms of digital preservation?
Do you use any specific strategies/workflows for dealing with these?
Do the EPrints preservation (http://files.eprints.org/696/) plugins help with identifying/solving these issues?

Thanks for any comments/suggestions about this.

Tomasz



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20150730/11b66955/attachment.html