EPrints Technical Mailing List Archive

Message: #02617


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Indexing based on case sensitive file extension check?


Hello,

Bernard from IOE noticed that if he uploaded a pdf with an uppercase
extension (ie .PDF) it was never indexed. If he replaced that with the
same file with a lowercase extension it got indexed.

I managed to find the cause in
perl_lib/EPrints/Plugin/Convert/PlainText.pm

Where in the *can_convert* (ln 70) and *export* (ln 118) subs, there are
regexs that check the file extension before continuing. These expect
lowercase file extensions and so no indexcodes are extracted from .PDFs
of .DOCs or .HTMLs etc.

Easy to fix, once found, but took me ages.

Looking in github I can't see where any regression might have occurred
so I'm wondering if it was ever thus?

Cheers,

Rory

-- 
Rory McNicholl
Lead developer, Research Repositories Team
Academic Research Technologies
University of London Computer Centre
Senate House
Malet Street
London
WC1E 7HU

t: +44 (0)20 7863 1344
e: r.mcnicholl@ulcc.ac.uk
w: http://www.ulcc.ac.uk/
b: http://dablog.ulcc.ac.uk/


To ensure you receive the full benefits of the repositories service
please remember to cc repositories@ulcc.ac.uk

The University of London is an exempt charity in England and Wales and a
charity registered in Scotland (reg. no. SC041194)