[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Indexing based on case sensitive file extension check?


Bernard from IOE noticed that if he uploaded a pdf with an uppercase
extension (ie .PDF) it was never indexed. If he replaced that with the
same file with a lowercase extension it got indexed.

I managed to find the cause in

Where in the *can_convert* (ln 70) and *export* (ln 118) subs, there are
regexs that check the file extension before continuing. These expect
lowercase file extensions and so no indexcodes are extracted from .PDFs
of .DOCs or .HTMLs etc.

Easy to fix, once found, but took me ages.

Looking in github I can't see where any regression might have occurred
so I'm wondering if it was ever thus?



Rory McNicholl
Lead developer, Research Repositories Team
Academic Research Technologies
University of London Computer Centre
Senate House
Malet Street

t: +44 (0)20 7863 1344
e: r.mcnicholl at ulcc.ac.uk
w: http://www.ulcc.ac.uk/
b: http://dablog.ulcc.ac.uk/

To ensure you receive the full benefits of the repositories service
please remember to cc repositories at ulcc.ac.uk

The University of London is an exempt charity in England and Wales and a
charity registered in Scotland (reg. no. SC041194)