[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Poor performance due to cachemap, non-SQL joins


In EPrints 3.0.5 (old, I know) I see very poor performance when a user 
ticks the checkbox to view their eprints in live archive. Apparently 
what happens is that IDs of all eprints from the archive are first 
inserted into one of the dynamically created cache tables (this means 
tens of thousands of individual INSERTs at a time, which seems like 
great waste - the INSERTs are not even batched). Afterwards, only the 
user's own eprints are displayed (let's say one or two of them).

I also noticed that joins (as in "database joins") are performed on huge 
arrays in Perl code, which are scanned sequentially, rather than at the 
SQL level. This contributes greatly to the sluggishness of 
generate_views (2-3 days in an installation with 70000 eprints).

I suppose that these issues are known. But I searched in 
trac.eprints.org, and haven't any conclusive answers to whether they 
still exist in the current version? Trying to make a stronger case for 
an upgrade...

Jan Ploski