Tech List

[index] [prev] [next] [options] [help]
See the Mailing Lists Page for how to subscribe and unsubscribe.

eprints_tech messages

Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.

Re: [EP-tech] security hole and expensive indexing

From: "Roman Chyla" <roman.chyla AT gmail.com>
Date: Mon, 19 May 2008 17:38:40 +0200


Threading: [EP-tech] security hole and expensive indexing from roman.chyla AT gmail.com
      • This Message

*** 
http://www.eprints.org/tech.php/id/%3Cea0115e90805190838l99d250cs77fac5d279883758%40mail.gmail.com%3E
*** EPrints community wiki - http://wiki.eprints.org/

Hi Tim,

>
> On Mon, 2008-05-19 at 14:47 +0200, Roman Chyla wrote:
>> *** 
http://www.eprints.org/tech.php/id/%3Cea0115e90805190547o4d84bdb2yec2625bbaf255415%40mail.gmail.com%3E
>> *** EPrints community wiki - http://wiki.eprints.org/
>>
>> Hi,
>> i am in a process of fixing EPrints for us. I am hoping to give you
>> some feedback over time. This one is a severe security hole:
>> generally a very bad idea to do:
>>
>> $sql = "INSERT INTO $indextable (fieldword,pos,ids ) VALUES
>> ('$fieldword',$n,':$objectid:')"
>> $session->get_database->do( $sql )
>>
>> should be done as:
>>
>> $sql = "INSERT INTO $indextable (fieldword,pos,ids ) VALUES 
(?,?,?)"
>> $session->get_database->do( $sql, $fieldword, $n, $objectid )
>>
>> even if EPrints *sometimes* does the Database::prep_value($value) -
>> taht function does not care for anything but backslashes
>
> $fieldword goes through prep_value, which escapes quotes and
> backslashes:

I think it is not the case here:
"UPDATE $indextable SET ids='$ids$objectid:' WHERE
fieldword='$fieldword' AND pos=$n"

There is one or two more in Index.pm

>
> $value =~ s/["\\']/\\$&/g;

I am not sure if this is enough - there might be more to escape than
quotes and back slashes. And escaping is database/driver specific, so
in my opinion it is better to use the standard DBI quote function
$dbh->quote("Don't")

>
> You can't pass a value through prep_value (in MySQL) that will be
> executed.
>
> The only non-escaped values passed through should be integer i.e. will
> have come from an id field.
>
>> prone to this error: everything in EPrints, especially fulltext 
indexing
>>
>> ----
>> and also, the indexing is very CPU intensive (for many reasons) one of
>> them is this:
>>
>> $session->get_database->do( $query)
>>
>> it would be much much better to do:
>> my $q = $dbh->prepare("statement")
>>
>> and then only in the loop
>> $q->execute()
>
> MySQL's driver only executes strings - the place holder substitution
> happens in Perl for every execution. prepare/execute and do() should be
> about the same complexity.

considering that the add() is called for every field for every eprint,
it could make a difference to prepare statements once, cache them, and
use for every field. but yes, in theory and in some cases it is
necessary ->do(), because with DBI one cannot have placeholders for
table/column names, but


>
> I've rationalised a lot of the DB functionality in 3.1, including making
> use of DBI's place holder substitution, because that does make a
> difference to other DBs.
>
> I think you'll find the slowest parts of indexing are due to spawning
> external processes (for text extraction) and text parsing.

of course, that counts, but i think some of these count too:
- combining field value with the field name in eprint__index (not in
eprint_rindex though)
that means when searching, you are probably searching for
_fulltext_:something instead of "select * from index where
col1=something and col2=fulltext_id"
nor can you spare all the rows for index that should match every
value, it is not possible to ask: "select * from index where
col1=something"

i believe it is expensive to have that text field name in the value,
yet again, i have done no profiling


- and instead of billion queries, you are issuing 2 billion queries
just updating the index
SELECT max(pos) FROM $indextable where fieldword='$fieldword'
SELECT ids FROM $indextable WHERE fieldword='$fieldword' AND pos=$n

while it could be:
SELECT ids FROM $indextable WHERE fieldword='$fieldword' order by pos
desc limit 1;

Best,

roman

>
> All the best,
> Tim.
>
>


[index] [prev] [next] [options] [help]