See the Mailing Lists Page for how to subscribe and unsubscribe.
eprints_tech messages
Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.
Re: [EP-tech] security hole and expensive indexing
From: "Roman Chyla" <roman.chyla AT gmail.com>
Date: Mon, 19 May 2008 17:38:40 +0200
| Threading: | ↑ [EP-tech] security hole and expensive indexing from roman.chyla AT gmail.com • This Message |
*** ↵ http://www.eprints.org/tech.php/id/%3Cea0115e90805190838l99d250cs77fac5d279883758%40mail.gmail.com%3E *** EPrints community wiki - http://wiki.eprints.org/ Hi Tim, > > On Mon, 2008-05-19 at 14:47 +0200, Roman Chyla wrote: >> *** ↵ http://www.eprints.org/tech.php/id/%3Cea0115e90805190547o4d84bdb2yec2625bbaf255415%40mail.gmail.com%3E >> *** EPrints community wiki - http://wiki.eprints.org/ >> >> Hi, >> i am in a process of fixing EPrints for us. I am hoping to give you >> some feedback over time. This one is a severe security hole: >> generally a very bad idea to do: >> >> $sql = "INSERT INTO $indextable (fieldword,pos,ids ) VALUES >> ('$fieldword',$n,':$objectid:')" >> $session->get_database->do( $sql ) >> >> should be done as: >> >> $sql = "INSERT INTO $indextable (fieldword,pos,ids ) VALUES ↵ (?,?,?)" >> $session->get_database->do( $sql, $fieldword, $n, $objectid ) >> >> even if EPrints *sometimes* does the Database::prep_value($value) - >> taht function does not care for anything but backslashes > > $fieldword goes through prep_value, which escapes quotes and > backslashes: I think it is not the case here: "UPDATE $indextable SET ids='$ids$objectid:' WHERE fieldword='$fieldword' AND pos=$n" There is one or two more in Index.pm > > $value =~ s/["\\']/\\$&/g; I am not sure if this is enough - there might be more to escape than quotes and back slashes. And escaping is database/driver specific, so in my opinion it is better to use the standard DBI quote function $dbh->quote("Don't") > > You can't pass a value through prep_value (in MySQL) that will be > executed. > > The only non-escaped values passed through should be integer i.e. will > have come from an id field. > >> prone to this error: everything in EPrints, especially fulltext ↵ indexing >> >> ---- >> and also, the indexing is very CPU intensive (for many reasons) one of >> them is this: >> >> $session->get_database->do( $query) >> >> it would be much much better to do: >> my $q = $dbh->prepare("statement") >> >> and then only in the loop >> $q->execute() > > MySQL's driver only executes strings - the place holder substitution > happens in Perl for every execution. prepare/execute and do() should be > about the same complexity. considering that the add() is called for every field for every eprint, it could make a difference to prepare statements once, cache them, and use for every field. but yes, in theory and in some cases it is necessary ->do(), because with DBI one cannot have placeholders for table/column names, but > > I've rationalised a lot of the DB functionality in 3.1, including making > use of DBI's place holder substitution, because that does make a > difference to other DBs. > > I think you'll find the slowest parts of indexing are due to spawning > external processes (for text extraction) and text parsing. of course, that counts, but i think some of these count too: - combining field value with the field name in eprint__index (not in eprint_rindex though) that means when searching, you are probably searching for _fulltext_:something instead of "select * from index where col1=something and col2=fulltext_id" nor can you spare all the rows for index that should match every value, it is not possible to ask: "select * from index where col1=something" i believe it is expensive to have that text field name in the value, yet again, i have done no profiling - and instead of billion queries, you are issuing 2 billion queries just updating the index SELECT max(pos) FROM $indextable where fieldword='$fieldword' SELECT ids FROM $indextable WHERE fieldword='$fieldword' AND pos=$n while it could be: SELECT ids FROM $indextable WHERE fieldword='$fieldword' order by pos desc limit 1; Best, roman > > All the best, > Tim. > >
[index] [prev] [next] [options] [help]




