EPrints Technical Mailing List Archive

Message: #04445


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Searching through multilang fields


Hi all,

First of all I started a new thread, since the objective of the question has strayed from its original form (nothing to do with finding selected language from configuration) and the new question is quite self contained.

Secondly, I did some research before asking for further help, and here are my findings (I am using the *title* field in these examples):

1) Each multilang eprints field generates three additional fields in every eprint__ordervalue_xx table of your database (where xx may stand for en, el, etc.), namely: ml_title_text, ml_title_lang, ml_title_text (all of them are of type longtext)
2) The peculiar thing is that all language texts are stored in the ml_title_text field of *every* eprint__ordervalues_xx table. What I mean is that each eprint__ordervalue_xx language table (eg. en, el) contains the exact same information.
3) When performing an API search to see how searching the ml_title field is performed using the following code (which just searches for the "title" keyword within our multilang ml_title field):

use EPrints;
my $ep = EPrints->new();
my $repo = $ep->repository( "testpamak1" );
my $ds = $repo->dataset("archive");
my $an_eprint = $repo->eprint( 23 );

$list = $ds->search(search_fields => [{
    meta_fields => [qw( ml_title )], value => "title",
  }]);


the generated SQL was:

SELECT `eprint`.`eprintid` FROM `eprint`, `eprint__rindex` AS `eprint__rindex` WHERE `eprint`.`eprintid`=`eprint__rindex`.`eprintid` AND (`eprint`.`eprint_status` = 'archive' AND `eprint__rindex`.`field`='ml_title_text' AND `eprint__rindex`.`word`='title') GROUP BY `eprint`.`eprintid`

Meaning that the real search was performed in the eprint__rindex table. This table has a 'field' column and a 'word' column, where the 'field' column in our case is 'ml_title' and the 'word' column equals to our search keyword, which is 'title'.

That said, it seems that the search process (at least as far as the API is concerned) will work just fine, if we somehow manage to use the multilang fields in our searches (eg. ml_title, ml_abstract, ml_creator, etc.).

Now, my questions are as follows:

- Where should I start in order to create a custom search form? Is there any documentation or should I try to understand how it works by reading eprint_search_advanced.pl?
- Is there another, maybe easier way to achieve searching within the multilang fields or their associated virtual fields?

Thanks all in advance,

George.

On 02/07/2015 02:17 μμ, Adam Field wrote:
I don't believe virtual fields will get searched because they aren't stored in the database.

You should add the multilang field to the search form, but I'm actually not sure how it will behave.

--
Adam Field
Business Relationship Manager and Community Lead
EPrints Services

On 2 Jul 2015, at 12:10, George Mamalakis <mamalos@eng.auth.gr> wrote:

Excellent observation skills!! :):)...and sorry for not having seen that.

One more question. With this setup (for title and abstract with two fields each), will searches work as expected? Meaning that, based on the selected language, EPrints will search the (dynamically generated) appropriate language title field for example? Because I'm trying to test it, but I'm facing some difficulties using the extended search menu, and I'm not sure if this is the problem (putting the title ). Simple search works just fine, but I assume that simple search searches all fields?

Thanks again!

George.


-- 
George Mamalakis

IT and Security Officer, 
Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki),
PhD (Aristotle Univ. of Thessaloniki),
MSc (Imperial College of London)

School of Electrical and Computer Engineering
Aristotle University of Thessaloniki

phone number : +30 (2310) 994379