EPrints Technical Mailing List Archive

Message: #05267


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Xapian indexing


Hello David,
Many thanks for your solution.
I added the value "
text_index => 1 " when type field=> "namdeset". All fields are automatically indexed.

/opt/www/eprints-3.3.12/archives/agritrop/cfg/cfg.d/ eprint_field.pl
ex :
# Statut de l'indexation
{
    name => 'statut_indexation',
    type => 'namedset',
    set_name => 'statut_indexation',
    input_style => 'medium',
    text_index => 1,
},

Thank you for
your help

Happy Holidays

Josée

Le 09/12/2015 19:34, David R Newman a écrit :
Hi Josée,

Turns out to be a really simple answer to this question but a rather
long way round to discovering it.

By default namedset fields have text_index set to 0.  Therefore if only
namedset fields are changed the EPrint will not be queued for
re-indexing, even if the field in question will be re-indexed if you
change a non-namedset field at the same time.  The solution is to add a:

text_index => 1

to the namedset field you want to be indexed.

I suspect the reason that namedset is non indexed because it is not the
value you see in the select box that will be added to the index but the
underlying value in the namedset file, which often not the same.  Also
search on such a short term is likely to return quite a few results
where this value matches but on another indexed field.  Therefore, I
think text_index is turned off by default because it is unlikely doing a
free text search on a namedset value is going to return you the set of
results you are expecting.  In some cases it may be appropriate, at
which point you should set text_index to 1 for this field.

Regards

David Newman 

On Wed, 2015-12-09 at 17:12 +0000, David R Newman wrote:
Hi Josée,

I am currently looking into this issue as well as I have identified a
situation where a small percentage of EPrints cannot be found when you
individual search on their title.  I have script for automating testing
this on multiple EPrints at once, which I can make available.

On the specific issue you describe, I can replicate the same issue on a
3.3.14 version of EPrints.  I have yet to dig down into what is causing
it not being put in the indexer queue but I do not think it will be too
difficult to figure out.  I found that if I subsequently change another
non-namedset field it will schedule for re-index both that field and the
namedset field I had previously changed.

I am not certain if your issue relates the problem I mentioned initially
as I think the problem is non-Xapian dependent, as it is not until the
indexing task is run later by the indexer, does it know whether it will
indexed using Xapian or just to the database.

Regards

David Newman


On Wed, 2015-12-02 at 07:55 +0100, Lessard Josée wrote:
Hello,
we use Xapian for our simple search.



The Xapian indexing  is correct when a reference is validated in the
archive (eprint_status:buffer => archive)

But, if the correction is made on a  "namedsets" field, the document
indexing is not launched!
If the modification is made on a  "type text" field, indexing is
launched.
Have you ever had this problem reported?  How to make sure re-indexing
is launched on any field type modifications?

Sorry for my English.

Sincerly
Josée Lessard


eprint_search_simple.pl



$c->{search}->{simple} = 
{
    search_fields => [
        {
            id => 'q',
            meta_fields => [
                'documents',
                'eprintid',
                'title',
                'abstract',
                'date',
                'type',
                'statut_indexation',
                'indexeur',
...
            ]
        },
    ],
    preamble_phrase => 'cgi/search:preamble',
    title_phrase => 'cgi/search:simple_search',
    citation => 'result',
    page_size => 20,
    order_methods => {
        'byyear'      => '-date/creators_name/title',
        'byyearoldest'     => 'date/creators_name/title',
        'byname'       => 'creators_name/-date/title',
        'bytitle'      => 'title/creators_name/-date',
        'bytype'      => 'type/-date/title',
        'byti'             => '-full_text_status/-date/title',
    },
    default_order => 'byyear',
    show_zero_results => 1,
};




/opt/www/eprints-3.3.12/archives/agritrop/cfg/namedsets/statut_indexation



a_classer
a_indexer
a_indexer_indexeur
en_cours_d_indexation
a_indexer_electronique
a_indexer_papier
document_a_numeriser
notice_indexee




__________________________________

Correction eprints


Résultat :



title

"Publications et travaux du SAR 1996"

eprint_status

"archive"

statut_indexation

"en_cours_d_indexation"


Indexation Xapian :

      * title:1996 
      * title:du 
      * title:et 
      * title:publications 
      * title:sar 
      * title:travaux 
      * statut_indexation:notice_indexee
      * lastmod:20150909

 

 


 

-- 
-- 
Josée Lessard

Documentaliste

Cirad-Dgdrs-Délégation à l'information scientifique et technique

TA 183/05 - Avenue Agropolis - 34398 Montpellier Cedex 5 (Tél: +33 4
67 61 57 37)


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/

--

Josée Lessard

Documentaliste

Cirad-Dgdrs-Délégation à l'information scientifique et technique

TA 183/05 - Avenue Agropolis - 34398 Montpellier Cedex 5 (Tél: +33 4 67 61 57 37)