[index] [prev] [next] [options] [help]
See the Contact page for how to subscribe and unsubscribe.

eprints_tech messages

Please note: this page shows emails that have been sent to the eprints_tech mailing list. Some of these may be spam emails we have failed to filter.

[EP-tech] Re: =?utf-8?b?RVByaW50cyBhbmQg2KfZhNi52LHYqNmK2Kk=?=

From: Tim Brody <tdb2 AT ecs.soton.ac.uk>
Date: Wed, 10 Mar 2010 17:05:03 +0000


Threading: [EP-tech] =?utf-8?b?RVByaW50cyBhbmQg2KfZhNi52LHYqNmK2Kk=?= from r.davis AT ulcc.ac.uk
      • This Message

On Wed, 2010-03-10 at 16:45 +0000, Richard M. Davis wrote:
> Dear all
> 
> Does anyone know if EPrints can index, search and generally grok Arabic 
> items - metadata and full-texts?

Take a look at (EPrints 3.2):
http://roar.eprints.org/1557/

The words are chunked on whitespace and indexed and you can find that
record by a term. Languages without whitespace (e.g. Chinese symbols)
aren't going to work.

You may need to refine the methods in cfg.d/indexing.pl to correctly
chunk non-Roman text.

Beware of UTF-8 gotchas e.g. the database columns not being big enough
to store the longer chars.

> I did happen on
> 
> http://wiki.eprints.org/w/Language_Issues_in_EPrints_2
> 
> but it's obviously a bit out-of-date. Also slightly concerned that the 
> page is categorised under "Languages" and "Rubbish".

One of our oompa-loompas has been going through the Wiki flagging
incorrect/out of date content (I pointed out "rubbish" wasn't a good
term ...).

/Tim.

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/


[index] [prev] [next] [options] [help]