From: Tim Brody <tdb2 AT ecs.soton.ac.uk>
Date: Wed, 10 Mar 2010 17:05:03 +0000
| Threading: | ↑ [EP-tech] =?utf-8?b?RVByaW50cyBhbmQg2KfZhNi52LHYqNmK2Kk=?= from r.davis AT ulcc.ac.uk • This Message |
On Wed, 2010-03-10 at 16:45 +0000, Richard M. Davis wrote: > Dear all > > Does anyone know if EPrints can index, search and generally grok Arabic > items - metadata and full-texts? Take a look at (EPrints 3.2): http://roar.eprints.org/1557/ The words are chunked on whitespace and indexed and you can find that record by a term. Languages without whitespace (e.g. Chinese symbols) aren't going to work. You may need to refine the methods in cfg.d/indexing.pl to correctly chunk non-Roman text. Beware of UTF-8 gotchas e.g. the database columns not being big enough to store the longer chars. > I did happen on > > http://wiki.eprints.org/w/Language_Issues_in_EPrints_2 > > but it's obviously a bit out-of-date. Also slightly concerned that the > page is categorised under "Languages" and "Rubbish". One of our oompa-loompas has been going through the Wiki flagging incorrect/out of date content (I pointed out "rubbish" wasn't a good term ...). /Tim. *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/