EPrints Technical Mailing List Archive

Message: #03088


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: IRStats2 - Family names of creators not displaying with appropriate capital letters


Hi,

Actually I'm getting different results on my machine (using the name-parse module) but it's even worse that what you showed me.


So I've extracted some code from the "lingua-name-case" module and this seems to work well on your examples.

The patch is there: https://github.com/eprints/irstats2/commit/ab76be0f2c72d753c76f5f9f772ec4612c1c7937

The complete file (Stats/Sets.pm) is there: https://raw.githubusercontent.com/eprints/irstats2/master/cfg/plugins/EPrints/Plugin/Stats/Sets.pm

Could you give this a try on your data and let me know if it fixed your issues?

Thanks,
Seb.




On 28/05/14 15:30, Centro de Documentación wrote:
Hi Seb,

Thanks for your help.

I've tested graingert's suggestion...

It works well in general, but can't deal with this situation:

(before -without any patch-) Alvaro de lazaro, Martín => (after)
Alvaro De Lazaro, Martín (-correct form- Alvaro de Lazaro, Martín)
Gennero de romero, Andrea inés => Gennero De Romero, Andrea Inés
(Gennero de Romero, Andrea Inés)
De la rosa, Julia maría => De La Rosa, Julia María  (de la Rosa, Julia María)

Well, I'll see what can I do

Thanks again,


On Tue, May 27, 2014 at 10:21 AM, Sebastien Francois
<sf2@ecs.soton.ac.uk> wrote:
I a reference to this:
http://stackoverflow.com/questions/19396804/capitalizing-strings-which-contain-accented-characters

But I don't have much to look into this at the moment... It seems to be
about an internal regex (to the module above) which fails to detect
proper word boundaries.

graingert (on github) was suggesting using another perl module - perhaps
worth giving that one a try?

Seb.

On 23/05/14 18:16, Centro de Documentación wrote:
Seb,

It works, but not always accurate. I have a problem with capital
letters after accent marks in family and given names.

Before applying the patch "lingua"
Alvarez cema, Juan alberto

After (Ok)
Alvarez Cema, Juan Alberto


Before applying the patch
González carella, María inés

After (Wrong)
GonzáLez Carella, MaríA InéS

Should be
González Carella, María Inés

Any suggestion?

Regards,


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/