EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10238


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Alphabetically sort names with special characters


CAUTION: This e-mail originated outside the University of Southampton.

You should simply be able to use the sort_values method when the list
is presented.

So I'm assuming off the top of my head, it's gonna be a layout issue,
in one of the xml documents, and that's it.

Whether that is ACTUALLY the case or not, will require me to see the
problem you're seeing on my own EPrints, fix it on my own EPrints, and
then share what the fix was.

So let's first look at an example of where the problem is occurring.
You said it was in the second level of browse views.
Fantastic!
I've found an example here:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Farcomabstracts.com%2Fview%2Fyear%2F2025.html&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Cbe886dc2424d4a2eca7308ddf049a56d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930924309356968%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=WLWNXqcNCEHdRqQNqEz3p9GojjolFQqgreZKiYHxpGo%3D&reserved=0

Okay.
I will replicate the same issue on my EPrints, and then look for a
fix, and write again shortly.

I suspect the layout xml is just cycling through the eprints and
displaying them, without first sorting the eprints by author using
sort_values (
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FAPI%3AEPrints%2FMetaField%23sort_values&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Cbe886dc2424d4a2eca7308ddf049a56d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930924309384742%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=qlrofCmDILgi%2FirneCruqoQW4Ku0FwgColU9RQc8I1E%3D&reserved=0 )...and
we'll soon see if that's the case.

That said, I have perhaps not read your email fully enough.
Are you saying, you think it's a matter of what you have, or have not,
copied over into your custom browse view? Do you have a copy of your
custom browse view to share?

I've just read up about gists here:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.github.com%2Fen%2Fget-started%2Fwriting-on-github%2Fediting-and-sharing-content-with-gists%2Fcreating-gists&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Cbe886dc2424d4a2eca7308ddf049a56d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930924309411368%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=Vw5F1foM2rJmvUS7Gb6jEZAZ7N1EVJG3YbizoDGdEZw%3D&reserved=0
...and these seem a good way to share snippets of code as needed for
our discussions.

Yours,
Andrew.





Quoting Will Hughes <w.p.hughes@reading.ac.uk>:

Andrew

Oops, type in the email: "bewildered by you..." should have been
"bewildered, but you...". Apologies for that!

OK, I am beginning to understand but I still struggle to really see
through this. In the default views.pl, which provides the sorting I
would expect, I see this:
{
        id => "creators",
        allow_null => 0,
        hideempty => 1,
        menus => [
            {
                fields => [ "creators_name" ],
                new_column_at => [1, 1],
                mode => "sections",
                open_first_section => 1,
                group_range_function =>
"EPrints::Update::Views::cluster_ranges_30",
                grouping_function => "group_by_a_to_z_hideempty",
            },
        ],
        order => "-date/title",
        variations => [
            "type",
            "DEFAULT",
        ],
    },

But, in my custom browsing, by document type, I have this:

{
   id=>"doctype", # Browse by type of document
   menus =>
      [
      {
              fields => [ "type" ],
      },
      ],
    order => "creators_name/date",
              variations => [
                      "creators_name;first_letter",
                      "type",
                      "DEFAULT" ],
};

So, the problem comes when my results are at a second level, rather
than at a primary level of browse results. I struggle to figure out
which part of the code I should be copying into my custom browse view.

Best wishes

Will

-----Original Message-----
From: Will Hughes
Sent: 10 September 2025 08:34
To: eprints-tech@unitedgames.co.uk
Subject: RE: [EP-tech] Alphabetically sort names with special characters

Andrew

Thank you so much for digging around and exploring this. I have been
bewildered by you are helping me to make sense of it. Your question
about where I come across the problem made me think. It is
interesting as mostly happens in Browse views, but now I see that
sometimes the sort is as I desire, and sometimes not - so maybe it
is merely a question of how the browse view is configured:

* Browse by author gives my pages (and the lists below them)
"correctly" sorted, like this:
A | Á-Å | B | C | Ç | D | E | F | G | H | I | İ | J | K | L | M | N
| O | Ó-Ø | P | Q | R | S | Š-Ş | T | U | Ü | V | W | X | Y | Z

* Browse by year gives me pages (and lists below them) sorted like this:
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q |
R | S | T | U | V | W | X | Y | Z | Ç | Ö | Ş

* I also have a custom Browse by Document Type, which sorts like this:
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q |
R | S | T | U | V | W | X | Y | Z | Ç | Ó | Ö | Š

So, I am going to dig around in the customised views.pl files and
compare them to vanilla versions - it may simply be a question of
how the order is defined.

Best wishes

Will

-----Original Message-----
From: eprints-tech-request@ecs.soton.ac.uk
<eprints-tech-request@ecs.soton.ac.uk> On Behalf Of Andrew M
Sent: 10 September 2025 08:06
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] Alphabetically sort names with special characters

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Yes. It seems there is support for it already in MetaFields via the
sort_values method.
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FAPI%3AEPrints%2FMetaField%23sort_values&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Cbe886dc2424d4a2eca7308ddf049a56d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930924309436442%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=aR1ji9%2BFjpA3RdVwZWEk1m%2BFQIIObN059mxMreoBYEA%3D&reserved=0

=======

=pod

=item $out_list = $field->sort_values( $in_list, $langid )

Sorts the in_list into order, based on the "order values" of the
values in the in_list. Assumes that the values are not a list of
multiple values. [ [], [], [] ], but rather a list of single values.

=cut

=======

Yours,
Andrew.


Quoting Andrew M <eprints-tech@unitedgames.co.uk>:

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Quoting Andrew M <eprints-tech@unitedgames.co.uk>:

Since the script was getting butchered in email form, I've thrown it
online here:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.andrewjamesmehta.com%2Ffiles%2Feprints%2FUnicodeSortExample.pm&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Cbe886dc2424d4a2eca7308ddf049a56d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930924309460993%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=KlOJHsr3yP6fLSu81i%2Fh6bDHMB00CvaaaTA20vT70ks%3D&reserved=0

However, the main part was:

sub unicode_sort {
    my  $self   =   shift;
    my  @configuration_to_ignore_case_and_diacritics    =   (level => 1);

    return
Unicode::Collate->new(@configuration_to_ignore_case_and_diacritics)->s
ort(@ARG);
}

As written about in the Perl Unicode cookbook:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2Fperlunicook%23%25E2%2584%259E-36%3A-Case-and-accent-ins&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Cbe886dc2424d4a2eca7308ddf049a56d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930924309487175%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=Lo4IF8BeDP72tgW%2BO0zKX8MT%2FhBRzkxhPD4%2BPOaCMhw%3D&reserved=0
ensitive-Unicode-sort

This is Perl, and not EPrints of course, so the next stage is to
figure out where such improved sorts need to be used in EPrints, or if
there is already an option in EPrints for them.




CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

There was no need for the "our" before $a and $b in that code example.
Apologies. Was messing around with different things and left that in.


Quoting Andrew M <eprints-tech@unitedgames.co.uk>:

Was intrigued by this, and had a moment of spare time, so wrote a
short script, that attempts three different sorts:

Default sort,

Default unicode case folding case-insensitive sort,

...and since the second made no difference, I hit the online cookbook...
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2Fperlunicook%23%25E2%2584%259E-36%3A-Case-and-accent-i&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Cbe886dc2424d4a2eca7308ddf049a56d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930924309511481%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=6fK738s7%2FSEIyw2IrV%2BX6yHqWqsjP%2FX0UbZPYsTGjBs%3D&reserved=0
nsensitive-Unicode-sort and learned about the default unicode
case-and-accent-insensitive sort.

So now we know how to do the correct kind of sort in Perl....next
we'd need to know where in the EPrints codebase to apply the fix.

Where are you seeing the wrong order appearing? In what context do
you wish for the order to be changed in?

Of course there may also be a simple EPrints option that switches to
more correct ordering, so I probably should have checked the EPrints
wiki before looking up the Perl solution.

Attempting to copy and paste the short experimental script I just
wrote - hope it doesn't get butchered in email form:

====================



Quoting Will Hughes <w.p.hughes@reading.ac.uk>:

CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.
Hi

Hopefully a quick question with an easy answer:

How do we get alphabetic sorting to list accented characters at an
appropriate point in an alphabetic list? The default behaviour
seems to use UniCode values or something, as accented characters
appear at the end of the alphabet.

For example, when I see this kind of sequence from Eprints:


*   Church, B
*   Lee, K
*   Ågren, R
*   Çınar, D

I feel that it should (probably) be:


*   Ågren, R
*   Church, B
*   Çınar, D
*   Lee, K

Is there a simple setting to implement sorting in a way that
respects accented characters? (and will these characters reproduce
accurately after emailing! Image attached just in case)

Best wishes

Will

Will Hughes
Emeritus Professor of Construction Management and Economics School
of the Built Environment University of Reading, PO Box 219,
Whiteknights Reading, RG6 6DF, UK