[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Antwort: Re: {Disarmed} International characters in advanced search fail for XML-Export



Dear Adam,

because Jens is out of the office for a few days, I jump in.

I take one of the records that is found by the saved search and where there
is a German umlaut in the creator's name (Kr?ger, G):

bin/export zora eprint XMLforCMS2 95663

just exports fine, and we obtain the expected XML:

<?xml version="1.0" encoding="utf-8" ?>
<eprints xmlns="http://eprints.org/ep2/data/2.0";>

    <eprint id="http://www.zora.uzh.ch/id/eprint/95663";>
      <eprintid>95663</eprintid>
      <title>Krabben, W??rmer, Schwein und Hund. Wie machen Tiere
Geschichte?</title>
      <date>2014-04</date>
      <year_from_date>2014</year_from_date>
      <creators__editors_if_edited_scientific_work>Kr??ger,
Gesine</creators__editors_if_edited_scientific_work>
      <first_creator__or__first_editor_if_edited_scientific_work>Kr??ger,
Gesine</first_creator__or__first_editor_if_edited_scientific_work>
      <type_in_text>Book Section</type_in_text>
      <citation>Kr??ger, Gesine (2014). &lt;a
href="http://www.zora.uzh.ch/95663"; target="_blank" class="uzh"
title="zoracitationlink 95663"&gt;Krabben, W??rmer, Schwein und Hund. Wie
machen Tiere Geschichte?&lt;/a&gt; In: Grumblies, Florian; Weise, Anton.
Unterdr??ckung und Emanzipation in der Weltgeschichte. Zum Ringen um
Freiheit, Kaffee und Deutungshoheit. Hannover, 26-41. ISBN
978-3-944342-47-4.</citation>

<coins>url_ver=Z39.88-2004&amp;rft_id=http%3A%2F%2Fwww.zora.uzh.ch%2F95663&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&amp;rft.au=Kr%C3%BCger%2C
+Gesine&amp;rft.aulast=Kr%C3%BCger&amp;rft.aufirst=Gesine&amp;rft.date=April
+2014&amp;rft.isbn=978-3-944342-47-4&amp;rft.title=Krabben%2C
+W%C3%BCrmer%2C+Schwein+und+Hund.+Wie+machen+Tiere
+Geschichte%3F&amp;rft.btitle=Unterdr%C3%BCckung+und+Emanzipation+in+der
+Weltgeschichte.+Zum+Ringen+um+Freiheit%2C+Kaffee+und
+Deutungshoheit&amp;rft.genre=bookitem&amp;rft.place=Hannover</coins>
    </eprint>

</eprints>


If we open the saved search with the "offending" umlaut (by clicking the
link in the "Name of search" column), the search is execute and yields a
result list.
You can then export the results by choosing an export plugin from the drop
down menu. All export plugins (including XMLforCMS2) do work this way.


In the last column of the saved search table there is a special button that
calls cgi/saved_search by passing savedsearch_id as parameter.
This button and the saved_search cgi script (seem to) have been extended by
EPrints Services for us.

Jens has opened a support case with Justin to check this script - we assume
that the problem is somewhere generated in the line

print $saved_search->make_searchexp->perform_search->export( $format );

when a "virtual" dataset is passed to the export plugin and there is an
umlaut in the originating query.

This problem not only happens with the XMLforCMS2 export - it happens with
any export format that is passed to the extended saved_search CGI script.

Best regards,

Martin


--
Dr. Martin Br?ndle
Zentrale Informatik
Universit?t Z?rich
Winterthurerstr. 190
CH-8057 Z?rich

mail: martin.braendle at id.uzh.ch
phone: +41 44 63 56705
fax: +41 44 63 54505
http://www.id.uzh.ch



Von:	"Field A.N." <af05v at ecs.soton.ac.uk>
An:	eprints-tech at ecs.soton.ac.uk
Datum:	22/01/2015 15:56
Betreff:	[EP-tech] Re: {Disarmed} International characters in advanced
            search fail for XML-Export
Gesendet von:	eprints-tech-bounces at ecs.soton.ac.uk



What happens if you export the record on the command line?


--
Adam Field
Business Relationship Manager and Community Lead
EPrints Services




On 20 Jan 2015, at 16:13, jens.vieler at id.uzh.ch wrote:

> Hi together,
>
> (using ePrints V3.3.12)
>
> found a strange behaviour in combination Advanced Search / Saved Search /
XML-Export whithin context of international characters: If we use a saved
search on a author/creator with german Umlauts (international encoding),
the XML-Export-Plugin returns an empty XML-Dataset. Database entry
savedsearch|spec looks like smart utf8 to us (look at the bottom of this
message).
>
> Does anybody know this behaviour ...or better know how to fix it? :)
>
> Cheers
>  Jens
>
>
> In detail:
>
> 1.) Creating an Advanced Searching for an author/creator WITHOUT German
Umlauts (e.g. "Vieler")
>
> - Database shows spec:
>
> ?plugin=Internal&searchid=advanced&dataset=archive&exp=0%7C1%7C-date%2Fcreators_name%2Ftitle%7Carchive%7C-%7Ccreators_name%3Acreators_name%3AALL%3AEQ%3AVieler%7C-%7Ceprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive%7Cmetadata_visibility%3Ametadata_visibility%3AANY%3AEQ%3Ashow

>
> - Screen-View:
>
> http://www.<eprint-server>.ch/id/saved_search/<savedsearch_id>
>
> will be redirected to
>
> MailScanner has detected a possible fraud attempt from "
www.zoratest.uzh.ch" claiming to be http://www
.<eprint-server>.ch/cgi/search/archive/advanced?_action_search=1&dataset=archive&exp=0|
1|-date%2Fcreators_name%2Ftitle|archive|-|
creators_name%3Acreators_name%3AALL%3AEQ%3AVieler|-|
eprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive&order=-date%2Fcreators_name%2Ftitle

>
> and works!
>
> - XML-Export for our CMS:
>
>  https://www
.<eprint-server>.ch/cgi/users/home?screen=Workflow%3A%3AExportSavedSearchResults&dataset=saved_search&dataobj=<savedsearch_id>

>
> will be redirected to
>
> MailScanner has detected a possible fraud attempt from "
www.zoratest.uzh.ch" claiming to be https://www
.<eprint-server>.ch/cgi/saved_search/export_zora_XMLforCMS2.xml?savedsearchid=<savedsearch_id>&_action_export=1&_output=XMLforCMS2

>
> and works!
>
>
> 2.) Creating an Advanced Searching for an author/creator WITH German
Umlauts (e.g. "Kr?ger,G")
>
> - Database shows spec:
>
> ?plugin=Internal&searchid=advanced&dataset=archive&exp=0%7C1%7C-date%2Fcreators_name%2Ftitle%7Carchive%7C-%7Ccreators_name%2Feditors_name%3Acreators_name%2Feditors_name%3AALL%3AEQ%3AKr%C3%BCger%2C
+G%7C-%7Ceprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive%7Cmetadata_visibility%3Ametadata_visibility%3AANY%3AEQ%3Ashow

>
> (so "Kr%C3%BCger" looks like good old utf8 stuff to me)
>
> - Screen-View:
>
> http://www.<eprint-server>.ch/id/saved_search/<savedsearch_id>
>
> will be redirected to
>
> MailScanner has detected a possible fraud attempt from "
www.zoratest.uzh.ch" claiming to be http://www
.<eprint-server>.ch/cgi/search/archive/advanced?_action_search=1&dataset=archive&exp=0|
1|-date%2Fcreators_name%2Ftitle|archive|-|
creators_name%2Feditors_name%3Acreators_name%2Feditors_name%3AALL%3AEQ%3AKr%C3%BCger%2C
+G|-|
eprint_status%3Aeprint_status%3AANY%3AEQ%3Aarchive&order=-date%2Fcreators_name%2Ftitle

>
> and works!
>
> - XML-Export for our CMS:
>
> https://www
.<eprint-server>.ch/cgi/users/home?screen=Workflow%3A%3AExportSavedSearchResults&dataset=saved_search&dataobj=<savedsearch_id>

>
> will be redirected to
>
> MailScanner has detected a possible fraud attempt from "
www.zoratest.uzh.ch" claiming to be https://www
.<eprint-server>.ch/cgi/saved_search/export_zora_XMLforCMS2.xml?savedsearchid=<savedsearch_id>&_action_export=1&_output=XMLforCMS2

>
> and fails... let's say: It's empty:
>
> <?xml version="1.0" encoding="utf-8" ?>
> <eprints xmlns="http://eprints.org/ep2/data/2.0";>
> </eprints>
>
> --
> Jens Vieler
> Informatikdienste
> Universit?t Z?rich
> Winterthurerstr. 190
> CH-8057 Z?rich
>
> mail:  jens.vieler at id.uzh.ch
> phone: +41 44 63 56777
> http://www.id.uzh.ch
>
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20150122/8bbe954c/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20150122/8bbe954c/attachment-0001.gif