[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[EP-tech] apostrophe in file names of uploaded/deposited file
Hi Tomasz,
There are two ways to work round this issue.? One has been in EPrints
for quite a while, another I introduced in 3.4.3 to help deal
retrospectively with this issue.
1. https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FOptional_filename_sanitise.pl&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=s%2BuLP7RStWfaHQY%2FTa1xqyoTICxFexPhVJrj%2BFL0fsI%3D&reserved=0 allows you
to set characters that should be removed before a filename is recorded
in the database or saved to disk.? I have to admit I did not know about
this until fairly recently, so I have not tested how well it will work
or solve your problem.? If you look at
/opt/eprints3/lib/cfg,d/optional_filename_sanitise.pl there is a
function that can be added under $c->{optional_filename_sanitise}.? The
default (albeit commented out) function will remove white space,
brackets and @ signs into underscores.? You could add a line like below
to deal with apostrophes.
$filepath =~ s!\x27!_!g;
2. The new functionality I added for 3.4.3, is to allow files on disk to
be found under the filename <fileid>.bin.? This allows you to fix this
sort of issue by renaming the file on disk to <fileid>.bin.? Also, you
can enable it so that future files are automatically saved in the format
<fileid>.bin by setting:
$c->{generic_filenames} = 1;
I would probably advise against doing this on a live repository,
especially if you have unusual uploads like uploading multiple files an
once through "Upload from URL".? If you want to test this on a
development repo, then please do, as any real-world-ish feedback on this
feature would be useful.
Regards
David Newman
On 20/02/2022 20:32, Tomasz Neugebauer via Eprints-tech wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
>
> Good afternoon!
>
> I?m trying to troubleshoot an issue with exporting out a deposited
> file that has an apostrophe in the filename.
>
> This is the issue:
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprintsug%2FEPrintsArchivematica%2Fissues%2F40&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4Y%2F5Ce3e3cRSoybrdmZoeSWsHtXNWb6IHVU7ByZcXw8%3D&reserved=0
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprintsug%2FEPrintsArchivematica%2Fissues%2F40&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4Y%2F5Ce3e3cRSoybrdmZoeSWsHtXNWb6IHVU7ByZcXw8%3D&reserved=0>
>
> Does EPrints replace apostrophes in filenames on disk with =0027?
>
> Where in the code does that happen?
>
> The URL of the file has the apostrophe, for example:
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspectrum.library.concordia.ca%2Fid%2Feprint%2F7066%2F1%2FServices_techniques_a_l%27Universite_Concordia.pdf&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=tB0a9gaytyO8qxGN9hp5VE8UfnTSIdqO%2FCkrYAb%2Bzbg%3D&reserved=0
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspectrum.library.concordia.ca%2Fid%2Feprint%2F7066%2F1%2FServices_techniques_a_l%27Universite_Concordia.pdf&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=tB0a9gaytyO8qxGN9hp5VE8UfnTSIdqO%2FCkrYAb%2Bzbg%3D&reserved=0>
>
> But unlike other Unicode characters, the apostrophe doesn?t make it
> into the file name on disk, and is substituted with =0027.
>
> I?m looking for confirmation that this is how it is ?supposed? to
> work, and for an understanding where this happens in the code, so that
> I might ultimately know how many OTHER characters are replaced in this
> way in the filename?
>
> Tomasz
>
>
> *** Options:http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7FZgZoqB7Z%2F0CnX2T%2FsdEP0%2FlK0nv4QtbTyBCuIT7gg%3D&reserved=0
> *** EPrints community wiki:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ybN8HdOUnsJU4uBy8E85KpQo6NLc5XYowPAT2eC%2FDxA%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20220220/d3e381fc/attachment.html