[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] apostrophe in file names of uploaded/deposited file



Hi Tomasz,

There are two ways to work round this issue.? One has been in EPrints 
for quite a while, another I introduced in 3.4.3 to help deal 
retrospectively with this issue.

1. https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FOptional_filename_sanitise.pl&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=s%2BuLP7RStWfaHQY%2FTa1xqyoTICxFexPhVJrj%2BFL0fsI%3D&reserved=0 allows you 
to set characters that should be removed before a filename is recorded 
in the database or saved to disk.? I have to admit I did not know about 
this until fairly recently, so I have not tested how well it will work 
or solve your problem.? If you look at 
/opt/eprints3/lib/cfg,d/optional_filename_sanitise.pl there is a 
function that can be added under $c->{optional_filename_sanitise}.? The 
default (albeit commented out) function will remove white space, 
brackets and @ signs into underscores.? You could add a line like below 
to deal with apostrophes.

$filepath =~ s!\x27!_!g;

2. The new functionality I added for 3.4.3, is to allow files on disk to 
be found under the filename <fileid>.bin.? This allows you to fix this 
sort of issue by renaming the file on disk to <fileid>.bin.? Also, you 
can enable it so that future files are automatically saved in the format 
<fileid>.bin by setting:

$c->{generic_filenames} = 1;

I would probably advise against doing this on a live repository, 
especially if you have unusual uploads like uploading multiple files an 
once through "Upload from URL".? If you want to test this on a 
development repo, then please do, as any real-world-ish feedback on this 
feature would be useful.

Regards

David Newman

On 20/02/2022 20:32, Tomasz Neugebauer via Eprints-tech wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
>
> Good afternoon!
>
> I?m trying to troubleshoot an issue with exporting out a deposited 
> file that has an apostrophe in the filename.
>
> This is the issue: 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprintsug%2FEPrintsArchivematica%2Fissues%2F40&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=4Y%2F5Ce3e3cRSoybrdmZoeSWsHtXNWb6IHVU7ByZcXw8%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprintsug%2FEPrintsArchivematica%2Fissues%2F40&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=4Y%2F5Ce3e3cRSoybrdmZoeSWsHtXNWb6IHVU7ByZcXw8%3D&amp;reserved=0>
>
> Does EPrints replace apostrophes in filenames on disk with =0027?
>
> Where in the code does that happen?
>
> The URL of the file has the apostrophe, for example:
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspectrum.library.concordia.ca%2Fid%2Feprint%2F7066%2F1%2FServices_techniques_a_l%27Universite_Concordia.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=tB0a9gaytyO8qxGN9hp5VE8UfnTSIdqO%2FCkrYAb%2Bzbg%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspectrum.library.concordia.ca%2Fid%2Feprint%2F7066%2F1%2FServices_techniques_a_l%27Universite_Concordia.pdf&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=tB0a9gaytyO8qxGN9hp5VE8UfnTSIdqO%2FCkrYAb%2Bzbg%3D&amp;reserved=0>
>
> But unlike other Unicode characters, the apostrophe doesn?t make it 
> into the file name on disk, and is substituted with =0027.
>
> I?m looking for confirmation that this is how it is ?supposed? to 
> work, and for an understanding where this happens in the code, so that 
> I might ultimately know how many OTHER characters are replaced in this 
> way in the filename?
>
> Tomasz
>
>
> *** Options:http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=7FZgZoqB7Z%2F0CnX2T%2FsdEP0%2FlK0nv4QtbTyBCuIT7gg%3D&amp;reserved=0
> *** EPrints community wiki:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C01890e841d96494e29e008d9f4b7f289%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637809893170214283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ybN8HdOUnsJU4uBy8E85KpQo6NLc5XYowPAT2eC%2FDxA%3D&amp;reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20220220/d3e381fc/attachment.html