[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] {SPAM?} Re: Word Documents won't download



Hi John,

I have just tested you config change but it does not seem to work on the 
abstract page of the repository I have been testing on.? My 
recommendation would be to set the following config option at the end of 
10_core.pl to make the URLs protocol-relative:

$c->{http_url} = '//' . $c->{host} . '/';

$c->{http_cgiurl} = '//' . $c->{host} . '/cgi/';

This sets a protocol relative URL rather than an http one.? You could 
alternativerly set to 'https://' . $c->{host} . '/'; to just make all 
URLs https.? If you only set to the protocol-relative option then there 
is a minor issue that the EPrint::View page for live items with be 
display the URL as //HOSTNAME/12345/ rather than https://HOSTNAME/12345/ 
which may be confusing to some users as they would expect it to start 
http or https.? Also default abstract/summary pages will display the URI 
as protocol-relative at the end of the summary table.? These are issues 
I have been trying to address for adding robust protocol-relative URL 
support to EPrints 3.4.3.

The motivation to switch to procotol-relative URLs is that it saves a 
wholesale switch from http to https URLs with redirects that I have 
noted often causes a dip in Google-indexing and download stats for up to 
a month or so.? An explanation about why this happens can be found at:

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FSimplified_HTTPS_Configuration%23Issues_and_Troubleshooting&data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420643419%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BVhxD1YG7DxO9hnY5yitVLrUGbCaWgXQLbmmcO%2BVZEc%3D&reserved=0 


I don't think using https in the http_url configuration option will 
affect the download stats that much, as it won't lead to and http to 
https redirect that is the predominant factor in lowering download 
stats.? It will however, change the URIs for eprint items, which may 
have an affect on Google indexing.? However, this is very much dependent 
on how Google seeks out the URLs to be indexed, which is multivarious.

Regards

David Newman

On 04/01/2021 18:02, John Salter wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
>
> I've just re-checked my config files.
>
> For 3.3.x, if you include (in e.g. 10_core.pl):
>
> $c->{http_root} = undef;
>
> It will make thumbnails/download links relative rather than absolute.
>
> I think there was more to it that that though - if you're creating 
> downloadable content (e.g. coversheets) , you want it to render the 
> full links (using with https by defult).
>
> Cheers,
>
> John
>
> *From:*eprints-tech-bounces at ecs.soton.ac.uk 
> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] *On Behalf Of *David R 
> Newman via Eprints-tech
> *Sent:* 04 January 2021 17:44
> *To:* John Salter <J.Salter at leeds.ac.uk>; 
> eprints-tech at ecs.soton.ac.uk; James Kerwin <jkerwin2101 at gmail.com>
> *Subject:* Re: [EP-tech] Word Documents won't download
>
> Hi all,
>
> So, the problem is the URL generated by EPrints compiled XML's 
> cite:linkhere which uses http rather than https.? The suggestion 
> John's makes about 
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FSimplified_HTTPS_Configuration%23HTTPS_Only&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420653410%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z4Qgc5m9BgmeojwNrPg8AQ22%2Fheubga5oZhzMS%2FkoqY%3D&amp;reserved=0 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FSimplified_HTTPS_Configuration%23HTTPS_Only&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420653410%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z4Qgc5m9BgmeojwNrPg8AQ22%2Fheubga5oZhzMS%2FkoqY%3D&amp;reserved=0> 
> will only work if you are running EPrints 3.4.1+, which I can see that 
> you are not.
>
> One of the features I have been working on for 3.4.3 is protocol 
> relative URLs which should help deal with these issues.? If you are 
> still running 3.3.x fixing these sort of problems with be tricky.? I 
> think you need to look at the various document citations and possible 
> eprint_render.pl and replace the http URLs with https URLs.? In some 
> cases the http URL will come from the <cite:linkshere>, which you will 
> probably need to hack with a fix like:
>
> <a href="{$config{https_url}}/{eprintid}/{pos}/{main}">
>
> and </cite:linkhere> with </a>
>
> Hope this helps
>
> David Newman
>
> On 04/01/2021 17:20, John Salter wrote:
>
>     *CAUTION:*This e-mail originated outside the University of
>     Southampton.
>
>     > I was just about to chime in that the document URL is rendered
>     with http - but you're redirecting to https - so some part of
>     Chrome's 302 handing is possibly confusing things?
>
>     To flesh that out a bit more:
>
>     White Rose is currently still available over http and https.
>     Document links are relative - so match the protocol you're
>     visiting the site from.
>
>     For LivRepo, it looks like you're using an HSTS setup so requests
>     to http:// are redirected to https:// (via a 307 response).
>
>     If you update the download URL to use https (via Chrome Console /
>     Inspect), it downloads fine.
>
>     To fix this in EPrints,
>     https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FSimplified_HTTPS_Configuration%23HTTPS_Only&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420653410%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z4Qgc5m9BgmeojwNrPg8AQ22%2Fheubga5oZhzMS%2FkoqY%3D&amp;reserved=0
>     <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.eprints.org%2Fw%2FSimplified_HTTPS_Configuration%23HTTPS_Only&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420653410%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z4Qgc5m9BgmeojwNrPg8AQ22%2Fheubga5oZhzMS%2FkoqY%3D&amp;reserved=0>
>     - setting 'host' to undef is the key - although test this
>     thoroughly first - can't remember if there's any related 'fun'
>     with Symplectic connector if you do this (I don't think there is?).
>
>     Cheers,
>
>     John
>
>     *From:*eprints-tech-bounces at ecs.soton.ac.uk
>     <mailto:eprints-tech-bounces at ecs.soton.ac.uk>
>     [mailto:eprints-tech-bounces at ecs.soton.ac.uk
>     <mailto:eprints-tech-bounces at ecs.soton.ac.uk>] *On Behalf Of
>     *David R Newman via Eprints-tech
>     *Sent:* 04 January 2021 17:02
>     *To:* James Kerwin <jkerwin2101 at gmail.com>
>     <mailto:jkerwin2101 at gmail.com>
>     *Cc:* eprints-tech at ecs.soton.ac.uk
>     <mailto:eprints-tech at ecs.soton.ac.uk>
>     *Subject:* Re: [EP-tech] Word Documents won't download
>
>     Hi James,
>
>     You should note that the whiterose URL is http rather than https.?
>     I have tested (on Chrome) the same URL I was testing before but
>     with http rather than https and this worked just fine.? This is
>     starting to suggest to me some security feature (albeit maybe a
>     bit broken) within Chrome.? From the depths of my brain I am
>     vaguely recall some issue to do with content length mismatches
>     that exhibited similar symptoms.
>
>     Regards
>
>     David Newman
>
>     On 04/01/2021 16:53, James Kerwin wrote:
>
>         *CAUTION:*This e-mail originated outside the University of
>         Southampton.
>
>         Hi David,
>
>         Thanks for your response. it's good to know it's not just me
>         (although I did ask my family to also attempt to download it
>         and they all struggled).
>
>         To add to the confusion, this item on the White Rose
>         repository downloads fine. Unless Mr Salter has some different
>         setup, I'm afraid it only adds to my quiet?terror:
>
>         https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Feprints.whiterose.ac.uk%2F160018%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420653410%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=BkCmsaCTMcfWykSHh%2FlEs1d%2Fc%2Fj2waX9sJq0j5cYUxM%3D&amp;reserved=0
>         <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Feprints.whiterose.ac.uk%2F160018%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420653410%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=BkCmsaCTMcfWykSHh%2FlEs1d%2Fc%2Fj2waX9sJq0j5cYUxM%3D&amp;reserved=0>
>
>         What I have noticed is that his filenames in the word
>         documents have no spaces. I'm currently mooching through our
>         EPrints database for a doc(x) file that also avoids spaces.
>         This isn't the most scientific way to work it out, but I'm
>         hoping it yields some results...
>
>         This sort of problem landing at my feet on the first day back
>         at work should be considered some sort of abuse of my human
>         rights!
>
>         If by some miracle I find a cause or solution I will share it.
>
>         Thanks,
>
>         James
>
>         On Mon, Jan 4, 2021 at 4:45 PM David R Newman
>         <drn at ecs.soton.ac.uk <mailto:drn at ecs.soton.ac.uk>> wrote:
>
>             Hi James,
>
>             I see the same behaviour on your repository for the Word
>             document on the URL you provided.? Similarly it works fine
>             on FireFox but has problems on Chrome when you click on
>             the link and don't try to download it another tab.? Oddly,
>             if I try a second time I get a popup asking me if I want
>             to allow downloads of multiple files.? I have tested on a
>             different repository and I see the same issue with both
>             .doc and .docx files.? I suspect there may be issues with
>             all application/... mime type files.? My best guess is
>             this is a new security feature from Chrome.? It may be
>             something that requires tweaking Apache's configuration or
>             possibly even something within EPrints.
>
>             I have also tested on Edge and Opera (all browsers running
>             on Windows 10) and I do not have any issues either.? The
>             Chrome version I am running is 87.0.4280.88, this looks to
>             have been released at the beginning of December. I do not
>             know when my browser upgraded but there are currently no
>             knew updates available for Chrome according to my
>             browser's "About Google Chrome" page.? I will continue to
>             investigate and get back to the list if I find out
>             anything more.
>
>             Regards
>
>             David Newman
>
>             On 04/01/2021 15:54, James Kerwin via Eprints-tech wrote:
>
>                 *CAUTION:*This e-mail originated outside the
>                 University of Southampton.
>
>                 Hi All,
>
>                 Happy new year etc. Hope everyone is well.
>
>                 I have a problem that has appeared today and it was
>                 fine before 18/12/2020 (as in it worked as expected).
>
>                 Word documents are not downloading on the repository
>                 when using Chrome. If I right the download link and
>                 open it in a new tab it works and the file is
>                 downloaded. PDFs are behaving fine. If I use FireFox
>                 and click the download link the file will download,
>                 but it does prompt me whether?I want to save or open
>                 (this is fine, I don't use FF much so I won't click
>                 the "don't ask again" option).
>
>                 In Elements on both Chrome and FF the file will not
>                 download. PDFs are downloading through Elements fine.
>
>                 Is there a likely cause for this? Some sort of update
>                 to some obscure working of the internet?
>
>                 Example record:
>
>                 https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3033942%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420653410%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=twvZpswzDncqXshdfVLqxuP5RNEJNNBjGauNgOKA%2FRQ%3D&amp;reserved=0
>                 <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flivrepository.liverpool.ac.uk%2F3033942%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420653410%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=twvZpswzDncqXshdfVLqxuP5RNEJNNBjGauNgOKA%2FRQ%3D&amp;reserved=0>
>
>                 Any help will be gratefully received. I'm totally
>                 confused by it and don't know where to start.
>
>                 Thanks,
>
>                 James
>
>
>
>
>                 *** Options:http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech  <http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech>
>
>                 *** Archive:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420663403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=RfRyiPFUqY8nQ8n%2F5U7z%2F%2Be5xRpwPWkY25q3cKJNJRw%3D&amp;reserved=0  <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420663403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=RfRyiPFUqY8nQ8n%2F5U7z%2F%2Be5xRpwPWkY25q3cKJNJRw%3D&amp;reserved=0>
>
>                 *** EPrints community wiki:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420663403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Uy6NOljw25X1QKtkk2t%2BNZN6RJFNI71FaQW%2BOHOpQag%3D&amp;reserved=0  <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420663403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Uy6NOljw25X1QKtkk2t%2BNZN6RJFNI71FaQW%2BOHOpQag%3D&amp;reserved=0>
>
>             Image removed by sender.
>             <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420663403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uYROJgw1yL9C%2FZftCg1C2y15YjPxP7gwvetidLe%2F%2BfI%3D&amp;reserved=0>
>
>             	
>
>             Virus-free. https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420663403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=q7kEdDou8%2FOnUrx3n4eebdEzaxtjdQ%2FaASi0RI8riGU%3D&amp;reserved=0
>             <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420663403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uYROJgw1yL9C%2FZftCg1C2y15YjPxP7gwvetidLe%2F%2BfI%3D&amp;reserved=0>
>
>


-- 
This email has been checked for viruses by AVG.
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.avg.com%2F&amp;data=04%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ce473e57443b04e0d801d08d8b0ffa658%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637453959420663403%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=r46aYDw39SF5IvFoiYS5cDBPcs2sozs7vNfcFPNadj4%3D&amp;reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20210104/49110e8e/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 350 bytes
Desc: not available
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20210104/49110e8e/attachment-0001.jpg