EPrints Technical Mailing List Archive

Message: #08429


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Word Documents won't download


CAUTION: This e-mail originated outside the University of Southampton.
Morning all,

Don, John and David; thank you so much for setting me on the right path. I had zero chance of sorting this out without your help. Especially since I forgot that the console existed in Chrome...

I read this blog post that explains my repo-woes:


I found this "solution" that appears to work on the repository side of things:


I've so far opted for the addition of the meta tag in ../cfg/templates/default.xml:

meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests"

I appreciate this isn't the BEST solution, but it's the most immediate one that gets some people off my back until I can implement a proper one. It does not solve the issue of accessing the file via Elements, which is what I expected.

It was one of my predecessors or CSD that set up HTTPS in our repository and it hasn't been done in the standard "EPrints way". I thought it worked well by redirecting people to HTTPS. I had previously tried replicating this set-up in our data repository, but I'm glad I didn't now. The data repo doesn't go to https until logging in and it often generates emails from users along the lines of "your website isn't secure!".

Anyway, thank you once more for the help and advice. I will update you all on how I get on.

Thanks,
James

On Mon, Jan 4, 2021 at 10:25 PM David R Newman <drn@ecs.soton.ac.uk> wrote:

Hi John,

I have just tested you config change but it does not seem to work on the abstract page of the repository I have been testing on.  My recommendation would be to set the following config option at the end of 10_core.pl to make the URLs protocol-relative:

$c->{http_url} = '//' . $c->{host} . '/';

$c->{http_cgiurl} = '//' . $c->{host} . '/cgi/';

This sets a protocol relative URL rather than an http one.  You could alternativerly set to 'https://' . $c->{host} . '/'; to just make all URLs https.  If you only set to the protocol-relative option then there is a minor issue that the EPrint::View page for live items with be display the URL as //HOSTNAME/12345/ rather than https://HOSTNAME/12345/ which may be confusing to some users as they would expect it to start http or https.  Also default abstract/summary pages will display the URI as protocol-relative at the end of the summary table.  These are issues I have been trying to address for adding robust protocol-relative URL support to EPrints 3.4.3.

The motivation to switch to procotol-relative URLs is that it saves a wholesale switch from http to https URLs with redirects that I have noted often causes a dip in Google-indexing and download stats for up to a month or so.  An explanation about why this happens can be found at:

https://wiki.eprints.org/w/Simplified_HTTPS_Configuration#Issues_and_Troubleshooting 

I don't think using https in the http_url configuration option will affect the download stats that much, as it won't lead to and http to https redirect that is the predominant factor in lowering download stats.  It will however, change the URIs for eprint items, which may have an affect on Google indexing.  However, this is very much dependent on how Google seeks out the URLs to be indexed, which is multivarious.

Regards

David Newman

On 04/01/2021 18:02, John Salter wrote:
CAUTION: This e-mail originated outside the University of Southampton.

I've just re-checked my config files.

For 3.3.x, if you include (in e.g. 10_core.pl):

    $c->{http_root} = undef;

It will make thumbnails/download links relative rather than absolute.

I think there was more to it that that though - if you're creating downloadable content (e.g. coversheets) , you want it to render the full links (using with https by defult).

 

Cheers,

John

 

From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of David R Newman via Eprints-tech
Sent: 04 January 2021 17:44
To: John Salter <J.Salter@leeds.ac.uk>; eprints-tech@ecs.soton.ac.uk; James Kerwin <jkerwin2101@gmail.com>
Subject: Re: [EP-tech] Word Documents won't download

 

Hi all,

So, the problem is the URL generated by EPrints compiled XML's cite:linkhere which uses http rather than https.  The suggestion John's makes about https://wiki.eprints.org/w/Simplified_HTTPS_Configuration#HTTPS_Only will only work if you are running EPrints 3.4.1+, which I can see that you are not.

One of the features I have been working on for 3.4.3 is protocol relative URLs which should help deal with these issues.  If you are still running 3.3.x fixing these sort of problems with be tricky.  I think you need to look at the various document citations and possible eprint_render.pl and replace the http URLs with https URLs.  In some cases the http URL will come from the <cite:linkshere>, which you will probably need to hack with a fix like:

<a href=""

and </cite:linkhere> with </a>

Hope this helps

David Newman

On 04/01/2021 17:20, John Salter wrote:

CAUTION: This e-mail originated outside the University of Southampton.

> I was just about to chime in that the document URL is rendered with http - but you're redirecting to https - so some part of Chrome's 302 handing is possibly confusing things…

 

To flesh that out a bit more:

White Rose is currently still available over http and https. Document links are relative - so match the protocol you're visiting the site from.

 

For LivRepo, it looks like you're using an HSTS setup so requests to http:// are redirected to https:// (via a 307 response).

 

If you update the download URL to use https (via Chrome Console / Inspect), it downloads fine.

 

To fix this in EPrints, https://wiki.eprints.org/w/Simplified_HTTPS_Configuration#HTTPS_Only - setting 'host' to undef is the key - although test this thoroughly first - can't remember if there's any related 'fun' with Symplectic connector if you do this (I don't think there is…).

 

Cheers,

John

 

 

From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of David R Newman via Eprints-tech
Sent: 04 January 2021 17:02
To: James Kerwin <jkerwin2101@gmail.com>
Cc: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] Word Documents won't download

 

Hi James,

You should note that the whiterose URL is http rather than https.  I have tested (on Chrome) the same URL I was testing before but with http rather than https and this worked just fine.  This is starting to suggest to me some security feature (albeit maybe a bit broken) within Chrome.  From the depths of my brain I am vaguely recall some issue to do with content length mismatches that exhibited similar symptoms.

Regards

David Newman

On 04/01/2021 16:53, James Kerwin wrote:

CAUTION: This e-mail originated outside the University of Southampton.

Hi David,

 

Thanks for your response. it's good to know it's not just me (although I did ask my family to also attempt to download it and they all struggled).

 

To add to the confusion, this item on the White Rose repository downloads fine. Unless Mr Salter has some different setup, I'm afraid it only adds to my quiet terror:

http://eprints.whiterose.ac.uk/160018/

 

What I have noticed is that his filenames in the word documents have no spaces. I'm currently mooching through our EPrints database for a doc(x) file that also avoids spaces. This isn't the most scientific way to work it out, but I'm hoping it yields some results...

 

This sort of problem landing at my feet on the first day back at work should be considered some sort of abuse of my human rights!

 

If by some miracle I find a cause or solution I will share it.

 

Thanks,

James

 

On Mon, Jan 4, 2021 at 4:45 PM David R Newman <drn@ecs.soton.ac.uk> wrote:

Hi James,

I see the same behaviour on your repository for the Word document on the URL you provided.  Similarly it works fine on FireFox but has problems on Chrome when you click on the link and don't try to download it another tab.  Oddly, if I try a second time I get a popup asking me if I want to allow downloads of multiple files.  I have tested on a different repository and I see the same issue with both .doc and .docx files.  I suspect there may be issues with all application/... mime type files.  My best guess is this is a new security feature from Chrome.  It may be something that requires tweaking Apache's configuration or possibly even something within EPrints.

I have also tested on Edge and Opera (all browsers running on Windows 10) and I do not have any issues either.  The Chrome version I am running is 87.0.4280.88, this looks to have been released at the beginning of December.   I do not know when my browser upgraded but there are currently no knew updates available for Chrome according to my browser's "About Google Chrome" page.  I will continue to investigate and get back to the list if I find out anything more.

Regards

David Newman

On 04/01/2021 15:54, James Kerwin via Eprints-tech wrote:

CAUTION: This e-mail originated outside the University of Southampton.

Hi All,

 

Happy new year etc. Hope everyone is well.

 

I have a problem that has appeared today and it was fine before 18/12/2020 (as in it worked as expected).

 

Word documents are not downloading on the repository when using Chrome. If I right the download link and open it in a new tab it works and the file is downloaded. PDFs are behaving fine. If I use FireFox and click the download link the file will download, but it does prompt me whether I want to save or open (this is fine, I don't use FF much so I won't click the "don't ask again" option).

 

In Elements on both Chrome and FF the file will not download. PDFs are downloading through Elements fine.

 

Is there a likely cause for this? Some sort of update to some obscure working of the internet?

 

Example record:

 

 

Any help will be gratefully received. I'm totally confused by it and don't know where to start.

 

Thanks,

James




*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/

 

Image removed by sender.

Virus-free. www.avg.com