[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Re: repos with a mix of HTTP and HTTPS



> From: eprints-tech-bounces at ecs.soton.ac.uk
[mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Yuri
> Sent: Monday, 23 November 2015 18:02
> To: eprints-tech at ecs.soton.ac.uk
>
> Il 23/11/2015 03:34, Matthew Kerwin ha scritto:
>>
>> Hi EPrintsers, I have a query about serving a repository with a mix of 
>> HTTP and HTTPS.
>>
>> Currently our two repositories have a pretty standard setup: the bulk 
>> of the site is served over plaintext HTTP, including untrusted session 
>> cookies. Secure/administrative functions are served over HTTPS.
>>
>
> Interesting, can you post how you didi it? It can be useful.
>

Sure, essentially we set up this sort of core config:

  $c->{host} = 'example.org';
  $c->{port} = 80;
  $c->{securehost} = 'example.org';
  $c->{secureport} = 443;
  $c->{securepath} = undef;  # in theory both secure and insecure path is
'/'
  $c->{http_cgi_root}  = '/cgi';
  $c->{https_root}     = '/secure';
  $c->{https_cgi_root} = '/secure/cgi';

We also did a bit of work setting up basic httpd VHost rules, with pointers
to our certificates, and a simple rewrite rule to make things run smoothly
[see below]. I think EPrints itself (using Apache::Rewrite) generates most
of the bounces between HTTP and HTTPS, as well as generating appropriate
relative URLs.

This is what I want to unpick.

>
>> We want to reconfigure the server to use HTTPS for the entire site 
>> (for various reasons, Google search rankings high amongst them.) 
>> However we want to retain the option of plaintext HTTP access so that 
>> some less modern external indexers and crawlers can continue to do 
>> their thing.
>>
>
> What is the problem in using https by spider instead of http? I would
switch entirely on https.
>

Sure, if I owned the spiders. But our repositories are accessed by other
robots within the university (and I don't have the political clout to force
them to rewrite/upgrade to HTTPS), and by external robots, including some
from the government (and I have no say at all in how those work.)

I want to go entirely HTTPS, but I need to allow some of those robots access
over plaintext.


* footnote: here's the basic gist of the httpd config we use:

-----%<-----
  Include /opt/eprints3/cfg/apache/repo.conf
  <VirtualHost 1.2.3.4:443>
    ServerName example.com
    SSLEngine On
    SSLCertificateFile    /path/to/repo.crt
    SSLCertificateKeyFile /path/to/repo.key
    PerlTransHandler +EPrints::Apache:::Rewrite
    Include /opt/eprints3/archives/repo/cfg/apachevhost_ssl.conf # standard
VHost config
    Include /opt/eprints3/apache_ssl/repo.conf # auto-generated; defines
<Location /secure>
    RewriteEngine On
    RewriteCond %{REQUEST_URI} !^/secure/
    RewriteRule ./* http://%{SERVER_NAME}%{REQUEST_URI} [L,R=301]
  </VirtualHost>
----->%-----

To be honest I haven't paid that much attention to this config for a while,
so some of it might be reconfigurable.

Cheers
-- 
Matthew Kerwin  |  QUT Library eServices  |  matthew.kerwin at qut.edu.au
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4845 bytes
Desc: not available
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20151123/5493369d/attachment-0001.bin