EPrints Technical Mailing List Archive

Message: #05159

< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: repos with a mix of HTTP and HTTPS

> From: eprints-tech-bounces@ecs.soton.ac.uk
[mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Yuri
> Sent: Monday, 23 November 2015 18:02
> To: eprints-tech@ecs.soton.ac.uk
> Il 23/11/2015 03:34, Matthew Kerwin ha scritto:
>> Hi EPrintsers, I have a query about serving a repository with a mix of 
>> HTTP and HTTPS.
>> Currently our two repositories have a pretty standard setup: the bulk 
>> of the site is served over plaintext HTTP, including untrusted session 
>> cookies. Secure/administrative functions are served over HTTPS.
> Interesting, can you post how you didi it? It can be useful.

Sure, essentially we set up this sort of core config:

  $c->{host} = 'example.org';
  $c->{port} = 80;
  $c->{securehost} = 'example.org';
  $c->{secureport} = 443;
  $c->{securepath} = undef;  # in theory both secure and insecure path is
  $c->{http_cgi_root}  = '/cgi';
  $c->{https_root}     = '/secure';
  $c->{https_cgi_root} = '/secure/cgi';

We also did a bit of work setting up basic httpd VHost rules, with pointers
to our certificates, and a simple rewrite rule to make things run smoothly
[see below]. I think EPrints itself (using Apache::Rewrite) generates most
of the bounces between HTTP and HTTPS, as well as generating appropriate
relative URLs.

This is what I want to unpick.

>> We want to reconfigure the server to use HTTPS for the entire site 
>> (for various reasons, Google search rankings high amongst them.) 
>> However we want to retain the option of plaintext HTTP access so that 
>> some less modern external indexers and crawlers can continue to do 
>> their thing.
> What is the problem in using https by spider instead of http? I would
switch entirely on https.

Sure, if I owned the spiders. But our repositories are accessed by other
robots within the university (and I don't have the political clout to force
them to rewrite/upgrade to HTTPS), and by external robots, including some
from the government (and I have no say at all in how those work.)

I want to go entirely HTTPS, but I need to allow some of those robots access
over plaintext.

* footnote: here's the basic gist of the httpd config we use:

  Include /opt/eprints3/cfg/apache/repo.conf
    ServerName example.com
    SSLEngine On
    SSLCertificateFile    /path/to/repo.crt
    SSLCertificateKeyFile /path/to/repo.key
    PerlTransHandler +EPrints::Apache:::Rewrite
    Include /opt/eprints3/archives/repo/cfg/apachevhost_ssl.conf # standard
VHost config
    Include /opt/eprints3/apache_ssl/repo.conf # auto-generated; defines
<Location /secure>
    RewriteEngine On
    RewriteCond %{REQUEST_URI} !^/secure/
    RewriteRule ./* http://%{SERVER_NAME}%{REQUEST_URI} [L,R=301]

To be honest I haven't paid that much attention to this config for a while,
so some of it might be reconfigurable.

Matthew Kerwin  |  QUT Library eServices  |  matthew.kerwin@qut.edu.au

Attachment: smime.p7s
Description: S/MIME cryptographic signature