EPrints Technical Mailing List Archive

Message: #06961


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Experimental Schema.org support for EPrints


Yeah, that's something I considered, but I figured it's too much learning curve for people to manage in addition to all their other work. Schema files are easier. Making valid JSONLD requires a fair bit of progression up the learning curve.


On 21/11/2017 16:56, Lizz Jennings wrote:
I did implement JSON-LD on the Bath Research Data Archive - won't necessarily translate to publications repos:

https://github.com/eprintsug/json-ld

Lizz

--
Lizz Jennings MSc MCLIP (Revalidated 2017)
Developer
Wessex House 4.16, University of Bath, Bath, BA2 7AY UK
E.Jennings@bath.ac.uk

-----Original Message-----
From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Christopher Gutteridge
Sent: 21 November 2017 16:46
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Experimental Schema.org support for EPrints

Hi, EPrints-tech, long time no-see.

I've recently rejoined the EPrints.soton.ac.uk support team, and was asked about trying out schema.org support (which Google and Bing like).
I'm not a huge fan as I like peer-to-peer data, rather than via the big search engines, but I gave it a go anyway.

I have been working on a way to add schema.org support to EPrints. It's using an invisible <div> which may not be everyone's preferred way of doing it, but has the advantage of working well with the citation files.

Other options would be to design the entire abstract page around this feature (possible, but work to add to existing sites) or use JSON-LD which is what I would do if I was doing it for just me, but making a configuration file to generate JSON-LD would be more work for me and more of a learning curve for the EPrints admin.

I've added it as a pilot to https://eprints.soton.ac.uk/ (subject to removal or change at any time)

See the data extracted from a page here:
https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Feprints.soton.ac.uk%2F50995%2F

There's lots more work to polish this, but it's work showing off now.

I've used 3 citation files for this. One outer  one to handle the different types. This is a bit ugly but was the solution I came up with, a second one to process fields that come in a standard install of EPrints, and a third for the fields eprints.soton has customised heavily.

In the main summary_page.xml I added:

    <epc:print expr="$item.citation('schema_org')" />

Which links to schema_org.xml:

<?xml version="1.0" ?>
<!DOCTYPE html SYSTEM "entities.dtd" >

<!--
      Full "abstract page" (or splash page or summary page, depending on your jargon) for an eprint.
-->

<cite:citation xmlns="http://www.w3.org/1999/xhtml";
xmlns:epc="http://eprints.org/ep3/control";
xmlns:cite="http://eprints.org/ep3/citation"; >

<div style='display:none'>
    <epc:choose>
      <epc:when test="type = 'article'">
        <div itemscope="itemscope"
itemtype="http://schema.org/ScholarlyArticle";>
          <epc:print expr="$item.citation('schema_org_main')" />
        </div>
      </epc:when>
      <epc:when test="type = 'book'">
        <div itemscope="itemscope" itemtype="http://schema.org/Book";>
          <epc:print expr="$item.citation('schema_org_main')" />
        </div>
      </epc:when>
      <!-- book_section -->
      <epc:when test="type = 'conference_item'">
        <div itemscope="itemscope"
itemtype="http://schema.org/ScholarlyArticle";>
          <epc:print expr="$item.citation('schema_org_main')" />
        </div>
      </epc:when>
      <epc:when test="type = 'monograph'">
        <div itemscope="itemscope"
itemtype="http://schema.org/ScholarlyArticle";>
          <epc:print expr="$item.citation('schema_org_main')" />
        </div>
      </epc:when>
      <!-- patent -->
      <epc:when test="type = 'thesis'">
        <div itemscope="itemscope" itemtype="http://schema.org/Thesis";>
          <epc:print expr="$item.citation('schema_org_main')" />
        </div>
      </epc:when>
      <epc:when test="type = 'dataset'">
        <div itemscope="itemscope" itemtype="http://schema.org/Dataset";>
          <epc:print expr="$item.citation('schema_org_main')" />
        </div>
      </epc:when>
      <!-- ad_item // art design item //  -->
      <epc:when test="type = 'mu_item'">
        <div itemscope="itemscope"
itemtype="http://schema.org/MusicComposition";>
          <epc:print expr="$item.citation('schema_org_main')" />
        </div>
      </epc:when>
      <!-- letter -->
      <!-- editorial -->
      <epc:when test="type = 'review'">
        <div itemscope="itemscope" itemtype="http://schema.org/Review";>
          <epc:print expr="$item.citation('schema_org_main')" />
        </div>
      </epc:when>
      <!-- special_issue -->
      <!-- meeting_abstract -->
      <!-- software // SoftwareApplication/ SoftwareSourceCode ?? -->
      <epc:when test="type = 'website'">
        <div itemscope="itemscope" itemtype="http://schema.org/Website";>
          <epc:print expr="$item.citation('schema_org_main')" />
        </div>
      </epc:when>

      <epc:otherwise>
        <div itemscope="itemscope" itemtype="http://schema.org/CreativeWork";>
          <epc:print expr="$item.citation('schema_org_main')" />
        </div>
      </epc:otherwise>
    </epc:choose>
</div>

</cite:citation>

Each of these options in turn links to the main one,
schama_org_main.xml, that uses default EPrints fields:

<?xml version="1.0" ?>
<!DOCTYPE html SYSTEM "entities.dtd" >

<cite:citation xmlns="http://www.w3.org/1999/xhtml";
xmlns:epc="http://eprints.org/ep3/control";
xmlns:cite="http://eprints.org/ep3/citation"; >

<div itemprop="name"><epc:print expr="title" /></div>
<div itemprop="headline"><epc:print expr="title" /></div>
<img itemprop="image"
src="http://www.eprints.org/uk/wp-content/uploads/EprintsServices2015icon.jpg";
/>
<epc:if test="abstract">
    <div itemprop="description"><epc:print expr="abstract" /></div>
</epc:if>
<epc:if test="keywords">
    <div itemprop="keywords"><epc:print expr="keywords" /></div>
</epc:if>
<epc:if test="isbn">
    <div itemprop="isbn"><epc:print expr="isbn" /></div>
</epc:if>
<epc:if test="id_number">
    <div itemprop="identifier"><epc:print expr="id_number" /></div>
</epc:if>

<epc:if test="issn or series">
    <div itemprop="isPartOf" itemscope="itemscope"
itemtype="http://schema.org/Periodical";>
      <epc:if test="issn"><div itemprop="issn"><epc:print expr="issn"
/></div></epc:if>
      <epc:if test="series"><div itemprop="name"><epc:print expr="series"
/></div></epc:if>
    </div>
</epc:if>

<epc:comment>
    <!-- pageEnd and pageStart could go here but are more bother to
extract. -->
</epc:comment>

<epc:if test="pagerange">
    <div itemprop="pagination"><epc:print expr="as_string(pagerange)"
/></div>
</epc:if>
<epc:if test="publisher">
    <div itemprop="publisher" itemscope="itemscope"
itemtype="http://schema.org/Organization";>
      <div itemprop="name"><epc:print expr="publisher" /></div>
    </div>
</epc:if>
<epc:if test="official_url">
    <div itemprop="url"><epc:print expr="official_url" /></div>
</epc:if>

<epc:if test="creators">
    <epc:foreach expr="creators" iterator="person">
      <div itemprop="creator" itemscope="itemscope"
itemtype="http://schema.org/Person";>
        <div itemprop="name"><epc:print
expr="$person.subproperty('name')" /></div>
        <epc:if test="$person.subproperty('id')">
          <div itemprop="identifier"><epc:print
expr="$person.subproperty('id')" /></div>
        </epc:if>
      </div>
    </epc:foreach>
</epc:if>
<epc:if test="editors">
    <epc:foreach expr="editors" iterator="person">
      <div itemprop="editor" itemscope="itemscope"
itemtype="http://schema.org/Person";>
        <div itemprop="name"><epc:print
expr="$person.subproperty('name')" /></div>
        <epc:if test="$person.subproperty('id')">
          <div itemprop="identifier"><epc:print
expr="$person.subproperty('id')" /></div>
        </epc:if>
      </div>
    </epc:foreach>
</epc:if>

<epc:if test="corp_creators">
    <epc:foreach expr="corp_creators" iterator="org">
      <div itemprop="creator" itemscope="itemscope"
itemtype="http://schema.org/Organization";>
        <div itemprop="name"><epc:print
expr="$person.subproperty('name')" /></div>
      </div>
    </epc:foreach>
</epc:if>


<epc:comment>
    ADD IN LOCAL EXTENSIONS USING THIS FILE
</epc:comment>
<epc:print expr="$item.citation('schema_org_lcoal')" />

</cite:citation>

Finally I created schema_org_local.xml for the fields like date and
creators which we've heavily messed around with.

<?xml version="1.0" ?>
<!DOCTYPE html SYSTEM "entities.dtd" >

<!--
      Local extra content for schema.org info on summary page.

      This file can be used to add new fields that are not standard for
EPrints.
-->

<cite:citation xmlns="http://www.w3.org/1999/xhtml";
xmlns:epc="http://eprints.org/ep3/control";
xmlns:cite="http://eprints.org/ep3/citation"; >

<epc:if test="dates">
    <epc:foreach expr="dates" iterator="date">
      <epc:if test="$date.subproperty('date_type') = 'published'">
        <div itemprop="datePublished"><epc:print
expr="$date.subproperty('date')" /></div>
      </epc:if>
      <epc:if test="$date.subproperty('date_type') = 'completed'">
        <div itemprop="dateCompleted"><epc:print
expr="$date.subproperty('date')" /></div>
      </epc:if>
    </epc:foreach>
</epc:if>

<epc:if test="contributors">
    <epc:foreach expr="contributors" iterator="person">
      <div itemprop="contributor" itemscope="itemscope"
itemtype="http://schema.org/Person";>
        <div itemprop="name"><epc:print
expr="$person.subproperty('name')" /></div>
        <epc:if test="$person.subproperty('id')">
          <div itemprop="identifier"><epc:print
expr="$person.subproperty('id')" /></div>
        </epc:if>
      </div>
    </epc:foreach>
</epc:if>

</cite:citation>


I'm not sure how useful all this is but figured I'd throw it out there.
It uses a default image as for some reason the Google checker insisted.
It doesn't link to files or mention subjects, doesn't include URIs
properly and doesn't link to ORCID etc. (which is data we have in
eprints.soton).




--
Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read our Web & Data Innovation blog: http://blogs.ecs.soton.ac.uk/webteam/