EPrints Technical Mailing List Archive

Message: #06956


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Experimental Schema.org support for EPrints


Hi, EPrints-tech, long time no-see.

I've recently rejoined the EPrints.soton.ac.uk support team, and was asked about trying out schema.org support (which Google and Bing like). I'm not a huge fan as I like peer-to-peer data, rather than via the big search engines, but I gave it a go anyway.

I have been working on a way to add schema.org support to EPrints. It's using an invisible <div> which may not be everyone's preferred way of doing it, but has the advantage of working well with the citation files.

Other options would be to design the entire abstract page around this feature (possible, but work to add to existing sites) or use JSON-LD which is what I would do if I was doing it for just me, but making a configuration file to generate JSON-LD would be more work for me and more of a learning curve for the EPrints admin.

I've added it as a pilot to https://eprints.soton.ac.uk/ (subject to removal or change at any time)

See the data extracted from a page here: https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Feprints.soton.ac.uk%2F50995%2F

There's lots more work to polish this, but it's work showing off now.

I've used 3 citation files for this. One outer  one to handle the different types. This is a bit ugly but was the solution I came up with, a second one to process fields that come in a standard install of EPrints, and a third for the fields eprints.soton has customised heavily.

In the main summary_page.xml I added:

  <epc:print expr="$item.citation('schema_org')" />

Which links to schema_org.xml:

<?xml version="1.0" ?>
<!DOCTYPE html SYSTEM "entities.dtd" >

<!--
    Full "abstract page" (or splash page or summary page, depending on your jargon) for an eprint.
-->

<cite:citation xmlns="http://www.w3.org/1999/xhtml"; xmlns:epc="http://eprints.org/ep3/control"; xmlns:cite="http://eprints.org/ep3/citation"; >

<div style='display:none'>
  <epc:choose>
    <epc:when test="type = 'article'">
      <div itemscope="itemscope" itemtype="http://schema.org/ScholarlyArticle";>
        <epc:print expr="$item.citation('schema_org_main')" />
      </div>
    </epc:when>
    <epc:when test="type = 'book'">
      <div itemscope="itemscope" itemtype="http://schema.org/Book";>
        <epc:print expr="$item.citation('schema_org_main')" />
      </div>
    </epc:when>
    <!-- book_section -->
    <epc:when test="type = 'conference_item'">
      <div itemscope="itemscope" itemtype="http://schema.org/ScholarlyArticle";>
        <epc:print expr="$item.citation('schema_org_main')" />
      </div>
    </epc:when>
    <epc:when test="type = 'monograph'">
      <div itemscope="itemscope" itemtype="http://schema.org/ScholarlyArticle";>
        <epc:print expr="$item.citation('schema_org_main')" />
      </div>
    </epc:when>
    <!-- patent -->
    <epc:when test="type = 'thesis'">
      <div itemscope="itemscope" itemtype="http://schema.org/Thesis";>
        <epc:print expr="$item.citation('schema_org_main')" />
      </div>
    </epc:when>
    <epc:when test="type = 'dataset'">
      <div itemscope="itemscope" itemtype="http://schema.org/Dataset";>
        <epc:print expr="$item.citation('schema_org_main')" />
      </div>
    </epc:when>
    <!-- ad_item // art design item //  -->
    <epc:when test="type = 'mu_item'">
      <div itemscope="itemscope" itemtype="http://schema.org/MusicComposition";>
        <epc:print expr="$item.citation('schema_org_main')" />
      </div>
    </epc:when>
    <!-- letter -->
    <!-- editorial -->
    <epc:when test="type = 'review'">
      <div itemscope="itemscope" itemtype="http://schema.org/Review";>
        <epc:print expr="$item.citation('schema_org_main')" />
      </div>
    </epc:when>
    <!-- special_issue -->
    <!-- meeting_abstract -->
    <!-- software // SoftwareApplication/ SoftwareSourceCode ?? -->
    <epc:when test="type = 'website'">
      <div itemscope="itemscope" itemtype="http://schema.org/Website";>
        <epc:print expr="$item.citation('schema_org_main')" />
      </div>
    </epc:when>

    <epc:otherwise>
      <div itemscope="itemscope" itemtype="http://schema.org/CreativeWork";>
        <epc:print expr="$item.citation('schema_org_main')" />
      </div>
    </epc:otherwise>
  </epc:choose>
</div>

</cite:citation>

Each of these options in turn links to the main one, schama_org_main.xml, that uses default EPrints fields:

<?xml version="1.0" ?>
<!DOCTYPE html SYSTEM "entities.dtd" >

<cite:citation xmlns="http://www.w3.org/1999/xhtml"; xmlns:epc="http://eprints.org/ep3/control"; xmlns:cite="http://eprints.org/ep3/citation"; >

<div itemprop="name"><epc:print expr="title" /></div>
<div itemprop="headline"><epc:print expr="title" /></div>
<img itemprop="image" src="http://www.eprints.org/uk/wp-content/uploads/EprintsServices2015icon.jpg"; />
<epc:if test="abstract">
  <div itemprop="description"><epc:print expr="abstract" /></div>
</epc:if>
<epc:if test="keywords">
  <div itemprop="keywords"><epc:print expr="keywords" /></div>
</epc:if>
<epc:if test="isbn">
  <div itemprop="isbn"><epc:print expr="isbn" /></div>
</epc:if>
<epc:if test="id_number">
  <div itemprop="identifier"><epc:print expr="id_number" /></div>
</epc:if>

<epc:if test="issn or series">
  <div itemprop="isPartOf" itemscope="itemscope" itemtype="http://schema.org/Periodical";>     <epc:if test="issn"><div itemprop="issn"><epc:print expr="issn" /></div></epc:if>     <epc:if test="series"><div itemprop="name"><epc:print expr="series" /></div></epc:if>
  </div>
</epc:if>

<epc:comment>
  <!-- pageEnd and pageStart could go here but are more bother to extract. -->
</epc:comment>

<epc:if test="pagerange">
  <div itemprop="pagination"><epc:print expr="as_string(pagerange)" /></div>
</epc:if>
<epc:if test="publisher">
  <div itemprop="publisher" itemscope="itemscope" itemtype="http://schema.org/Organization";>
    <div itemprop="name"><epc:print expr="publisher" /></div>
  </div>
</epc:if>
<epc:if test="official_url">
  <div itemprop="url"><epc:print expr="official_url" /></div>
</epc:if>

<epc:if test="creators">
  <epc:foreach expr="creators" iterator="person">
    <div itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person";>       <div itemprop="name"><epc:print expr="$person.subproperty('name')" /></div>
      <epc:if test="$person.subproperty('id')">
        <div itemprop="identifier"><epc:print expr="$person.subproperty('id')" /></div>
      </epc:if>
    </div>
  </epc:foreach>
</epc:if>
<epc:if test="editors">
  <epc:foreach expr="editors" iterator="person">
    <div itemprop="editor" itemscope="itemscope" itemtype="http://schema.org/Person";>       <div itemprop="name"><epc:print expr="$person.subproperty('name')" /></div>
      <epc:if test="$person.subproperty('id')">
        <div itemprop="identifier"><epc:print expr="$person.subproperty('id')" /></div>
      </epc:if>
    </div>
  </epc:foreach>
</epc:if>

<epc:if test="corp_creators">
  <epc:foreach expr="corp_creators" iterator="org">
    <div itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Organization";>       <div itemprop="name"><epc:print expr="$person.subproperty('name')" /></div>
    </div>
  </epc:foreach>
</epc:if>


<epc:comment>
  ADD IN LOCAL EXTENSIONS USING THIS FILE
</epc:comment>
<epc:print expr="$item.citation('schema_org_lcoal')" />

</cite:citation>

Finally I created schema_org_local.xml for the fields like date and creators which we've heavily messed around with.

<?xml version="1.0" ?>
<!DOCTYPE html SYSTEM "entities.dtd" >

<!--
    Local extra content for schema.org info on summary page.

    This file can be used to add new fields that are not standard for EPrints.
-->

<cite:citation xmlns="http://www.w3.org/1999/xhtml"; xmlns:epc="http://eprints.org/ep3/control"; xmlns:cite="http://eprints.org/ep3/citation"; >

<epc:if test="dates">
  <epc:foreach expr="dates" iterator="date">
    <epc:if test="$date.subproperty('date_type') = 'published'">
      <div itemprop="datePublished"><epc:print expr="$date.subproperty('date')" /></div>
    </epc:if>
    <epc:if test="$date.subproperty('date_type') = 'completed'">
      <div itemprop="dateCompleted"><epc:print expr="$date.subproperty('date')" /></div>
    </epc:if>
  </epc:foreach>
</epc:if>

<epc:if test="contributors">
  <epc:foreach expr="contributors" iterator="person">
    <div itemprop="contributor" itemscope="itemscope" itemtype="http://schema.org/Person";>       <div itemprop="name"><epc:print expr="$person.subproperty('name')" /></div>
      <epc:if test="$person.subproperty('id')">
        <div itemprop="identifier"><epc:print expr="$person.subproperty('id')" /></div>
      </epc:if>
    </div>
  </epc:foreach>
</epc:if>

</cite:citation>


I'm not sure how useful all this is but figured I'd throw it out there. It uses a default image as for some reason the Google checker insisted. It doesn't link to files or mention subjects, doesn't include URIs properly and doesn't link to ORCID etc. (which is data we have in eprints.soton).



--
Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read our Web & Data Innovation blog: http://blogs.ecs.soton.ac.uk/webteam/