[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] subject dataset - removing subjectid from eprint



Hi Monica

I'm not saying it would be quick, but I'd be surprised it it really took an infeasible amount of time, even on a large repository.  Loading records is fairly lightweight and trivial -- it's writing that takes time, and that would only happen for records that were changed by the script.

As you've identified, EPrints is trying to be 'clever' with the subject by searching for items at that level or below.  Now that the subject in question has been removed from the tree, this may be what's causing the problem.  Three solutions I would consider:

* Do a record by record iterative search over the repository.
* Reinstate the subject id using the subject editor, run the script, then remove it from the tree.
* Identify the eprintids of items that have that subject set using a mysql query, write them to a file, then write a script to load and modify each of those eprints.



[Jisc]<http://www.jisc.ac.uk/>

Adam Field
SHERPA services analyst developer


From: <eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>> on behalf of Monica Wood <monica.wood at utas.edu.au<mailto:monica.wood at utas.edu.au>>
Reply-To: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Date: Thursday, 10 March 2016 23:12
To: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint

Hi Adam,

I believe changing the search would return all the eprint items in the repository?
We have a massive repository, so I this wouldn?t be a good option.

I have now done a bulk change and set the collections metafield as empty across all thesis item types.

However to help with debugging the script, I ran it with the args:  FIELDNAME = collections and SUBJECTID = theses .  If either of these were incorrect the script would have returned an error.
I only did the dry-run to see what it would output, but it never got to the bit of the script where it printed anything out, which is why I?m assuming the search returned no results, therefore $list is empty.

As in my previous email, I stated I put the noise level up to 3 so I could find out exactly what was happening and this was the Output:

Starting EPrints Repository.
Connecting to DB ... Database execute debug: SET NAMES 'utf8'
done.
Database execute debug:
SELECT `eprint`.`eprintid`
FROM `eprint`, `eprint_collections` AS `eprint_collections`, `subject_ancestors` AS `127395456subject_ancestors`
WHERE `eprint`.`eprintid`=`eprint_collections`.`eprintid`
AND `eprint_collections`.`collections`=`127395456subject_ancestors`.`subjectid`
AND `127395456subject_ancestors`.`ancestors` = 'theses'
GROUP BY `eprint`.`eprintid`

Ending EPrints Repository.


As you can see, it?s only returning those that match the eprint_collections.collections and the subject_ancestors.subjectid.  As I had removed the node ?theses? from the subject tree, it?s giving back no results from this query.

I?m wondering if something should be added to the UNLINK function in the Subject Tree, that when you remove a node for good from the subject tree than any matching metafields are also removed from the records?



Monica Wood
Library Systems Officer
Library | Division of Students & Education
University of Tasmania
Locked Bag 25
Hobart 7001
T +61 3 6226 1849
http://www.utas.edu.au/library

From: <eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>> on behalf of Adam Field <Adam.Field at jisc.ac.uk<mailto:Adam.Field at jisc.ac.uk>>
Reply-To: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Date: Friday, 11 March 2016 at 12:21 AM
To: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint

I would suggest running the script over the whole repository.

Looking at John's script, change this:


my $list = $session->get_repository->dataset( 'eprint' )->search( filters => [
        { meta_fields => [ $fieldname ],
          value => $subjectid }] );

To this:

my $list = $session->dataset('eprint')->search();

...and see what happens.

(though I agree with John that this shouldn't really make a difference).  If it doesn't work, please post exactly what you typed on the command-line to invoke the script.



[Jisc]<http://www.jisc.ac.uk/>

Adam Field
SHERPA services analyst developer


From: <eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>> on behalf of John Salter <J.Salter at leeds.ac.uk<mailto:J.Salter at leeds.ac.uk>>
Reply-To: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Date: Wednesday, 9 March 2016 06:36
To: "eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint

Interesting...
You could try adding the subject back into the tree temporarily to see if it works that way?

Using this script should cause any affected EPrints' summary pages to be regenerated - if you alter the database directly, you'd have to do this by running bin/generate_abstracts.

Cheers,
John
________________________________
From:eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> <eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>> on behalf of Monica Wood <monica.wood at utas.edu.au<mailto:monica.wood at utas.edu.au>>
Sent: 09 March 2016 04:33:34
To: 'eprints-tech at ecs.soton.ac.uk<mailto:'eprints-tech at ecs.soton.ac.uk>'
Subject: Re: [EP-tech] subject dataset - removing subjectid from eprint

Hi John,

Thanks for linking me to this script.
I?ve had a look through it and tried it out, but it?s not working. I believe this is because I?ve already removed the node from the subject tree (Unlinked it from the tree).

Putting the noise level up on the script to 3 gives me some feedback on a query it?s doing at I believe this line?

my $list = $session->get_repository->dataset( 'eprint' )->search( filters => [
        { meta_fields => [ $fieldname ],
          value => $subjectid }

This query is (with filename set to collections and subjectid set to theses)

Database execute debug: SELECT `eprint`.`eprintid` FROM `eprint`, `eprint_collections` AS `eprint_collections`, `subject_ancestors` AS `127395456subject_ancestors` WHERE `eprint`.`eprintid`=`eprint_collections`.`eprintid` AND `eprint_collections`.`collections`=`127395456subject_ancestors`.`subjectid` AND `127395456subject_ancestors`.`ancestors` = 'theses' GROUP BY `eprint`.`printed`

This is returning an empty list, as the theses subjectid no longer exists in subject_ancestors, but it does still exist in eprint_collections.

I?ll have a go at bulk changing the records from the GUI, if that doesn?t work out, I?ll do a bulk change directly in the database by removing the entries in eprint_collections that point to the theses subjectid.

Cheers,

Monica Wood

Library Systems Officer
Library | Division of Students & Education
University of Tasmania
Locked Bag 25
Hobart 7001
T +61 3 6226 1849
http://www.utas.edu.au/library

From: <eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk>> on behalf of John Salter <J.Salter at leeds.ac.uk<mailto:J.Salter at leeds.ac.uk>>
Reply-To: "'eprints-tech at ecs.soton.ac.uk<mailto:'eprints-tech at ecs.soton.ac.uk>'" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Date: Tuesday, 8 March 2016 at 10:06 PM
To: "'eprints-tech at ecs.soton.ac.uk<mailto:'eprints-tech at ecs.soton.ac.uk>'" <eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>>
Subject: Re: [EP-tech] subject dataset - remove_field

Hi Monica,
I think your suggestion will remove the field itself, rather than a specific value stored in that field.

I?ve done something similar ? just added it to the wiki for you:
https://wiki.eprints.org/w/Remove_subjectid_script

Let me know if it doesn?t work for you.

Cheers,
John


From:eprints-tech-bounces at ecs.soton.ac.uk<mailto:eprints-tech-bounces at ecs.soton.ac.uk> [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Monica Wood
Sent: 08 March 2016 06:17
To: eprints-tech at ecs.soton.ac.uk<mailto:eprints-tech at ecs.soton.ac.uk>
Subject: [EP-tech] subject dataset - remove_field

Hi there,

In our repository we have a root subject called ?Collections?  Under this I have unlinked(deleted) a child of Collections.
I now have the issue that all items that were connected to this collection still have the metadata saying so and on our summary page we display the collection an item belongs to.
So it?s now showing ???colllectionName??? as a link and that link is now dead.

Is there a way to delete these connections without needing to do it directly through the database?
I was wondering if the epadmin remove_field might do the job on the subject dataset?
Something like:
~/bin/epadmin remove_field repoid subject collectionid ??

Thanks in advanced

Monica Wood
Library Systems Officer
Library | Division of Students & Education
University of Tasmania
Locked Bag 25
Hobart 7001
T +61 3 6226 1849
http://www.utas.edu.au/library
Available Times
Tues: 9am ? 5pm
Wed: 1pm ? 5pm
Fri: 9am ? 5pm



University of Tasmania Electronic Communications Policy (December, 2014).
This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.

Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc?s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.

Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 2881024, VAT number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160311/4a030b89/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6B9928AE-9C97-4E75-8330-7E24168F02D7[10].png
Type: image/png
Size: 1264 bytes
Desc: 6B9928AE-9C97-4E75-8330-7E24168F02D7[10].png
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160311/4a030b89/attachment-0002.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6B9928AE-9C97-4E75-8330-7E24168F02D7[6].png
Type: image/png
Size: 1264 bytes
Desc: 6B9928AE-9C97-4E75-8330-7E24168F02D7[6].png
Url : http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20160311/4a030b89/attachment-0003.png