EPrints Technical Mailing List Archive

Message: #09513


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Want to know total id item on document folder

  • To: Agung Prasetyo W. <prazetyo@gmail.com>
  • Subject: Re: [EP-tech] Want to know total id item on document folder
  • From: David R Newman <drn@ecs.soton.ac.uk>
  • Date: Tue, 12 Dec 2023 09:26:06 +0000

Hi Agung,

I have never before tested what happens if you reach item 100,000,000.  However, it certainly won't start using disk1.   I don't think I have even seen a repository reach 1 million records.  Typically the largest repositories I have ever seen are in the 100,000 to 200,000 range (i.e. 0.1% to 0.2% of the maximum possible number of items). 

I have just tried this in a test repository, setting the counter for eprintid to 100,000,000 and then creating a new record it creates the following directory (for eprint 100,000,001):

disk0/10/00/00/001

This directory is successfully created and I can see the revisions sub-directory for it has been created and I can upload a document and see this on the filesystem.  Obviously the sub-directories that will be created will make the directory structure for documents quite messy but it does seem to at least still be functional.  However, I cannot be sure than there are not certain aspects of EPrints (or plugins for EPrints) that rely on the directory structure being diskN/NN/NN/NN/NN.

I can imagine in the future that people might want to upload much larger numbers of documents to some eprints but the prospect of an organisation adding many exponentials more research publication records a year than they (or anyone else) currently do, lacks a use case as far as I can see.  Certainly, if such a case did exist it would be an extreme edge case and at that size would likely have other issues beyond this 100 million records limitation. 

Regards

David Newman


On 12/12/2023 8:42 am, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.
Hi,
As for the hard disk capacity, perhaps it could be increased, but I'm curious, for example if the item ID has reached 99,999,999, will the 100,000,000 item ID be placed on disk1?
For example, the folder format changes to:
disk1
    |-- 10
          |-- 00
                |-- 00
                      |-- 01
                      |-- ...
                      |-- 99

thank you

regards,
Agung PW

On Thu, 7 Dec 2023 at 02:09, David R Newman <drn@ecs.soton.ac.uk> wrote:
Hi Agung,

The maximum number of EPrints records in 100 million minus 1 (99,999,999).  At the time this was decided it was deemed to be well in excess of the number of research publications that would ever need to be stored.  There can be considerably more documents added, as a each research publication (eprint) can have a near unrestricted number of documents associated with it.

The reason that there is a directory called disk0 is that this will be created on the same disk partition as the EPrints installation and this may run out of space.  Being able to create disk1 as a symlink to a different disk partition means you can deal with running out of disk space.  Nowadays, it is normally much easier to increase a disk partition, as typically people run VMs and you just edit the VM's configuration to increase the disk space.  If you were to manually create disk1 under the documents directory of your archive, newly created eprint records would start using this rather than disk0. However, the eprint IDs would still continue to increase from whatever was the ID of the last eprint created.

Regards

David Newman

On 01/12/2023 2:47 am, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.
Hi,
See the document storage structure in eprints like this
disk0
   |-- 00
         |-- 00
               |-- 00
                     |-- 01
                     |-- ...
                     |-- 99
Can I know the maximum number of item IDs until disk0 changes to disk1?

Thank You

Regards,
Agung PW

*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/