[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[EP-tech] SQL Problem at EPrints
- Subject: [EP-tech] SQL Problem at EPrints
- From: drn at ecs.soton.ac.uk (David R Newman)
- Date: Mon, 17 Apr 2023 08:47:50 +0100
- In-reply-to: <CACOEPmOoJ=CpUvVEEGo2-g1k_j28D6cGevNJc1j7065HJ+WCvg@mail.gmail.com>
- References: <CACOEPmOoJ=CpUvVEEGo2-g1k_j28D6cGevNJc1j7065HJ+WCvg@mail.gmail.com> <CACOEPmOoJ=CpUvVEEGo2-g1k_j28D6cGevNJc1j7065HJ+WCvg@mail.gmail.com> <ba55681e-b4a5-62ed-b066-d53aea95db27@ecs.soton.ac.uk>
Hi Agung PW,
I think this may be similar to the issue that Mario reported recently.?
The database cannot index certain words that are in the indexcodes files
generated, so that the full text of documents can be indexed.
Before, I proposed two solutions.? Below 1 is a stopgap to fix the issue
whilst you are on the current version of EPrints but it will mean
certain words will not be indexed.? 2 is my implemented solution for
future versions of EPrints that avoids certain words not being indexed:
1. Add the following to your archive's cfg/cfg.d/indexing.pl (if this
does not exist, copy into place from lib/cfg.d/indexing.pl).
??? ??? if( $word =~ m/[^\x20-\xEF]/ )
??????? {
??????????? $ok=0;
??????? }
Add this after the block of code:
??? ??? if( $word =~ m/^[A-Z][A-Z0-9]+$/ )
??????? {
??????????? $ok=1;
??????? }
The words that this will stop being indexed are unlikely to be words
that would be search for, as this code should only affect extended
characters.? The work I did on Mario's issue found these worlds were
mostly Latin or Greek characters using a particular font as they were
part of mathematical equations.? One example is: ???.
2. Look at https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fissues%2F320&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C686c4fd8655b4bdc287108db3f180c79%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638173144768569517%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9WLw3ge8a7mrz%2F0S0rtxKFmTkJAnnnT1rHbZH9rU3Hc%3D&reserved=0 and merge
the commit it contains.? This should add mappings for the indexer, so
these words can now be indexed.? However, for full text indexing, this
occurs when the indexcodes files is regenerated. epadmin has a command
to regenerate all these and reindex but that could take a very long time
with a large repository.? Therefore, I have improved the indexer so that
the --force flag on "epadmin reindex" will force the indexcodes files to
be regenerated and make use of this new mappings (see
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints3.4%2Fissues%2F321&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C686c4fd8655b4bdc287108db3f180c79%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638173144768569517%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mTTnO7S7NZEMf6ykbGwoQVogZ9ZQtdtenzOWvMnpYQc%3D&reserved=0) if you do not want to
use the new version of epadmin.? Using the "Reindex Item" button in the
web interface should achieve the same thing.? If you see my earlier
emails to Mario on the EPrints Tech list, you will see I was a little
baffled why indexcodes files were only re-generated this was and not
currently when using epadmin.
Anyway, with either solution, make sure that the indexer is restarted to
apply the changes made.? (If you intend to use the "Reindex Item" button
I would also reload the webserver just to be sure). Not restarting will
not affect you initial use of "epadmin reindex" for specific eprints you
want to test/fix but will prevent the changes being applied for future
indexing tasks carried out by the indexer.
Regards
David Newman
On 17/04/2023 12:51 am, Agung Prasetyo W. via Eprints-tech wrote:
> *CAUTION:* This e-mail originated outside the University of Southampton.
> Hi,
>
> When I running command : epadmin reindex *repository_id* *dataset_id*
> [*eprint_id*]
>
> I got an error like this :
> Indexed item: eprint/7039
> *DBD::mysql::st execute failed: Incorrect string value:
> '\xF0\x9D\x91\x9F13' for column 'word' at row 1 at
> /usr/share/eprints3/bin/../perl_lib/EPrints/Database.pm line 1287.*
> Indexed item: eprint/7040
> Indexed item: eprint/7041
> *DBD::mysql::st execute failed: Incorrect string value:
> '\xF0\x9D\x91\xA6\xF0\x9D...' for column 'word' at row 1 at
> /usr/share/eprints3/bin/../perl_lib/EPrints/Database.pm line 1287.*
> Indexed item: eprint/7042
> Indexed item: eprint/7043
> *DBD::mysql::st execute failed: Incorrect string value:
> '\xF0\x9D\x90\xBF\xF0\x9D...' for column 'word' at row 1 at
> /usr/share/eprints3/bin/../perl_lib/EPrints/Database.pm line 1287.*
> Indexed item: eprint/7044
> *DBD::mysql::st execute failed: Incorrect string value:
> '\xF0\x9D\x91\xA1\xF0\x9D...' for column 'word' at row 1 at
> /usr/share/eprints3/bin/../perl_lib/EPrints/Database.pm line 1287.*
> Indexed item: eprint/7045
> Indexed item: eprint/7046
> *DBD::mysql::st execute failed: Incorrect string value:
> '\xF0\x9D\x91\x9D\xF0\x9D...' for column 'word' at row 1 at
> /usr/share/eprints3/bin/../perl_lib/EPrints/Database.pm line 1287*.
> Indexed item: eprint/7047
>
> Is there any solution for this problem?
>
> Thank you.
>
> Regards,
> Agung PW
>
> *** Options:http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C686c4fd8655b4bdc287108db3f180c79%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638173144768569517%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vUy0YH0BKSKdXAk14VmKXbpvkRomDMe2RaHo8qDrd1E%3D&reserved=0
> *** EPrints community wiki:https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C686c4fd8655b4bdc287108db3f180c79%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638173144768569517%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=a8T4IAjmmpzuYdv4h4cFyF6CZklEG9i1ookVuN30bz8%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20230417/89a72a72/attachment-0001.html