[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Server timeouts - MySQL Problems

Thanks David!

I've used top and htop, but never atop. I've just installed it so I will
get to investigating it now. Sounds like it could be really useful.
Improves on my previous idea of staying up until the suspected failure time
and looking at which processes were running.

I'll get working on point two now.

I may delay point three until I'm really stuck. Although I have just
noticed that the Elements "get_records" script appears to run for longer
than an hour, so it's for example still running the 1pm script when the 2pm
script starts.

I'm trying to decide if there's any great harm in doing frequent curl calls
to the homepage from another server to see at which point it fails so I can
pin down a more precise time for the problem.

So much to investigate!

Thanks again for your advice. It's greatly appreciated.


On Thu, Mar 5, 2020 at 1:37 PM Newman D.R. <drn at ecs.soton.ac.uk> wrote:

> Hi James,
> Several suggestions:
> 1. Try install atop [1], this creates log files similar to what you get
> from running the top command.  This will allow you to look back later to
> see what was going on at the times when the server was not responding.  By
> default it takes a snapshot every 10 minutes.  It might be worth swaping
> this to every minute or couple of minutes.
> 2. Edit MySQL's configuration to introduce a log file for slow running
> queries [2].
> 3. I use something called pt-kill to kill very long running queries that
> may be blocking other queries [3].
> Regards
> David Newman
> [1] https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinux.die.net%2Fman%2F1%2Fatop&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C6f31453cbbc44c02487108d7c112d06b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=cpUU6f9tG%2BQ%2FAjE1atyE3qENHkjcr%2BYkWpb%2FghysSBw%3D&amp;reserved=0
> [2] https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdev.mysql.com%2Fdoc%2Frefman%2F5.7%2Fen%2Fslow-query-log.html&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C6f31453cbbc44c02487108d7c112d06b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=%2Fl%2B%2Fo9Wnz0LfApTUgWGTCK0h1WyvOjsJ4LaqC1PKO4Q%3D&amp;reserved=0
> [3] https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.percona.com%2Fdoc%2Fpercona-toolkit%2FLATEST%2Fpt-kill.html&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C6f31453cbbc44c02487108d7c112d06b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=D87SP3HBmtsPsa07LglVpEsOPkhuipjLsjsIpAo10To%3D&amp;reserved=0
> On 05/03/2020 12:46, James Kerwin via Eprints-tech wrote:
> Hi All,
> This isn't necessarily directly EPrints related, but its about a server
> running EPrints.
> I've noticed a problem this week with the repository. In the early hours
> of the morning the number of users drops to zero for several hours between
> 2am and 6am (according to Google Analytics). Due to having  a cold I've
> been up between these times and can confirm that the repository website
> times out when I try to connect from home.
> I don't get any memory or CPU warnings from our monitoring software. My
> gut instinct is that it's an issue with MySQL connections not closing in a
> timely manner. We do have cron jobs that run at 1:30, 2:30 and 3:30 which
> I'm aware fall right within the problem zone, but these have been running
> at the same time for years and have never caused an issue.
> Has anybody experienced anything similar to this or have suggestions as to
> how I could chase it down?
> It's a Ubuntu server with MySQL running EPrints 3.3.14. I don't think
> it's an EPrints issue, but there is nothing in the log files to suggest
> what's happening. The apache error log is blank for the hours that the
> server won't connect.
> Thanks,
> James
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C6f31453cbbc44c02487108d7c112d06b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=XmH84Ex8nRVZwwo2xXpbwryIQvN7iqX7wL1MnwVcKaI%3D&amp;reserved=0
> *** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C6f31453cbbc44c02487108d7c112d06b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=o680tYctEZppuQaLfOkjruAy3qdiX%2FX4LL%2F%2BO38XEm4%3D&amp;reserved=0
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C6f31453cbbc44c02487108d7c112d06b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=kUqtACKAEc%2FRWt2t30LSgdWYKySXOx25B3%2FItdMex30%3D&amp;reserved=0> Virus-free.
> https://eur03.safelinks.protection.outlook.com/?url=www.avg.com&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C6f31453cbbc44c02487108d7c112d06b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=0KFsW3%2BXcWUZ0SxvrMcn2SK94fsX7AVUya%2FLxjhGGYM%3D&amp;reserved=0
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C6f31453cbbc44c02487108d7c112d06b%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=kUqtACKAEc%2FRWt2t30LSgdWYKySXOx25B3%2FItdMex30%3D&amp;reserved=0>
> <#m_-9132979355531325136_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20200305/15484e16/attachment.html