[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[EP-tech] Server timeouts - MySQL Problems

Hi James,

Technically, there is also mtop (m for mysql) but I don't think it is in the standard package repositories and is only useful for keeping an eye on something at the time it is happening, unless you can hack a way to get it to log like atop.


David Newman

On 05/03/2020 14:37, James Kerwin wrote:
Thanks David!

I've used top and htop, but never atop. I've just installed it so I will get to investigating it now. Sounds like it could be really useful. Improves on my previous idea of staying up until the suspected failure time and looking at which processes were running.

I'll get working on point two now.

I may delay point three until I'm really stuck. Although I have just noticed that the Elements "get_records" script appears to run for longer than an hour, so it's for example still running the 1pm script when the 2pm script starts.

I'm trying to decide if there's any great harm in doing frequent curl calls to the homepage from another server to see at which point it fails so I can pin down a more precise time for the problem.

So much to investigate!

Thanks again for your advice. It's greatly appreciated.


On Thu, Mar 5, 2020 at 1:37 PM Newman D.R. <drn at ecs.soton.ac.uk<mailto:drn at ecs.soton.ac.uk>> wrote:

Hi James,

Several suggestions:

1. Try install atop [1], this creates log files similar to what you get from running the top command.  This will allow you to look back later to see what was going on at the times when the server was not responding.  By default it takes a snapshot every 10 minutes.  It might be worth swaping this to every minute or couple of minutes.

2. Edit MySQL's configuration to introduce a log file for slow running queries [2].

3. I use something called pt-kill to kill very long running queries that may be blocking other queries [3].


David Newman

[1] https://linux.die.net/man/1/atop<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinux.die.net%2Fman%2F1%2Fatop&data=01%7C01%7C%7C7ad1d54da0074526f43508d7c113d248%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=OIoZx72wcd08HLfEZOUCg0FVQd7mC45wR20QpnJp0ik%3D&reserved=0>

[2] https://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdev.mysql.com%2Fdoc%2Frefman%2F5.7%2Fen%2Fslow-query-log.html&data=01%7C01%7C%7C7ad1d54da0074526f43508d7c113d248%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=GyF%2BaaqHZqQPWOMRzdYuvcYsOxUPw94UOYlOES8HbL0%3D&reserved=0>

[3] https://www.percona.com/doc/percona-toolkit/LATEST/pt-kill.html<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.percona.com%2Fdoc%2Fpercona-toolkit%2FLATEST%2Fpt-kill.html&data=01%7C01%7C%7C7ad1d54da0074526f43508d7c113d248%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=SYE5WWmN2jsLi73ejPR29tMU8VSvYLD2d3MiZb3XaHQ%3D&reserved=0>

On 05/03/2020 12:46, James Kerwin via Eprints-tech wrote:
Hi All,

This isn't necessarily directly EPrints related, but its about a server running EPrints.

I've noticed a problem this week with the repository. In the early hours of the morning the number of users drops to zero for several hours between 2am and 6am (according to Google Analytics). Due to having  a cold I've been up between these times and can confirm that the repository website times out when I try to connect from home.

I don't get any memory or CPU warnings from our monitoring software. My gut instinct is that it's an issue with MySQL connections not closing in a timely manner. We do have cron jobs that run at 1:30, 2:30 and 3:30 which I'm aware fall right within the problem zone, but these have been running at the same time for years and have never caused an issue.

Has anybody experienced anything similar to this or have suggestions as to how I could chase it down?

It's a Ubuntu server with MySQL running EPrints 3.3.14. I don't think it's an EPrints issue, but there is nothing in the log files to suggest what's happening. The apache error log is blank for the hours that the server won't connect.


*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&data=01%7C01%7C%7C7ad1d54da0074526f43508d7c113d248%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=FKqf8b7%2B7qH4yuWrOA7FA6Vk9QULZL6O4h9WLHUwjvQ%3D&reserved=0>
*** EPrints community wiki: http://wiki.eprints.org/<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&data=01%7C01%7C%7C7ad1d54da0074526f43508d7c113d248%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=XrQ1V01jWKSMYvBJJ5%2B5LTM7CcztIXFDus0puKQLd1M%3D&reserved=0>

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png]<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&data=01%7C01%7C%7C7ad1d54da0074526f43508d7c113d248%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=qXKNpNaVQNJGrQmY7EMaMSdq5NDxTVk8%2BjiKHOxHF1w%3D&reserved=0> Virus-free. www.avg.com<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.avg.com%2Femail-signature%3Futm_medium%3Demail%26utm_source%3Dlink%26utm_campaign%3Dsig-email%26utm_content%3Demailclient&data=01%7C01%7C%7C7ad1d54da0074526f43508d7c113d248%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=qXKNpNaVQNJGrQmY7EMaMSdq5NDxTVk8%2BjiKHOxHF1w%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ecs.soton.ac.uk/pipermail/eprints-tech/attachments/20200305/34d1a2ba/attachment-0001.html