Contents |
My Server Load is High!
Is your sever running so slow it takes 10 minutes for a site to come up?
Does logging into ssh take so long that your hair noticeably grows in the process?
Have you received warning messages from Cpanel or other monitoring scripts saying there is a high load?
Don't panic! There is a way to True Enlightenment.
Why, Oh Why is the Load So High?
Okay - you need data. Useful data. Then you can figure out what is going on.
While there is no cookie-cutter template of actions that will solve all high load issues, here is a general approach to take.
First - you need to collect information. This will allow you zero in on the root cause of the issue.
Work from the top down, narrowing in as you go. You will find the reasons, or at least get very close.
If you are persistent you will find the reason. Take it as an opportunity to learn more about your server and how it works. This will come in very handy in the future!
Research Steps
You might want to take the following actions:
1) Log into your Sago Customer portal site: http://www.portal.sagonet.com Examine the bandwidth graph for the affected server. Is there extremely high inbound and outbound traffic? Very high inbound traffic in particular can indicate you are under a DoS attack. A large amount of outbound traffic sometimes indicated a server being used for malicious purposes. (spam, dos, or brute force attacks) If this is the case you can narrow your search in this direction.
2) If bandwidth usage looks okay but load is still extremely high, try logging in and running top and look for what processes are consuming the most CPU resource. Are Apache or MySQL or PHP running processes listed as using all the resources? If so, you can narrow the search in that direction.
There are all manner of shell commands to assist you but the most basic are:
archimedes@Anduril:~$ w 14:35:47 up 3 days, 4:52, 1 user, load average: 0.05, 0.15, 0.16 USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
Shows the 1, 5 and 15 minute load average. You can monitor this with:
archimedes@Anduril:~$ watch -n 1 w
Run this is a separate shell session or in a screen session. These figures are your indicators of if you are affecting anything.
Also you can use the ps and netstat shell commands - they are great!
Load average note What the heck are these numbers. Well, here is the uber simple answer. You want the number to be equal to or less than 1.0, for EACH processor. So a load of 4 on a dual P4D system is utterly fine and it has 4 CPU cores. A single core CPU server would (ideally) have a load average of about 1 or less. However, all that means is that the CPU has enough active processes on it with demand enough to keep it essentially busy. A server can still run fine with a load of 5. But if you see load averages of 10, 15, 35, etc. you have a problem. Read on.
3) If you know or suspect Apache, MySQL or other services to be the culprit stop them individually and wait (at least a few minutes) to see if this affects the load average - if it goes down, then that was it! Congratulations! Now, if you just never start them again you will be fine. Just kidding. If you isolate the offending process you can then isolate the users / actions running it and then modify permissions, block that IP, etc. to address the issue.
You stop them with commands like:
[root@tardis ~]# service httpd stop Stopping httpd: [ OK ] [root@tardis ~]# service mysql stop mysql: unrecognized service [root@tardis ~]# service mysqld stop Stopping MySQL: [ OK ]
You restart them, oddly, by specifying start rather then stop, and the option status will - well . . you get it.
MySQL note:
Sometimes load is caused by tons of MySQL activity. One way to investigate this is to run the mysql client in the shell:
Callandor:~ # mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.0.45 SUSE MySQL RPM Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show processlist; +----+------+-----------+------+---------+------+-------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------+------+---------+------+-------+------------------+ | 2 | root | localhost | NULL | Query | 0 | NULL | show processlist | +----+------+-----------+------+---------+------+-------+------------------+ 1 row in set (0.00 sec) mysql> quit Bye
That shows MySQL is so bored and lonely it is practically pining away. If you get 63 pages of results, well - you have perhaps stuck an auspicious line of research. To help ferret out MySQL activity you can also try MyTop, which is like the top shell command for MySQL - very nice:
http://jeremy.zawodny.com/mysql/mytop/
Also see Mytop - SQL Monitor
This article walks you through exactly how to install this and has an install script too.
PHP / PHP SuExec
Processes in top or ps will show as being run by the webserver user (Apache, nobody, etc.) rather than the actual user running them. This makes it hard to tell what user is running what. The answer is SuEXEC and if you have Cpanel you can simply recompile Apache and PHP to include this option. Now, you can identify the offending user account and zero in from there.
4) Look at your logs. Yes, everyone always tells you that right? Well, remember on a Linux server just about everything is logged. Unauthorized access attempts to you server feeling cranky - it all gets logged. The problem of course - where!
I feel your pain, I do. But wait, hope is not yet lost.
There is this amazingly nifty tool that let you basically google your logs. Yep. Tight. It's called Splunk and you can read all about it here Splunk Log Analysis
It will give you a time line view off all your primary logs, you can zoom in and out on the time line and search for data in all the logs you set up to monitor. Installing this is actually very easy and will be a great tool at your disposal.
5) Okay - you have seen Splunk shows when you had the most log entries going on and give you a lot - Oh My God a friggen LOT - of data. But I thought we were talking about high server loads. Man, you are sharp. Thought I could be a bit dodgy but I guess not. How do you relate when server loads are high to this log activity?
Time for another tool, you need system performance and reporting information. How about a beautiful graph that shows you CPU usage, memory and swap usage, and all kinds of yummy performance info displayed in aesthetically pleasing graphs? This is easy: see the article on System Reporting - sar & kSar
It's easy - I mean seriously easy, to use Sar and the graphical java tool Ksar that will create these type of server performance graphs.
Server Warrior
Now you are armed.
Now you have information and powerful weapons to wield.
Now you know the Tao of server load analyses.
Now just put it all together:
Using shell commands like top and ps you can examine currently running processes. By shutting down services when load is actively high you can see what service is responsible for the load. By analyzing log files with Splunk you can see exactly what was going on at that time. If the load is intermittent and you keep missing it by the time you get on to troubleshoot you can pull a report with Ksar and see a graph for any day and easily see when the load was high, then zero in on the log time line view in Splunk to see what was occurring.







