IOWait definition & properties
IOWait (usually labeled %wa in top) is a sub-category of idle (%idle is usually expressed as all idle except defined subcategories), meaning the CPU is not doing anything. Therefore, as long as there is another process that the CPU could be processing, it will do so. Additionally, idle, user, system, iowait, etc are a measurement with respect to the CPU. In other words, you can think of iowait as the idle caused by waiting for io.
Precisely, iowait is time spent receiving and handling hardware interrupts as a percentage of processor ticks. Software interrupts usually are labled separately as %si.
Importance & Potential misconception
IOWait is important because it often is a key metric to know if you’re bottlenecked on IO. But absense of iowait does not necessarily mean your application is not bottlenecked on IO. Consider two applications running on a system. If program 1 is heavily io bottlenecked and program 2 is a heavy CPU user, the %user + %system of CPU may still be something like ~100% and correspondingly, iowait would show 0. But that’s just because program 2 is intensive and relatively appear to say nothing about program 1 because all this is from the CPU’s point of view.
Tools to Detect IOWait
These are very basic tools often found in linux (if not all) systems. Some may require additional repos.
- top – This will be the simplest way in linux system. Just type “top”.
- iostat – More detailed with regarding to IO. Personally, I really thing the -x flag (extended) is quite essential. Part of sysstat package.
- iotop – like top but just for io.
- sar – This will display stats over history.
Reducing IOWait
- Make sure that you have enough physical memory. If you run out of RAM and your OS starts to use the disk for cache, you will have a bad time.
- Defrag or keep sufficient space left in the drive. Any drive over 90% (roughly) will have difficulty preventing fragmentation of the disk.
- Optimize your software. Use memory based caching techniques to reduce the request to your drive.
- If you have a software that makes very frequent disk calls but the files are temporary and does not need to be kept over time, try using ramdisk.
- Also, as we are now almost entering 2013, in addition to above, the option of simply awesome IO storage devices are affordable, namely SSDs. SSDs are awesome!!!
What happens if iowait reaches 20, it is always 20 on our db servers
Then it’s most likely that you are reaching an IO based bottleneck. I suggest you upgrade your IO throughput either by increasing hardware or optimizing your software.
Can you please explain the same with an example? may be a sar output like-
%user %nice %system %iowait %steal %idle
23.52 1.34 9.68 2.39 0.00 63.07
Thank you!
That means under specified user, 23% of the CPU is being used. Nice is when it’s priority is specified, like lower than others so that it doesn’t hog the CPU. System is a step above the user but ultimately is run by your program. So usually, user + system can be seen as one. User + nice + system is all active CPU usage. Your IOWait of 2.39 means you have some processes that are waiting for IO to respond and can’t proceed otherwise. Steal is a bit complicated to say in a few words but you don’t seem to have any going on and it’s only in VM system. Your idle is 63, which you can interpret as it’s not doing anything 63% of the time.
What is a good IOwait value depends on what you’re running. Like if it was a download server, it’ll still be stable even at like 50%. If it’s a frequent high speed response system, like a database server, even 1% of IOWait can be felt quite severely.
Defrag? In Linux?
Could you elaborate? (Yes, I realize this is an old post.)
Thanks! I’m genuinely curious.
Most linux distros come with newer journaling system that doesn’t require much, if any defrags. But if your drive is too full (ex: 90%+) and actively used, it will still run into defragmentation problems. There are tools like e4defrag to defrag ext4, xfs_fsr for xfs, etc.