CPU Observations
RAID-5 and CPU Utilization, in Theory
When considering parity RAID, hardware RAID offers a significant advantage over software RAID in terms of how much effort the operating system, and therefore the CPU, has to spend on storage management.
The hardware controller handles the extra I/O required by RAID-5. Consider what has to happen when an application writes, say, 64 KiB of aligned data to a random place on one of the virtual disks used in these tests. On an array with a 64-KiB stripe block size, that 64 KiB block will go to one and only one of the physical disks; let us call that disk A. Before the controller can complete the write, it has to update the parity information for the stripe. It cannot do that without having all the blocks for that stripe, so it has to read a block from two other disks, B and C, assuming no corresponding data is currently cached in the controller’s memory. Once it has the block that the application wrote plus the two blocks of the same stripe that it just read, it can perform an XOR operation on the three blocks to generate a parity block. Then, it can write the original block the application wrote to disk A plus the new parity block for that stripe to disk D. That is two 64 KiB read operations plus two 64 KiB write operations done between the controller and the disks, whereas the operating system sent only 64 KiB to the controller. On a RAID-5 array with four disks, that represents a 75% reduction in I/O to the OS.
I have seen 24-disk RAID-6 arrays used for backup servers. The I/O reduction to the OS in such a case may be greater than 95%.
The other advantage often touted is that the hardware handles the XOR operations on the blocks written to the disk so that the CPU does not have to. This claim is a little disingenuous. Any modern CPU can probably XOR data far faster than a RAID controller. However, if the CPU were made to do it, it would mean that the RAID controller is passing the data back to the OS, which would defeat the purpose of hardware RAID. Proclaiming XOR offloading as an advantage of hardware RAID is like saying that your 3D GPU card handles your display and — by the way — also handles texture mapping, so your CPU doesn’t have to. Well, obviously. It is the bandwidth that is being offloaded to the controller and parity calculation is an integral part of that.
When not using hardware RAID, the OS has to do all the I/O and XOR itself. This means more utilization of the CPU. The OS is responsible for reading the blocks to calculate the parity, as well as the additional write of the parity, and all this I/O consumes bus bandwidth and CPU time.
Let us look at what the CPU utilization looked like during the test intervals.