Eliminating Backup System Bottlenecks

I suspect that we all have a love/hate relationship with backups. Having backups done is a great feeling, but getting them to the point that you can absolutely count on them can be an exercise in frustration. Not only are they incredibly important, but the more you need them, the harder they are to do right.

In Jacob Farmer’s training, “Eliminating Backup System Bottlenecks”, I’ve been learning more about how to ensure that even large backup jobs complete in time. In time for what? Well, before the next backup job starts would be nice!

As it turns out, there are several possible bottlenecks for backup operations, and the severity of each depends on the size of your network and the sorts of data you are backing up. That’s not to mention the actual methods and devices that you use, each of which can introduce issues to the process. In small networks, bottlenecks are relatively obvious because there are only a few parts to the whole environment. In larger environments, the backup infrastructure gets considerably more complex.

Jacob divides the backup infrastructure into three parts, each of which have their own possibilities for slowing down the process:

The first place to start troubleshooting slow backups is the front end. Front end bottlenecks are caused by large volumes, static files, and slow hosts. These issues can be combated by using intelligent backup solutions rather than copying everything each time. Many backup solutions are available which only copy the daily deltas or only the pieces of large files which have changed.

Central bottlenecks should only be attacked after the front line issues have been resolved. Issues in the central area are primarily related to insufficient processing abilities of the backup servers or network data movers. In a properly designed backup solution, the actual data writers are running at full capacity. This can only happen if the devices and network feeding the data are working optimally. If data movers are being overloaded, adding more can help, as can adding a slave server if necessary.

Back end bottlenecks are complex. Slow backups after improving the front end clients and the centralized data movers are related to the media and drives used to write and store the data. Whether you use disks, tapes, or VTLs, each has its own particular quirks.

When using tape, ensure that you can feed the tape data at the maximum write speed of the drive. Feeding data any slower leads to “shoe shining” where the tape writes at max speed, which causes the tape to move past the end of the data. To write the next bit of information, the tape has to rewind, and then it writes the next chunk, overshoots again, and rewinds…repeat ad nauseam. This dramatically decreases the write speed (and lifetime) of the drive.

Because of this throughput requirement, adding tape drives will only speed up the backup if your current tape drive(s) are running at maximum speed, and if your backup server(s) are capable of increasing the current output rate. Adding slave servers to write to additional tape drives may be a more reliable way of guaranteeing output.

When dealing with backups to disk, the difficulty swings the other direction, and your target may not be fast enough. With multiple backup writers, one array could have its I/O maxed out relatively easily.

The ultimate solution is to test and document a baseline benchmark of all the individual pieces of your backup solution, and critically evaluate each against the other. Using the careful, planned approach rather than an ad hoc purchasing spree will lead to faster backups, quicker recoveries, and greater efficiency of the process.


By Matt Simmons, author of the Standalone Sysadmin blog

Tags: 

Comments

[...] I suspect that we all have a love/hate relationship with backups. Having backups done is a great feeling, but getting them to the point that you can absolutely count on them can be an exercise in frustration. Not only are they incredibly important, but the more you need them, the harder they are to do [...] Go to Source for Full Story [...]

0 likes
0 dislikes