Monday, August 31, 2009

Notes on Migrating to RAID

This’ll be fun.  Over the weekend one of my colleagues and I were tasked with converting our software repository to RAID.  If you are not familiar with RAID and you’d like to read up on it.  Hold onto your geek badge and click here.  It stands for Redundant Array of Independent Disks.

Lesson One: You can’t go from no RAID to RAID.  First, I thought, I’ll just install the second drive and tell the RAID controller to convert it (along with the first drive) to RAID 1.  I naively thought that the controller would simply mirror the old hard drive onto the new hard drive and voila, I’d be done.  No such luck.  I don’t know if there is a RAID controller out there capable of such magic, but you’d think there would be.  The RAID controller we were using said it had to erase the drives in order to create the set.

Lessons Two: A RAID set contains less information than the drive alone.  If you have an 80 gig hard drive and you format it, you can get about 75 gigs worth of usable space.  I assumed that if I imaged this drive, then I’d be able to directly restore the image onto a newly created RAID set.  Create an image, add the new 80 gig hard drive, tell the RAID controller to make a RAID 1 set (it removes all data when it does this – see Lesson One), then restore the image.

When trying to restore the image (after an hour of creating the image), we were told that the destination was too small.  Apparently RAID uses some overhead so that the usable space of a 2 80 gig RAID set is only about 74 gig (lost some available space).

We had to restore the image back to a single drive, use a partitioning program to shrink the partition from 75 gig to 70 gig, then create a new image of the (now 70 gig) partition.

Lesson Three: Some and perhaps all RAID controllers disable write caching.  RAID 1, in particular, is made for fault tolerance.  If one drive fails, you can continue to work.  If you have write caching turned on, you could easily get corruption on your RAID set if there is a power failure of some sort.  Disabling write caching makes this much less likely. 

When we got the new partition image and started the recovery, we were told it would take about 5 hours.  We went home and returned the next day.

I’m not sure how to overcome the lengthy image restore anomaly.  If there was a way to turn on write caching at the controller level that might have worked.  It’s also possible to setup the RAID set then disable one of the drives so that synchronization isn’t being performed during the restoration, but I’m not sure how safe or fast that would be.

No comments:

Post a Comment