Corrupted filesystem recovery dry-run with LVM snapshots

I have a corrupt Reiser filesystem that needs a tree rebuild on it, which can be a scary thing to do (and is only advised when you *really* do need to do it which, unfortunately, I do).

Now, this filesystem largely works, there is just a small part of it that causes problems when accessed. A rebuild could make things a lot worse, or it might just solve my problem (note: my problem appears NOT to be due to hardware failure. rebuilding the tree of a Reiser filesystem on hardware that has badsectors or whatever is VERY likely to make things worse. don’t do it).

So, I’m currently using the filesystem and avoiding the broken bit.  I need to know if one: how long a rebuild is going to take, so I can plan the downtime and two: will it complete sucessfully or will the world fall on my head.

LVM snapshots can help here and my filesystem in on a LVM logical volume.  The idea is to take a snapshot of the filesystem and run the rebuild on the snapshot.  Then you can decide whether you want to take the live filesystem down to rebuild that, or maybe you decide to update your backups best you can and start a new filesystem from scratch.

  1. create the snapshot: lvcreate -s -L 5G -n home-080605 /dev/myvg/home
  2. run the rebuild: time reiserfsck –rebuild-tree -y /dev/myvg/home-080605
  3. note the time it took
  4. mount the snapshot somewhere: mount /dev/myvg/home-080605 /mnt/home-rebuilt
  5. poke around a bit, make sure things are worse off (in my case, I took a file listing of live and snapshot filesystems and ran it through diff – nothing guaranteed, but provides some clues :)

The 5G is how much data can change from the time of the snapshot before LVM drops it.  This depends on how many changes the rebuild is going to make and how many changes to the original filesystem are made whilst you’re working (assuming it’s still in use like in this case)

In my case, the output of the rebuild command showed that just the known corrupted files had been affected, which was nice.  So now you can arrange the downtime and do the real rebuild. Feel free to take another snapshot before running the real rebuild, then if something does happen to go terribly wrong, you can recover to at least where you started.

Remember to remove the snapshots when you’re done: lvremove /dev/myvg/home-080605

Comments

TooMeeK says:

Yep. I can confirm this.
Rebuild three + reiserfs + bad sector = complete data loss.
One day I issued this command to recover deleted files. Result was a disaster..
There was no output after the command completed. No tree change. Only file contents.. were.. all.. corrupted! And that was single, non-RAIDed drive.

Leave a Reply