Add drive to RAID5 on Ubuntu

Some time ago I migrated from a RAID1 setup to RAID5, this was on the minimum 3 drives. At some point this summer I spotted a good deal on a matching 1TB drive to what I had in my array and bought it. My purchase sat in my desk drawer for a month (or two) then I finally got around to installing it into the server. At least another couple of months went by until I got to adding it to my array – it turns out to be really simple and I’m kicking myself for dragging my feet.

With any hardware upgrade (specifically drives) it’s a good idea to capture what the system thinks things look like before you make any changes. For the most part Ubuntu talks about UUIDs for drives, but a couple of places (at least in my install) use the /dev/sd*# names and can trip you up when you shuffle hardware around. Capturing the drive assignments is simply a matter of:

$ sudo fdisk -l | grep ^/dev

Post hardware installation I was surprised at how much of a shuffle the /dev/sd*#‘s changed around. I was glad I had a before and after capture of the data, it also let me identify the new drive pretty easily.

Early in my notes I have “could it be this simple?” and a link to the kernel.org wiki on RAID. It turns out that yes, it really is that simple — but you do need to follow the steps carefully. I did also find an Ubuntu Forum post that was a good read for background too.

The new drive I had temporarily used on an OSX system to do some recovery work, so fdisk wasn’t very happy about working with the drive that had a GUID partition table (GPT). It turns out parted was happy to work with the volume and let me even change it back into something fdisk could work with.

I puzzled over the fact that this new drive wanted to start at 2048 instead of 63, I was initially under the incorrect assumption this had something to do with the GPT setup that I hadn’t been able to fix. Consider two basically identical volumes (old followed by new)

$ sudo fdisk -l /dev/sdb

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/sdb1 63 1953520064 976760001 83 Linux

$ sudo fdisk -l /dev/sdc

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/sdc1 2048 1953525134 976761543+ 83 Linux

I’ve highlighted the key differences in bold, you can see the physical sector size is 4096 vs. 512 and that is the reason for the different start position. Ok, diversion over – let’s actually follow the wiki and get this drive added to the RAID array.

Start by looking at what we have:

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md_d3 : active raid5 sdf1[1] sdd1[0] sdb1[2]
1953519872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

So, my RAID5 array is /dev/md_d3, and I know my new drive is /dev/sdc1 after my parted/fdisk adventure above.

$ sudo mdadm --add /dev/md_d3 /dev/sdc1

Now we look at mdstat again and it shows we have a spare. This is honestly what I should have at least done with the drive immediately after installing it – having a spare lets the RAID array fail over to the spare drive with no administrator intervention.

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md_d3 : active raid5 sdc1[3](S) sdf1[1] sdd1[0] sdb1[2]
1953519872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Next we grow the array across the new device

$ sudo mdadm --grow --raid-devices=4 /dev/md_d3

You can peek at /proc/mdstat from time to time (or use the watch command) to monitor progress. This may take a while.

Once this is done, don’t forget to modify /etc/mdadm/mdadm.conf as per the wiki: “To make mdadm find your array edit /etc/mdadm.conf and correct the num-devices information of your Array”

At this point we now have our data spread across more drives, but don’t have a larger volume. We need to resize the volume to take advantage of the new space. It’s recommended you do the resize with the RAID5 volume unmounted (offline). I set about to do this and hit problems unmounting the volume: this turned out to be samba holding on to the volume, turning that service off fixed things.

Then I hit a show stopper, the resize2fs command failed:

$ sudo resize2fs -p /dev/md_d3
resize2fs 1.42 (29-Nov-2011)
resize2fs: Device or resource busy while trying to open /dev/md_d3
Couldn't find valid filesystem superblock.

Huh? This is something I’ll one day sort out I suppose, but it really beats me what is going on here. You can resize RAID5 while it’s online too, it’s slower and a bit scarier, but it works.

$ sudo resize2fs /dev/md_d3
resize2fs 1.42 (29-Nov-2011)
Filesystem at /dev/md_d3 is mounted on /stuff; on-line resizing required
old_desc_blocks = 117, new_desc_blocks = 175
Performing an on-line resize of /dev/md_d3 to 732569952 (4k) blocks.

This was followed by a few moments of terror as I realized that I was doing this over a SSH connection – what if the connection is lost? Next time I’ll use screen, or nohup the process.

It was neat to watch the free space on the drive creep upwards. It was running at about 1Gb every 2 seconds. Once this finishes, you’re done. My RAID volume went from 1.9T to 2.8T with the new drive.

How to: Dell Inspiron 1525 repair

As the “computer guy” in the family from time to time a repair job will land on my doorstep. This was an older Dell Inspiron 1525 laptop that wasn’t booting anymore. This time it wasn’t some horrible virus that had eaten a system file, but the hard drive starting to fail (something I verified by booting a Live USB version of Ubuntu).

The very first thing you need to do at any sign of hard drive failure is run a complete backup. If you really care about some of the data, then I suggest you start with the bits you really care about (photos) and work your way outwards to a full backup. Don’t be shy about “wasting” backup drive space, this might be the last hour (or minutes) of functioning drive. I make this comment from experience watching a full backup stop due to total drive failure part way through the photo directories after successfully backing up a bunch of system files.

You can (as I have) try gddrescue or similar recover tools once you’ve done what you can with the drive in terms of backup. This might get you a little bit more data, but these tools are in my opinion last ditch efforts to salvage failing media. You can try the freezer trick, or even warming the drive up but don’t count on them working. It’s worth repeating: nothing beats regular backups – unless it’s automatic nightly incremental backups with off-site replication.

Replacing the failed drive is simply a matter of finding the manual, and buying a matching drive. If this were my personal system I’d be tempted to upgrade to a SSD, but there is still a big price difference. I booted from the Live USB Ubuntu for the first boot after install, this let me check the hardware was good to go and to peek at the SMART data. It was interesting to see that the SMART data says power cycles=4, I guess they really do test the drives at the factory.

Installing using the Dell Windows install disks went smoothly, the driver installation steps were annoying and kludgey feeling but not too horrible. Time then crawled to a stand still as I worked my way through the updates from Microsoft, one patch set at a time from 2008 (era of the install media) to present day. Many, many, many reboots later I had a clean install that was fully patched and ready to go back. If you find yourself having to do this more than once in a long while, consider using a more advanced technique.

Well, with the exception of the problem of a missing key. Amazingly the lack of the key didn’t prevent you from using it – the ‘T’ worked just fine, it just felt very wrong. I quick trip off to ebay and I was able to locate a suitable replacement. It turns out I needed a K26 type key, and that the same model had at least four variations. This website had a handy guide for selecting the type of key you needed.

Pictured above is what arrived in the mail. A rubber plunger, a plastic hinge and the key cap. First I needed to pop the hinge off of the key, the blade from my swiss army knife did the job worked well. Then I had to puzzle a bit over which way the hinge was supposed to go on the keyboard.

I hope the picture above helps others understand how it is supposed to fit on the keyboard – it was the image I had hoped to find on the internet when I was trying to figure it out. Some of the  youtube videos show using needle nose pliers to do the installation, I found that my fingernails did the job. One the hinge in on, simply plop the rubber plunger in the middle – large base down as pictured in the arrive in the mail shot. The key cap will just snap on when placed on top and pressed down – work the top first, then the bottom.

Replacing the missing key was very satisfying,  not very expensive and the improvement was both cosmetic and functional.

Installing a custom ROM on the SGH-I727R

I’m a fan of running customized ROMs on my phone. There are three reasons: a) I like to tinker b) It provides added capability and longer currency for my phone c) I can get source for most of the code running on my phone. In this post I’ll talk about installing CyanogenMod 10, but a good part of this will be applicable to any after market ROM.

First I like to gather data about the state of the phone as it came to me. These details can all be found in the “About Phone” screen.

SGH-I727R
Android 2.3.5
baseband I727RUXKJ7
Kernel 2.6.35.11
Gingerbread.RUXKJ7
IMEI XXXXXXXXXXXXXXX
IMEI SV XX

I’ve omitted my actual IMEI, but you’ll want to record that as it is possible to accidentally wipe it out on some phones. Fortunately as this phone came to me in the actual retail box, the sticker on the box matched the details here too.

The next step is to spend some time reading up on how to modify the firmware (ROM) and how to restore to stock. I’ve said this before, but it’s worth saying again: there is a lot of mis-information out there about how to go about this. Primarily this is because people don’t really understand what they are doing and simply provide instructions that seemed to work for them, voodoo magic included. An example is this youtube video I came across – well, it does give you confidence that it can be done, but there is no need to root your phone before installing a custom recovery.

If you’ve done any searching at all, you’ll have come across the XDA Forums, I do recommend signing up and reading through the relevant forums. Learn to search for answers, and share what you do know with folks who don’t. The newb starting guide is a good place to start. Also since my primary target is CyanogenMod, usually a good place to start is with their wiki – however, in this case it wasn’t.

The first step is to get a custom recovery image installed. ClockWorkMod (CWM) is the preferred solution for CyanogenMod and I’m familiar with it. To install it we need a tool that will talk to the download mode of our phone, and we need to get our phone into download mode. The tool I prefer to use is heimdall, it worked well with my i9000 and it’s also friendly to Linux. The other option is Odin (download link), a Windows only tool.

To enter download mode, the Rogers version is slightly different than the AT&T version – only volume down needs to be held with power (not volume up & volume down).

If you are successful you should be greeted by a screen as per above. Now assuming the USB cable is attached we can start the tool to send down the custom recovery image.

Sadly this is where I went off the rails a little, it turns out the version of heimdall (1.3.1) I had didn’t quite support the protocol being used by this phone. Upgrading to a newer version did fix the connection problem, but then it failed in another way I can only assume is also related to the protocol.

$ sudo ../Heimdall/heimdall/heimdall flash --recovery recovery.img
Heimdall v1.4 RC1

Copyright (c) 2010-2012, Benjamin Dobell, Glass Echidna
http://www.glassechidna.com.au/

This software is provided free of charge. Copying and redistribution is
encouraged.

If you appreciate this software and you would like to support future
development please consider donating:
http://www.glassechidna.com.au/donate/

Initialising connection...
Detecting device...
Claiming interface...
Setting up interface...

Checking if protocol is initialised...
Protocol is initialised.

Beginning session...
Session begun.

In certain situations this device may take up to 2 minutes to respond.
Please be patient!

Releasing device interface...

And that was it, my phone was busted.

I did try several times to recover using heimdall and failed. So it was off to Windows to use the Odin tool to fix things.

I initially tried to simply install a version of CWM and then proceed from there, but I made a few mistakes. 1) I didn’t have the right version of CWM, I can’t explain this but I do admit I was thrashing a little here. 2) I did have some partial successes which probably left things in a somewhat dubious state causing my grief later when I was doing the right things (see log below).

E:Can't open /cache/recovery/log
E:Can't open /cache/recovery/log
E:Can't open /cache/recovery/last_log
E:Can't open /cache/recovery/last_log

The solution is to return to stock and start fresh. Thankfully Odin was quite happy to flash the stock version I got via XDA.

I did locate the correct version of CWM via the CM10 thread on XDA, I found the TeamChopsticks install guide quite helpful. Once I had the right mojo, things went smoothly.

Starting from stock
Install recovery via Odin, boot into recovery
Wipe & factory reset
Format /system
Flash CM10 nightly zip
Flash google apps zip (optional)
Reboot

I like to have SSHD running on my phone along with rsync to allow for nightly backups to happen. Unfortunately CM10 isn’t yet shipping with dropbear pre-installed, and the CM7 version doesn’t seem to be happy anymore. I’ve switched to using the DropBear SSH Server app, the one downside is that it doesn’t auto-start on boot. I’ve been in touch with the author and this is on his future feature list.

A few notes on setting up DropBear SSH Server. The very first run will ask you to grant it super user privileges, it needs these so you need to say yes. Once the first screen is all green, you can test the server – the default root password is 42. Once you’ve verified it’s working, we want to fix the password under settings. I use keyed logins, and the app does support importing keys from files – but only one key per file. Once you’ve setup some keys, you can disable password logins entirely.

In CM10 you’ll probably want to enable USB Storage under Settings->Storage, then press menu to bring up ‘USB computer connection’ where you can opt in for USB storage (it is off by default).

Somewhere during the set of events I managed to end up in a state where the Radio version reported in About Phone was ‘unknown’. Phone calls worked fine, and I ran a couple of days without noticing this. I did later reboot into recovery and install a more up to date radio/modem firmware (I727RUXLF3). While I was in recovery I initiated a backup which I can return to if things get really messed up, this is handy as it is stored on the external SD card and is available even if I’m not somewhere with a computer and need to fix the phone.

CM10 has a new over the air (OTA) update system, I used it for the first time tonight to move to the latest nightly. Very slick, but there didn’t seem to be an option to back-up my existing state.

So aside from a few heart stopping moments where my brand new phone was totally fubar‘d, this was overall a pretty typical experience with a new device. Plenty of little details to figure out, a few new tools to install/configure/learn and the excitement of new hardware (and software). I’m really pleased with the i727, the screen still feels really big (but not too big). It’s fast and the battery life is very good. Google Now also recently added the ability to enter calendar events, resolving one of the few things I found it couldn’t do – and yes, Google Now is pretty darn cool.