Sunday, February 17, 2008

New hardware for my lil server

I recently ordered new hardware for my little file-serving, downloading and IRCing box. As the nature of this 'server' is to be always on and not demanding powerful hardware, I went for the following components:

  • Intel Celeron 420 (1.6GHz, Single-Core, 35W power-consumption)
  • ASUS P5B-VM Motherboard (5 SATA-ports, GBit ethernet, on-board VGA)
  • Crucial 1GB DDR2-667
The hardware that was being replaced was:
  • Tyan Tiger motherboard
  • 2x Pentium III 1GHz
  • 2x 512MB SD-RAM
  • 2x Adaptec 1210SA
  • 1x Intel 1GBit Ethernet (eepro1000)
  • ATI Radeon 8500
(Note that I currently have the Intel Ethernet adapter still plugged in since network didn't work out of the box and I haven't found the time to fiddle with it, yet).

As you can see this box is optimized on low cost, low power consumption and replacing as many of the old parts as possible. At a total of 136 EUR (the CPU was only 35 EUR!) including shipping, I'm really satisfied. One additional Adaptec controller would've cost 50 EUR and would've added only 2 SATA ports, so this was probably the best solution. I still have 2 SATA ports unused on the mainboard and have 2 SATA-controllers lying around ready to be plugged in if even those will be used up. Finallly I have potential to grow my RAID even bigger. :-)

Fixing raid-5 failures, the adventurous approach

You might remember the trouble I had with my raid5 before. Well, it's still not 100% sorted out, but I know the cause now. It really was a faulty drive! I came to notice that after I replaced the motherboard, CPU and RAM with new components. After I've added them and booted into the system (which worked flawlessly on the first try, by the way, although the hardware is absolutely unrelated to the previous one) I noticed a click-sound from one of the harddisks. I immediately realized that I bought new hardware for nuthin. But at least I was sure which component was causing the failure now, plus I got 5 free SATA ports to upgrade the RAID. Previously I had non unused ports, leaving no potential for a possible upgrade. But somehow the raid got messed up in the process. I wasn't able to assemble it with the remaining 3 discs because one disc was always added as spare. So I had 2 functional devices and one spare added, which is obviously not enough to run the raid. This is due to some corrupted superblock, but luckily the superblock is just metadata which can be recreated. If I knew the correct devices and slots they corresponded to before all this happened, I could've created the array with mdadm --create and the correct params. Unfortunately, I did not know the exact params so I had take a more... adventurous approach. There's a perl-script on the linux-raid wiki which permutates over each possible combination of devices (including one missing device) and tries to mount the created array. It does everything in read-only mode so no actual data is being touched, only metadata. If it could mount the raid it prints the mdadm --create command used to build it, stops the array and goes on. You can then execute the creation-commands yourself and see if everything's right. In my case, luckily it was and I got all my data back. Note that I had to connect the failed drive for this to work because it always replaces one given device with 'missing' ('missing' tells mdadm that this device is, well, missing) instead of adding 'missing' to the devices-list. This is because it's not supposed to recreate a partial, but only a complete array. So you need to provide ALL raid-members to the command-line, otherwise it won't work. It should be fairly easy to hack the script to work for partial arrays, too, but it was easier for me to add the drive again than to hack perl-code.


After this the raid was up and I needed to mark the drive as faulty and remove it so it can't cause problems anymore. It's always a bit problematic to map the device-names (/dev/sdx) to the real harddrives and you might pull out the wrong one, possibly leading to more problems. I found out a reliable way to identify the drives:

hdparm -I /dev/sdx | grep 'Serial Number'
This will print the serial number, which usually is visible on the actual discs, too. Somehow the -I option to hdparm never occured to me before. The serial-number matched one of my disks and so I was able to locate and remove the faulty drive.

Yay!

Next step is to contact the reseller for a replacement. I hope the next bad drive will be less problematic.

Thursday, February 14, 2008

Qt debugging with Visual Studio 2005

zbenjamin, one of the fine folks from the Qxt-project just gave me his additions to the AutoExp.dat-file that lets you debug native Qt-types (e.g. QString) far more easily. Here's the before/after-comparison:

Before



After



How it's done


And here's what you have to do to use it yourself:
First, open up the file

C:\Program\ Files\Microsoft\ Visual\ Studio\ 8\Common7\Packages\Debugger\autoexp.dat

Important: Under Windows Vista, you need to open the file as Administrator, because it is not writeable by the user and the program-files-virtualisation will get in your way.

Then, add the following lines under the [AutoExpand]-mark:
QObject =classname=<staticMetaObject.d.stringdata,s> superclassname=<staticMetaObject.d.superdata->d.stringdata,s>
QList<*>=size=<d->end,i>
QLinkedList<*>=size=<d->end,i>
QString=<d->data,su> size=<d->size,u>
QByteArray=<d->data,s> size=<d->size,u>
QUrl =<d->encodedOriginal.d->data,s>
QUrlInfo =<d->name.d->data,su>
QPoint =x=<xp> y=<yp>
QPointF =x=<xp> y=<yp>
QRect =x1=<x1> y1=<y1> x2=<x2> y2=<y2>
QRectF =x=<xp> y=<yp> w=<w> h=<h>
QSize =width=<wd> height=<ht>
QSizeF =width=<wd> height=<ht>
QMap<*> =size=<d->size>
QVector<*> =size=<d->size>
QHash<*> =size=<d->size>
QVarLengthArray<*> =size=<s> data=<ptr>
QFont =family=<d->request.family.d->data,su> size=<d->request.pointSize, f>
QDomNode =name=<impl->name.d->data,su> value=<impl->value.d->data,su>

Now restart Visual Studio and you should be good to go.

Monday, February 11, 2008

More slime goodness

Remember the Slimy Lisp Video I've posted earlier this year? Well, some other blogger, Peter Christensen has put more effort into it and has written a reference/annotation for the video, including a timeline, a transcript of important parts and introductory explanations on how to set up SLIME to use that video. This will give SLIME-beginners an even better kickstart.
Thanks very much for your effort, Peter!

Thursday, February 7, 2008

Fixing my software raid-5 with mdadm

On my server-box I have a software raid-5 /dev/md0 consisting of 4 500GB SATA-harddisks, namely sda1, sdb1, sdc1, sdd1. When I was working on my other box where I was pulling off some dd-stunts on a lvm-volume my raid suddenly died on the server. There was some output in dmesg that both sda and sdb are somewhat corrupt and that they've been removed from the raid, leaving it unfunctional (you need to have at least n-1 disks in a raid-5 to keep it operational). I was very shocked by this. I restarted the PC and tried to re-assemble the raid with

mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
to no avail. mdadm said 'bad superblock on device /dev/sda1' (or similar) and leaving out sda1 worked and I had the mdadm assembled with 3 out of 4 disks. That's of course not satisfactory. I stopped the raid and ran S.M.A.R.T.-checks on each of the 4 disks with
smartctl -t long /dev/sdx1
. This took over an hour so I went to sleep and checked the results the next day -- 100% error-free, according to smart! That's really strange. Assembling the array still does not work because of sda1. I opened up sda in cfdisk and saw the exact same partition-size as on sdb and the others, but I knew that something was corrupt. So I wrote the partition-table to a file to back it up, removed the partition and re-added it. Then I used
mdadm --add /dev/md0 /dev/sda1
to re-add the partition that was formerly part of the array, anyways... mdadm did it's job and recovered the raid. You can watch the progress by doing
cat /proc/mdstat
. It took around 7 hours or so to complete, and now the raid5 is fully functional again.
What a horror-trip! I'm still wondering what was going on and why sda1 has been kicked out of the array.
A small addition: After I've fixed the raid it was ok for a day or two, but then one day when I came home I noticed it broke again. I remembered that I stepped onto the USB-keyboard that was attached to the server right after I came home and found an unhandled IRQ-oops in the kernel-log at what happens exactly that timespan. So my suggestion is that the USB-handler somehow messed up something, which in turn has killed the RAID again. But I'm still investigating the issue, for now rebooting and forcing the assembly worked fine. I hope I'll not have any more problems with it...

Converting lvm to a normal partition

I've recently set up a new gentoo-box and first decided to use lvm2 on my root. Well, I ran into some issues with the kernel and initrd which I could figure out and fix. But then I noticed that, because of the lvm, I won't be able to access the disk from Windows with the free ext3-drivers that are available. Linux will even boot faster because I'll have no need for the initrd anymore. That's when I decided to get rid of the lvm.And that's actually easier than you'd think: If you have a spare-partition or -harddisk around that is at least the size of the logical volume that you'd like to convert to a partition, you can easily do this with dd. Imagine that /dev/vg/volume is a logical volume that consists of only one partition, /dev/sday:

sh# dd if=/dev/vg/volume of=/dev/sdbx bs=8M
sh# dd if=/dev/sdbx of=/dev/sday bs=8M

That's it. This will back up the logical, continuous data that's hosted on the lvm to a partition. After the first dd you'll be able to mount /dev/sdbx and see how the content of /dev/vg/volume has been copied. The mounted partition's usable size will be exactly the same as the volume's size, even if the partition itself is much bigger. That's because the filesystem on it will still be the same size it was before. You could (but it wouldn't make much sense because we want to move the data to the other partition anyways) fix this with resize2fs (if you use ext2 or ext3, that is).
The second dd copies the data back to the partition that it was formerly stored on, but without the additional lvm-abstraction. The lvm will be overwritten by the 'flat' filesystem-data. If sdbx happens to be bigger than sday, an error will be printed that dd reached the end of the partition. This is nothing to worry about since the data left on sdbx is not interesting to us anyways.
You can fix the filesystem-size to the actual partition size with resize2fs. Since the lvm itself needs some space, too, it will be slightly (a few bytes) larger now.

Friday, February 1, 2008

Accessing MS SQL UID-fields with Qt

When working with a database that relies heavily on uniqueidentifiers, I experienced problems with handling those fields with Qt's built-in SQL-classes.
First, I connect to the database via the QODBC-driver. Then I fetch the results of table 'a' and tried to fetch the corresponding results in table 'b', which are referenced by foreign keys. Here's a code-snippet:

QSqlQuery a(db);
a.exec("select id from a");
a.next();
QSqlQuery b(db);
b.prepare("select id from b where b.id_a=:id");
while(a.isValid())
{
b.bindValue(":id",a.value(0));
b.exec();
// ERROR: Operand type clash: image is incompatible with uniqueidentifier
a.next();
}

So Qt converts the binary data it received from the uniqueidentifier to a binary blob of type image, it seems.
There's a simple way to convert the GUID that is stored in a.value(0) to a formatted UID-string, which in turn can be used to bind the value of the second query.
   QString uuidToString(const QVariant &v)
{
// get pointer to raw data
QByteArray arr(v.toByteArray());
std::string result(arr.constData(),arr.size());
assert(result.size() == 16);
const char *ptr = result.data();
// extract the GUID-parts from the data
uint data1 = *reinterpret_cast<const uint*>(ptr);
ushort data2 = *reinterpret_cast<const ushort*>(ptr+=sizeof(uint));
ushort data3 = *reinterpret_cast<const ushort*>(ptr+=sizeof(ushort));
uchar data4[8] =
{
*reinterpret_cast<const uchar*>(ptr+=sizeof(ushort)),
*reinterpret_cast<const uchar*>(++ptr),
*reinterpret_cast<const uchar*>(++ptr),
*reinterpret_cast<const uchar*>(++ptr),
*reinterpret_cast<const uchar*>(++ptr),
*reinterpret_cast<const uchar*>(++ptr),
*reinterpret_cast<const uchar*>(++ptr),
*reinterpret_cast<const uchar*>(++ptr)
};
// create a uuid from the extracted parts
QUuid uuid(
data1,
data2,
data3,
data4[0],
data4[1],
data4[2],
data4[3],
data4[4],
data4[5],
data4[6],
data4[7]);
// finally return the uuid as a QString
return uuid.toString();
}

Using this function, you can easily bind the values to the second query:
b.bindValue(uuidToString(a.value(0)));

Edit:Starting from Qt 4.4.0 (I used the latest snapshot) QVariant supports GUIDs and hence this function fails AND is unneccessary.