Happy Friday all,
A nice, pure linuxy question for a Friday lunchtime (and my apologies,
it's a bit of a long one). The university has just got a bunch of shiny
new compute nodes with shiny 10G "Chelsio" Ethernet cards in them. Sadly
in order to get the 10G nics to work you need to install the Chelsio
drivers. Which is a problem when it comes to pxebooting the machines
(pxe kicks off and nabs what it needs, the installer starts up, then
doesn't know how to talk to the network card (it can't even seen them).
Which is a bit of a show stopper.
To make things a bit more complicated installing the drivers tweaks
things in the kernel - after installing the drivers the NICs only become
visible after a reboot.
So the obvious choice is to hack the initial ramdisk to enable it to
speak chelsio NIC. I've tried two approaches to this, and both fail - no
doubt largely to me fumbling through the process, pleading to the
machine spirits for guidance and hearing nothing but despair (often on
ubuntu forums). A lot of google hours have been put in, with even the
occasional (fruitless) foray onto the second page of results. Lots of
clues but no answers were found. Here's what I tried:
Method 1: Hack the SL initrd.img! This involves unpacking the initrd.img
(using unlzma and cpio), and manually inserting the drivers into the
right places, whilst also editing modules.aliases and other modules.*
files to contain the extra chelsio (cxgb4) information. The information
was pulled from a manually installed machine, then packing it all up
again. Very messy! Surprisingly the initrd.img still worked after I
tainted it, but it still failed to see the 10G nics.
Method 2: Build our own initrd.img! After our previous failures we tried
to build our own boot files. On one of our manually installed machines
we used dracut to build the ram disk image (making sure dracut-network
was installed, otherwise it doesn't work at all). We then copied this
image and the machines vmlinuz files to the usual PXE places.
As you can guess this didn't work either, although this errored during
startup with the errors:
VFS: Cannot open root device "(null)" or unknown-block(8,1)
Please append a correct "root:" boot option: here are the available
partitions:
Kernel Panic - not syncing: VFS:Unable to mount root fs on
unknown-block(8,1)
Pid: 1, comm: swapper Not tainted 2.6.32-358.el6.x86-64 #1
(then a bunch of trace information)
It looks to me like it's trying to boot in the "normal" fashion and
getting confused, but I can't figure out how to build a install-bootable
initrd.img and vmlinuz from an existing machine with the drivers I need
on it - or if the problem is pxe server side the correct "append" option
(in the last case it feels like if I found the append root= option we'd
be sitting pretty).
To give the final bits of information, the pxe booting is handled by
cobbler, using http to server the kickstart and install rpm.
Thanks for bearing with me, we've got a support line open with Viglen
and Chelsio, but I thought I'd ask you, my trusted peers, for help and
guidance.
Thanks in advance, and have a good weekend all,
Matt
|