Netboot from Lion Server - "Error loading kernelcache"

dlondon
Valued Contributor

Hi,

I'm having difficulty with Netboot from an Xserve running Lion Server.

We are in a different subnet but have a helper address in the firewall that is the gateway between the two subnets.

When I try netbooting from one of our client mac's by holding the N key down
at startup I get:
Big globe, Little Globe, Circle with Cross through it
Big globe, Little Globe, Circle with Cross through it
Big globe, Little Globe, Folder with Cross through it

then it gives up and boots normally

I got a port mirrored on the network switch and plugged various test
machines in and captured the network traffic whilst the above was happening.
What I see is the sftp download of the booter file but when that ends I
don't see the NFS traffic start up.

I can manually sftp and download the booter file.
I can also manually mount the NFS share and traverse all the way into the
Netboot image and access the DMG.

On another server, I set it up to be the same as the first server
- OS X Lion Server
- same location for Netboot image
- Same netboot image
- same permissions
- replaced entry in helper address with second server IP

For the second server all the manual tests work and Netboot actually works.

I've tried removing the netboot service on my main Xserve and recreating it but that doesn't
change things.

The second server works brilliantly but as it's an older server I wanted to try the Prod Server again.

One final piece of info I found today is that if I Netboot using the problem server by holding down the N key and then press Command-V (verbose mode) I see the following on the client machine:

efiboot loaded from device: Acpi (PNP0A03,0)/Pci(1C15)/Pci(010)/Mac(001B63AB38C6)

boot file path:
Loading kernel cache path `x86_64kernelcache`...
Error loading kernelcache (0x7)
efiboot loaded from device: Acpi (PNP0A03,0)/Pci(1C15)/Pci(010)/Mac(001B63AB38C6)

boot file path:
Loading kernel cache path `x86_64kernelcache`...
Error loading kernelcache (0x7)
efiboot loaded from device: Acpi (PNP0A03,0)/Pci(1C15)/Pci(010)/Mac(001B63AB38C6)

boot file path:
Loading kernel cache path `x86_64kernelcache`...
Error loading kernelcache (0x7)
efiboot loaded from device: Acpi (PNP0A03,0)/Pci(1C15)/Pci(010)/Mac(001B63AB38C6)

boot file path:
Loading kernel cache path `x86_64kernelcache`...
Error loading kernelcache (0x7)
efiboot loaded from device: Acpi (PNP0A03,0)/Pci(1C15)/Pci(010)/Mac(001B63AB38C6)

boot file path:
Loading kernel cache path `x86_64kernelcache`...
Error loading kernelcache (0x7)
efiboot loaded from device: Acpi (PNP0A03,0)/Pci(1C15)/Pci(010)/Mac(001B63AB38C6)

boot file path:
Loading kernel cache path `x86_64kernelcache`...
Error loading kernelcache (0x7)

The machine then boots up into the existing installation on the hard drive

The kernelcache file is the same size, date and permissions on both servers.

Any suggestions on what may be the problem or what to try?

Regards,

David

1 ACCEPTED SOLUTION

dlondon
Valued Contributor

Well I changed from nfs to http on the non working netboot server but with no joy. I figured this now meant we definitely had an issue with the image itself.

Then as I was scowling at my screen my friend James suggested I run a checksum such as MD5 against the image file on the working server and the non-working server. The so called identical images had different MD5 checksums ...

Digging into the images I found that the kernelcache file was the culprit. Here's the MD5 and file info:

Non Working
ls -la
total 58544
drwxr-xr-x 3 admin staff 102 18 Mar 16:01 .
drwxr-xr-x 5 root admin 170 18 Mar 16:01 ..
-rw-r--r-- 1 admin staff 29972164 18 Mar 13:55 kernelcache

md5 kernelcache MD5 (kernelcache) = debb1783a5ba6aab8420ffb7c1f1fbac
-------------------------------------------------------------------------------

Working:
ls -la
total 58544
drwxr-xr-x 3 admin staff 102 18 Mar 16:01 .
drwxr-xr-x 5 admin staff 170 18 Mar 16:01 ..
-rw-r--r-- 1 admin staff 29972164 18 Mar 13:55 kernelcache

md5 kernelcache MD5 (kernelcache) = 104cb1f48371e87261079112b63a0542
-------------------------------------------------------------------------------

Both files have same size but different content. When I replace the kernelcache in the non-working one with the working one it all comes good. Very strange as they both come from the same TAR file.

Anyway problem solved.

I also see a performance improvement using http ... wonder what the downside is to using http

View solution in original post

4 REPLIES 4

daworley
Contributor II

One option to try troubleshooting NetBoot from the client side is to do a Verbose NetBoot.

Option 1 - Use a Casper policy (or bless command or system preferences) and when it reboots, force it into verbose the standard way with Command-V

Option 2 - Hold down option for startup manager, select the NetBoot image, and then when you click on the "go arrow" hold down Command-V. This will be quick, so get your fingers ready.

Hopefully with this information you can see what the client is holding up on.

daworley
Contributor II

Oh, I misread the context around the logs you posted.

That sounds like the NFS path is not coming through. Have you tried putting the NetBoot service onto HTTP?

dlondon
Valued Contributor

Thanks Douglas,

Excellent suggestion. I'll give it a go and report back.

Regards,

David

dlondon
Valued Contributor

Well I changed from nfs to http on the non working netboot server but with no joy. I figured this now meant we definitely had an issue with the image itself.

Then as I was scowling at my screen my friend James suggested I run a checksum such as MD5 against the image file on the working server and the non-working server. The so called identical images had different MD5 checksums ...

Digging into the images I found that the kernelcache file was the culprit. Here's the MD5 and file info:

Non Working
ls -la
total 58544
drwxr-xr-x 3 admin staff 102 18 Mar 16:01 .
drwxr-xr-x 5 root admin 170 18 Mar 16:01 ..
-rw-r--r-- 1 admin staff 29972164 18 Mar 13:55 kernelcache

md5 kernelcache MD5 (kernelcache) = debb1783a5ba6aab8420ffb7c1f1fbac
-------------------------------------------------------------------------------

Working:
ls -la
total 58544
drwxr-xr-x 3 admin staff 102 18 Mar 16:01 .
drwxr-xr-x 5 admin staff 170 18 Mar 16:01 ..
-rw-r--r-- 1 admin staff 29972164 18 Mar 13:55 kernelcache

md5 kernelcache MD5 (kernelcache) = 104cb1f48371e87261079112b63a0542
-------------------------------------------------------------------------------

Both files have same size but different content. When I replace the kernelcache in the non-working one with the working one it all comes good. Very strange as they both come from the same TAR file.

Anyway problem solved.

I also see a performance improvement using http ... wonder what the downside is to using http