Searching For Gateway Controller

Update 31/10/2014

This article contains old information. You can find a new technique here.


Context

At the office we have a NetBoot server running the Apple AST-service to run diagnostics on faulty Macs. It allows us to NetBoot a machine, run an app and it shows us which parts may be misbehaving. It’s a cool piece of technology, that’s under a lot of NDA’s, but sadly it’s not the most stable of services.

One of the errors we see a lot are the Searching for Gateway Controller errors when launching the Gateway Manager application.
This happens when we update the machine, change the hostname or move the device to a new subnet. (Or sometimes just randomly).
When a server is configurered to provide the Netboot image over multiple LAN’s, the service often flips and crashes resulting in the same error.

While migrating our existing server to a new building today, which runs on a new subnet, I once again saw the familiar Searching for Gateway Controller message show up when launching the Gateway Manager.
Only known fix: reinstall the whole server from scratch or manually delete all references to the AST service and reinstall.
Since I had some time today, I decided to run a verbose log of the AST-installers while configuring the server to see which files are placed or changed on the server.

I monitored these newly created files and checked what changed while installing the service and one specific file caught my attention: /Library/Preferences/com.apple.servermgr_info.plist
It contained a few containing IP addresses that appeared to resolve to the server I installed the AST tool on.

    <key>applicationConfigurations</key>
    <dict>
        <key>com.apple.Server.v2</key>
        <dict>
            <key>123456-ABCD-1234-ABCD</key>
            <dict>
                <key>identifier</key>
                <string>123456-ABCD-1234-ABCD</string>
                <key>ldapServer</key>
                <dict>
                    <key>addresses</key>
                    <array>
                        <string>server.domain.com</string>
                        <string>192.168.1.250</string>
                    </array>
                    <key>userName</key>
                    <string>admin</string>
                </dict>

Solution

The AST Service is a service that runs on top of the NetBoot service of OS X Server and needs to resolve to the server.

Normally when network-based services are installed on a server, good practice is resolving the server to itself using either localhost or 127.0.0.1 as an IP.
But apparently, when installing the AST Service, someone at Apple decided it was a good idea to use the manually configured LAN IP of the server to resolve the service. And only one IP that is, that of the primary interface en0.

So after moving the server to a different subnet, changing its hostname or changing the order of network interfaces the Gateway manager will still launch and look at the IP written down in the plist, but in those cases the IP doesn’t resolve to the server’s primary interface anymore, resulting in a Manager that can’t find its service anymore.

Long story short: I fixed our Searching for Gateway Controller issue on the corrupt machine by changing the IP and hostname to 127.0.0.1 and server.local in the plist.
I rebooted the server and voila, a working AST-server.

Mind, the com.apple.servermgr_info.plist file contains two arrays referencing the server-ip. So you need to change this twice.

Remarks

  • If you have more than one AST server, one server is your master, the others are slaves to that master. So only one server should resolve to itself, all the others need to resolve to the master server. Change the IP’s accordingly when the master server changes its IP.
    When installing a slave machine it searches the network and writes down the IP of the first AST server it finds in the plist. So check if the IP corresponds to your master.
  • I have a theory that if you add multiple keys with hostnames and IP’s to the plist you can use the server across multiple subnets. But I haven’t been able to test that one yet.
  • I hope a next version of the AST tools contains a fix for this using either 127.0.0.1 by default or by offering an interface to manually change the server-ip when changing the system it runs on.
  • As always, this hack worked for me in my setup, I’m not responsible for issues that occur when this hack does or does not work for your server.

Comments?