Watchdog

From AwkwardTV

Revision as of 16:21, 29 September 2010 by Seppler (Talk | contribs)
Jump to: navigation, search

Contents

What is Watchdog

Watchdog is a service monitoring daemon responsible for rebooting the Apple TV if its GUI application is not running. Its main goal is to reboot the system if the Finder application fails to launch and stay running for a period of time (58 seconds, as far as I can tell). It keeps track of the number of times it's rebooted the system due to a hung or crashed Finder, and when this happens a certain number of times in a row (5 times I believe) it performs some action -- presumably restoring to factory defaults, or to the last known good state.

There's a kext called AppleTCOWatchdog.kext in the /System/Library/Extensions/ folder.

Brief Watchdog Background

Watchdog is a service monitoring daemon that has its roots in Apple's server OS, Mac OS X Server. Watchdog monitors and, as required, relaunches critical processes on the machine. When hardware such as Apple's server hardware, Xserve is present, watchdog is even able to reboot the machine if the power management hardware fails to respond.

This should not be confused with the "watchdog" program from OS X 10.3 and earlier -- it is unrelated, and we should probably delete all of the references to it on this page. :/

The Ripstop Daemon

Ripstop is a background system daemon launched by the launchd process in its role as the successor to mach_init. Ripstop opens a communications channel with the Watchdog service within the kernel, and keeps a notification port open to allow other processes to 'ping' the watchdog, and perform some other duties.

Ripstop Details

Ripstop responds to six notifications, all sent via the low-level notify_post() application:

  • com.apple.riptide.heartbeat
  • com.apple.riptide.start
  • com.apple.riptide.stop
  • com.apple.ripstop.query
  • com.apple.ripstop.debug

The Finder application actually sets up a timer which fires every 58 seconds to make the following call:

notify_post( "com.apple.riptide.heartbeat" );

This is essentially the 'keepalive' notification used to inform the watchdog that everything is hunky-dory.

Upon launch, ripstop switches to the Frontrow user & group ID, then opens a COM interface to the watchdog service, and creates a notification port with a CFMachPortRef wrapper which it then runs via a CFRunLoop. In the main function, it makes the following calls into the watchdog service:

(*service)->tcoWdSetTimer( service, 500 );
(*service)->tcoWdLoadTImer( service );

These are the actions performed for each notification:

com.apple.riptide.heartbeat:

(*service)->tcoWdLoadTimer( service );

com.apple.riptide.start:

(*service)->tcoWdLoadTimer( service );
(*service)->tcoWdEnableTimer( service );

com.apple.riptide.stop:

(*service)->tcoWdDisableTimer( service );

com.apple.ripstop.query:

(*service)->tcoWdGetCtl( service, &info );    // gets whether its running or not, more or less
if ( info.counterIsRunning )
    notify_post( com.apple.tcowd.ison );
else
    notify_post( com.apple.tcowd.isoff );

com.apple.ripstop.terminate:

(closes the watchdog service interface, presumably quits ripstop process)

com.apple.ripstop.debug:

(enables a debug flag, will now syslog() details when it receives notifications)

[BRSettingsHelper tellWatchdogWeAreUpAndRunning]

This function sends the 'dogy' command to the SettingsHelper tool inside the BackRow framework bundle. It essentially resets the Watchdog's failure counters in NVRAM, since the Watchdog is only interested in failures to launch the interface. Once the interface launches successfully, this function is called and the data is cleared.

The SettingsHelper performs the following tasks at this point:

  • Sets the boot count to zero.
  • Sets the maximum boot count to five.
  • Clears the system reset reason.
  • Clears the remote state.

How to Disable Watchdog

AppleTV 3.0.2 Method

Create a process to feed watchdog bones. Source code to do this is here: http://epplersoft.com/atv/AppleTVWatchdogFeeder.zip

Method One: The Sedative Patch

Phoem has published source code for sedative, a patch for AppleTCOWatchdog application. Be Careful: It looks like this will just corrupt the kext on the latest ATV release.

Method Two: Manually Disable Ripstop and Watchdog

First disable Ripstop. Doing this and unloading the Watchdog kext have been confirmed to work:

mkdir /etc/mach_init.disabled
mv /etc/mach_init.d/ripstop.plist /etc/mach_init.disabled

Reboot using the following command.

shutdown -r now

Ripstop is now disabled. To disable Watchdog, unload AppleTCOWatchdog.kext:

kextunload -b com.apple.driver.AppleTCOWatchdog

To unload the AppleTCOWWatchdog.kext automatically at startup. add the command to your local startup script. This is recommended as it is best to have Ripstop and Watchdog either both enabled or both disabled. In some cases the rc.local file doesn't exist. A way around it is to use su. (su is a rather dangerous command as it effectively allows you to log in as root so use it wisely!)

Copy /usr/bin/su from your local OS X installation to the same location on the AppleTV.

Then issue these commands

sudo su -
touch /etc/rc.local

Then, after you verified that rc.local exists...

echo "/sbin/kextunload -b com.apple.driver.AppleTCOWatchdog" >> /etc/rc.local

Apple TV 1.1 method: Use Turbo's USB and watchdog hack

You do not need to disable ripstop, nor unload the AppleTCOWatchdog kext, just patch your /mach_kernel.prelink file as described on: Turbo's AppleTV Hacks @ 0xfeedbeef.com

Untested Methods and Other Information

You can attempt to quit the watchdog process. Watchdog is (or at least used to be) "quit" in a special way. Locate the watchdog process and send it a 'SIGTERM' term.

From a 'man watchdog' on a Mac OS X Server:

SIGTERM
	   watchdog forces a complete shutdown when it receives the terminate
	   signal.  The automatic reboot timer will be disabled and all exe-
	   cuting children will be terminated, forcibly (with SIGKILL) if nec-
	   essary.  After all children have terminated, watchdog itself exits.
	   watchdog should always be terminated with this signal instead of
	   the kill signal (SIGKILL) to properly disable the automatic reboot

(Full output of 'man watchdog' sucked from google cache and linked at bottom of page)

Give that a try and report the results here?


Yet another failed attempt:

Auto restart is for automatically rebooting after a power failure and is present also on Mac. You can turn it off using: pmset autorestart 0
As you can see with pmset -g:

System-wide power settings:
SleepDisabled          1
Active Profiles:
AC Power                -1*
Currently in use:
 disksleep      0 
 hibernatemode  0
 displaysleep   0
 powerbutton    0
 sleep          0
 autorestart    0
 hibernatefile  /var/vm/sleepimage

Apparently BackRow.framework tells Watchdog that "we are up and running":

strings /System/Library/PrivateFrameworks/BackRow.framework/Versions/A/BackRow
(...)
BRSettingsHelper tellWatchdogWeAreUpAndRunning

Here is the code needed to call BackRow functions. Note you must add the BackRow framework to your project.

//BRSettingsHelper.h
#import <Cocoa/Cocoa.h>


@interface BRSettingsHelper : NSObject {

}

- (void) tellWatchdogWeAreUpAndRunning;  //This function seems to reset the boot count but does not prevent the machine from rebooting.
- (void) reboot;

@end

And now to test it:

#import <Cocoa/Cocoa.h>
#include "BRSettingsHelper.h"

int main(int argc, char *argv[])
{
    BRSettingsHelper * test = [[BRSettingsHelper alloc] init];

        [test reboot]; 
    
   
    return NSApplicationMain(argc,  (const char **) argv);
    
}

Note that to get this to work, the app had to be run as root.
There is also the key:
_kRUIAutoRestartIntervalKey

which is probably used in one of:

-[RUIPreferences boolForKey:]
-[RUIPreferences boolForKey:withValueForMissingPrefs:]
-[RUIPreferences canSetPreferencesForKey:]
-[RUIPreferences descriptionForKey:]
-[RUIPreferences floatForKey:]
-[RUIPreferences integerForKey:]
-[RUIPreferences objectForKey:]
-[RUIPreferences setBool:forKey:]
-[RUIPreferences setFloat:forKey:]
-[RUIPreferences setInteger:forKey:]
-[RUIPreferences setObject:forKey:]
-[RUIPreferences stringForKey:]

This seems to indicate that the auto reboot interval is stored as a key somewhere. I don't know if this is simply used for the Finder.app or if there is something else on the system that will use this.


SettingsHelper has a reference to /sbin/shutdown -r now

Symlinking /sbin/shutdown to /usr/bin/true makes the shutdown command do nothing.

There's also /usr/sbin/recovery_reboot shell script, which seems to tell OS X to reboot using recovery partition (only this time). Perhaps (pretty doubtful) it is called when watchdog sniffs something bad. Easiest way to check that would be just adding something like:

echo 'oh no, I am being recovery_rebooted' > /blah

and then see if /blah exists after your ATV was rebooted by a watchdog.

How to Implement Watchdog Keepalive

The Finder contains a very simple class (called MEWatchdog) which simply installs a timer which fires every 58 seconds to call a function which does literally the following, in its entirety:

notify_post( "com.apple.riptide.heartbeat" );

This function just posts a distributed notification, so if can literally be posted by anything. Unless the Watchdog inspects the interval between notifications, it's likely that any application can perform the same test, along with the Finder, if started by something like launchd. This way, the watchdog would not reboot the machine, even if the Finder quits, and we don't have any potential side-effects arising from stopping a system process.

As an aside, 'Riptide' appears to be a codename of sorts for the AppleTV software system. Various elements within BackRow print out debug statements including file/line information, which begin with /SourceCache/Riptide-xxx/.

Background information

This Apple document about Watchdog may shed some light:

http://docs.info.apple.com/article.html?artnum=106588&coll=cp (However, on the Apple TV, there is no /etc/watchdog.conf, other than what this document is saying)

Watchdog is not used in Mac OS X 10.4: look at launchd. This is referring to a software watchdog and is probably unrelated to the rebooting problem.

Link to a Google cache (possibly old) output of 'man watchdog' with possibly useful information in it:

http://72.14.253.104/search?q=cache:J8bC2hERg9YJ:www.hmug.org/man/8/watchdog.html+Mac+OS+X+Server+Watchdog&hl=en&ct=clnk&cd=16&gl=us

Personal tools