Watchdog monitors the health of vendor services and the VHAL service, and
terminates any unhealthy process. When an unhealthy process is terminated, the Watchdog
dumps the process status to /data/anr
as with other Application Not Responding
(ANR) dumps. Doing so facilitates the debugging process.
Vendor service health monitoring
Vendor services are monitored at both native and Java side. For a Vendor service to be monitored, the service must register a health checking process with the Watchdog by specifying a pre-defined timeout. Watchdog monitors the health of a registered health checking process by pinging it at an interval relative to the timeout that is specified during the registration. When a pinged process doesn't respond within the timeout, the process is considered unhealthy.
Native service health monitoring
Specify the Watchdog AIDL makefile
- Include
carwatchdog_aidl_interface-ndk_platform
inshared_libs
.Android.bp
cc_binary { name: "sample_native_client", srcs: [ "src/*.cpp" ], shared_libs: [ "carwatchdog_aidl_interface-ndk_platform", "libbinder_ndk", ], vendor: true, }
Add an SELinux policy
- To add an SELinux policy, allow the vendor service domain to use binder
(
binder_use
macro) and add the vendor service domain to thecarwatchdog
client domain (carwatchdog_client_domain
macro). See the code below forsample_client.te
andfile_contexts
:sample_client.te
type sample_client, domain; type sample_client_exec, exec_type, file_type, vendor_file_type; carwatchdog_client_domain(sample_client) init_daemon_domain(sample_client) binder_use(sample_client)
file_contexts
/vendor/bin/sample_native_client u:object_r:sample_client_exec:s0
Implement a client class by inheriting BnCarWatchdogClient
- In
checkIfAlive
, perform a health check. One option is to post to the thread loop handler. If healthy, callICarWatchdog::tellClientAlive
. See the code below forSampleNativeClient.h
andSampleNativeClient.cpp
:SampleNativeClient.h
class SampleNativeClient : public BnCarWatchdogClient { public: ndk::ScopedAStatus checkIfAlive(int32_t sessionId, TimeoutLength timeout) override; ndk::ScopedAStatus prepareProcessTermination() override; void initialize(); private: void respondToDaemon(); private: ::android::sp<::android::Looper> mHandlerLooper; std::shared_ptr<ICarWatchdog> mWatchdogServer; std::shared_ptr<ICarWatchdogClient> mClient; int32_t mSessionId; };
SampleNativeClient.cpp
ndk::ScopedAStatus WatchdogClient::checkIfAlive(int32_t sessionId, TimeoutLength timeout) { mHandlerLooper->removeMessages(mMessageHandler, WHAT_CHECK_ALIVE); mSessionId = sessionId; mHandlerLooper->sendMessage(mMessageHandler, Message(WHAT_CHECK_ALIVE)); return ndk::ScopedAStatus::ok(); } // WHAT_CHECK_ALIVE triggers respondToDaemon from thread handler void WatchdogClient::respondToDaemon() { // your health checking method here ndk::ScopedAStatus status = mWatchdogServer->tellClientAlive(mClient, mSessionId); }
Start a binder thread and register the client
The car watchdog daemon interface name is
android.automotive.watchdog.ICarWatchdog/default
.
- Search for the daemon with the name and call
ICarWatchdog::registerClient
. See the code below formain.cpp
andSampleNativeClient.cpp
:main.cpp
int main(int argc, char** argv) { sp<Looper> looper(Looper::prepare(/*opts=*/0)); ABinderProcess_setThreadPoolMaxThreadCount(1); ABinderProcess_startThreadPool(); std::shared_ptr<SampleNativeClient> client = ndk::SharedRefBase::make<SampleNatvieClient>(looper); // The client is registered in initialize() client->initialize(); ... }
SampleNativeClient.cpp
void SampleNativeClient::initialize() { ndk::SpAIBinder binder(AServiceManager_getService( "android.automotive.watchdog.ICarWatchdog/default")); std::shared_ptr<ICarWatchdog> server = ICarWatchdog::fromBinder(binder); mWatchdogServer = server; ndk::SpAIBinder binder = this->asBinder(); std::shared_ptr<ICarWatchdogClient> client = ICarWatchdogClient::fromBinder(binder) mClient = client; server->registerClient(client, TimeoutLength::TIMEOUT_NORMAL); }
Java service health monitoring
Implement a client by inheriting CarWatchdogClientCallback
- Edit the new file as follows:
private final CarWatchdogClientCallback mClientCallback = new CarWatchdogClientCallback() { @Override public boolean onCheckHealthStatus(int sessionId, int timeout) { // Your health check logic here // Returning true implies the client is healthy // If false is returned, the client should call // CarWatchdogManager.tellClientAlive after health check is // completed } @Override public void onPrepareProcessTermination() {} };
Register the client
- Call
CarWatchdogManager.registerClient()
:private void startClient() { CarWatchdogManager manager = (CarWatchdogManager) car.getCarManager( Car.CAR_WATCHDOG_SERVICE); // Choose a proper executor according to your health check method ExecutorService executor = Executors.newFixedThreadPool(1); manager.registerClient(executor, mClientCallback, CarWatchdogManager.TIMEOUT_NORMAL); }
Unregister the client
- Call
CarWatchdogManager.unregisterClient()
when the service is finished:private void finishClient() { CarWatchdogManager manager = (CarWatchdogManager) car.getCarManager( Car.CAR_WATCHDOG_SERVICE); manager.unregisterClient(mClientCallback); }
VHAL health monitoring
Unlike vendor service health monitoring, Watchdog monitors the VHAL service
health by subscribing to the VHAL_HEARTBEAT
vehicle property.
Watchdog expects the value of this property to be updated once every N seconds.
When the heartbeat is not updated within this timeout, Watchdog terminates the VHAL
service.
Note: Watchdog monitors the VHAL service health only when
the VHAL_HEARTBEAT
vehicle property is supported by the VHAL service.
VHAL internal implementation can vary by vendor. Use the following code samples as references.
- Register the
VHAL_HEARTBEAT
vehicle property.When starting the VHAL service, register the
VHAL_HEARTBEAT
vehicle property. In the below example, anunordered_map
, which maps property ID to config is used to hold all supported configs. Config forVHAL_HEARTBEAT
is added to the map, so that whenVHAL_HEARTBEAT
is queried, the corresponding config is returned.void registerVhalHeartbeatProperty() { const VehiclePropConfig config = { .prop = toInt(VehicleProperty::VHAL_HEARTBEAT), .access = VehiclePropertyAccess::READ, .changeMode = VehiclePropertyChangeMode::ON_CHANGE, }; // mConfigsById is declared as std::unordered_map<int32_t, VehiclePropConfig>. mConfigsById[config.prop] = config; }
- Update
VHAL_HEARTBEAT
vehicle property.Based on the VHAL health check frequency (explained in Define the frequency of VHAL health check"), update the
VHAL_HEARTBEAT
vehicle property once every N seconds. One way to do this is by using theRecurrentTimer
to call the action that checks the VHAL health and updates theVHAL_HEARTBEAT
vehicle property within timeout.Shown below is a sample implementation using
RecurrentTimer
:int main(int argc, char** argv) { RecurrentTimer recurrentTimer(updateVhalHeartbeat); recurrentTimer.registerRecurrentEvent(kHeartBeatIntervalNs, static_cast<int32_t>(VehicleProperty::VHAL_HEARTBEAT)); … Run service … recurrentTimer.unregisterRecurrentEvent( static_cast<int32_t>(VehicleProperty::VHAL_HEARTBEAT)); } void updateVhalHeartbeat(const std::vector<int32_t>& cookies) { for (int32_t property : cookies) { if (property != static_cast<int32_t>(VehicleProperty::VHAL_HEARTBEAT)) { continue; } // Perform internal health checking such as retrieving a vehicle property to ensure // the service is responsive. doHealthCheck(); // Construct the VHAL_HEARTBEAT property with system uptime. VehiclePropValuePool valuePool; VehicleHal::VehiclePropValuePtr propValuePtr = valuePool.obtainInt64(uptimeMillis()); propValuePtr->prop = static_cast<int32_t>(VehicleProperty::VHAL_HEARTBEAT); propValuePtr->areaId = 0; propValuePtr->status = VehiclePropertyStatus::AVAILABLE; propValuePtr->timestamp = elapsedRealtimeNano(); // Propagate the HAL event. onHalEvent(std::move(propValuePtr)); } }
- (Optional) Define the frequency of VHAL health check.
Watchdog's
ro.carwatchdog.vhal_healthcheck.interval
read-only product property defines the VHAL health check frequency. Default health check frequency (when this property is not defined) is three seconds. If three seconds isn't sufficient for the VHAL service to update theVHAL_HEARTBEAT
vehicle property, define the VHAL health check frequency depending on the service responsiveness.
Debug unhealthy processes terminated by the Watchdog
Watchdog dumps the process state and terminates unhealthy processes. When terminating
an unhealthy process, Watchdog logs the text carwatchdog terminated
<process name> (pid:<process id>)
to logcat. This log line
provides information about the terminated process like the process name and process
ID.
- The logcat can be searched for the aforementioned text by running:
$ adb logcat -s CarServiceHelper | fgrep "carwatchdog killed"
For example, when the KitchenSink app is a registered Watchdog client and becomes unresponsive to Watchdog pings, Watchdog logs a line such as the below line when terminating the registered KitchenSink process.
05-01 09:50:19.683 578 5777 W CarServiceHelper: carwatchdog killed com.google.android.car.kitchensink (pid: 5574)
- To identify the root cause of the unresponsiveness, use the process
dump stored at
/data/anr
just as you would use for activity ANR cases. To retrieve the dump file for the terminated process use the below commands.$ adb root $ adb shell grep -Hn "pid process_pid" /data/anr/*
The following sample output is specific to the KitchenSink app:
$ adb shell su root grep -Hn "pid 5574" /data/anr/*.
/data/anr/anr_2020-05-01-09-50-18-290:3:----- pid 5574 at 2020-05-01 09:50:18 ----- /data/anr/anr_2020-05-01-09-50-18-290:285:----- Waiting Channels: pid 5574 at 2020-05-01 09:50:18 -----
The dump file for the terminated KitchenSink process is located at
/data/anr/anr_2020-05-01-09-50-18-290
. Start your analysis using the terminated process's ANR dump file.