Fault tolerance, TimeKeeper, and PTP Proposed Enterprise Profile

One of the most interesting things we saw in the proposed IEEE 1588 enterprise profile was a bold suggestion on fault tolerance that looked familiar. Here’s FSMLabs press release from September 2011

TimeKeeper 5.0 offers the ability to monitor multiple time distribution channels, even those operating on different time distribution standards or of different quality due to distance or network issues. As an example, a TimeKeeper client may monitor two different Precision Time Protocol (PTP) “master clocks” and three different Network Time Protocol (NTP) servers. In addition, if the time quality of TimeKeeper’s primary sources becomes questionable, TimeKeeper can now switch from tracking one time source to another, according to a fail-over list provided at configuration time.

This press release described products that were already in the field in production. I remember that although customers liked this capability, talks at timing conferences often provoked complaints from engineers who insisted that the PTP “Best Master Clock” protocol already solved the problem. Anyways, it was gratifying to see that by February 2015 a similar, scaled down, capability was being proposed for the PTP Enterprise Profile.

Clocks SHOULD include support for multiple domains. The purpose is to support multiple simultaneous masters for redundancy. Leaf devices (non-forwarding devices) can use timing information from multiple masters by combining information from multiple instantiations of a PTP stack, each operating in a different domain. Redundant sources of timing can be ensembled, and/or compared to check for faulty master clocks. The use of multiple simultaneous masters will help mitigate faulty masters reporting as healthy, network delay asymmetry, and security problems. Security problems include man-in-the-middle attacks such as delay attacks, packet interception / manipulation attacks. Assuming the path to each master is different, failures malicious or otherwise would have to happen at more than one path simultaneously. Whenever feasible, the underlying network transport technology SHOULD be configured so that timing messages in different domains traverse different network paths.

Note that there are three things missing from this proposal that were in TimeKeeper 5.0 back in 2011: the ability to use NTP sources as well as PTP, the ability to use multiple PTP sources in the same domain, and working software. Stating “SHOULD” in a standard is a long way from “works in the field” but recognition of the problem is a good step.

TimeKeeper is now at version 7.