The Best Master Clock (BMC) algorithm is a key part and key weakness of the PTP standard. The proposed enterprise profile for PTP calls it “the so-called ‘best master clock’” algorithm because it doesn’t actually pick the best master clock in an enterprise or telco environment. The BMC requires that each clock advertise a level of accuracy and the clients pick the best one (hence the name). In enterprise and telco environments, however, the accuracy of the delivered time usually has much more to do with packet delays than with the accuracy of the clock itself so the BMC uses the wrong metric.
If you have two grand masters, one in the same data center as the clients, connected to them over 10G networks and one that is connected by a dial up modem on a phone at 3800 baud, and the first advertises itself as being 100 nanoseconds from GPS time while the second advertises itself as 50 nanoseconds from GPS time, the BMC says the client is stuck with the second one. The original PTP standard does not have any smart way of improving this situation because it’s designed for dumb clients on 1 wire networks, not smart clients on a complex enterprise networks. Even worse, the original PTP standard has no way to deal with bad grand master clocks that send out bad time but claim to be accurate. When the Euronext system failed 2 years ago, a broken grand master clock went off time by 35 seconds but kept claiming to be accurate down to the nanosecond. According to the original PTP standard, the clients would be forced to keep using that bad time even if they could detect the problem. In fact, if you interpret the basic PTP standard in a way that makes it easy to write dumb clients, the Grand Masters themselves are supposed to turn off if they see a “better” time source. So in the situation described in the preceding paragraph, the clock on the 10G network is supposed to go silent and let the packets coming over the dial up telephone modem control the clients. That’s why the proposed Enterprise PTP Profile has that snide line about the “so-called best master clock” algorithm. BMC often chooses a clock that is far from the best.
TimeKeeper is based on a smart client approach. Our clients will switch to a better PTP or NTP source from a clock that goes wrong because the primary goal is to track time accurately even when the standard is inadequate. TimeKeeper stays compatible with the standards but makes decisions in a wider context. We run multiple client contexts for both PTP and NTP at the same time and use real-time data analysis to figure out which one gets to control the clock. This system is used in some of the most demanding transaction systems in the financial trading markets.
PTP-2008, the latest version of the standard, makes some halting steps towards fixing the BMC problem through complex (untested and unimplemented) ideas like Grand Master Clustering. PTP-2008 also adds the possibility of “profiles” that can replace or amend BMC at the cost of making the standard less standard. Both the Telecom Profile and the proposed Enterprise Profile explicitly develop ways defeat the BMC. Here’s something from the Enterprise Profile proposal:
Slave clocks MUST be able to operate properly in network which contains multiple Masters in multiple domains. Slaves SHOULD make use of information from all the Masters in their clock control subsystems.
We think that’s a great idea – which is why our client software has been managing multiple PTP and NTP sources for years. I used to give talks at time conferences on our fault-tolerance methods and encounter either incomprehension or lectures about how the BMC solved the problem already. It is gratifying to see wider appreciation of the limitations of the BMC even if our solutions have not yet been appreciated (except by our customers). Below is a screenshot of a TimeKeeper instance monitoring three time sources against each other.