Comparing NTP and PTP (2015)

Choosing between PTP and NTP

Many of our customers ask about the trade-offs between using PTP (IEEE 1588) and NTP protocols for time synchronization. Since TimeKeeper supports both I thought it would be a good idea to highlight a few of the differences between the two and why to choose one over the other.  (Another post provides a more business oriented analysis.)

The executive summary from this article is: if you’re using a high quality implementation of NTP or PTP (such as TimeKeeper) they’re both equivalent and both can give accuracy better than 1 microsecond, often into the low hundreds of nanoseconds.  If you use low quality implementations, quality and accuracy can vary greatly.

First off: there is no accuracy difference between the two protocols, although PTP vendors tend to preach otherwise because of the false belief that PTP requires a large investment in new hardware. People who say that PTP is better are often selling PTP.  Unfortunately, it has become common wisdom that “NTP cannot deliver accuracy better than 1 ms” or “PTP accuracy is far superior to NTP” but neither of those are correct.  Where PTP significantly outperforms NTP, it’s due to a poor NTP implementation, and not a problem with the protocol. There is no secret protocol magic in either.  Both protocols do the same thing and carry the same information.  They do differ in how they operate, and depending on your environment one may be a better choice than the other.  The difference in performance that you’re going to get on your network depends on the implementation of the NTP/PTP hardware and software and not the protocol itself.

You can run both NTP and PTP together - on the same network, at the same time.  With TimeKeeper you can serve both from the same device - giving out the same time.  You can also track both on the same client at the same time with TimeKeeper.  That allows you to compare the performance of the two directly, so there’s no guessing which is a better fit.

Here’s an example of TimeKeeper tracking a remote NTP and PTP source, demonstrating an offset in the PTP signal: x

Now let’s look at some of the significant differences in how the protocols operate.

Current (legacy) generation network time appliances

Most current network appliances are best considered as ‘legacy’, since they lag behind current network and timing requirements. They tend to have 100mbit and may provide 1G ports as an upgrade option. (This is unchanged from when this topic was first covered in 2012.) Appliances may make claims of PTP accuracy, but if NTP is supported there are few if any claims made. Vendors just get a copy of the reference implementation of NTP (ntpd, from the University of Delaware) rather than creating a more accurate version.  The accuracy and performance of ntpd is well known to be poor, and this lack of accuracy is the root of many of the misconceptions mentioned above. Downloading and putting this software on a time server appliance, as most vendors do, delivers the same poor accuracy as ntpd delivers anywhere else.

With these older devices you’re going to see poor NTP performance but generally good PTP performance.  That’s just a result of the quality of the PTP and NTP implementations on those boxes - not due to differences in the protocols.

TimeKeeper is a next-generation, top-notch timing server and client with high quality implementations of NTP and PTP. With our high quality implementation, performance and accuracy shows the same accuracy for both PTP and NTP - in the low hundreds of nanoseconds.  TimeKeeper was designed from the ground-up to distribute and track time sources in the world of high frequency trading where accuracy and performance is paramount.  Using a TimeKeeper-based appliance to serve or track time means you’re going to see next-generation performance from it.

Now, onto some specific protocol differences.

Network traffic - which is more scalable?

NTP exchanges are initiated by the client - it sends a request to the server and the server responds, allowing the client to get the current time and calculate the one way delay. So, for every time update of a client there are 2 messages on the network.  PTP updates are initiated by the server, with two messages broadcast from the grandmaster.  Slaves then send a message to the master and receive a response in order to compute the one-way delay.  There are other types of packets that are exchanged periodically with PTP also. Generally, as the number of clients scales up you’ll see slightly more traffic using PTP compared to NTP. Only in very large installations is the additive traffic a signficant concern. In many cases the difference in traffic won’t be noticed at all.

Both NTP and PTP are optimally accurate at one update per second in most deployments. NTP is a unicast protocol in most cases.  That’s a point-to-point UDP transaction.  PTP defaults to a multicast protocol - so broadcast messages are sent across the network for every packet.  PTP can have a much heavier impact on the network since broadcast messages delivered everywhere, including messages that really only have to go from the client to the grandmaster. There are different profiles available that cause PTP to be less multicast dependent, and behave more like NTP. With N clients using NTP expect to see roughly:

2N packets per time update. With N clients using a common hybrid unicast/multicast PTP (more details below) expect to see roughly: 2 + 2N packets per time update.

Multicast headaches, network policy and ports

As mentioned above, PTP is usually distributed as multicast.  That can be a hassle to adminster because it can generate extra traffic and requires special rules to forward between network segments - especially between distant sites.  NTP is unicast, requires no special routing rules, and only the sender and intended recipient have to see or handle those packets.

The UDP port used for NTP (port 123) is often already open and network administrators know what it is, as it’s been a standard on the internet for decades.  PTP uses 2 UDP ports (it also has a raw ethernet mode, but we’re not concerned with that here) - port 319 and port 320.  They’re broadcast ports so they require rules for forwarding and multiple-hop connections. That is another set of ports to open up that may be blocked on your network.

PTP supports several profiles, allowing for different amounts of multicast data. Deployments may be fully multicast - every client sees every update from the grandmaster and every request from every other client to the grandmaster. A hybrid approach is also an option, where time updates are broadcast from grandmaster to clients, which then make unicast delay requests back to the server. Taken further, PTP can be fully unicast like NTP, where the client specifically sends unicast time updates to each client, and each client makes unicast requests back to the server to calculate network delays. TimeKeeper supports the whole spectrum, allowing users to find the right fit for their environment.

With PTP, switch capabilities also are worth considering. Switches that act as boundary clocks or transparent clocks have specific PTP knowledge in order to manipulate PTP traffic that is flowing by. Some switches handle these tasks well, some can degrade the PTP accuracy or break the protocol entirely. Many times in diagnosing a PTP distribution issue, we find a buggy switch is at fault.

Hardware timestamping NICs

Commodity network cards that can assist in time synchronization are nearly everywhere now.  In many cases they’re built into the motherboards of computers as standard NICs. These cards have built in oscillators that are used to timestamp packets when they come off the wire. Timestamping at the NIC improves accuracy greatly compared to software timestamping, as the operating system may be variably delayed in creating a timestamp depending on load. Some cards handle PTP as a special case and only timestamp PTP packets.  Others timestamp any packet, whether it is PTP, NTP, or anything else.

TimeKeeper supports hardware timestamping on just about any card that supports it, including cards from Mellanox, Solarflare, Intel, Broadcom, etc. Our tests across different hardware timestamping NICs available (both NTP and PTP) show about 250 nanosecond accuracy with both NTP and PTP where the cards will provide hardware timestamping.

Types of networks

Ideally, a quiet network offers the best accuracy for both PTP and NTP. Both protocols need to understand the delay involved in getting a time update from the server to the client, and on a quiet network that delay will be nearly constant. Traffic variability will introduce some noise in that delay time, but a good timing client like TimeKeeper will filter that noise out.

With PTP, a slave must explicitly request round-trip timing from the master in a special message and wait for the response.  With NTP the client is able to calculate the round-trip delay on every every message since each exchange provides all of the information.  This means that NTP is able to use that round-trip delay to correctly compute the current time when a message is received.  It can also determine if an NTP query has run into a delay and might be inaccurate.  Having access to this information as part of the protocol means NTP can discard bad values very easily.  PTP clients must be careful to avoid relying on possibly old and inaccurate estimates of the transit time of the master’s timing packets.  For that reason PTP might not be the best choice where the client can’t always send a delay request to correspond with every update from the grandmaster.

A particular implementation of PTP can choose to request a response from the server on every single time update and emulate NTP’s behavior.  That makes detecting and correcting for errors more feasible but it is not required by the protocol itself.  TimeKeeper operates in this way but not all implementations do.

Failover/redundancy

PTP has a built-in mechanism for handling failure of a server and using another.  How that’s done is defined in the PTP “Best Master Clock” algorithm (BMC). This algorithm allows multiple servers to broadcast time on the same network, but clients will select and use the same server based on their local execution of the algorithm.  If the client implementation obeys the PTP protocol fully, when one server fails or self-reported accuracy is reduced the client will begin using another. Note that this also assumes the server can correctly identify its own accuracy, which is not always the case.

While the BMC is helpful, it’s insufficient in practice. A specific failover scenario may be desired, but the BMC may behave differently based on the state of any existing grandmasters at the time of failure, and different clients can make different decisions.  Failover between PTP domains, specific Ethernet interfaces, and failover to another protocol is also outside the domain of the algorithm.

NTP has no builtin method for failover or selecting a new source based on accuracy. The University of Delaware version allows you to specify multiple servers that all contribute to determining the time.  When one fails the remaining ones are used but that’s often not what’s needed in environments that require a highly accurate sync. TimeKeeper allows for failing over from a PTP source based on accuracy and the health of the time source.  In this case the implementation makes up for some of the limits of the protocol - and even allows failing over between multiple NTP and PTP sources. It will also use the BMC to decide to failover where applicable. Here’s an example of how to handle failover with TimeKeeper:

SOURCE0() { PTPDOMAIN=3; }
SOURCE1() { PTPDOMAIN=2; IFACE=eth7; }
SOURCE2() { NTPSERVER=internalhost1; }

Here TimeKeeper will track PTP on domain 3 via the default route, and use the BMC algorithm to identify the best grandmaster there.  If that fails, due to the grandmasters going offline or an Ethernet failure, it will fail over to SOURCE1, which is PTP on a specific Ethernet interface on a separate domain. Should that fail, the same failover occurs to the NTP server in SOURCE2.  TimeKeeper can also cross check these sources if needed, so that if the primary PTP grandmaster is functioning but acting erratically, another source is selected until the grandmaster agrees with the other sources. Using just the PTP specification, an erratic grandmaster would still be used, even when it’s wrong, because the scope of the spec is limited. If it’s easier to understand what this would look like, below is a snapshot of the above configuration inside the TimeKeeper GUI. failover

Comparing with real numbers

No matter the implementation - you need real performance numbers to be sure which is better for your configuration.  It’s always best to run NTP and PTP alongside one another at the same time, crosschecked against a PPS input, under the same network load, even through the same network interface, to compare the performance of each.  Any implementation that doesn’t allow you to test like that is deficient and prevents you from making an informed decision regarding a very important part of your network. Similar to the above example, a validation of PTP and NTP at the same time can be set up easily, with a PPS to cross check both network feeds:

SOURCE0() { PPSDEV=/dev/ttyS1; }
SOURCE1() { PTPDOMAIN=0; IFACE=eth7; }
SOURCE2() { NTPSERVER=internalhost1; }

This will cause TimeKeeper to track both sources and provide accuracy information about how they both behave, at the same time, on the same Ethernet link, under the same load.  The PPS will be used as a stable input against which tthe NTP and PTP offsets are compared. Analysis of the resulting sync data makes the decision easy.

If you like, you can graph the differences in the offsets using the TimeKeeper web GUI, like in this screenshot: If you’ve got questions, or need more details on selecting the best fit for your environment, contact us at sales@fsmlabs.com.