Blog

Virtualization and Cloud time synchronization

05/19/2013

Although it is widely believed that precise time sync is impossible for virtual machines, TimeKeeper solves the problem.

TimeKeeper time synchronization client and server software has been managing distribution and quality of time on physical networks for years. It is now also capable of providing microsecond level time accuracy for virtual servers and networks

 PTP, NTP, PPS, GPS – low 100’s of nanoseconds accuracy
 Technique allows TimeKeeper to erase effects on time from
Guest OS, Host OS, Hypervisor, virtual networking
 Takes advantage of, does not require, assist from:
Solarflare, Mellanox, Intel, Broadcom hardware
 Act as NTP server, PTP GrandMaster, Boundary clock and client at the same time
 Requires no application changes – standard Linux time API
 Reading time is faster – actually improving performance
 Time Intelligence Platform gathers statistics from clients and peers, detects problems
 Auto-discovers your time network topology and displays it for single-pane-of-glass enterprise management
 Multiple simultaneous time sources for redundancy, security
 Detect, alert, self-heal from network failure/spoof attack
 Scalable operation, including support for remote servers outside the local network and high performance implementation of both PTP and NTP
 Advanced time network administration via web interface
 Integrates with existing NTP/PTP infrastructure
 Includes testing tools to validate sync quality

Read the data sheet:TimeKeeper_Virtual_Data_Sheet

Using TimeKeeper ‘s web interface to explore the time distribution network

04/04/2013

TimeKeeper’s web interface with it’s powerful data analysis and graphical presentation offers a way to make sense out of time distribution networks and to get an solid understanding of how time is being distributed, where the bottlenecks and points of failure can be found, and even the causes of possible time errors. The interface also makes it easy to configure the components in a time network and to make sure that the system will be reliable.Take a look:
Big_Maps_for_Big_Enterprise_final.pdf

Trading technology is not baking Twinkies

12/05/2012

A short discussion of “bakeoffs” and critical paths.

Sometimes we go into trading technology shops, especially at larger banks, and find ourselves participating in this strange thing called a “bakeoff”. The idea is to address a business technology requirement by gathering together a wide assortment of different technical ingredients, then running experiments, and finally producing a report evaluating the choices. The good thing for us is that our TimeKeeper product does really well in those cooking competitions and the recipe books become valuable sales tools for us to leverage. But in many instances, the “bakeoff” seems to be more of an obsolete institutional reflex than a sensible business practice.

Consider this common situation: several critical trading platforms handling large volumes of trades have known vulnerabilities to massive time failures. The software and hardware combination running those platforms has been shown to be inadequate. There are known solutions that could be introduced at once, solutions that have been shown to work in similar institutions and that have costs that are invisible given the scale of the trading system. Furthermore, technology staff is already overcommitted to work that is necessary to keep the business functioning. Yet, the institutional habit is to task the technology team with a bakeoff and to make that a blocking issue. Business managers are then prevented from implementing any fix, possibly for many months, while engineers try to find spare time to build out test systems and collect data. In fact, it is not unusual for the bakeoffs to never conclude as participants keep getting pulled out for urgent tasks. In the meantime, critical systems remain at risk.

Since Wall Street is such a revolving door, we have run into engineers participating in their third or fourth bake-off – each time for a different company and each time with the same results. As financial institutions try to switch from a craft model to a more industrial model and as they emphasize agility and innovation, they might want to consider less baking and more serving up a game changing solution.

It makes sense, particularly in larger institutions; to continually run tests of technology so that as market offerings and business unit requirements evolve, supporting technology teams can deliver best in class technology quickly. Business units benefit from knowing about new technologies, alternatives, and tradeoffs. But that type of evaluation shouldn’t be a critical path activity or used to justify forcing business units to wait for new technology that is already known to work. Even for baking companies with products that have an infinite shelf life, delays in innovation can be very costly.

Less baking, more eating.

VY

Risk mitigation for financial trading time tracking

11/02/2012

TimeKeeper brings to bear a significant technological advantage in reducing the risks to financial trading systems from fragility of time distribution and synchronization.

TimeKeeper:

Financial trading firms are learning from bitter experience that unmanaged technology risk issues can be costly. Time management and synchronization must be part of any effort to manage technology risk. When trades are moving at sub- microsecond speed, equivalent precision in timing networks is a non-negotiable necessity.
Time management risks are magnified both by “time sync client” software that is not designed to any rigorous level of engineering and by weakness or failure in the clocks or communication network. 

Timing errors have been covered up by synchronization software that is notorious for: optimistic self-evaluation, weak quality checking and lack of error reporting. And timing problems, especially at the microsecond or lower level, are invisible to standard network management tools. TimeKeeper® changes all that. As users have discovered, it doesn’t just synchronize and distribute time to sub-microsecond accuracy. It also polices the network in the course of monitoring and logging time quality, and providing failover resilience to failing time sources.

Here are examples:

Case 1: Errors being sent to applications. In a pilot set-up, TimeKeeper identified significant fluctuation from a time source in the data center. The prospective customer discovered that the old synchronization software was adjusting network time without any checks or warnings and providing incorrect time to applications.

  • Case 2: Flawed network performance. TimeKeeper alerted IT staff to an unreliable time signal over a leased WAN. Trace logs found the WAN was not performing to its SLA.
  • Case 3: Hardware breakdown. TimeKeeper generated alerts of an error condition in the network, leading IT staff to a bad network port in a switch in a data center.
  • Case 4: Heat stress. TimeKeeper alerted to a wildly inaccurate time feed from the data center.  Investigation found the cooling turned off in the server room.
  • Case 5: Time source failover. TimeKeeper alerted IT staff and switched to secondary time source when the primary GPS clock provided as an (expensive) service from the data center produced low quality time feed.  Investigation found that critical parts of the time protocol had been silently blocked by data center IT staff policies.

    Foundations of Risk Reduction

    TimeKeeper reduces the trading algorithm errors produced by bad timie data. It is designed to be a key component in risk reduction for financial trading applications by exposing latency, locking in to precise time, and gracefully handling network and time errors

    A printable version of this post is here

  • Observing the leap second

    07/02/2012

    A leap second was inserted June 30, 2012 - and we wanted to record how the leap second was handled via the various protocols - including PTP, NTP, and GPS, in both off the shelf software installs and with the latest network appliances. We also wanted to capture the network traffic over the period for later dissection.  With TimeKeeper, this was easy:

    With TimeKeeper, we easily tracked a group of sources with this configuration:

    # Track pulse from serial port, get time of day from SOURCE8
    SOURCE0() { PPSDEV=/dev/ttyS1; MAJORTIME=SOURCE8; }
    # Track internal NTP server appliance at 4 queries/second
    SOURCE1() { NTPSERVER=10.0.0.90; NTPSYNCRATE=4; }
    # Track additional internal NTP server appliance at 4 queries/second
    SOURCE2() { NTPSERVER=10.0.0.91; NTPSYNCRATE=4; }
    # Track PTP grandmaster on domain 80 (It gets its time from GPS)
    SOURCE3() { PTPCLIENTVERSION=2;PTPDOMAIN=80; }
    # Track internal off the shelf Linux ntpd server at 4 samples/second
    SOURCE4() { NTPSERVER=10.0.0.110; NTPSYNCRATE=4; }
    # Track public NTP server at the default query rate
    SOURCE5() { NTPSERVER=tick.uh.edu;} 
    # Follow a NIST server at the default query rate (US east coast)
    SOURCE6() { NTPSERVER=nist1.aol-va.symmetricom.com; }
    # Follow a NIST server at the default query rate (US west coast)
    SOURCE7() { NTPSERVER=nist1-sj.ustiming.org; } 
    # Track system time and provide time of day for SOURCE0
    SOURCE8() { PPSDEV=self; }
    # On startup, set the time directly to avoid an initial clock slew
    SET_TIME_ON_STARTUP=1
    # Capture relevant timing protocol data automatically for later analysis
    VERBOSE_TCPDUMP=1

    And that’s it! Multiple NTP servers tracked at different query rates, along with a PTP grandmaster, a PPS input directly from GPS, and both off the shelf ntpd behavior and network appliance NTP are represented.  All sources were modeled and logged throughout the transition.  For timing folks, it was a pretty exciting Saturday night.

    Choosing between PTP and NTP

    06/28/2012

    Many people ask about the trade-offs between using the PTP (IEEE 1588) and the NTP protocols for time synchronization.  Since TimeKeeper supports both I thought it a good idea to highlight a few of the differences between the two and why to choose one over the other.

    First off, there is no intrinsic performance difference between the two protocols.  That is, one is not necessarily going to provide significantly better time or has better accuracy than the other when implemented well.  Unfortunately, it has become common wisdom that “NTP cannot deliver accuracy better than 1 ms” or “PTP accuracy is far superior to NTP” but neither of those are correct.  There is no secret protocol magic in either.  Both protocols do the same thing and carry the same information.  They do differ in how they operate, and depending on your environment one may be a better choice than the other.  The difference in performance that you’re going to get on your network depends on the implementation of the NTP/PTP hardware and software and not the protocol itself.

    The executive summary from this article is if you’re using a high quality implementation of NTP or PTP (such as TimeKeeper) they’re both equivalent and both can give you single digit microsecond accuracy or better.  If you use low quality implementations, quality and accuracy can vary greatly.

    You can run both NTP and PTP together - on the same network.  With TimeKeeper you can serve both from the same device - giving out the same time.  You can also track both on the same client at the same time with TimeKeeper.  That allows you to compare the performance of the two directly, so there’s no guessing which is better.

    Now let’s look at some of the significant differences in how the protocols operate.


    Current (legacy) generation network time appliances
    ==============
    Many current network appliances qualify as being ‘legacy’ - because nearly all use a very poor implementation of NTP.  Most also use a far more modern and careful implementation of PTP.  The reason is that it’s much easier to get a copy of the reference implementation of NTP than it is to create a more accurate version.  It has been available for years from the University of Delaware.  The accuracy and performance of the University of Delaware ntpd is well known, and taking a free piece of software and putting it on an a time server appliance gives you that same level of accuracy.

    With these older devices you’re going to see poor NTP performance but generally good PTP performance.  That’s just a result of the quality of the PTP and NTP implementations on those boxes.

    TimeKeeper is a next-generation, top-notch implementation of NTP and PTP so server performance and accuracy shows the same quality - single digit microsecond or better.  TimeKeeper was designed from the ground-up for the job of distributing time and acting as a time client in the world of high frequency trading where accuracy is paramount.  Using TimeKeeper as a software appliance to serve time or using a time appliance built using the TimeKeeper software means you’re going to see next-generation performance from it.


    Network traffic - scalability
    ==============
    NTP exchanges are initiated by the client - it sends a request to the server and the server responds. For every time update of a client there are 2 messages on the network.  PTP time updates are done with two messages normally and are sent out by the master as broadcast messages.  Slaves occasionally (sometimes every update) send a message to the master and receive a response in order to compute the one-way delay between the client and master.  Generally, as the number of clients scales up you’ll see less traffic on the network using PTP compared to NTP.

    Only in very large installations is the additive traffic a signficant concern, and even then, it’s rarely an issue. In most cases the difference in traffic won’t be noticed at all given the update frequency.  Both NTP and PTP tend to peak at one update per second in most deployments.

    NTP is also a unicast protocol in most cases.  That’s a point-to-point UDP transaction.  PTP defaults to a multicast protocol - so broadcast messages are sent across the network for every exchange.  PTP can have a much heavier footprint since broadcast messages are going to be delivered everywhere.

    With N clients on the network using NTP you can expect to see:

        2*N packets per time update.

    With N clients on the network using PTP you can expect to see:

        2 + 2*N packets per time update.


    Multicast headaches, network policy and ports
    ==============
    As mentioned above, PTP is multicast.  That can be a hassle to adminster because it can generate extra traffic and requires special rules to forward between network segments - especially between distant sites.  NTP is unicast, requires no special routing rules, and only the sender and intended recipient have to see or handle those packets.

    The UDP port used for NTP (port 123) is often already open and network administrators know what it is, as it’s been a standard on the internet for decades.  PTP uses 2 UDP ports (it also has a raw ethernet mode, but we’re not concerned with that here) - port 319 and port 320.  They’re broadcast ports so they require rules for forwarding and multiple-hop connections. It’s another set of ports to open up that may be blocked on your network. PTP does have a unicast mode that can make it more NTP-like in behavior, though.


    Hardware assist
    ==============
    Commodity network cards that can assist in time synchronization have started appearing on the market.  In fact - they’re sometimes built into the motherboards of computers as standard NICs.

    Some cards know what PTP is and can only assist with PTP packets.  Other can assist in timestamping any packet, whether it is PTP, NTP, or any packet you like.

    Some current generation 1G Intel cards (82580 chips) can timestamp any packet - and TimeKeeper takes advantage of that.  The 10G Intel cards support hardware timestamps in the silicon but as of this writing no Linux driver exists to take advantage of that.  That means you can get hardware assistance with both NTP and PTP - so there’s no tradeoff.

    The latest (as of this writing) Solarflare and Mellanox cards include hardware timestamps for PTP packets only.

    The assistance from NICs can increase the synchronization accuracy a great deal - an order of magnitude sometimes depending on the software implementation.  It can bring accuracy down to well below 1 microsecond.  Our tests across the hardware timestamping NICs available (both NTP and PTP) show about 250 nanosecond to 500 nanosecond accuracy.


    Quiet network vs. busy network
    ==============
    PTP performs best when run on a quiet network - perhaps even a subnet dedicated to time synchronization only.  The reason is that PTP clients may only query the round-trip time of a packet on the network occasionally and assumes that this value does not change very much or very often.

    With the PTP protocol, a slave must explicitly request round-trip timing from the master in a special message and wait for the response.  In the NTP protocol the client is able to calculate the round-trip delay on every every message since each exchange reliably provides all of the information.  This means that NTP is able to use that round-trip delay to correctly compute the current time when a message is received.  It can also determine if an NTP query has run into a delay and might be inaccurate.  Having access to this information as part of the protocol means NTP can discard bad values very easily.  PTP clients, on the other hand, may rely on possibly old and inaccurate estimates of the travel time of the master’s sync message to compute the current time.  For that reason PTP might not be the best choice on a shared network with a great deal of traffic that could delay time synchronization messages.

    A particular implementation of PTP can choose to request a response from the server on every single time update and emulate NTP’s behavior.  That makes detecting and correcting for errors more feasible but it is not required by the protocol itself.  TimeKeeper does operate in this way but not all implementations do.


    Failover/redundancy
    ==============
    PTP has a built-in mechanism for handling failure of a server and using another.  How that’s done is defined in the PTP “Best Master Clock Algorithm” (BMC or BMCA). This algorithm allows multiple servers to broadcast time on the same network, but all clients will eventually select and use the same server.  If the client implementation obeys the PTP protocol fully, when one server fails or self-reported accuracy is reduced the clients will begin using another. This also assumes the server can correctly identify its own accuracy.

    While the BMCA is helpful, it can be insufficient in practice. A specific failover scenario may be desired, but the BMCA may behave differently based on the state of any existing grandmasters at the time of failure.  Failover between PTP domains and specific Ethernet channels is also outside the domain of the algorithm.

    NTP has no builtin (protocol) method for failover or selecting a new source based on accuracy. The University of Delaware version allows you to specify multiple servers that all contribute to determining the time.  When one fails the remaining ones are used but that’s often not what’s needed in environments that require a highly accurate sync.

    TimeKeeper allows for failing over from a NTP source based on accuracy and the health of the time source.  In this case the implementation makes up for some of the limits of the protocol - and even allows failing over between multiple NTP and PTP sources. It will also use the BMCA when it can.

    Here’s an example of how to handle failover with TimeKeeper:

    SOURCE0() { PTPDOMAIN=30; }
    SOURCE1() { PTPDOMAIN=20; IFACE=eth7; }
    SOURCE2() { NTPSERVER=internalhost1; }
    SOURCE3() { NTPSERVER=internalhost2; IFACE=eth7; }

    Here TimeKeeper will track PTP on domain 30 via the default route, and use the BMC algorithm to identify the best grandmaster.  If that fails, due to the grandmasters going offline or an Ethernet failure, it will fail over to SOURCE1, which is PTP on a specific Ethernet interface on a separate domain. Should that fail, there are two backup NTP servers.  (TimeKeeper can also cross check these sources and reject even the primary PTP source if it appears to be behaving strangely, even if the BMCA has identified a specific server.)


    Comparing with real numbers
    ==============
    No matter the implementation - you need real performance numbers to be sure which is better for your configuration.  It’s always best to run NTP and PTP alongside one another at the same time, under the same network load, even through the same network interface, to compare the performance of each.  Any implementation that doesn’t allow you to test like that is deficient and hampers you quite a bit in making an informed decision.

    Similar to the above example, a validation of PTP and NTP at the same time can be set up easily:

    SOURCE0() { PTPDOMAIN=0; IFACE=eth7; }
    SOURCE1() { NTPSERVER=internalhost1; IFACE=eth7; }

    This will cause TimeKeeper to track both sources and provide accuracy information about how they both behave, at the same time, on the same Ethernet link, under the same load.  Analysis of the resulting sync data makes the decision easy.

    If you’ve still got questions, or need more details on selecting the best fit for your environment, contact us at .(JavaScript must be enabled to view this email address).

    White paper: Resilient Networks with TimeKeeper

    04/09/2012

    Time distribution is inherently fragile. Network links fail or slow down, router and switch problems, traffic spikes, failures in time clocks, transient failures, GPS interruptions or spoofing, and even load on the local server can all introduce delays or asymmetries or service interruptions that can interfere with the delivery of reference time. An automated trading system or other transaction system that depends on precise time is exceptionally vulnerable to such failures - especially because some widely used client software often fails silently without even recording problems that could cause time slippage.

    TimeKeeper addresses time source fragility on both the client and server sides.
    Client
    On the client side, time problems
    • can trigger failover,
    • are detected and noted in the error log,
    • can be set to generate emails to operators and/or SNMP messages to monitoring programs.
    Any number of sources can be used. The TimeKeeper configuration file to the right sets up at least three time reference sources (the PTP 1588 source may allow multiple master clocks to take over from each other) in a cascade.
    [ See more in the full white paper]

    GPS vulnerability

    03/18/2012

    And article in The Economist describes some of the vulnerabilities of the GPS system - one reason FSMLabs has invested so much effort in developing fault detection and recovery methods for TimeKeeper

    In November America’s National Space-Based PNT Advisory Board said deliberate disruption of GPS was becoming more common, and that the systems in place to find and stop jammers were insufficient. It called for the rapid development of new ways to shut down sources of interference, new laws to punish offenders more harshly and for alternative, non-GPS-based backup systems to be deployed.
    The Economist

    Customer complaints

    02/14/2012

    “I saw a 1.2us peak offset today on the server running timekeeper “. We’re working on it. But this is on a heavily used network. There appears to be a problem with the network card. It’s a pessimistic estimate of the error. And this is a peak offset.

    Virtual Machines and Time Synchronization

    11/02/2011

    TimeKeeper’s advanced algorithms can compensate for virtual machine difficulties in keeping accurate time.

    The problem with a virtual machine as far as time synchronization goes is that the VM will suspend for seconds at a time while the underlying operating systems work on something else. As it becomes more common to migrate virtual machines within a cluster or even a larger data center   in order to load balance, this problem gets worse. A VM waking up after say 5 seconds has passed will detect a huge gap between its current time and the time provided by the clock. After a day or two, it’s not unusual to see a VM that is seconds out of sync. But a VM running TimeKeeper will recover its footing rapidly and smoothly as the algorithms TK uses to compensate for network packet delays and oscillator variation produce a quick convergence with a time source. You can hook a VM running timekeeper up to an NTP source and expect pretty good time tracking. This is not something we were initially targeting - for financial trading software the random delays of VMs are not acceptable - but it’s an interesting side effect. If VMs are being used for large scale database processing or map-reduce, the stability of a TK validated timestamp may improve data reliability and also reduce locking overhead.

    SEARCH

    BLOG

    PRESS RELEASES

     

    SPOTLIGHT

    email FSMLabs       512-263-5530