Thursday, May 21, 2015
If you don’t have a second to spare, you soon will! On June 30, 2015 at precisely 23:59:60 UTC, the world will experience its 26th recorded leap second. It will be the third one experienced by Google. If you use Google Compute Engine, you need to be aware of how leap seconds can affect you.
What is a leap second?
It's sort of like a very small leap year. Generally, the Earth's rotation slows down over time, thus lengthening the day. In leap years, we add an extra day in February to sync the calendar year back up with the astronomical year. Similarly, an extra second is occasionally added to bring coordinated universal time in line with mean solar time. Leap seconds in Unix time are commonly implemented by repeating the last second of the day.
When do leap seconds happen?
By convention, leap seconds happen at the end of either June or December. However, unlike leap years, leap seconds do not happen at regular intervals, because the Earth's rotation speed varies irregularly in response to climatic and geological events. For example, the 2011 earthquake in Japan shortened the day by 1.8 microseconds by speeding up the Earth's rotation.
How does Google handle this event?
We have a clever way of handling leap seconds that we posted about back in 2011. Instead of repeating a second, we “smear” away the extra second. During a 20-hour “smear window” centered on the leap second, we slightly slow all our servers’ system clocks (by approximately 14 parts per million). At the end of the smear window, the entire leap second has been added, and we are back in sync with civil time. (This method is a little simpler than the leap second handling we posted back in 2011. The outcome is the same: no time discontinuities.) Twenty hours later, the entire leap second has been added and we are back in sync with non-smeared time.
Why do we smear the extra second?
Any system that depends on careful sequencing of events could experience problems if it sees a repeated second. This problem is accentuated for multi-node distributed systems, because a one second jump dramatically magnifies time sync discrepancies between multiple nodes. Imagine two events going into a database under the same timestamp (or even worse, the later one being recorded under an earlier timestamp), when in reality one follows another. How would you know later what the real sequence was? Most software isn't written to explicitly handle leap seconds, including most of ours. During the 2005 leap second, we noticed various problems like this with our internal systems. To avoid changing all time-using software to handle leaps correctly, we instead attempt to make leaps invisible by adding a little bit of the extra second to our servers' clocks over the course of a day, rather than all at once.
What services does this apply to on Google Cloud Platform?
Only Virtual Machines running on Google Compute Engine are affected by the time smear as they are the only entities that can manually sync time. All other services within Google Cloud Platform are unaffected as we take care of that for you.
How will I be affected?
All of our Compute Engine services will automatically receive this “smeared” time, so if you are using the default NTP service (metadata.google.internal) or the system clock, everything should be taken care of for you automatically (note that the default NTP service does not set the Leap Indicator bit). If, however, you are using an external time service, you may see a full-second “step”, or perhaps several small steps. We don’t know how external NTP services will handle the leap second, and thus cannot speculate on exactly how time will be kept in sync. If you use an external NTP service with your Compute Engine virtual machines, you should be prepared to understand how those time sources handle the leap second, and how that behavior might affect your applications and services. If possible, you should avoid using external NTP sources on Compute Engine during the leap event.
The worst possible configuration during a leap second is to use a mixture of non-smearing and smearing NTP servers (or servers that smear differently): behavior will be undefined, but probably bad.
If you run services on both Google Compute Engine and other providers that do not smear leap seconds, you should be aware that your services can see discrepancies in time during the leap second.
What is Google's NTP service?
From inside a virtual machine running on Google Compute Engine, you can use metadata.google.internal. You can also just use the system clock, which is automatically synced with the smeared leap second. Google does not offer an external NTP service that advertises smeared leap seconds.
You can find documentation about configuring NTP on Compute Engine instances