Skip to content

Instantly share code, notes, and snippets.

Last active May 16, 2021
What would you like to do?

We've had a number of NTP and clock based anomalies over approximately the past 12 to 18 hours. Upon further investigation, as unlikely as this might seem, we think the Hypervisor may be presenting an incorrect time to the guest.

Whilst I'll discuss a specific host in this ticket we've seen it in multiple places over this time period. All examples have been in eu-west-1, have been spot instances, although the instance type varies.

Consider that /sys/devices/system/clocksource0/current_clocksource returns xen.

With no ntp daemon running the following can be observed:

# ntpdate && sleep 900 && ntpdate
13 Dec 13:42:04 ntpdate[4889]: step time server offset 4.531351 sec
13 Dec 13:57:29 ntpdate[5639]: step time server offset 16.268760 sec

That is to say ntpdate corrected 4.53 seconds of skew, we waited 15 minutes, and then 16 seconds of lag were then corrected with the xen clock source.

We also see the same thing with tsc as a source. However, given that these are all PV instances we believe the time source is broadly the same:

root@ip-172-31-20-47:/var/lib/ntp# echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
root@ip-172-31-20-47:/var/lib/ntp# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
root@ip-172-31-20-47:/var/lib/ntp# ntpdate && sleep 900 && ntpdate
13 Dec 14:00:25 ntpdate[5786]: step time server offset 3.049115 sec
13 Dec 14:15:48 ntpdate[8165]: step time server offset 16.155150 sec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment