We've had a number of NTP and clock based anomalies over approximately the past 12 to 18 hours. Upon further investigation, as unlikely as this might seem, we think the Hypervisor may be presenting an incorrect time to the guest.
Whilst I'll discuss a specific host in this ticket we've seen it in multiple places over this time period. All examples have been in eu-west-1, have been spot instances, although the instance type varies.
With no ntp daemon running the following can be observed:
# ntpdate 0.amazon.pool.ntp.org && sleep 900 && ntpdate 0.amazon.pool.ntp.org 13 Dec 13:42:04 ntpdate: step time server 188.8.131.52 offset 4.531351 sec 13 Dec 13:57:29 ntpdate: step time server 184.108.40.206 offset 16.268760 sec
That is to say ntpdate corrected 4.53 seconds of skew, we waited 15 minutes, and then 16 seconds of lag were then corrected with the
xen clock source.
We also see the same thing with
tsc as a source. However, given that these are all PV instances we believe the time source is broadly the same:
root@ip-172-31-20-47:/var/lib/ntp# echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource root@ip-172-31-20-47:/var/lib/ntp# cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc root@ip-172-31-20-47:/var/lib/ntp# ntpdate 0.amazon.pool.ntp.org && sleep 900 && ntpdate 0.amazon.pool.ntp.org 13 Dec 14:00:25 ntpdate: step time server 220.127.116.11 offset 3.049115 sec 13 Dec 14:15:48 ntpdate: step time server 18.104.22.168 offset 16.155150 sec