Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
After installing Ubuntu 20.04 my machine frequently wakes from sleep for no obvious reason. This gist describes my attempts to resolve this.

Diagnosing spurious wakes

This gist documents my attempts to get to the bottom of spurious wakes after installing Ubuntu 20.04 LTS on my system.

Initially, I thought it might be another system on my network sending Wake-on-LAN (WoL) packets. Then I thought it might be a known XHCI spurious wake kernel issue. And lastly, I finally resolved things by actively disabling the ability of USB devices, e.g. the mouse, to wake the system.

Update: I later came up with a better way of disabling wake-on-mouse that's covered here.

Note: as one of these steps, I upgraded the system BIOS - while this didn't resolve this particular issue, it did resolve an annoying issue with the graphic state not being properly restored for certain applications after wake-up.

First attempt

Look at logs:

$ journalctl | less

Go to end - G - and search backwards with ? for sleep:

Dec 08 02:54:41 ghawkins-OptiPlex-3020 kernel: rfkill: input handler disabled
Dec 08 02:54:41 ghawkins-OptiPlex-3020 kernel: Generic FE-GE Realtek PHY r8169-300:00: attached PHY driver [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
Dec 08 02:54:41 ghawkins-OptiPlex-3020 NetworkManager[637]: <info>  [1607392481.5050] manager: sleep: wake requested (sleeping: yes  enabled: yes)
Dec 08 02:54:41 ghawkins-OptiPlex-3020 NetworkManager[637]: <info>  [1607392481.5051] device (enp3s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'managed')

I'm guessing that something on the network has requested a wake via enp3s0 (the primary network interface).

ifconfig seems to out of favor and no longer installed so instead:

$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    ...
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether f8:bc:12:64:e6:2b brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.125/24 brd 192.168.0.255 scope global dynamic noprefixroute enp3s0
       valid_lft 56382sec preferred_lft 56382sec
    inet6 2a02:aa16:577d:df80::149/128 scope global dynamic noprefixroute 
       valid_lft 56385sec preferred_lft 56385sec
    inet6 2a02:aa16:577d:df80:7c5d:23f7:19fd:c2e0/64 scope global temporary dynamic 
       valid_lft 574785sec preferred_lft 56229sec
    inet6 2a02:aa16:577d:df80:1a3e:567a:d75a:57c5/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 1194771sec preferred_lft 589971sec
    inet6 fe80::d716:5396:c3b8:fc9f/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:d7:2f:32:9b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0

Then, from the Arch WoL wiki page:

$ sudo apt install ethtool
$ ethtool enp3s0
...
Cannot get wake-on-lan settings: Operation not permitted
...

Which might lead one to believe WoL isn't available, but...

$ sudo ethtool enp3s0
...
Supports Wake-on: pumbg
Wake-on: d
...

d apparently means its disabled, so this idea looks like a dead end.

But let's keep trying...

Using Network Manager to find its name for the primary network interface:

$ nmcli con show

It turns out that it's Wired connection 1, so let's look at its settings:

$ nmcli c show 'Wired connection 1'

It pipes its output through less, seach for wake and it shows:

802-3-ethernet.wake-on-lan:             default
802-3-ethernet.wake-on-lan-password:    --

Apparently, this would value would have to be magic or some other reason for WoL. But it can also be set to ignore to more definitely mean no WoL.

TODO: try the /etc/NetworkManager/conf.d/wake-on-lan.conf described on the Arch page to set the 802-3-ethernet.wake-on-lan to ignore.


To trigger WoL from another machine, work out the broadcast address of the subnet that your machine is on:

$ ip address show enp3s0 | sed -n '/inet .* brd / s/.* brd \([0-9\.]*\) .*/\1/p'
192.168.0.255

And get the MAC address:

$ ip link
...
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether f8:bc:12:64:e6:2b brd ff:ff:ff:ff:ff:ff
               ^^^^^^^^^^^^^^^^^

Then, in my case, using a Mac on the same subnet:

$ brew install wakeonlan
$ wakeonlan -i 192.168.0.255 f8:bc:12:64:e6:2b

This did not trigger my machine to wake. But was my machine receiving the packets? To check...

On the receiving machine (while woken):

$ sudo apt install ngrep
$ sudo ngrep '\xff{6}(.{6})\1{15}' -x port 9

Then back on the Mac, redo the above:

$ wakeonlan -i 192.168.0.255 f8:bc:12:64:e6:2b

And I can see that the packet is seen. In fact I can also see that the default broadcast address - 255.255.255.255 - works fine:

$ wakeonlan f8:bc:12:64:e6:2b

Check WoL in BIOS:

  • Press F2 repeatedly (do not hold down) during restart
  • Go to Power Management / Wake on LAN

This showed WoL is disabled at the BIOS level.


Is there an issue with the BIOS? Let's update it...

First determine the serial number and BIOS version:

    $ sudo inxi -F | fgrep -A1 Machine
    Machine:   Type: Desktop System: Dell product: OptiPlex 3020 v: 01 serial: 7abcdef2 
               Mobo: Dell model: 0VHWTR v: A02 serial: /75NTP02/CN7016343407OO/ BIOS: Dell v: A03 date: 04/14/2014 

So the serial number is 7abcdef2 and the BIOS version is A03. Using the serial number find the latest BIOS at DELL support.

It turns out that it's A20 - so there have been quite a lot of updates since A03.

The update is a .exe so this necessitates booting to Windows or DOS.

Note: DELL do support updating the BIOS etc. in Linux but only for RHEL - see here.

After various dead ends, this didn't prove too difficult - the main dead end was this AskUbuntu answer. If SystemRescueCD ever supported FreeDOS, it does not anymore.

So the obvious answer would seem to be to use FreeDOS directly. Unfortunately, many web pages and, amazingly, their own wiki have out of date instructions on how to create a bootable FreeDOS USB stick.

In the end it's easy:

  • Download their Lite USB zip.
  • Unpack it and use Etcher to create a bootable USB key from the .img file that was in the zip.

Oddly, this process creates a disk that's partitioned to be only exactly big enough for its contents and no more so I couldn't copy the BIOS update .exe to the drive and I had no luck booting to FreeDOS and getting it to see another USB stick with the .exe on it.

After a number of dangerous experiments where I almost repartioned my primary disk, I came to the conclusion that it is not currently possible to resize the main partition on the USB stick (parted says they're working on support for DOS disks but currently only support EXT etc.).

It turns out that the resulting disk isn't really a live CD, it's primarily an installation disk for FreeDOS. So in the end I just freed up space by deleting some of the packages that it would install but doesn't need immediately:

$ cd /media/$USER/FD-SETUP/FDSETUP/PACKAGES/
$ rm -rf UTIL
$ cd ../..
$ cp ~/Downloads/O3020A20.exe .

O3020A20.exe is the BIOS update downloaded from DELL.

Then I booted the system from the FreeDOS USB stick - for whatever reason, I had to shutdown and start the system, simply rebooting just rebooted to Linux.

Rather frighteningly, FreeDOS defaults to suggesting it install FreeDOS on your primarily drive - instead, just exit to DOS.

Now you can run O3020A20.exe - the prompt supports tab completion:

C:\> O3020A20.exe

The process went without a hitch and after removing the USB stick and rebooting I could confirm that the BIOS had been updated to A20:

$ sudo inxi -F | fgrep -A1 Machine
Machine:   Type: Desktop System: Dell product: OptiPlex 3020 v: 00 serial: 75NTP02 
           Mobo: Dell model: 0VHWTR v: A02 serial: /75NTP02/CN7016343407OO/ BIOS: Dell v: A20 date: 05/27/2019

Enabling XHCI HCD quirks

The BIOS update fixed something unrelated - an annoying issue where various panels in UI applications showed up filled with noise (rather than whatever flat color or image should have been there) after wake-up. But it didn't fix the spurious wake-up issue itself.

The Kernel quirks section of the Arch page describes enabling the XHCI_SPURIOUS_REBOOT and XHCI_SPURIOUS_WAKEUP quirks to potentially solve this problem:

$ sudo vim /etc/default/grub

Then I changed the line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

To:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash xhci_hcd.quirks=270336"

I.e. I added xhci_hcd.quirks=270336 as covered in the Arch page. Then:

$ sudo update-grub

And reboot.

Checking XHCI setup

Before making the above change, you can check the existing XHCI module parameters like this:

$ systool -v -m xhci_hcd
Module = "xhci_hcd"

  Attributes:
    uevent              = <store method only>

  Parameters:
    link_quirk          = "0"
    quirks              = "0"

Or simply like this:

$ cd /sys/module/xhci_hcd/parameters
$ cat quirks 
0
$ cat link_quirk 
0

After setting the xhci_hcd.quirks parameter, as shown above and then rebooting, you can see that the value has been set:

$ cat /sys/module/xhci_hcd/parameters/quirks
270336

There's no generic way for decoding such values, you have to look at the module code itself - in this case the defines for XHCI_SPURIOUS_REBOOT and XHCI_SPURIOUS_WAKEUP are in drivers/usb/host/xhci.h. There, you see:

#define XHCI_SPURIOUS_REBOOT	BIT_ULL(13)
...
#define XHCI_SPURIOUS_WAKEUP	BIT_ULL(18)

And if you convert 270336 to a binary number:

$ bc
obase=2
270336
1000010000000000000

Then you see that the 18th and 13th bit are set (if you consider the right-most bit as the 0th bit rather than the 1st bit).

More on the XHCI driver

While looking for how to query the xhci_hcd module parameters, I found lots of ways of querying related information. Note that all of the following information is unaffected by the xhci_hcd.quirks change made above. I.e. it was the same before and after this change.

1. Using lspci:

$ sudo lspci -v | fgrep -i -A4 xhci
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 04) (prog-if 30 [XHCI])
	Subsystem: Dell 8 Series/C220 Series Chipset Family USB xHCI
	Flags: bus master, medium devsel, latency 0, IRQ 27
	Memory at f7200000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [70] Power Management version 2
	Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
	Kernel driver in use: xhci_hcd

Note: sudo is needed here to retrieve the information for the Capabilities lines.

2. Using the /boot/config-* files:

$ grep -i xhci /boot/config-$(uname -r)
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_XHCI_DBGCAP=y
CONFIG_USB_XHCI_PCI=y
CONFIG_USB_XHCI_PLATFORM=m
CONFIG_USB_ROLES_INTEL_XHCI=m

3. Using lshw:

$ sudo lshw
...
*-usb:0
     description: USB controller
     product: 8 Series/C220 Series Chipset Family USB xHCI
     vendor: Intel Corporation
     physical id: 14
     bus info: pci@0000:00:14.0
     version: 04
     width: 64 bits
     clock: 33MHz
     capabilities: pm msi xhci bus_master cap_list
     configuration: driver=xhci_hcd latency=0
     resources: irq:27 memory:f7200000-f720ffff
   *-usbhost:0
        product: xHCI Host Controller
        vendor: Linux 5.4.0-56-generic xhci-hcd
        physical id: 0
        bus info: usb@3
        logical name: usb3
        version: 5.04
        capabilities: usb-2.00
        configuration: driver=hub slots=10 speed=480Mbit/s
   ...
   *-usbhost:1
        product: xHCI Host Controller
        vendor: Linux 5.4.0-56-generic xhci-hcd
        physical id: 1
        bus info: usb@4
        logical name: usb4
        version: 5.04
        capabilities: usb-3.00
        configuration: driver=hub slots=2 speed=5000Mbit/s
   ...

4. Using modinfo:

$ modinfo xhci_hcd
name:           xhci_hcd
filename:       (builtin)
license:        GPL
author:         Sarah Sharp
description:    'eXtensible' Host Controller (xHC) Driver
parm:           link_quirk:Don't clear the chain bit on a link TRB (int)
parm:           quirks:Bit flags for quirks to be enabled as default (ullong)

Next steps

If this still doesn't fix things, there's a section on the Arch WoL page describing issues with the Realtek 8168 NIC. And that's exactly the NIC I have:

$ sudo inxi -F | fgrep Network
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169

Note that my driver, though, is r8169 (and the Arch page refers to driver r8168).

It doesn't seem to be possible to query this module for its parameters (I tried the first two answers here). And if I go into /sys/module/r8169 there are no obvious parameters.

It may be possible to disable WoL with s5wol (they describe using it to enable WoL).

But given that WoL is disabled in BIOS, I think the spurious wakeups quirks settings are a more likely fix.

A further step might be getting NetworkManager to log at debug level as described here.

Disabling the ability of devices to wake the system

The WoL steps documented above and the quirks changes didn't resolve the issue. So next I tried disabling the ability of various devices to wake up the system (as described here on Hacker News).

You can list all the devices that can wake up the computer:

$ fgrep -w -e enabled -e Device /proc/acpi/wakeup
Device	S-state	  Status   Sysfs node
RP04      S4    *enabled   pci:0000:00:1c.3
EHC1      S3    *enabled   pci:0000:00:1d.0
EHC2      S3    *enabled   pci:0000:00:1a.0
XHC       S4    *enabled   pci:0000:00:14.0
PEG0      S4    *enabled   pci:0000:00:01.0
PWRB      S3    *enabled   platform:PNP0C0C:00

Oddly, there's no way to look up what these device names mean, they're determined by individual vendors. However, some names are used fairly consistently across vendors and luckily all of the above are well-known names. For a list of these well-known names see this SO answer.

In the above case, all the devices, except PWRB, are PCI devices - so we can work out what they are:

$ lspci -tv
-[0000:00]-+-00.0  Intel Corporation 4th Gen Core Processor DRAM Controller
           +-01.0-[01]--+-00.0  NVIDIA Corporation GK208B [GeForce GT 730]
           |            \-00.1  NVIDIA Corporation GK208 HDMI/DP Audio Controller
           +-14.0  Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI
           +-16.0  Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1
           +-1a.0  Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2
           +-1b.0  Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller
           +-1c.0-[02]--
           +-1c.3-[03]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           +-1d.0  Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1
           +-1f.0  Intel Corporation H81 Express LPC Controller
           +-1f.2  Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]
           \-1f.3  Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller

So from the SO answer and the data above we can determine:

  • RP0x means a PCI slot - in this case RP04 is the ethernet controller.
  • EHCx means USB 2.0 - here we can't determine anything more than that EHC1 and EHC2 are USB related.
  • XHC means USB 3.0 - similarly we can't work out much more.
  • PEGx means a PCIe for Graphics slot - in this case PEG0 is a Nvidia graphics card.

PWRB isn't a PCI device, but from the SO answer we can see that it's the power button.

You can find out some more information about USB devices with lsusb, e.g. above we can see that the the node for EHC2 is 0000:00:1a.0. We can grep for that like so:

$ sudo lsusb -v 2> /dev/null | fgrep --before-context=9 0000:00:1a.0
  bDeviceClass            9 Hub
  bDeviceSubClass         0
  bDeviceProtocol         0 Full speed (or root) hub
  bMaxPacketSize0        64
  idVendor           0x1d6b Linux Foundation
  idProduct          0x0002 2.0 root hub
  bcdDevice            5.08
  iManufacturer           3 Linux 5.8.0-43-generic ehci_hcd
  iProduct                2 EHCI Host Controller
  iSerial                 1 0000:00:1a.0

We can see that EHC2 is a USB hub.

Note: you don't really need to use sudo but in some situations it gives you slightly more information.

So the devices listed above for /proc/acpi/wakeup aren't individual devices like mice or keyboards. We can only disable e.g. USB devices at the hub level.

So let's try just disabling the USB devices, leaving the ethernet and graphics card alone - we can then just use the power button to wake things up.

$ sudo bash
# cd /usr/lib/systemd/system-sleep
# cat > wakefix << "EOF"
#!/bin/bash -e

for device in EHC1 EHC2 XHC
do
    echo $device > /proc/acpi/wakeup
done
EOF
# chmod a+x wakefix
# ./wakefix
# fgrep -w -e enabled -e Device /proc/acpi/wakeup
Device	S-state	  Status   Sysfs node
RP04	  S4	*enabled   pci:0000:00:1c.3
PEG0	  S4	*enabled   pci:0000:00:01.0
PWRB	  S3	*enabled   platform:PNP0C0C:00

So we sudo to root, go to /usr/lib/systemd/system-sleep, create a simple bash script called wakefix, run it and finally check that EHC1, EHC2 and XHC are no longer enabled.

Important: the double-quotes around "EOF" are actually imporant - they stop variable substitution, e.g. of $device, occurring when we create the script. See this SO answer.

This script is run by the systemd suspend service just before entering suspend state. I had to actually reboot my system for this to start happening - perhaps it would have been enough to just restart the suspend service. It wasn't enough to have simply created the script or to have run it manually. When the system wakes from sleep the relevant devices are reenabled, i.e. this script only disables them for the period that the system is suspended. See the suspend service man page for more information.

Now the system can only be woken by a short press to the power button.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment