Skip to content

Instantly share code, notes, and snippets.

@wmealing
Last active March 29, 2024 22:51
Show Gist options
  • Save wmealing/2dd2b543c4d3cff6cab7 to your computer and use it in GitHub Desktop.
Save wmealing/2dd2b543c4d3cff6cab7 to your computer and use it in GitHub Desktop.
What are CPU "C-states" and how to disable them if needed?

To limit a CPU to a certain C-state, you can pass the processor.max_cstate=X option in the kernel line of /boot/grub/grub.conf.

Here we limit the system to only C-State 1:

    kernel /vmlinuz-2.6.18-371.1.2.el5 ... processor.max_cstate=1

On some systems, the kernel can override the BIOS setting, and the parameter intel_idle.max_cstate=0 may be required to ensure sleep states are not entered:

	kernel /vmlinuz-2.6.32-431.el6.x86_64 ... processor.max_cstate=1 intel_idle.max_cstate=0

You can confirm the maximum allowed CPU C-State with:

# cat /sys/module/intel_idle/parameters/max_cstate
0

In order to save energy when the CPU is idle, the CPU can be commanded to enter a low-power mode. Each CPU has several power modes and they are collectively called “C-states” or “C-modes.”.

The lower-power mode was first introduced with the 486DX4 processor. To the present, more power modes has been introduced and enhancements has been made to each mode for the CPU to consume less power in these low-power modes. The idea of these modes is to cut the clock signal and power from idle units inside the CPU. As many units you stop (by cutting the clock) as you reduce the voltage or even completely shut down to save energy. On the other hand, you have to take into account that more time is required for the CPU to “wake up” and be again 100% operational. These modes are known as C-states. They are usually starting in C0, which is the normal CPU operating mode, i.e., the CPU is 100% turned on. With increasing C number, the CPU sleep mode is deeper, i.e., more circuits and signals are turned off and more time the CPU will require to return to C0 mode, i.e., to wake-up. Each mode is also known by a name and several of them have sub-modes with different power saving – and thus wake-up time – levels.

mode Name What id does CPUs
C0 Operating State CPU fully turned on, currently executing instructions. All CPUs
C1 Operating State CPU fully turned on, awaiting instructions All CPUs
C1E Halt Stops CPU main internal clocks via software; bus interface unit and APIC are kept running at full speed 486DX4 and above
C1E Enhanced Halt Stops CPU main internal clocks via software and reduces CPU voltage; bus interface unit and APIC are kept running at full speed All socket 775 CPUs
C1E -- Stops all CPU internal clocks Turion 64, 65-nm Athlon X2 and Phenom CPUs
C2 Stop Grant Stops CPU main internal clocks via hardware; bus interface unit and APIC are kept running at full speed 486DX4 and above
C2 Stop Clock Stops CPU internal and external clocks via hardware Only 486DX4, Pentium, Pentium MMX, K5, K6, K6-2, K6-III
C2E Extended Stop Grant Stops CPU main internal clocks via hardware and reduces CPU voltage; bus interface unit and APIC are kept running at full speed Core 2 Duo and above (Intel only)
C3 Sleep Stops all CPU internal clocks Pentium II, Athlon and above, but not on Core 2 Duo E4000 and E6000
C3 Deep Sleep Stops all CPU internal and external clocks Pentium II and above, but not on Core 2 Duo E4000 and E6000; Turion 64
C3 AltVID Stops all CPU internal clocks and reduces CPU voltage AMD Turion 64
C4 Deeper Sleep Reduces CPU voltage Pentium M and above, but not on Core 2 Duo E4000 and E6000 series; AMD Turion 64
C4E/C5 Enhanced Deeper Sleep Reduces CPU voltage even more and turns off the memory cache Core Solo, Core Duo and 45-nm mobile Core 2 Duo only
C6 Deep Power Down Reduces the CPU internal voltage to any value, including 0 V 45-nm mobile Core 2 Duo only
@andrei-korshikov
Copy link

It appears that the intel_idle kernel module is also loaded on AMD CPUs (on the AMD Ryzen 5 2400G at least). The default setting on my system was 9 -- does power state Cn correspond directly to kernel module parameter n?

Kernel intel_idle module parameter n is a state number. Cn is a state name. They do correspond to each other, but state numbers are CPU-specific. So the same n means different Cns on different platforms. You can find out the correspondence between intel_idle driver state numbers and state names using, for example, the following command:

grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name

And here is the output on my laptop (with Intel Core i3-7100U CPU):

/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C1E
/sys/devices/system/cpu/cpu0/cpuidle/state3/name:C3
/sys/devices/system/cpu/cpu0/cpuidle/state4/name:C6
/sys/devices/system/cpu/cpu0/cpuidle/state5/name:C7s
/sys/devices/system/cpu/cpu0/cpuidle/state6/name:C8
/sys/devices/system/cpu/cpu0/cpuidle/state7/name:C9
/sys/devices/system/cpu/cpu0/cpuidle/state8/name:C10

So, as an example, on my platform intel_idle.max_cstate=4 means "Deepest allowed state is C6".

@andrei-korshikov
Copy link

Okay, thanks. So is there a reason you use two different values like this?
processor.max_cstate=1 intel_idle.max_cstate=0

There are two CPU idle drivers: intel_idle (the default one for nowadays) and acpi_idle (the old school).
With intel_idle driver C-state numbers are processor-specific (see my comment above), and the driver ignores BIOS C-states settings.
With acpi_idle driver C-states are defined by ACPI standard and exported by BIOS, and the driver follows BIOS settings.

intel_idle.max_cstate=0 means "Disable intel_idle driver and use acpi_idle instead". So processor.max_cstate=1 is a command to acpi_idle driver, but I'm not sure about its very meaning. I've read that processor.max_cstate=1 and processor.max_cstate=0 are equal, and both disable C-states completely when using acpi_idle driver. And idle=poll is usually added to kernel parameters when processor.max_cstate=1 is used.

@nospam2000
Copy link

nospam2000 commented Mar 11, 2023

What is the relationship to /dev/cpu_dma_latency? This one collects the requirements of multiple applications and perform the maximum C-state settings automatically to match the maximum allowed latency of all applications.
I was trying with some values, but the actual latency were much higher than the values written to cpu_dma_latency, so we always use cpu_dma_latency=0 now.

The intel tool i7z displays the current C-states but a quick check with /sys/bus/cpu/devices/cpu0/cpuidle/state1/disable=1 setting has shown not effect to the output of the tool. The values of /sys/bus/cpu/devices/cpu0/cpuidle/state1/time and /sys/bus/cpu/devices/cpu0/cpuidle/state1/usage are no longer changing when disable=1 so there seems to be an effect, but I would more trust the output of i7z.

@wmealing
Copy link
Author

the /dev/cpu_dma_latency was written well after this document was created. If you want to write something up that doesn't require a specific tool to query, i'd be willing to update this doc!

Thanks!

@nospam2000
Copy link

nospam2000 commented Mar 12, 2023

I can give some practical usage of cpu_dma_latency. It is a global setting of all CPU cores depending on the value you write the allowed states are different. It is practical to use when no CPU pinning is used. Your CPU cooling should be able to handle to run the CPU without power saving. In my short example you can see that the CPU temperature was already going up.

Starting point: /dev/cpu_dma_latency is not set by any application

# show the current value of /dev/cpu_dma_latency in hex
root:~# xxd < /dev/cpu_dma_latency -g4 -ps 
00943577

root:~# i7z
Cpu speed from cpuinfo 1833.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 1833 MHz
  CPU Multiplier 22x || Bus clock frequency (BCLK) 83.32 MHz

Socket [0] - [physical cores=4, logical cores=4, max online cores ever=4]
  TURBO ENABLED on 4 Cores, Hyper Threading OFF
  Max Frequency without considering Turbo 1916.32 MHz (83.32 x [23])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  0x/0x/0x/0x
  Real Current Frequency 1429.29 MHz [83.32 x 17.15] (Max of below)
        Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp      VCore
        Core 1 [0]:       1429.29 (17.15x)      4.47    1.26       0    95.2    28      3.7500
        Core 2 [1]:       961.85 (11.54x)       2.36    2.05       0    96.7    28      3.7500
        Core 3 [2]:       817.00 (9.81x)        2.21    1.55       0    97.5    31      3.7500
        Core 4 [3]:       906.77 (10.88x)       2.36    1.47       0    97.4    31      3.7500

C0 = Processor running without halting
C1 = Processor running with halts (States >C0 are power saver modes with cores idling)
C3 = Cores running with PLL turned off and core cache turned off
C6, C7 = Everything in C3 + core state saved to last level cache, C7 is deeper than C6
  Above values in table are in percentage over the last 1 sec
[core-id] refers to core-id number in /proc/cpuinfo

Setting /dev/cpu_dma_latency to 0 to completely disable C1..Cn and P1..Pn

# set /dev/cpu_dma_latency to 0 (file must be kept open; when closing the file the effect is reverted)
root:~# cat >/dev/cpu_dma_latency <(echo -e -n "\x0\x0\x0\x0" ; sleep inf) &
[1] 30919
# save the PID to use it later for closing /dev/cpu_dma_latency
root:~# PID1=$!

# show the current value of /dev/cpu_dma_latency in hex
root:~# xxd < /dev/cpu_dma_latency -g4 -ps
00000000

root:~# i7z
Cpu speed from cpuinfo 1833.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 1833 MHz
  CPU Multiplier 22x || Bus clock frequency (BCLK) 83.32 MHz

Socket [0] - [physical cores=4, logical cores=4, max online cores ever=4]
  TURBO ENABLED on 4 Cores, Hyper Threading OFF
  Max Frequency without considering Turbo 1916.32 MHz (83.32 x [23])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  0x/0x/0x/0x
  Real Current Frequency 2166.27 MHz [83.32 x 26.00] (Max of below)
        Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp      VCore
        Core 1 [0]:       2166.27 (26.00x)       100       0       0       0    32      3.7500
        Core 2 [1]:       2166.27 (26.00x)       100       0       0       0    32      3.7500
        Core 3 [2]:       2166.27 (26.00x)       100       0       0       0    33      3.7500
        Core 4 [3]:       2166.27 (26.00x)       100       0       0       0    33      3.7500

C0 = Processor running without halting
C1 = Processor running with halts (States >C0 are power saver modes with cores idling)
C3 = Cores running with PLL turned off and core cache turned off
C6, C7 = Everything in C3 + core state saved to last level cache, C7 is deeper than C6
  Above values in table are in percentage over the last 1 sec
[core-id] refers to core-id number in /proc/cpuinfo

# the values of C-State1 are not changing, that means C-State1 is not used
root:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/usage /sys/devices/system/cpu/cpu0/cpuidle/state1/time
18893627
17872365167
root:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/usage /sys/devices/system/cpu/cpu0/cpuidle/state1/time
18893627
17872365167
root:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/usage /sys/devices/system/cpu/cpu0/cpuidle/state1/time
18893627
17872365167

# kill the command which keeps /dev/cpu_dma_latency open, so that the state before the initial echo command was executed is restored
root:~# kill -s TERM -$PID1
[1]+  Terminated              cat <(echo -e -n "\x0\x0\x0\x0" ; sleep inf) > /dev/cpu_dma_latency

# show the current value of /dev/cpu_dma_latency in hex
root:~# xxd < /dev/cpu_dma_latency -g4 -ps
00943577

# the values of C-State1 are now changing again
root:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/usage /sys/devices/system/cpu/cpu0/cpuidle/state1/time
18893658
17872366108
root:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/usage /sys/devices/system/cpu/cpu0/cpuidle/state1/time
18893667
17872366481

Using a value of 256 to allow C1-state

# check the latency value of C1-state
root# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/latency
1
# check the latency value of C2-state
root# cat /sys/devices/system/cpu/cpu0/cpuidle/state2/latency
300
# use a value of 256 to be above C1 but below C2 state (just for testing, in reality your maximum allowed latency defines this value)
# the value 256=0x00000100 here is for a little endian machine
root:~# cat >/dev/cpu_dma_latency <(echo -e -n "\x0\x1\x0\x0" ; sleep inf) &
[1] 32433
root:~# PID1=$!
root:~# i7z
  Real Current Frequency 1557.26 MHz [83.32 x 18.69] (Max of below)
        Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp      VCore
        Core 1 [0]:       1066.57 (12.80x)      3.69    97.9       0       0    30      3.7500
        Core 2 [1]:       1557.26 (18.69x)       9.3    92.1       0       0    31      3.7500
        Core 3 [2]:       1401.23 (16.82x)      3.13    97.6       0       0    32      3.7500
        Core 4 [3]:       1231.34 (14.78x)      1.95    98.7       0       0    32      3.7500
root:~# kill -s TERM -$PID1

The information that i7z shows can also by extracted from sysfs by reading the following values from sys/bus/cpu/devices/cpu0/cpuidle

  • time : Total time spent in this idle state (in microseconds)
  • usage : Number of times this state was entered (count)

see also cpuidle/sysfs.txt

@h1ght
Copy link

h1ght commented Apr 14, 2023

i tried your command for the named idle states. looks like some are missing on my system. ive read that for some cpu's they are disabled.
/sys/devices/system/cpu/cpu1/cpuidle/state0/name:POLL /sys/devices/system/cpu/cpu1/cpuidle/state1/name:C1E /sys/devices/system/cpu/cpu1/cpuidle/state2/name:C6 /sys/devices/system/cpu/cpu1/cpuidle/state3/name:C8 /sys/devices/system/cpu/cpu1/cpuidle/state4/name:C10
my system doenst want to go further as c3. now ive do trial and error for bios settings. im a bit confused by this.

also rc6pp is disabled for all system what ive read.

@ksingh7
Copy link

ksingh7 commented May 14, 2023

So i have tried to install this deb file and next i am going to monitor my system and report back back. Hopefully this should fix this problem, fingers crossed.

@ksingh7
Copy link

ksingh7 commented May 14, 2023

And also tried this

cpupower idle-set --disable-by-latency 0
# make systemd run it at startup:
cat >/etc/systemd/system/disable_cpu_idle_states.service <<'EOT'
[Unit]
Description=Disable idle CPU states
After=cpufrequtils.service
[Service]
Type=oneshot
ExecStart=/usr/bin/cpupower idle-set --disable-by-latency 0
[Install]
WantedBy=multi-user.target
EOT
systemctl daemon-reload
systemctl enable disable_cpu_idle_states
# if everything went well, reboot

@its0ka
Copy link

its0ka commented Jan 17, 2024

on my system with intel core 2 T5500 only "cpupower idle-set --disable-by-latency 0" or echo 1 to "/sys/devices/system/cpu/cpu*/cpuidle/state*/disable" works. kernel parameters from the first post don't change which c-states are enabled.

@wujtt
Copy link

wujtt commented Jan 31, 2024

Hello, may I ask where the bottom table was obtained from? Do you have any relevant documents or references? Thank you very much!

@wmealing
Copy link
Author

wmealing commented Feb 2, 2024

I think it was the intel hardware reference manuals, and source code.. the doc was written a long.. long time ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment