Skip to content

Instantly share code, notes, and snippets.

@eduncan911

eduncan911/README.md

Last active Mar 19, 2021
Embed
What would you like to do?
Fixing Thermal Throttling on Thinkpad P1 and X1 Extreme - Linux Edition

Fixing Thermal Throttling on Thinkpad P1 and X1 Extreme - Linux Edition

Lenovo messed up with the X1E and P1 Gen 1 versions (and maybe later generations) in that the system boots with a thermal limit (aka Tjunction or tjmax) set to 82C (some report 80C). What this means is that regardless of power draw or under-volting settings, when your CPU hits 82C, it will drop the frequency down to the "Configurable TDP-down" frequency, or even lower. It will also may limits the system power draw.

Thermal Paste and Stress Testing

First, note that I have already replaced the thermal paste on my P1's CPU and GPU with Noctua NT-H2 thermal compound. This immediately made a very noticable difference in idle temps and placing the laptop on my lap stayed cool. Also, the keyboard no longer got hot to the touch.

For stress testing under Linux, I used the s-tui application to dig into the details for all testing below.

How to Fix It

The fix is really two steps:

  • Set the Tjunction higher, say, -3 under your CPU's rated Tjunction value.
  • Undervolt the CPU, Cache, Uncore, and iGPU to maximize your performance.

Windows has a "driver" fix

Lenovo released a software update that effectively sets the Tjunction back up to 97C. However, this is only for Windows, and there are many posts of where Hyper-V negates the setting. I am not sure, but perhaps Lenovo has fixed this with newer drivers since others reported it back in Q1 2018.

Linux instructions

For Linux, we are left to fend for ourselves. Therefore, here's how to verify your system is affected, and how to fix it.

Verify your system is affected

Two different ways to do this.

Use msr-tools

You can install the msr-tools utility.

sudo apt install msr-tools
sudo modprobe msr 

Then, read the field and convert it to a digit:

$ sudo rdmsr --bitfield 23:16 -d 0x00001a2
18

This means your system is set to -18C under your Tjunction max, which for my Xeon E-2176M is 100C. So, that would be 100 - 18, which is 82C max.

Use the undervolt utility

Current install instructions are on the github:

https://github.com/georgewhewell/undervolt

But in short, install it via pip under root (I know, anti-Python, but this needs root to access the DMA).

sudo pip install undervolt

Now, you can read the Tjunction directly (called temperature target):

$ sudo undervolt --read
temperature target: -18 (82C)
core: 0.0 mV
gpu: 0.0 mV
cache: 0.0 mV
uncore: 0.0 mV
analogio: 0.0 mV
powerlimit: 78.0W (short: 0.00244140625s - enabled) / 45.0W (long: 96.0s - enabled)

As you can see, mine is set to 82C.

1. Set Tjunction to proper setting

Go lookup your CPU on Intel's Ark site and find its Tjunction value. My E-2176M has a max of 100C. You do NOT want to hit this 100C, ever! So we are going to set it to 97C instead, to leave a little headroom as sometime CPU temps spike 1C or 2C higher than your target temp while waiting on fans to ramp up. If you do hit your Tjunction max, your system will shut down out of safety.

Armed with target temp, mine being 97C, we can use the undervolt utility listed under the Verify section above.

sudo undervolt --temp 97

We can check it now:

$ sudo undervolt --read
temperature target: -3 (97C)
core: 0.0 mV
gpu: 0.0 mV
cache: 0.0 mV
uncore: 0.0 mV
analogio: 0.0 mV
powerlimit: 78.0W (short: 0.00244140625s - enabled) / 45.0W (long: 96.0s - enabled)

2. Undervolting

Now that my CPU ramps up to 97C, I went from 2700Mhz to 3400Mhz across all cores! However, this is still a far cry from its rated 4.4Ghz turbo setting. And, it only lasts about 10 seconds before it throttles pretty quickly down to 1500Mhz, and back up to 3400Mhz again. The reason is that our CPU is running at full voltage, which is hot. Intel processors run with more voltage than they need to account for unstable/inaccurate system voltage regulation.

To address this, I used undervolt to find a safe setting for undervolting. Here are my settings I found to be stable for the E-2176M:

sudo undervolt --temp 97 --core -150 --cache -150 --gpu -100 --uncore -100

And checking it's all set correctly:

$ sudo undervolt --read
temperature target: -3 (97C)
core: -150.39 mV
gpu: -99.61 mV
cache: -150.39 mV
uncore: -99.61 mV
analogio: 0.0 mV
powerlimit: 78.0W (short: 0.00244140625s - enabled) / 45.0W (long: 96.0s - enabled)

With these settings, I am connected to two Thunderbolt 3 docking stations, 3 1080p monitors, 5 USB external accessories, Brave browser open with about 29 tabs, and a couple of terminals on Pop_OS.

I ran s-tui stress test for about 3 hours straight, while using the Brave browser and watching youtube and various surfing. Zero issues.

All cores now hover around 3900Mhz to 4000Mhz, much closer to that Turbo of 4.4Gh and 35W of usage. It would still drop after a minute or two, but it only drops to 2200 or 2400Mhz now which is much better for the low before.

Your mileage may vary. Adjust the voltages 20mV at a time.

Persist it all across reboots

You'll want to read up on Undervolt's github site for how to persist it with systemd service. While I do use it, and my undervolting remains, my max temp isn't sticking yet across all reboots. It's a hit or miss, more likely a race condition with another service on startup. I'll setup the timer as described in the Undervolt instructions later.

Enjoy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment