Fixing Thermal Throttling on Thinkpad P1 and X1 Extreme - Linux Edition
Lenovo messed up with the X1E and P1 Gen 1 versions (and maybe later generations) in that the system boots with a thermal limit (aka Tjunction or tjmax) set to 82C (some report 80C). What this means is that regardless of power draw or under-volting settings, when your CPU hits 82C, it will drop the frequency down to the "Configurable TDP-down" frequency, or even lower. It will also may limits the system power draw.
Thermal Paste and Stress Testing
First, note that I have already replaced the thermal paste on my P1's CPU and GPU with Noctua NT-H2 thermal compound. This immediately made a very noticable difference in idle temps and placing the laptop on my lap stayed cool. Also, the keyboard no longer got hot to the touch.
For stress testing under Linux, I used the
s-tui application to dig into the
details for all testing below.
How to Fix It
The fix is really two steps:
- Set the Tjunction higher, say, -3 under your CPU's rated Tjunction value.
- Undervolt the CPU, Cache, Uncore, and iGPU to maximize your performance.
Windows has a "driver" fix
Lenovo released a software update that effectively sets the Tjunction back up to 97C. However, this is only for Windows, and there are many posts of where Hyper-V negates the setting. I am not sure, but perhaps Lenovo has fixed this with newer drivers since others reported it back in Q1 2018.
For Linux, we are left to fend for ourselves. Therefore, here's how to verify your system is affected, and how to fix it.
Verify your system is affected
Two different ways to do this.
You can install the
sudo apt install msr-tools sudo modprobe msr
Then, read the field and convert it to a digit:
$ sudo rdmsr --bitfield 23:16 -d 0x00001a2 18
This means your system is set to -18C under your Tjunction max, which for my Xeon E-2176M is 100C. So, that would be 100 - 18, which is 82C max.
Use the undervolt utility
Current install instructions are on the github:
But in short, install it via pip under root (I know, anti-Python, but this needs root to access the DMA).
sudo pip install undervolt
Now, you can read the Tjunction directly (called temperature target):
$ sudo undervolt --read temperature target: -18 (82C) core: 0.0 mV gpu: 0.0 mV cache: 0.0 mV uncore: 0.0 mV analogio: 0.0 mV powerlimit: 78.0W (short: 0.00244140625s - enabled) / 45.0W (long: 96.0s - enabled)
As you can see, mine is set to 82C.
1. Set Tjunction to proper setting
Go lookup your CPU on Intel's Ark site and find its Tjunction value. My E-2176M has a max of 100C. You do NOT want to hit this 100C, ever! So we are going to set it to 97C instead, to leave a little headroom as sometime CPU temps spike 1C or 2C higher than your target temp while waiting on fans to ramp up. If you do hit your Tjunction max, your system will shut down out of safety.
Armed with target temp, mine being 97C, we can use the
listed under the Verify section above.
sudo undervolt --temp 97
We can check it now:
$ sudo undervolt --read temperature target: -3 (97C) core: 0.0 mV gpu: 0.0 mV cache: 0.0 mV uncore: 0.0 mV analogio: 0.0 mV powerlimit: 78.0W (short: 0.00244140625s - enabled) / 45.0W (long: 96.0s - enabled)
Now that my CPU ramps up to 97C, I went from 2700Mhz to 3400Mhz across all cores! However, this is still a far cry from its rated 4.4Ghz turbo setting. And, it only lasts about 10 seconds before it throttles pretty quickly down to 1500Mhz, and back up to 3400Mhz again. The reason is that our CPU is running at full voltage, which is hot. Intel processors run with more voltage than they need to account for unstable/inaccurate system voltage regulation.
To address this, I used
undervolt to find a safe setting for undervolting.
Here are my settings I found to be stable for the E-2176M:
sudo undervolt --temp 97 --core -150 --cache -150 --gpu -100 --uncore -100
And checking it's all set correctly:
$ sudo undervolt --read temperature target: -3 (97C) core: -150.39 mV gpu: -99.61 mV cache: -150.39 mV uncore: -99.61 mV analogio: 0.0 mV powerlimit: 78.0W (short: 0.00244140625s - enabled) / 45.0W (long: 96.0s - enabled)
With these settings, I am connected to two Thunderbolt 3 docking stations, 3 1080p monitors, 5 USB external accessories, Brave browser open with about 29 tabs, and a couple of terminals on Pop_OS.
s-tui stress test for about 3 hours straight, while using the Brave browser
and watching youtube and various surfing. Zero issues.
All cores now hover around 3900Mhz to 4000Mhz, much closer to that Turbo of 4.4Gh and 35W of usage. It would still drop after a minute or two, but it only drops to 2200 or 2400Mhz now which is much better for the low before.
Your mileage may vary. Adjust the voltages 20mV at a time.
Persist it all across reboots
You'll want to read up on Undervolt's github site for how to persist it with
systemd service. While I do use it, and my undervolting remains, my max
temp isn't sticking yet across all reboots. It's a hit or miss, more likely
a race condition with another service on startup. I'll setup the timer as
described in the Undervolt instructions later.