Skip to content

Instantly share code, notes, and snippets.

@baatochan
Last active April 8, 2024 10:38
Show Gist options
  • Save baatochan/251a2616aa732c3804fa00e68b89866e to your computer and use it in GitHub Desktop.
Save baatochan/251a2616aa732c3804fa00e68b89866e to your computer and use it in GitHub Desktop.
Living with a faulty RAM in 2023

Living with a faulty RAM in 2023

How to blacklist particular defective RAM areas on a machine with Linux?

Why do I even write this gist?

Well, the first and main reason which you probably guessed is because recently my laptop turned out to have a defective RAM and I had to do something with it as it became unstable to work on. The second reason is because I didn't even know that there is another way than just replacing the RAM (for the record if you can just simply replace the RAM stick then I don't recommend using a PC with a faulty RAM; however in my case the laptop has a soldered-in RAM which makes replacing the RAM impossible for me and pretty expensive in a repair shop). The third reason is the fact that even tho I learned from a friend what I should use to "fix" my issue I couldn't really find any reliable info how to do it now because in 2023 no one does it anymore (it was vastly more popular 20 years ago when RAM was expensive).

Disclaimer

Even tho I try to gather as much correct information as possible it is likely that some of it might not be 100% correct. Not everything that I describe here was done by me and some of is based only on the info I could find in the Internet. Additionally I linked at the end the pages which I based on when trying to achieve that on my machine - there might be more info which I forgot to describe here. If you spot that something is missing or is not 100% true, correct me in the comments. If you know any other/better methods feel free to put them in the comments as well.

How to start with the topic?

If you read that manual I can assume you probably use some kind of Linux and probably know that your RAM is faulty. However I will start with the most basic thing here - testing for faulty RAM. The faulty RAM can result in your system being unstable. In my case it was visible in Chrome, because every once in a while some tabs (generally it was just one tab) would crash with the error code related to memory issue. However faulty RAM can cause other issues such as random app crashes, system crashes and many more. If you encounter any weird behavior of your PC then it's a good idea to test your RAM. To test your RAM (especially if you use Linux) I suggest using Memtest86+ which is probably already installed in your Linux or if not, you can do it easily. When it is installed there should be a dedicated GRUB entry to boot into Memtest86+. After booting up the RAM test automatically starts and it takes about 2h to complete (it depends on your RAM size and speeds, but for modern 16GB it's around 2h). The default view in Memtest86+ shows every RAM error, so if you see any red line, then your RAM is most probably broken. The test is done when all individual tests pass (you can see the progress for the individual test as well as the whole pass).

How to fix your faulty RAM?

Okay, so I guess I could kinda scare you off with my long introduction, but in reality blacklisting RAM area is very easy (if you know how, that is. I lost too much time with that stuff during my first attempt). So I assume you used Memtest86+ and it showed you some errors with the specific RAM address that is failing. The solution I will use here is BadRAM which was a standalone Linux kernel module, but now is a part of the Linux kernel itself for some time. The solution presented here also assumes that you use GRUB (if not, you need to find the way on your own).

  1. Run Memtest86+ and at the beginning of the first pass switch an error reporting mode to BadRAM pattern
    Note: To switch the error reporting mode press c (configuration) -> 3 (Error Reporting Mode) -> (3) BadRAM Patterns. Make sure the numbers are the same in your version of Memtest86+. This will change the output of the Memtest86+ to already prepared BadRAM pattern which will be needed later.
  2. When one whole pass is finished write down (or use Google Lens to scan) the pattern
  3. If you can, boot into your Linux on this machine, then go to step 8
  4. If you can't prepare the Linux installation media on some USB drive using a different PC
  5. Locate the /boot/grub/grub.cfg file on the USB drive (!)
  6. Append the badram #PATTERN# at the end of the file, replacing #PATTERN# with the one you got from Memtest86+, eg. badram 0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc
  7. Boot your faulty PC with the USB drive
  8. chroot into your main installation
  9. Append the GRUB_BADRAM="#PATTERN#" at the end of the /etc/default/grub file, replacing #PATTERN# with the one you got from Memtest86+, eg. GRUB_BADRAM="0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc"
  10. Recreate GRUB for your main installation (eg. by running update-grub or grub-mkconfig -o /boot/grub/grub.cfg still being in the chroot env)
  11. Exit the chroot env and reboot the PC
  12. Run the Memtest86+ again (from your GRUB) and this time there should be a clear pass
    Note: If you don't have Memtest86+ in your GRUB and you used a custom boatable USB drive with Memtest86+ then it will still show the errors, only Memtest86+ run from the GRUB with BadRAM set up should have a clear pass.

Summary of the fix

Memtest86+ can return an already prepared BadRAM pattern. It looks something like this: 0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc. This pattern have to be inserted into the /boot/grub/grub.cfg (grub config file) as badram 0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc. However this file should never be manually edited as it is overwritten every time GRUB is updated. All the user specified config goes into the file /etc/default/grub which is used by GRUB generating tools. In that file the correct syntax for BadRAM is GRUB_BADRAM="0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc". So the correct way to apply the fix is to put it at the end of /etc/default/grub as GRUB_BADRAM="0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc" and recreate GRUB.

Sources

Another method - memmap kernel parameter

There is a second method for blacklisting specific RAM areas - memmap kernel parameter. If I understand correctly it's an older solution.

At first I wanted to use this method as I didn't understand BadRAM and I thought I needed to manually apply the BadRAM kernel module into my kernel (I've never compiled the Linux kernel and I didn't want to learn it). However the issue I encountered with mememap was that it doesn't work on UEFI installation. I might be wrong here, but this is what I found (I listed some post in the sources).

This kernel param uses a different pattern than BadRAM. One of the sources listed above explains how to convert a BadRAM pattern into the memmap one, so if you're interested go check it out.

In short it should work as follows. You take the memmap pattern, you put it as a kernel param in the /etc/default/grub file, recompile the GRUB and it should work.

Blacklisting RAM in Windows

I didn't try that one, but the first source I listed explains a potential solution on how to do it in Windows so it might be useful for some of you.

Summary

In summary this is a pretty easy solution, that can be applied in a few hours max (counting the Memtest86+ run). However it took me more than a week to find out how to do it (at some point I even tried to reinstall Linux with nonUEFI installation which happened to be the issue in itself). I know it mostly comes down to the fact that I couldn't connect the info I got, but because of that I wanted to create a short summary/tutorial which would explain everything in an accessible way.

@baatochan
Copy link
Author

baatochan commented Apr 8, 2024

Windows solution

Recently I had an experience of running Windows 10 on this device with a faulty RAM so I looked up how to set it up in Windows. It's kinda similar but you can't use it during Windows install so you may end up with corrupted Windows installation (no way around it really, other than installing Windows on a different PC and moving the drive).

Step by step

Setting it up is easy if you have installed Windows already.

Windows doesn't work with addresses and masks like Linux badram, here you have the ability to blacklist particular pages of memory (on most devices there are chunks of 4KB of RAM). To calculate the values required just take the address given by memtest86 and remove 3 last digits (4KB is 0x1000) -> 0x1a84c8b0c becomes 0x1a84c8. If you have more than one faulty address in one page it is enough to specify it once.

I just want to note that I got a little bit different faulty RAM address under Windows than I got under Manjaro Linux. For Windows I run memtest86+ via Rescuzilla CD, while under Manjaro I was running memtest86+ from my GRUB. Under Linux the faulty value was 0xa84c8b0c while under Windows I got 0x1a84c8b0c while 0xa84c8b0c address was not present at all. Don't really know the reason for that.

Then run the cmd as administrator and enter the following commands.

# Enable memory blacklisting
bcdedit /set {badmemory} badmemoryaccess no
# Specify what addresses to blacklist, replace "0x1a04c8 0x1a84c8" with your pages to blacklist,
# you may enter as many pages as you want, separate them with a space
bcdedit /set {badmemory} badmemorylist 0x1a04c8 0x1a84c8
# Verify blacklisted addresses
bcdedit /enum {badmemory}

Then reboot the device and run RAMMap tool to check if the RAM is properly blacklisted. You should be able to see it in the "Physical ranges" tab.

If you want to remove the blacklisted pages you may use the following command.

# Remove badmemorylist
bcdedit /deletevalue {badmemory} badmemorylist 

Issue with Windows 10 2004 and 20H2

Some users couldn't use badmemorylist on Windows 10 version 2004 and 20H2. I tried using it with Win10 version 22H2 (the last version of Win10) and it worked, so it seems that at least on the 22H2 the bug is no longer present.

Updating Windows

As Windows 10 22H2 is the latest version of Windows this issue is no longer that big of a deal, but when updating between Windows releases badmemorylist are lost and have to be set again. Moreover they are not used during the update process so you may end up with a corrupted Windows installation (the same issue as with installing). The same issue applies to Windows 11 if you want to use it on Windows 11 (my device doesn't support Win11 so I don't think I will be upgrading).

Sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment