Skip to content

Instantly share code, notes, and snippets.

@Luxcium
Created July 20, 2023 08:43
Show Gist options
  • Save Luxcium/85a5076ebd9e8a9e53181e523b9cb515 to your computer and use it in GitHub Desktop.
Save Luxcium/85a5076ebd9e8a9e53181e523b9cb515 to your computer and use it in GitHub Desktop.
Expanded NVMe / PCIe / EXT4 Fedora Linux Drive Management Cheat Sheet

Expanded NVMe / PCIe / EXT4 Fedora Linux Drive Management Cheat Sheet

There are several tools in Linux to scan disks for errors, though the specific methods and tools will depend on the type of storage medium (SSD, HDD, etc.), the file system in use, and the nature of the errors you're expecting to encounter. Here's a couple of general methods to consider

ChatGPT Recommendations

To maximizing hardware resilience specifically on a 1TB NVMe drive over a PCIe interface, here are some specific recommendations and best practices

Partitioning and Filesystem Optimization

While the earlier steps provided a baseline setup, we can go further in optimizing the partitions and filesystems for performance and durability.

Command Explanation
sudo parted -a optimal /dev/nvme1n1 mkpart primary ext4 1MiB 100% Leave 1MiB of free space at the beginning of the drive for optimal alignment.
sudo tune2fs -o journal_data_writeback /dev/nvme1n1p1 Enable writeback mode for the ext4 journal to improve performance.
sudo tune2fs -E stride=128,stripe-width=128 /dev/nvme1n1p1 Tune the filesystem for a RAID configuration or an SSD with a specific erase block size.

Advanced Monitoring and Diagnostics

For advanced monitoring and diagnostics, we can use additional tools and nvme-cli commands.

Command Explanation
sudo nvme error-log /dev/nvme1n1 Retrieve the error log for the drive.
sudo nvme fw-log /dev/nvme1n1 Retrieve the firmware log for the drive.
sudo nvme telemetry-log /dev/nvme1n1 Retrieve the host-initiated telemetry log for the drive.
sudo nvme show-regs /dev/nvme1n1 Display the drive's registers.
sudo nvme id-ns /dev/nvme1n1 --namespace-id=1 Identify the first namespace on the drive.
sudo nvme id-ctrl /dev/nvme1n1 Identify the controller of the drive.

Performance Tuning

Several system settings can impact NVMe performance, including I/O scheduler choice, IRQ affinities, and power management settings.

Command Explanation
`echo mq-deadline sudo tee /sys/block/nvme1n1/queue/scheduler`
`echo 1 sudo tee /sys/class/block/nvme1n1/device/queue_depth`
`echo 0 sudo tee /sys/module/nvme_core/parameters/use_threaded_interrupts`
cat /proc/interrupts Check the CPU interrupt assignments.
`echo 1 sudo tee /sys/bus/pci/drivers/nvme/new_id`
sudo cpupower frequency-set -g performance Set the CPU governor to performance to improve NVMe performance.

Handling Drive Errors

When dealing with drive errors, the right approach depends on the nature of the error.

Issue Solution
Read errors Use badblocks -svn /dev/nvme1n1 to identify and isolate bad blocks.
Write errors Check the drive's health with smartctl or nvme smart-log, and consider replacing it if necessary.
Controller errors Try resetting the controller with nvme reset /dev/nvme1n1 or a system reboot.
Firmware issues Update the drive's firmware using vendor-specific tools.

Please note, the above suggestions are provided as general guidance and may not work for all specific cases. Always test changes in a controlled environment before deploying them in a production environment. Many of the more advanced tuning options should be used with caution, as they can have negative side effects if used improperly. Make sure you understand what each setting does before changing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment