Last active
September 26, 2023 20:45
-
-
Save rjurney/296c56cc0151121af8bac920a92b0ba3 to your computer and use it in GitHub Desktop.
Heracles Ubuntu 20.04 Logs - Machine that Shuts Down for No Reason
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ChatGPT says: | |
1) The NVIDIA module is having trouble retrieving NUMA (Non-Uniform Memory Access) node information for a specific device. NUMA is a method for setting up memory in multiprocessor systems to better match memory access times with processor cores. This might be related to the multi-GPU setup you mentioned. | |
2) The NVIDIA kernel module for your graphics cards is being loaded. The version is 530.30.02, and it was compiled on Wed Feb 22 04:11:39 UTC 2023. | |
3) This is a warning from the ACPI subsystem in the kernel, related to device configuration and power management. The specific warning pertains to an argument type mismatch for a given method. Such ACPI warnings can result from inconsistencies between the computer's BIOS/UEFI firmware and the OS's expectations. | |
4) Lastly, the nvidia-drm, nvidia_uvm, and other NVIDIA related entries refer to the NVIDIA graphics driver's components being initialized. The Direct Rendering Manager (drm) is used in graphics pipelines, while nvidia_uvm refers to NVIDIA's Unified Virtual Memory, a feature of their CUDA platform. Given these logs and the repeated ACPI errors (AE_ALREADY_EXISTS), I suspect there might be a slight incompatibility or configuration issue between your system's BIOS/UEFI firmware and the installed version of Ubuntu. This could be amplified by the presence of multiple identical GPUs. It might help to check for a BIOS/UEFI update, as such updates can sometimes improve compatibility and resolve ACPI-related issues. If the system is working fine, these could be benign warnings. If you're facing any actual issues, I'd suggest checking NVIDIA forums and Ubuntu forums with your GPU model and the specific errors for more tailored advice. | |
------------------------Logs-follow--------------------------- | |
[ 2.201430] kernel: EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none. | |
[ 2.201984] kernel: usb 1-11: new full-speed USB device number 6 using xhci_hcd | |
[ 2.202214] systemd[1]: Finished Remount Root and Kernel File Systems. | |
[ 2.202618] systemd[1]: Activating swap /swapfile... | |
[ 2.202831] systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped. | |
[ 2.203450] systemd[1]: Starting Load/Save Random Seed... | |
[ 2.203995] systemd[1]: Starting Create System Users... | |
[ 2.206252] kernel: nvidia_fs: loading out-of-tree module taints kernel. | |
[ 2.206284] kernel: nvidia_fs: module verification failed: signature and/or required key missing - tainting kernel | |
[ 2.206428] kernel: nvidia_fs: Initializing nvfs driver module | |
... | |
[ 2.206429] kernel: nvidia_fs: registered correctly with major number 236 | |
[ 2.206769] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:02:00.0 | |
[ 2.206772] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:03:00.0 | |
[ 2.206791] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:02:00.0 | |
[ 2.206806] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:06:00.0 | |
[ 2.206820] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:02:00.0 | |
[ 2.206829] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:07:00.0 | |
[ 2.206837] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:02:00.0 | |
[ 2.206846] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:09:00.0 | |
[ 2.206857] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:01:00.0 | |
[ 2.206866] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:03:00.0 | |
[ 2.206874] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:01:00.0 | |
[ 2.206883] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:06:00.0 | |
[ 2.206891] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:01:00.0 | |
[ 2.206900] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:07:00.0 | |
[ 2.206908] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:01:00.0 | |
[ 2.206917] kernel: nvidia-fs:warning: error retrieving numa node for device 0000:09:00.0 | |
... | |
[ 3.030228] kernel: thermal thermal_zone2: failed to read out thermal zone (-61) | |
[ 3.040338] kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 530.30.02 Wed Feb 22 04:11:39 UTC 2023 | |
[ 3.047667] kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 530.30.02 Wed Feb 22 03:45:40 UTC 2023 | |
[ 3.049037] kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver | |
[ 3.066900] kernel: ACPI Warning: \_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210730/nsarguments-61) | |
[ 3.082046] kernel: apple 0003:05AC:0220.0003: input,hidraw4: USB HID v1.11 Device [Apple, Inc Apple Keyboard] on usb-0000:00:14.0-7.2/input1 | |
[ 3.090402] kernel: Bluetooth: hci0: MSFT filter_enable is already on | |
[ 3.177673] kernel: iwlwifi 0000:08:00.0: loaded PNVM version 0x5a8dfca | |
[ 3.189916] kernel: iwlwifi 0000:08:00.0: Detected RF GF, rfid=0x10d000 | |
[ 3.242652] kernel: NET: Registered PF_ALG protocol family | |
[ 3.245306] kernel: intel_rapl_common: Found RAPL domain package | |
[ 3.245307] kernel: intel_rapl_common: Found RAPL domain core | |
[ 3.249626] kernel: aufs 5.15.5-20211129 | |
[ 3.259982] kernel: iwlwifi 0000:08:00.0: base HW address: 2c:0d:a7:de:22:0e | |
[ 3.265433] kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210730/dsfield-184) | |
[ 3.265451] kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210730/dswload2-477) | |
[ 3.265461] kernel: | |
[ 3.265462] kernel: No Local Variables are initialized for Method [_DSM] | |
[ 3.265462] kernel: | |
[ 3.265463] kernel: Initialized Arguments for Method [_DSM]: (4 arguments defined for method invocation) | |
[ 3.265463] kernel: Arg0: 00000000be1f3f11 <Obj> Buffer(16) 75 0B A5 D4 C7 65 F7 46 | |
[ 3.265467] kernel: Arg1: 00000000f2d76158 <Obj> Integer 0000000000000102 | |
[ 3.265469] kernel: Arg2: 000000003988fe13 <Obj> Integer 0000000000000010 | |
[ 3.265470] kernel: Arg3: 000000005055a5cc <Obj> Buffer(4) 00 10 52 44 | |
[ 3.265472] kernel: | |
[ 3.265473] kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210730/psparse-529) | |
[ 3.265617] kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210730/dsfield-184) | |
[ 3.265630] kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210730/dswload2-477) | |
[ 3.265639] kernel: | |
[ 3.265639] kernel: No Local Variables are initialized for Method [_DSM] | |
[ 3.265640] kernel: | |
[ 3.265640] kernel: Initialized Arguments for Method [_DSM]: (4 arguments defined for method invocation) | |
[ 3.265640] kernel: Arg0: 000000005055a5cc <Obj> Buffer(16) 75 0B A5 D4 C7 65 F7 46 | |
[ 3.265643] kernel: Arg1: 000000003988fe13 <Obj> Integer 0000000000000102 | |
[ 3.265644] kernel: Arg2: 00000000f2d76158 <Obj> Integer 0000000000000010 | |
[ 3.265646] kernel: Arg3: 00000000be1f3f11 <Obj> Buffer(4) 00 20 52 44 | |
[ 3.265648] kernel: | |
[ 3.265648] kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210730/psparse-529) | |
[ 3.265790] kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210730/dsfield-184) | |
[ 3.265802] kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210730/dswload2-477) | |
... lots of ... | |
[ 3.269891] kernel: No Local Variables are initialized for Method [_DSM] | |
[ 3.269891] kernel: | |
[ 3.269891] kernel: Initialized Arguments for Method [_DSM]: (4 arguments defined for method invocation) | |
[ 3.269891] kernel: Arg0: 000000005055a5cc <Obj> Buffer(16) 75 0B A5 D4 C7 65 F7 46 | |
[ 3.269894] kernel: Arg1: 000000003988fe13 <Obj> Integer 0000000000000102 | |
[ 3.269896] kernel: Arg2: 00000000f2d76158 <Obj> Integer 0000000000000010 | |
[ 3.269897] kernel: Arg3: 00000000be1f3f11 <Obj> Buffer(4) 00 80 52 44 | |
[ 3.269899] kernel: | |
[ 3.269899] kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210730/psparse-529) | |
[ 3.271233] kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210730/dsfield-184) | |
[ 3.272020] kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210730/dswload2-477) | |
[ 3.272424] kernel: | |
[ 3.272424] kernel: No Local Variables are initialized for Method [_DSM] | |
[ 3.272424] kernel: | |
[ 3.272425] kernel: Initialized Arguments for Method [_DSM]: (4 arguments defined for method invocation) | |
[ 3.272425] kernel: Arg0: 00000000be1f3f11 <Obj> Buffer(16) 75 0B A5 D4 C7 65 F7 46 | |
[ 3.272428] kernel: Arg1: 00000000f2d76158 <Obj> Integer 0000000000000102 | |
[ 3.272429] kernel: Arg2: 000000003988fe13 <Obj> Integer 0000000000000010 | |
[ 3.272430] kernel: Arg3: 000000005055a5cc <Obj> Buffer(4) 00 90 52 44 | |
[ 3.272432] kernel: | |
[ 3.272433] kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210730/psparse-529) | |
[ 3.273390] kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210730/dsfield-184) | |
[ 3.274433] kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210730/dswload2-477) | |
... ending in ... | |
[ 3.930684] kernel: No Local Variables are initialized for Method [_DSM] | |
[ 3.930684] kernel: | |
[ 3.930685] kernel: Initialized Arguments for Method [_DSM]: (4 arguments defined for method invocation) | |
[ 3.930686] kernel: Arg0: 0000000075b229ee <Obj> Buffer(16) 75 0B A5 D4 C7 65 F7 46 | |
[ 3.930689] kernel: Arg1: 00000000c31cd1cf <Obj> Integer 0000000000000102 | |
[ 3.930691] kernel: Arg2: 00000000e42d8972 <Obj> Integer 0000000000000010 | |
[ 3.930692] kernel: Arg3: 000000005055a5cc <Obj> Buffer(4) 00 00 50 4F | |
[ 3.930694] kernel: | |
[ 3.930695] kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210730/psparse-529) | |
[ 4.363840] kernel: NFSD: Using UMH upcall client tracking operations. | |
[ 4.363845] kernel: NFSD: starting 90-second grace period (net f0000000) | |
[ 4.836443] kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0 | |
[ 4.836781] kernel: [drm] [nvidia-drm] [GPU ID 0x00000200] Loading driver | |
[ 4.837468] kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:02:00.0 on minor 1 | |
[ 4.854094] kernel: nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint. | |
[ 4.856824] kernel: nvidia-uvm: Loaded the UVM driver, major device number 505. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
heracles Tue Sep 26 13:24:52 2023 530.30.02 | |
[0] NVIDIA GeForce RTX 3060 | 40'C, 0 % | 613 / 12288 MB | gdm(101M) rjurney(207M) rjurney(44M) rjurney(2M) | |
[1] NVIDIA GeForce RTX 3060 | 39'C, 0 % | 254 / 12288 MB | gdm(4M) rjurney(4M) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Note: ChatGPT says - The kernel log entries you provided are related to ACPI (Advanced Configuration and Power Interface) and the NVIDIA GPU driver on your system. | |
------------------------Logs-follow--------------------------- | |
Sep 25 11:22:10 localhost kernel: [ 3.924315] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210730/dsfield-184) | |
Sep 25 11:22:10 localhost kernel: [ 3.925958] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210730/dswload2-477) | |
Sep 25 11:22:10 localhost kernel: [ 3.927158] | |
Sep 25 11:22:10 localhost kernel: [ 3.927160] No Local Variables are initialized for Method [_DSM] | |
Sep 25 11:22:10 localhost kernel: [ 3.927160] | |
Sep 25 11:22:10 localhost kernel: [ 3.927161] Initialized Arguments for Method [_DSM]: (4 arguments defined for method invocation) | |
Sep 25 11:22:10 localhost kernel: [ 3.927162] Arg0: 000000005055a5cc <Obj> Buffer(16) 75 0B A5 D4 C7 65 F7 46 | |
Sep 25 11:22:10 localhost kernel: [ 3.927170] Arg1: 00000000e42d8972 <Obj> Integer 0000000000000102 | |
Sep 25 11:22:10 localhost kernel: [ 3.927173] Arg2: 00000000c31cd1cf <Obj> Integer 0000000000000010 | |
Sep 25 11:22:10 localhost kernel: [ 3.927176] Arg3: 0000000075b229ee <Obj> Buffer(4) 00 00 4B 56 | |
Sep 25 11:22:10 localhost kernel: [ 3.927180] | |
Sep 25 11:22:10 localhost kernel: [ 3.927182] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210730/psparse-529) | |
Sep 25 11:22:10 localhost kernel: [ 3.928723] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210730/dsfield-184) | |
Sep 25 11:22:10 localhost kernel: [ 3.929891] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210730/dswload2-477) | |
Sep 25 11:22:10 localhost kernel: [ 3.930683] | |
Sep 25 11:22:10 localhost kernel: [ 3.930684] No Local Variables are initialized for Method [_DSM] | |
Sep 25 11:22:10 localhost kernel: [ 3.930684] | |
Sep 25 11:22:10 localhost kernel: [ 3.930685] Initialized Arguments for Method [_DSM]: (4 arguments defined for method invocation) | |
Sep 25 11:22:10 localhost kernel: [ 3.930686] Arg0: 0000000075b229ee <Obj> Buffer(16) 75 0B A5 D4 C7 65 F7 46 | |
Sep 25 11:22:10 localhost kernel: [ 3.930689] Arg1: 00000000c31cd1cf <Obj> Integer 0000000000000102 | |
Sep 25 11:22:10 localhost kernel: [ 3.930691] Arg2: 00000000e42d8972 <Obj> Integer 0000000000000010 | |
Sep 25 11:22:10 localhost kernel: [ 3.930692] Arg3: 000000005055a5cc <Obj> Buffer(4) 00 00 50 4F | |
Sep 25 11:22:10 localhost kernel: [ 3.930694] | |
Sep 25 11:22:10 localhost kernel: [ 3.930695] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210730/psparse-529) | |
Sep 25 11:22:11 localhost kernel: [ 4.363840] NFSD: Using UMH upcall client tracking operations. | |
Sep 25 11:22:11 localhost kernel: [ 4.363845] NFSD: starting 90-second grace period (net f0000000) | |
Sep 25 11:22:11 localhost kernel: [ 4.836443] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0 | |
Sep 25 11:22:11 localhost kernel: [ 4.836781] [drm] [nvidia-drm] [GPU ID 0x00000200] Loading driver | |
Sep 25 11:22:11 localhost kernel: [ 4.837468] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:02:00.0 on minor 1 | |
Sep 25 11:22:11 localhost kernel: [ 4.854094] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint. | |
Sep 25 11:22:11 localhost kernel: [ 4.856824] nvidia-uvm: Loaded the UVM driver, major device number 505. | |
Sep 25 11:22:17 localhost kernel: [ 10.854179] atlantic 0000:07:00.0 eno1: atlantic: link change old 0 new 1000 | |
Sep 25 11:22:17 localhost kernel: [ 10.854376] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready | |
Sep 25 11:22:19 localhost kernel: [ 12.336963] TCP: eno1: Driver has suspect GRO implementation, TCP performance may be compromised. | |
Sep 25 11:22:22 localhost kernel: [ 15.411945] bpfilter: Loaded bpfilter_umh pid 1352 | |
Sep 25 11:22:22 localhost kernel: [ 15.646997] kauditd_printk_skb: 44 callbacks suppressed | |
Sep 25 11:22:22 localhost kernel: [ 15.647004] audit: type=1400 audit(1695666142.455:55): apparmor="STATUS" operation="profile_load" profile="unconfined" name="docker-default" pid=1396 comm="apparmor_parser" | |
Sep 25 11:22:23 localhost kernel: [ 16.452563] atlantic 0000:07:00.0 eno1: atlantic: link change old 1000 new 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nvidia-installer log file '/var/log/nvidia-installer.log' | |
creation time: Fri Aug 18 12:49:41 2023 | |
installer version: 535.98 | |
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin | |
nvidia-installer command line: | |
./nvidia-installer | |
Using: nvidia-installer ncurses v6 user interface | |
-> Detected 24 CPUs online; setting concurrency level to 24. | |
ERROR: An NVIDIA kernel module 'nvidia-modeset' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occurred that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer. | |
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Tue Sep 26 12:04:24 2023 | |
+---------------------------------------------------------------------------------------+ | |
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 | | |
|-----------------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | |
| | | MIG M. | | |
|=========================================+======================+======================| | |
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 Off | N/A | | |
| 0% 39C P8 14W / 170W| 367MiB / 12288MiB | 0% Default | | |
| | | N/A | | |
+-----------------------------------------+----------------------+----------------------+ | |
| 1 NVIDIA GeForce RTX 3060 On | 00000000:02:00.0 Off | N/A | | |
| 0% 38C P8 14W / 170W| 10MiB / 12288MiB | 0% Default | | |
| | | N/A | | |
+-----------------------------------------+----------------------+----------------------+ | |
+---------------------------------------------------------------------------------------+ | |
| Processes: | | |
| GPU GI CI PID Type Process name GPU Memory | | |
| ID ID Usage | | |
|=======================================================================================| | |
| 0 N/A N/A 1482 G /usr/lib/xorg/Xorg 101MiB | | |
| 0 N/A N/A 5615 G /usr/lib/xorg/Xorg 207MiB | | |
| 0 N/A N/A 5745 G /usr/bin/gnome-shell 44MiB | | |
| 0 N/A N/A 6065 G /usr/libexec/gnome-initial-setup 2MiB | | |
| 1 N/A N/A 1482 G /usr/lib/xorg/Xorg 4MiB | | |
| 1 N/A N/A 5615 G /usr/lib/xorg/Xorg 4MiB | | |
+---------------------------------------------------------------------------------------+ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment