Skip to content

Instantly share code, notes, and snippets.

@dlangille
Last active September 2, 2017 18:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dlangille/88eac25349577aaca22a401ac08e9d1b to your computer and use it in GitHub Desktop.
Save dlangille/88eac25349577aaca22a401ac08e9d1b to your computer and use it in GitHub Desktop.
CAM status: SCSI Status Error
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 32 84 c2 f8 00 00 c0 00 length 98304 SMID 130 terminated ioc 804b scsi 0 state c xfer 81920
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 32 84 c3 b8 00 01 00 00 length 131072 SMID 852 terminated ioc 804b scsi 0 state c xfer 0
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 32 84 c2 f8 00 00 c0 00
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): CAM status: SCSI Status Error
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): SCSI status: Check Condition
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Aug 25 06:10:32 knew kernel: (da18:mps2:0:11:0): Retrying command (per sense data)
Aug 25 06:10:33 knew kernel: (da18:mps2:0:11:0): READ(10). CDB: 28 00 32 86 d0 c8 00 00 18 00
Aug 25 06:10:33 knew kernel: (da18:mps2:0:11:0): CAM status: SCSI Status Error
Aug 25 06:10:33 knew kernel: (da18:mps2:0:11:0): SCSI status: Check Condition
Aug 25 06:10:33 knew kernel: (da18:mps2:0:11:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Aug 25 06:10:33 knew kernel: (da18:mps2:0:11:0): Retrying command (per sense data)
[dan@knew:~] $ sudo smartctl -a /dev/da18
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-RELEASE-p20 amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Toshiba 3.5" MD04ACA... Enterprise HDD
Device Model: TOSHIBA MD04ACA500
Serial Number: 653IK1IBFS9A
LU WWN Device Id: 5 000039 65bf80144
Firmware Version: FP2A
User Capacity: 5,000,981,078,016 bytes [5.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Aug 25 12:23:51 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 542) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 529
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 53
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 063 063 000 Old_age Always - 15169
10 Spin_Retry_Count 0x0033 101 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 53
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 5
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 44
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 694
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 43 (Min/Max 18/50)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 253 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 0
222 Loaded_Hours 0x0032 063 063 000 Old_age Always - 15000
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 204
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 15169 -
# 2 Extended offline Completed without error 00% 9 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[dan@knew:~] $
Sep 2 11:28:17 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 f3 65 a5 00 00 01 00 00 length 131072 SMID 207 terminated ioc 804b scsi 0 state c xfer 16384
Sep 2 11:28:17 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 f3 65 a6 00 00 01 00 00 length 131072 SMID 628 terminated ioc 804b scsi 0 state c xfer 0
Sep 2 11:28:17 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 f3 65 a5 00 00 01 00 00
Sep 2 11:28:17 knew kernel: (da19:mps2:0:12:0): CAM status: SCSI Status Error
Sep 2 11:28:17 knew kernel: (da19:mps2:0:12:0): SCSI status: Check Condition
Sep 2 11:28:17 knew kernel: (da19:mps2:0:12:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep 2 11:28:17 knew kernel: (da19:mps2:0:12:0): Retrying command (per sense data)
Sep 2 11:28:18 knew kernel: (da19:mps2:0:12:0): READ(10). CDB: 28 00 f3 66 7e c8 00 00 20 00
Sep 2 11:28:18 knew kernel: (da19:mps2:0:12:0): CAM status: SCSI Status Error
Sep 2 11:28:18 knew kernel: (da19:mps2:0:12:0): SCSI status: Check Condition
Sep 2 11:28:18 knew kernel: (da19:mps2:0:12:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Sep 2 11:28:18 knew kernel: (da19:mps2:0:12:0): Retrying command (per sense data)
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-RELEASE-p20 amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Toshiba 3.5" MD04ACA... Enterprise HDD
Device Model: TOSHIBA MD04ACA500
Serial Number: 653BK12FFS9A
LU WWN Device Id: 5 000039 65bc00179
Firmware Version: FP2A
User Capacity: 5,000,981,078,016 bytes [5.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Sep 2 18:30:44 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 535) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 546
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 56
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 062 062 000 Old_age Always - 15367
10 Spin_Retry_Count 0x0033 101 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 56
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 269
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 47
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 663
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 40 (Min/Max 18/51)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 253 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 0
222 Loaded_Hours 0x0032 063 063 000 Old_age Always - 15199
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 206
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 15181 -
# 2 Short offline Completed without error 00% 15169 -
# 3 Extended offline Completed without error 00% 9 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
@dlangille
Copy link
Author

dlangille commented Aug 25, 2017

Perhaps relevant: a zfs scrub started at about 3:30 today. The drives in question are part of this array.

[dan@knew:~] $ zpool status tank_data
  pool: tank_data
 state: ONLINE
  scan: scrub in progress since Fri Aug 25 03:32:30 2017
        23.9T scanned out of 34.5T at 777M/s, 3h57m to go
        0 repaired, 69.37% done
config:

	NAME        STATE     READ WRITE CKSUM
	tank_data   ONLINE       0     0     0
	  raidz3-0  ONLINE       0     0     0
	    da11p1  ONLINE       0     0     0
	    da12p1  ONLINE       0     0     0
	    da8p1   ONLINE       0     0     0
	    da13p1  ONLINE       0     0     0
	    da14p1  ONLINE       0     0     0
	    da15p1  ONLINE       0     0     0
	    da16p1  ONLINE       0     0     0
	    da17p1  ONLINE       0     0     0
	    da18p1  ONLINE       0     0     0
	    da19p1  ONLINE       0     0     0

errors: No known data errors

@dlangille
Copy link
Author

Also, there was a resilver in progress on another pool, but the drives in question are not part of the array which appears below:

[dan@knew:~] $ zpool status system
  pool: system
 state: ONLINE
  scan: resilvered 1.68T in 20h5m with 0 errors on Fri Aug 25 09:20:04 2017
config:

	NAME        STATE     READ WRITE CKSUM
	system      ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    da7p3   ONLINE       0     0     0
	    ada2p3  ONLINE       0     0     0
	    da6p3   ONLINE       0     0     0
	    da1p3   ONLINE       0     0     0
	    da2p3   ONLINE       0     0     0
	    da0p3   ONLINE       0     0     0
	    da9p3   ONLINE       0     0     0
	    da5p3   ONLINE       0     0     0
	    da10p3  ONLINE       0     0     0
	    da4p3   ONLINE       0     0     0
	logs
	  mirror-1  ONLINE       0     0     0
	    ada1p1  ONLINE       0     0     0
	    ada0p1  ONLINE       0     0     0

errors: No known data errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment