Skip to content

Instantly share code, notes, and snippets.

@Hashbrown777
Last active June 22, 2023 17:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Hashbrown777/f3790ef4586ff17b68a2785f70a525e7 to your computer and use it in GitHub Desktop.
Save Hashbrown777/f3790ef4586ff17b68a2785f70a525e7 to your computer and use it in GitHub Desktop.
In files identified as containing corrupted blocks punch holes in the valid blocks so they don't take up as much space
$blockSize = 4KB
$buffer=[byte[]]::new($blockSize)
Get-ChildItem -File `
| Sort-Object -Property Length
| %{
"$($_.Name)`t$($_.Length / 1MB -bor 0)MB"
$path = $_.FullName
#useful if your files are patchable eg torrent-sourced
sudo cp --reflink=always $path "${path}_rescued"
chmod u+r $path
$file = [System.IO.File]::OpenRead($path)
chmod u-r $path
$current = $NULL
$read = $NULL
$occupied = 0
$bad = 0
while ($file.CanRead) {
$position = $file.Position
if ($position -and !($position % 256MB)) {
"`t$($position / 1MB)MB"
}
try {
$read = $file.Read($buffer, 0, $blockSize)
$current = $True
if ($read -lt $blockSize) {
$file.Close()
$current = $False
$position += $read
}
}
catch {
$current = $False
++$bad
$file.Seek($blockSize, 1) | Out-Null
sudo fallocate -p -o $position -l $blockSize "${path}_rescued"
}
if ($occupied -lt 0) {
if ($current) {
$occupied = $position
}
}
elseif (!$current) {
if ($current = $position - $occupied) {
"`tPunching $($current / 1KB -bor 0)KB"
sudo fallocate -p -o $occupied -l $current $path
}
$occupied = -1
}
}
"`t$bad bad ${blockSize}B blocks of $($_.Length / $blockSize -bor 0)`n"
}
#!/usr/bin/pwsh
Filter Output { Param($colour)
$num = ($_ -split '(?<=^[a-f0-9]+:\s)')[0]
$num | Write-Host -NoNewline
$_ = $_ -replace "^$num",'' -split ' (?=.{10}$)'
# $_[0] | Write-Host -NoNewline -ForegroundColor $colour
# ' ' | Write-Host -NoNewline
# $_[1].PadRight(10, ' ') | Write-Host -NoNewline -BackgroundColor $colour
(
[char]27,
'[',
(31,32)[$colour -eq 'Green'],
'm',
$_[0],
[char]27,
'[39m',
' ',
[char]27,
'[',
(41,42)[$colour -eq 'Green'],
'm',
$_[1].PadRight(10, ' '),
[char]27,
'[49m'
) | Write-Host -NoNewline
}
#$context = [System.Collections.Queue]::new()
$context = [PSCustomObject]@{
Count = 0
Index = 0
Queue = @($NULL, $NULL, $NULL, $NULL, $NULL, $NULL)
} `
| Add-Member `
-Name Enqueue `
-MemberType ScriptMethod `
-Value { Param($item)
if ($this.Count -eq $this.Queue.Count) {
throw $this.Count
}
$this.Queue[($this.Index + $this.Count) % $this.Queue.Count] = $item
++$this.Count
} `
-PassThru `
| Add-Member `
-Name Dequeue `
-MemberType ScriptMethod `
-Value { Param()
if ($this.Count -eq 0) {
throw $this.Count
}
$this.Queue[$this.Index++]
$this.Index %= $this.Queue.Count
--$this.Count
} `
-PassThru
$elipsed = @($False)
Function Context {
for ($count = 0; $context.Count -and $count -lt 3; ++$count) {
$line = $context.Dequeue()
$line[0] | Output -colour Green
' ' | Write-Host -NoNewline
$line[1] | Output -colour Green
'' | Write-Host
}
if ($elipsed[0] = !!$context.Count) {
# ' ' * 61 + '...' | Write-Host -ForegroundColor Green
[char]27,'[32m',(' ' * 61),'...',[char]27,'[39m' -join ''
}
}
'diff -y <(xxd "$1") <(xxd "$2")' `
| bash -s $args `
| %{
$line = $_ -split '\t' -replace '(?<= .{10}) \|',''
if ($line[0] -eq $line[1]) {
if ($context.Count -eq 6) {
Context
}
$context.Enqueue($line)
if ($elipsed[0]) {
$context.Dequeue() | Out-Null
}
}
else {
Context
$line[0] | Output -colour Red
' ' | Write-Host -NoNewline
$line[1] | Output -colour Red
'' | Write-Host
}
}
Context
@Hashbrown777
Copy link
Author

Hashbrown777 commented Apr 4, 2023

I have an old HDD array where each disk is individually LUKS'd and the encrypted array as a whole is collated under btrfs. One of the disks has bad sectors, but like my other arrays of this nature (which have no failures but equally as old, like over 15years), it continues to function 24/7 magically without degrading further somehow.

In this situation, possibly because filesystem encryption is enabled, I find that unreadable blocks turn up with an "I/O error" and I have a sneaking suspicion the bad block isn't being taken out of the available pool of blocks to write to by the filesystem/kernel/disk table.
So instead of deleting them outright, as a precaution to stop further writes being corrupted, I've taken to moving unreadable files to a hidden directory and removing all permissions from them, so these blocks are kept allocated but unused.

The data isn't critical, just the amount of available space, so replacing the disks isn't really the goal.
However, some of these files are GB's in size, and when reading them I can see that some can take quite some time before the read will abort due to the error, meaning that portions of the file are readable and using up valid working addressable disk. Indeed, tailing some files (using dd, not tail since that seems to actually fast-forward over the whole file instead of seeking) reveals that readable blocks exist after corrupted portions also.

This script reads through all files in my bad files directory, identifies all valid and invalid blocks of the given blocksize, then asks the kernel to punch holes in the file such that all valid blocks are freed back for use, and the resultant files are sparse with only broken sectors allocated.

I had 30 offending files ranging from 4KB to 8GB totaling over 50GB of logical space, reportedly using up 47GB on disk (compression and 0-blocks, probably), and after running the script successfully reclaimed 44GB. Logically the files still take up 53GB (as reported by ls), but physically du now reports about 6MB(!) used and btrfs fi df $mount reports the space being freed.

The worst files have 128 individual unreadable 4KB portions, and these portions tend to be consecutive in sequences up to 16 in a row, with huge completely readable portions between. The holes do manage to reclaim blocks surrounded on either side by corrupted sections, and performing a re-run these sections remain unreadable meaning we didn't accidentally clear nor perform a copy and re-allocate them.
Awesome.

@Hashbrown777
Copy link
Author

Once you have fixed your file, check what was missing:
vimdiff <(xxd FILE_rescued) <(xxd DIR/FILE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment