Skip to content

Instantly share code, notes, and snippets.

@miku
Last active June 30, 2020 11:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save miku/065a94e92508a27c3f17445115f84dc4 to your computer and use it in GitHub Desktop.
Save miku/065a94e92508a27c3f17445115f84dc4 to your computer and use it in GitHub Desktop.
Check tar files status
/tarcheck
/elsevier_tarcheck.ndj

tarcheck

Mass tarball file checks.

Takes one tar filename per line from stdin and reports any errors found in a JSON lines file.

We had to check 1588 tar file containing millions of files. This program takes about 3 minutes and is for some reason much faster than tar tf plus exit code checks, plus the report here is JSON.

$ echo "myfile.tar" | tarcheck
SHELL := /bin/bash
elsevier_tarcheck.ndj: tarcheck
# takes about 3 minutes
taskcat ElsevierJournalsPaths | egrep 'tar$$' | ./tarcheck > $@
tarcheck: tarcheck.go
go build -o $@ $^
.PHONY: clean
clean:
rm tarcheck
package main
import (
"archive/tar"
"encoding/json"
"fmt"
"io"
"log"
"os"
"strings"
"github.com/miku/parallel"
)
type Result struct {
Filename string `json:"f"`
Errs []string `json:"errs"`
Contents []string `json:"c"`
}
func main() {
pp := parallel.NewProcessor(os.Stdin, os.Stdout, func(p []byte) ([]byte, error) {
filename := strings.TrimSpace(string(p))
log.Println(filename)
f, err := os.Open(filename)
if err != nil {
return nil, err
}
defer f.Close()
result := Result{
Filename: filename,
Errs: make([]string, 0),
Contents: make([]string, 0),
}
tr := tar.NewReader(f)
for {
hdr, err := tr.Next()
if err == io.EOF {
break // End of archive
}
if err != nil {
result.Errs = append(result.Errs, fmt.Sprintf("%v", err))
break
}
if hdr == nil {
result.Errs = append(result.Errs, "unreadable tar")
break
}
result.Contents = append(result.Contents, hdr.Name)
}
b, err := json.Marshal(result)
b = append(b, []byte("\n")...)
return b, err
})
if err := pp.Run(); err != nil {
log.Fatal(err)
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment