Skip to content

Instantly share code, notes, and snippets.

@rctay
Last active January 2, 2016 15:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rctay/8322742 to your computer and use it in GitHub Desktop.
Save rctay/8322742 to your computer and use it in GitHub Desktop.
[python] aggregate m,...,n to m-n

The program reads a sequence of integers (sorted), and outputs x-y ranges for consecutive integers, outputting it verbatim otherwise.

cat | python ranges.py
1
2
3
7
8
9
^D
1-3
7-9

ranges is actually a DFA/FSM!

Use case: for a NTFS partition, show which files are in bad sectors marked by ddrescue.

ddrescuelog outputs individual blocks (sectors). On the other hand, the -s option to ntfscluster can accept x-y ranges. By "factoring" the list of blocks into ranges, we can reduce the number of (slow) disk accesses.

ddrescuelog --domain-logfile=sdN.domain --list-blocks=- rescue.log \
| python ranges.py \
| xargs -n1 -i \
  sudo ntfscluster -s \{} 2>/dev/null

(redirected stderr for ntfscluster as it tends to output a lot of "Error reading inode NNN" lines)

def ranges(ints):
start = None
end = None
for x in iter(ints):
shallreset = True
if end is None:
pass
elif end + 1 == x:
end = x
shallreset = False
else:
yield (start, end)
if shallreset:
start = x
end = x
# in case the last int consumed was part of a range
if end > start:
yield (start,end)
def read_ints():
import fileinput
for line in fileinput.input():
yield int(line)
if __name__ == '__main__':
ints = read_ints()
r = ranges(ints)
for rr in r:
print "%d-%d" % rr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment