Skip to content

Instantly share code, notes, and snippets.

@kousu
Last active April 8, 2021 22:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kousu/d6a945cfc6ffa9f93e82a630098acc90 to your computer and use it in GitHub Desktop.
Save kousu/d6a945cfc6ffa9f93e82a630098acc90 to your computer and use it in GitHub Desktop.
extract functions

Extract an indented block from a file, starting from a given header line.

Usage: extract.awk HEADER file

e.g.

$ ./extract.awk a t
   a() {

      printf("lololol\n");
   }
$ ./extract.awk b t
   b() {


     call_the_thing();
     for(i = 1; i<a; i++) {
        i--;
     }
   }

Installation

git clone https://gist.github.com/kousu/d6a945cfc6ffa9f93e82a630098acc90
mkdir -p ~/.local/bin
ln d6a945cfc6ffa9f93e82a630098acc90/extract-block ~/.local/bin
chmod +x ~/.local/bin/extract-block

and if you're not on a system where this is pre-configured, also do

echo 'export PATH=~/.local/bin:$PATH` >> ~/.profile # or equivalent
#!/usr/bin/env -S awk -f
# extract an indented block from a file
#
# usage: extract-block search-pattern file1 [file2 ...]
BEGIN {
pattern = ARGV[1]
delete ARGV[1]
extracting=0
current_indentation=""
target_indentation=""
}
# detect the current line's indentatation level
{
match($0, "^([[:space:]]*)", G)
current_indentation=G[1]
}
# find requested header line
extracting==0 && $0 ~ pattern { # https://www.gnu.org/software/gawk/manual/html_node/Computed-Regexps.html#Computed-Regexps # XXX is this safe? is there a code-injection attack here?
extracting=1
target_indentation=current_indentation
# hack:
# the stopping condition would notice that target_indentation == current_indentation and stop immediately, meaning each block only prints one line
# circumvent that by just inlining the print step but skipping the stop step after it.
print; next
}
# print every line with the detected indent level
extracting == 1 {
#print "Current line |" $0 "|'s indentation level is: |" current_indentation "| which is " length(current_indentation) " long" " and |" target_indentation "| which is " length(target_indentation) " long" #>/dev/stderr
print
}
# stop on the last line of the block
# - skip blank lines, because most editors cull trailing whitespace, so a fully blank line shouldn't count as being dedented
# - skip if there's *no* indentation; in this case, we should print the entire rest of the file
extracting == 1 && length() > 0 && length(target_indentation) > 0 && length(current_indentation) <= length(target_indentation) {
# problem: this matches the first line (because length(current_indentation) == length(target_indentation)
#
# solution 1: put this line *before* the command that detects the first line
# -> problem: then this misses the *last*
exit
}
# test file
a() {
printf("lololol\n");
}
b() {
call_the_thing();
for(i = 1; i<a; i++) {
i--;
}
}
ofa sdf oisd
c
allphabetitt
dsfosup
spoufd
@kousu
Copy link
Author

kousu commented Apr 8, 2021

The idea is this would be useful in CI to detect when particular code blocks have changed.

To do this correctly requires a full AST parser. This is only an approximation. But most projects are using some kind of reasonable indentation rules, so it will work for them; and if you do use it to detect changes, it will detect when someone breaks the indentation rules too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment