grep -Fxvf file1 file2
Flags mean:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.
-x, --line-regexp
Select only those matches that exactly match the whole line.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.
# Get lines that's 5 character long
grep -x '.\{5\}'
# To run the operation on a file
less input_file.txt | grep -x '.\{5\}'
we can use a combo of head
and tail
. For example, to get line 50 to line 100 of a file:
head -n 100 input_file.txt | tail -n 50
less input_file.txt | wc -l
Copying a lot of files (argument list too long)
for i in source_dir/images/*.jpg ; do cp "$i" target_dir/images/; done
sometimes CSV's linebreak includes an extra \r
to remove these for using other CLI like xargs
in Linux, run the following instead of cat
(ref):
sed $'s/\r$//' your_file.csv
for example to run a python script once per image in target_dir/
ls target_dir/*.jpg | xargs -n 1 -I{} python script.py --image target_dir/{}
the -I{}
flag gives you control over where exactly to place the piped variable. The -n 1
flag tells xargs to read the command one line at a time
use case: generating a train-test split ref: link
ls | shuf -n 10 | xargs -i mv {} path-to-new-folder
use case: load testing a API
ls | xargs -i curl -X 'POST' 'http://your.api.url/apply?model_name=FancyModel' -H 'accept: image/jpeg' -H 'Content-Type: multipart/form-data' -F 'file_obj=@{};type=image/jpeg' -o path_to_output_folder/{}
- the output will be saved with the same filename as the input but in the specified output directory (using the
-o
flag) - add a
2>&1 | tee path_to_your_log_file.log
to save a log of each of your API calls and also see it in the terminal. Hat tip to this stackoverflow post - to time the whole command, just add
time
in the beginning
use case: retry an API call if the JSON response was Time-out or Error
grep -Ril "Error 500" path_to_output_folder/ | cut -c12- | rev |cut -c6- | rev | xargs -i curl -X 'POST' 'http://your.api.url/apply?model_name=FancyModel' -H 'accept: image/jpeg' -H 'Content-Type: multipart/form-data' -F 'file_obj=@{};type=image/jpeg' -o path_to_output_folder/{}.json
grep -Ril
: is used to find files in the given directory where the content of the file contain the given search pattern. In this caseError 500
(could also beTime-out
orInternal Server Error
). This command returns a list of files. For more, see this stackoverflow postcut
: this removes part of the returned file path. For example.json
rev
: is a hack to usecut
to remove the last X characters
assuming that all the CSVs have the same columns:
cat first.csv <(tail +2 second.csv ) <(tail +2 third.csv ) > all.csv
This command will show only column 3 of the input.csv
:
cut -f3 -d, input.csv
while this command will NOT show column 3:
cut --complement -f 3 -d, input.csv
- note that
---complement
is not available in the MacOS version ofcut
, for work around keep reading below...
And finally this command will show column 1 to 2 and column 4 onwards (inclusive):
cut -f1-2,4- -d, input.csv
Selecting Lines from a CSV using sed
for example:
sed -n '1p;1001,2001p' path/to/your.csv > path/to/your_subset.csv
will print the 1st line (the CSV headers) and the 1000th row to 2000th row to your_subset.csv
for example, when your script is expecting a slightly different column name in your dataframe:
sed -i '1cCol1,Col2,NewCol3' dataframe.csv
-i
this make changes to the CSV in place (see original solution here)
- this cheatsheet is amazing!
- there's a lesson on the Programming Historian or a tutorial on Digital Ocean if you got the time
- and finally this guide or even just this medium post could be useful too!
- use
-O
if you want to specify the output filename (see reference) - use
-P
if you want to download the file to a specific directory
cd targert/dir/
tar -czvf logs_archive.tar.gz *
to create a zip file, it's the easiest to put everything you wanna zip in a folder first (.e.g. ~/dir_to_zip/
) then run:
zip -r package.zip ~/dir_to_zip
tar -xvf filename.tar.gz -C /target/dir/
-v
: verbose, here is optional-C
: this is also optional but specify where to extract (x
) the file (f
) to instead of the current working directory
for zipped files:
unzip file.zip -d destination_folder
-d
is optional if you are okay extracting the file to the current directory with the same name as the zipped file
there's about 10 ways as describled in details here but my favorite is:
command |& tee output.txt
which will capture both the standard and error streams into output.txt
while still being visible in the terminal (this variation will overwrite output.txt
if exists)
curl ifconfig.me
curl checkip.amazonaws.com