Skip to content

Instantly share code, notes, and snippets.

@victoriastuart
Last active July 26, 2017 00:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save victoriastuart/2e1094ecacaf6e25b3347c2dcd597c66 to your computer and use it in GitHub Desktop.
Save victoriastuart/2e1094ecacaf6e25b3347c2dcd597c66 to your computer and use it in GitHub Desktop.
[Linux] recurse through child directories, searching and replacing text in files (two methods), with optional filename, filename extension matching
==============================================================================
/mnt/Vancouver/Programming/scripts/claws_sed_tests/notes.txt
# ----------------------------------------------------------------------------
[victoria@victoria claws_sed_tests]$ date
Fri Jul 7 10:07:39 PDT 2017
[victoria@victoria claws_sed_tests]$ pwd
/mnt/Vancouver/Programming/scripts/claws_sed_tests
[victoria@victoria claws_sed_tests]$ tree . -F
.
├── 0
├── 0.txt
├── 1/
│   ├── 1
│   ├── 1.txt
│   └── 2/
│   ├── 2
│   ├── 2.txt
│   └── 3/
│   ├── 3
│   └── 3.txt
├── notes.txt
└── sed.sh*
3 directories, 10 files
[victoria@victoria claws_sed_tests]$ cat 0
The = neurodegeneration == that = = occurs = = = in Parkinson's disease is a result of str=
ess on the endoplasmic reticulum in the cell rather than failure of the mit=
ochondria as previously thought, according to a study in fruit flies. It wa=
s found that the death of neurons associated with the disease was prevented=
when chemicals that block the effects of endoplasmic reticulum stress were=
used. Some inherited forms of early-onset Parkinson's disease have typical=
ly been blamed on poorly functioning mitochondria, the powerhouses of cells=
. Without reliable sources of energy, neurons wither and die. This may not =
be the complete picture of what is happening within cells affected by Parki=
nson's. Researchers from the MRC Toxicology Unit at the University of Leice=
ster used a common fruit fly to investigate this further; fruit flies were =
used because they provide a good genetic model for humans.
[victoria@victoria claws_sed_tests]$ cat 0.txt
The = neurodegeneration == that = = occurs = = = in Parkinson's disease is a result of str=
ess on the endoplasmic reticulum in the cell rather than failure of the mit=
ochondria as previously thought, according to a study in fruit flies. It wa=
s found that the death of neurons associated with the disease was prevented=
when chemicals that block the effects of endoplasmic reticulum stress were=
used. Some inherited forms of early-onset Parkinson's disease have typical=
ly been blamed on poorly functioning mitochondria, the powerhouses of cells=
. Without reliable sources of energy, neurons wither and die. This may not =
be the complete picture of what is happening within cells affected by Parki=
nson's. Researchers from the MRC Toxicology Unit at the University of Leice=
ster used a common fruit fly to investigate this further; fruit flies were =
used because they provide a good genetic model for humans.
# ----------------------------------------
# OTHER FILES { 1; 1.txt | 2; 2.txt | ...}
# ARE THE SAME (**PRIOR** to running sed.sh)
# ----------------------------------------
[victoria@victoria claws_sed_tests]$ cat sed.sh
#!/bin/bash
# /mnt/Vancouver/Programming/scripts/claws_sed_tests/sed.sh
# ============================================================================
# METHOD 1 (GLOBBING + for LOOP + sed COMMAND):
# =============================================
# ----------------------------------------
# FILENAME GLOBBING:
# ------------------
# https://superuser.com/questions/112078/delete-matching-files-in-all-subdirectories
# FILES=zz//*/* ## all files, recursively through child directories
# FILES=zz/*/*.txt ## files ending in .txt, recursively through child directories
# not needed:
## shopt -s globstar
# ----------------------------------------
# For "Method 1," uncomment the following 5 lines:
# ------------------------------------------------
#FILES=zz/**/*.txt ## files ending in .txt, recursively through child directories
#for f in $FILES
#do
#sed -i ':a;N;$!ba;s/=\n//g' $f
#done
# ----------------------------------------
# NOTES:
# ------
# sed -i option [ man sed | grep -i -C2 \\-i ]:
# -i[SUFFIX], --in-place[=SUFFIX]
# edit files in place (makes backup if SUFFIX supplied)
# ------------------
# sed ':a;N;$!ba;s/\n//g' file
# https://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed
# This reads the whole file in a loop, then deletes the newline(s).
# Explanation:
# Create a label via :a.
# Append the current and next line to the pattern space via N.
# If we are before the last line, branch to the created label $!ba ($! means
# not to do it on the last line as there should be one final newline).
# Finally the substitution replaces every newline with a space on the pattern
# space (which is the whole file).
# ============================================================================
# METHOD 2 (find COMMAND + sed COMMAND):
# ======================================
# Victoria: this is my preferred solution:
# 1. One-liner;
# 2. Better control over directory recursion, other options
# (filename matching; file names containing spaces; ...)
# ----------------------------------------
# Parse files in CHILD directories, relative to current ( ./ ) directory:
# ----------------------------------------------------------------------
# [Child dir = ./1/ ]
# Files ending in .txt:
#find ./1 -type f -iname "*.txt" -exec sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# Files named with numbers only (e.g., Claws Mail messages):
find ./1 -type f -iname '[0-9]' -exec sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# ----------------------------------------
# Variants:
# ---------
# Hide errors (if any):
# find ./1 2>/dev/null -type f -iname '[0-9]' -exec sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# ----------------------------------------
# NOTES:
# ------
# For testing, add 'echo' to that command; e.g.:
# find ./1 -type f -iname '[0-9]' -exec echo sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# find ./1 -type f -iname "*.txt" -exec echo sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# ------------------
# https://stackoverflow.com/questions/6758963/find-and-replace-with-sed-in-directory-and-sub-directories
# https://stackoverflow.com/questions/16541582/finding-multiple-files-recursively-and-renaming-in-linux
# https://superuser.com/questions/112078/delete-matching-files-in-all-subdirectories
# -type f means find will only process files
# -iname '[0-9]' means find in file names only (-i: ignore case);
# '[0-9]' : numerical names, only
# -iname "*.txt" ditto, except only match files ending in .txt
# '-exec' makes find execute rename for every matching file found
# '{}' will be replaced with the path name of the file
# '\;' is only there to mark the end of the exec expression
# (runs once per file)
# ============================================================================
[victoria@victoria claws_sed_tests]$ ./sed.sh
[victoria@victoria claws_sed_tests]$ cat 0 ## [file '0' : untouched]
The = neurodegeneration == that = = occurs = = = in Parkinson's disease is a result of str=
ess on the endoplasmic reticulum in the cell rather than failure of the mit=
ochondria as previously thought, according to a study in fruit flies. It wa=
s found that the death of neurons associated with the disease was prevented=
when chemicals that block the effects of endoplasmic reticulum stress were=
used. Some inherited forms of early-onset Parkinson's disease have typical=
ly been blamed on poorly functioning mitochondria, the powerhouses of cells=
. Without reliable sources of energy, neurons wither and die. This may not =
be the complete picture of what is happening within cells affected by Parki=
nson's. Researchers from the MRC Toxicology Unit at the University of Leice=
ster used a common fruit fly to investigate this further; fruit flies were =
used because they provide a good genetic model for humans.
[victoria@victoria claws_sed_tests]$ cat 0.txt ## [file '0.txt' : untouched]
The = neurodegeneration == that = = occurs = = = in Parkinson's disease is a result of str=
ess on the endoplasmic reticulum in the cell rather than failure of the mit=
ochondria as previously thought, according to a study in fruit flies. It wa=
s found that the death of neurons associated with the disease was prevented=
when chemicals that block the effects of endoplasmic reticulum stress were=
used. Some inherited forms of early-onset Parkinson's disease have typical=
ly been blamed on poorly functioning mitochondria, the powerhouses of cells=
. Without reliable sources of energy, neurons wither and die. This may not =
be the complete picture of what is happening within cells affected by Parki=
nson's. Researchers from the MRC Toxicology Unit at the University of Leice=
ster used a common fruit fly to investigate this further; fruit flies were =
used because they provide a good genetic model for humans.
# ----------------------------------------
## ABOVE (PARENT DIRECTORY CONTENTS) UNTOUCHED, AS PLANNED,
## WHILE THE FOLLOWING (CHILD DIRECTORY FILES) ARE EDITED,
## PER THAT SINGLE-LINE COMMAND:
## find ./1 -type f -iname '[0-9]' -exec sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# ----------------------------------------
[victoria@victoria claws_sed_tests]$ cat 1/1 ## [file '1' : edited]
The = neurodegeneration == that = = occurs = = = in Parkinson's disease is a result of stress on the endoplasmic reticulum in the cell rather than failure of the mitochondria as previously thought, according to a study in fruit flies. It was found that the death of neurons associated with the disease was prevented when chemicals that block the effects of endoplasmic reticulum stress were used. Some inherited forms of early-onset Parkinson's disease have typically been blamed on poorly functioning mitochondria, the powerhouses of cells. Without reliable sources of energy, neurons wither and die. This may not be the complete picture of what is happening within cells affected by Parkinson's. Researchers from the MRC Toxicology Unit at the University of Leicester used a common fruit fly to investigate this further; fruit flies were used because they provide a good genetic model for humans.
[victoria@victoria claws_sed_tests]$ cat 1/1.txt ## [file '1.txt' : untouched]
The = neurodegeneration == that = = occurs = = = in Parkinson's disease is a result of str=
ess on the endoplasmic reticulum in the cell rather than failure of the mit=
ochondria as previously thought, according to a study in fruit flies. It wa=
s found that the death of neurons associated with the disease was prevented=
when chemicals that block the effects of endoplasmic reticulum stress were=
used. Some inherited forms of early-onset Parkinson's disease have typical=
ly been blamed on poorly functioning mitochondria, the powerhouses of cells=
. Without reliable sources of energy, neurons wither and die. This may not =
be the complete picture of what is happening within cells affected by Parki=
nson's. Researchers from the MRC Toxicology Unit at the University of Leice=
ster used a common fruit fly to investigate this further; fruit flies were =
used because they provide a good genetic model for humans.
[victoria@victoria claws_sed_tests]$ cat 1/2/2 ## [file '2' : edited]
The = neurodegeneration == that = = occurs = = = in Parkinson's disease is a result of stress on the endoplasmic reticulum in the cell rather than failure of the mitochondria as previously thought, according to a study in fruit flies. It was found that the death of neurons associated with the disease was prevented when chemicals that block the effects of endoplasmic reticulum stress were used. Some inherited forms of early-onset Parkinson's disease have typically been blamed on poorly functioning mitochondria, the powerhouses of cells. Without reliable sources of energy, neurons wither and die. This may not be the complete picture of what is happening within cells affected by Parkinson's. Researchers from the MRC Toxicology Unit at the University of Leicester used a common fruit fly to investigate this further; fruit flies were used because they provide a good genetic model for humans.
[victoria@victoria claws_sed_tests]$ cat 1/2/2.txt ## [file '2.txt' : untouched]
The = neurodegeneration == that = = occurs = = = in Parkinson's disease is a result of str=
ess on the endoplasmic reticulum in the cell rather than failure of the mit=
ochondria as previously thought, according to a study in fruit flies. It wa=
s found that the death of neurons associated with the disease was prevented=
when chemicals that block the effects of endoplasmic reticulum stress were=
used. Some inherited forms of early-onset Parkinson's disease have typical=
ly been blamed on poorly functioning mitochondria, the powerhouses of cells=
. Without reliable sources of energy, neurons wither and die. This may not =
be the complete picture of what is happening within cells affected by Parki=
nson's. Researchers from the MRC Toxicology Unit at the University of Leice=
ster used a common fruit fly to investigate this further; fruit flies were =
used because they provide a good genetic model for humans.
[victoria@victoria claws_sed_tests]$ cat 1/2/3/3 ## [file '3' : edited]
The = neurodegeneration == that = = occurs = = = in Parkinson's disease is a result of stress on the endoplasmic reticulum in the cell rather than failure of the mitochondria as previously thought, according to a study in fruit flies. It was found that the death of neurons associated with the disease was prevented when chemicals that block the effects of endoplasmic reticulum stress were used. Some inherited forms of early-onset Parkinson's disease have typically been blamed on poorly functioning mitochondria, the powerhouses of cells. Without reliable sources of energy, neurons wither and die. This may not be the complete picture of what is happening within cells affected by Parkinson's. Researchers from the MRC Toxicology Unit at the University of Leicester used a common fruit fly to investigate this further; fruit flies were used because they provide a good genetic model for humans.
[victoria@victoria claws_sed_tests]$ cat 1/2/3/3.txt ## [file '3.txt' : untouched]
The = neurodegeneration == that = = occurs = = = in Parkinson's disease is a result of str=
ess on the endoplasmic reticulum in the cell rather than failure of the mit=
ochondria as previously thought, according to a study in fruit flies. It wa=
s found that the death of neurons associated with the disease was prevented=
when chemicals that block the effects of endoplasmic reticulum stress were=
used. Some inherited forms of early-onset Parkinson's disease have typical=
ly been blamed on poorly functioning mitochondria, the powerhouses of cells=
. Without reliable sources of energy, neurons wither and die. This may not =
be the complete picture of what is happening within cells affected by Parki=
nson's. Researchers from the MRC Toxicology Unit at the University of Leice=
ster used a common fruit fly to investigate this further; fruit flies were =
used because they provide a good genetic model for humans.
[victoria@victoria claws_sed_tests]$ ## ALL CHECKS OUT! :-D
[victoria@victoria claws_sed_tests]$
==============================================================================
==============================================================================
END OF FILE
==============================================================================
==============================================================================
#!/bin/bash
# /mnt/Vancouver/Programming/scripts/claws_sed_tests/sed.sh
# ============================================================================
# METHOD 1 (GLOBBING + for LOOP + sed COMMAND):
# =============================================
# ----------------------------------------
# FILENAME GLOBBING:
# ------------------
# https://superuser.com/questions/112078/delete-matching-files-in-all-subdirectories
# FILES=zz//*/* ## all files, recursively through child directories
# FILES=zz/*/*.txt ## files ending in .txt, recursively through child directories
# not needed:
## shopt -s globstar
# ----------------------------------------
# For "Method 1," uncomment the following 5 lines:
# ------------------------------------------------
#FILES=zz/**/*.txt ## files ending in .txt, recursively through child directories
#for f in $FILES
#do
#sed -i ':a;N;$!ba;s/=\n//g' $f
#done
# ----------------------------------------
# NOTES:
# ------
# sed -i option [ man sed | grep -i -C2 \\-i ]:
# -i[SUFFIX], --in-place[=SUFFIX]
# edit files in place (makes backup if SUFFIX supplied)
# ------------------
# sed ':a;N;$!ba;s/\n//g' file
# https://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed
# This reads the whole file in a loop, then deletes the newline(s).
# Explanation:
# Create a label via :a.
# Append the current and next line to the pattern space via N.
# If we are before the last line, branch to the created label $!ba ($! means
# not to do it on the last line as there should be one final newline).
# Finally the substitution replaces every newline with a space on the pattern
# space (which is the whole file).
# ============================================================================
# METHOD 2 (find COMMAND + sed COMMAND):
# ======================================
# Victoria: this is my preferred solution:
# 1. One-liner;
# 2. Better control over directory recursion, other options
# (filename matching; file names containing spaces; ...)
# ----------------------------------------
# Parse files in CHILD directories, relative to current ( ./ ) directory:
# ----------------------------------------------------------------------
# [Child dir = ./1/ ]
# Files ending in .txt:
#find ./1 -type f -iname "*.txt" -exec sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# Files named with numbers only (e.g., Claws Mail messages):
find ./1 -type f -iname '[0-9]' -exec sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# ----------------------------------------
# Variants:
# ---------
# Hide errors (if any):
# find ./1 2>/dev/null -type f -iname '[0-9]' -exec sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# ----------------------------------------
# NOTES:
# ------
# For testing, add 'echo' to that command; e.g.:
# find ./1 -type f -iname '[0-9]' -exec echo sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# find ./1 -type f -iname "*.txt" -exec echo sed -i -e ':a;N;$!ba;s/=\n//g' {} \;
# ------------------
# https://stackoverflow.com/questions/6758963/find-and-replace-with-sed-in-directory-and-sub-directories
# https://stackoverflow.com/questions/16541582/finding-multiple-files-recursively-and-renaming-in-linux
# https://superuser.com/questions/112078/delete-matching-files-in-all-subdirectories
# -type f means find will only process files
# -iname '[0-9]' means find in file names only (-i: ignore case);
# '[0-9]' : numerical names, only
# -iname "*.txt" ditto, except only match files ending in .txt
# '-exec' makes find execute rename for every matching file found
# '{}' will be replaced with the path name of the file
# '\;' is only there to mark the end of the exec expression
# (runs once per file)
# ============================================================================
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment