Created
March 28, 2017 08:13
-
-
Save KarthikMAM/d8ebde4db84a72b083df0e14242edb1a to your computer and use it in GitHub Desktop.
Download the grid corpus dataset and extract it.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#preparing for download | |
mkdir "gridcorpus" | |
cd "gridcorpus" | |
mkdir "raw" "audio" "video" | |
cd "raw" && mkdir "audio" "video" | |
for i in `seq $1 $2` | |
do | |
printf "\n\n------------------------- Downloading $i th speaker -------------------------\n\n" | |
#download the audio of the ith speaker | |
cd "audio" && curl "http://spandh.dcs.shef.ac.uk/gridcorpus/s$i/audio/s$i.tar" > "s$i.tar" && cd .. | |
cd "video" && curl "http://spandh.dcs.shef.ac.uk/gridcorpus/s$i/video/s$i.mpg_vcd.zip" > "s$i.zip" && cd .. | |
if (( $3 == "y" )) | |
then | |
unzip -q "video/s$i.zip" -d "../video" | |
tar -xf "audio/s$i.tar" -C "../audio" | |
fi | |
done |
Hi @KarthikMAM, Thanks a lot for providing such an easy way to download this dataset;
however, it did not worked for me so I changed the for loop format. Now it works like a charm 👍
for ((i=1; i<=2; i++)); do printf "\n\n------------------------- Downloading $i th speaker -------------------------\n\n"
line 15: ((: == y: syntax error: operand expected (error token is "== y").
How to fix this error?
Just to keep sharing the latest greatest:
http --> https
#preparing for download
mkdir "gridcorpus"
cd "gridcorpus"
mkdir "raw" "audio" "video"
cd "raw" && mkdir "audio" "video"
for i in `seq $1 $2`
do
printf "\n\n------------------------- Downloading $i th speaker -------------------------\n\n"
#download the audio of the ith speaker
cd "audio" && curl "https://spandh.dcs.shef.ac.uk/gridcorpus/s$i/audio/s$i.tar" > "s$i.tar" && cd ..
cd "video" && curl "https://spandh.dcs.shef.ac.uk/gridcorpus/s$i/video/s$i.mpg_vcd.zip" > "s$i.zip" && cd ..
if (( $3 == "y" ))
then
unzip -q "video/s$i.zip" -d "../video"
tar -xf "audio/s$i.tar" -C "../audio"
fi
done
Sharing for Python Users
Python Script
import os
import zipfile
import subprocess
import tarfile
# Create directories
os.makedirs("gridcorpus/raw/audio", exist_ok=True)
os.makedirs("gridcorpus/raw/video", exist_ok=True)
os.makedirs("gridcorpus/audio", exist_ok=True)
os.makedirs("gridcorpus/video", exist_ok=True)
# Define range of speakers 1 to 34
start_speaker = int(input("Enter the starting speaker number: "))
end_speaker = int(input("Enter the ending speaker number: "))
extract_files = input("Do you want to extract files after downloading? (y/n): ")
for i in range(start_speaker, end_speaker + 1):
print(f"\n\n------------------------- Downloading {i}th speaker -------------------------\n\n")
# Download audio and video files
subprocess.run(["curl", f"https://spandh.dcs.shef.ac.uk/gridcorpus/s{i}/audio/s{i}.tar", "-o", f"gridcorpus/raw/audio/s{i}.tar"])
subprocess.run(["curl", f"https://spandh.dcs.shef.ac.uk/gridcorpus/s{i}/video/s{i}.mpg_vcd.zip", "-o", f"gridcorpus/raw/video/s{i}.zip"])
# Extract files if requested
if extract_files.lower() == "y":
with zipfile.ZipFile(f"gridcorpus/raw/video/s{i}.zip", 'r') as zip_ref:
zip_ref.extractall(f"gridcorpus/video/s{i}")
with tarfile.open(f"gridcorpus/raw/audio/s{i}.tar", 'r') as tar_ref:
tar_ref.extractall(f"gridcorpus/audio/s{i}")
print("Download completed.")
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @KarthikMAM, Thanks a lot for providing such an easy way to download this dataset;
however, it did not worked for me so I changed the for loop format. Now it works like a charm 👍
for ((i=1; i<=2; i++)); do printf "\n\n------------------------- Downloading $i th speaker -------------------------\n\n"