Skip to content

Instantly share code, notes, and snippets.

@KarthikMAM
Created March 28, 2017 08:13
Show Gist options
  • Save KarthikMAM/d8ebde4db84a72b083df0e14242edb1a to your computer and use it in GitHub Desktop.
Save KarthikMAM/d8ebde4db84a72b083df0e14242edb1a to your computer and use it in GitHub Desktop.
Download the grid corpus dataset and extract it.
#preparing for download
mkdir "gridcorpus"
cd "gridcorpus"
mkdir "raw" "audio" "video"
cd "raw" && mkdir "audio" "video"
for i in `seq $1 $2`
do
printf "\n\n------------------------- Downloading $i th speaker -------------------------\n\n"
#download the audio of the ith speaker
cd "audio" && curl "http://spandh.dcs.shef.ac.uk/gridcorpus/s$i/audio/s$i.tar" > "s$i.tar" && cd ..
cd "video" && curl "http://spandh.dcs.shef.ac.uk/gridcorpus/s$i/video/s$i.mpg_vcd.zip" > "s$i.zip" && cd ..
if (( $3 == "y" ))
then
unzip -q "video/s$i.zip" -d "../video"
tar -xf "audio/s$i.tar" -C "../audio"
fi
done
@marziehoghbaie
Copy link

marziehoghbaie commented May 10, 2021

Hi @KarthikMAM, Thanks a lot for providing such an easy way to download this dataset;
however, it did not worked for me so I changed the for loop format. Now it works like a charm 👍
for ((i=1; i<=2; i++)); do printf "\n\n------------------------- Downloading $i th speaker -------------------------\n\n"

@649459021
Copy link

Hi @KarthikMAM, Thanks a lot for providing such an easy way to download this dataset;
however, it did not worked for me so I changed the for loop format. Now it works like a charm 👍
for ((i=1; i<=2; i++)); do printf "\n\n------------------------- Downloading $i th speaker -------------------------\n\n"

line 15: ((: == y: syntax error: operand expected (error token is "== y").
How to fix this error?

@Zikovich
Copy link

Just to keep sharing the latest greatest:

http --> https

#preparing for download 
mkdir "gridcorpus"
cd "gridcorpus"
mkdir "raw" "audio" "video"
cd "raw" && mkdir "audio" "video"

for i in `seq $1 $2`
do
    printf "\n\n------------------------- Downloading $i th speaker -------------------------\n\n"
    
    #download the audio of the ith speaker
    cd "audio" && curl "https://spandh.dcs.shef.ac.uk/gridcorpus/s$i/audio/s$i.tar" > "s$i.tar" && cd ..
    cd "video" && curl "https://spandh.dcs.shef.ac.uk/gridcorpus/s$i/video/s$i.mpg_vcd.zip" > "s$i.zip" && cd ..

    if (( $3 == "y" ))
    then
        unzip -q "video/s$i.zip" -d "../video"
        tar -xf "audio/s$i.tar" -C "../audio"
    fi
done

@sthasmn
Copy link

sthasmn commented May 9, 2024

Sharing for Python Users

Python Script

import os
import zipfile
import subprocess
import tarfile

# Create directories
os.makedirs("gridcorpus/raw/audio", exist_ok=True)
os.makedirs("gridcorpus/raw/video", exist_ok=True)
os.makedirs("gridcorpus/audio", exist_ok=True)
os.makedirs("gridcorpus/video", exist_ok=True)

# Define range of speakers 1 to 34
start_speaker = int(input("Enter the starting speaker number: "))
end_speaker = int(input("Enter the ending speaker number: "))
extract_files = input("Do you want to extract files after downloading? (y/n): ")

for i in range(start_speaker, end_speaker + 1):
    print(f"\n\n------------------------- Downloading {i}th speaker -------------------------\n\n")

    # Download audio and video files
    subprocess.run(["curl", f"https://spandh.dcs.shef.ac.uk/gridcorpus/s{i}/audio/s{i}.tar", "-o", f"gridcorpus/raw/audio/s{i}.tar"])
    subprocess.run(["curl", f"https://spandh.dcs.shef.ac.uk/gridcorpus/s{i}/video/s{i}.mpg_vcd.zip", "-o", f"gridcorpus/raw/video/s{i}.zip"])

    # Extract files if requested
    if extract_files.lower() == "y":
        with zipfile.ZipFile(f"gridcorpus/raw/video/s{i}.zip", 'r') as zip_ref:
            zip_ref.extractall(f"gridcorpus/video/s{i}")
        with tarfile.open(f"gridcorpus/raw/audio/s{i}.tar", 'r') as tar_ref:
            tar_ref.extractall(f"gridcorpus/audio/s{i}")

print("Download completed.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment