You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Used to download data from HTTP(S) sites and FTP servers
man curl to check installation and manual.
curl --help for checking help on specific commands.
-O to save the file with its original name curl -O <url> . -o for new file name curl -o newfilename.txt <url>.
Can use a wildcard * to download multiple files curl -O <url.com/datafilename*.txt>.
Can also using Globbing Parser curl -O https://website.com/filename[001-100].txt. To download the 10th file use curl -O https://website.com/filename[001-100:10].txt.
Preemptive Troubleshooting
curl has two particularly useful option flags in case of timeouts during download:
-L redirects HTTP URL if a 300 error code occurs
-C resumes a previous file transfer if it times out before completion, use - after the flag to make it automatically assume where to re-download.
Putting everything together curl -O -C -L - https://website.com/filename[001-100].txt.
wget
Derives from World Wide Web and get
Native to Linux but compatible for all operating systems
Used to download data from HTTP(S) and FTP
Better than curl at downloading multiple files recursively
Check installation with which wget. Check manual with wget --help.
Unique option flags to wget:
-b go to background immediately after startup
-q turn off output
-c resume broken download (continue getting a partially downloaded file)
wget -bqc https://website.com/file.txt
cat wget-log to check the download status and incase any file goes amiss.
Advanced Downloading With wget
Can store a list of downloads in a file like url_list.txt. Then download all with -i flag. Put other flags before -i.
Can set download limit rate with --limit-rate.
wget --limit-rate={rate}k {file_location}
`wget --limit-rate=200k -i url_list.txt
Setting constraints for small files by setting a mandatory pause time in seconds between file downloads with --wait.
wget --wait=2.5 -i url_list.txt
curl VS wget
curl advantages:
Can be used for downloading and uploading files from 20+ protocols
Easier to install
wget advantages:_
Has many built-in functionalities for handling multiple file downloads
Can handle various file formats for download (file directory, HTML pages)
csvkit
Is a suite of command-line tools
Is developed in Python by Wireservice
Offers data processing and cleaning capabilities on CSV files
Has data capabilities that rival Python, R, and SQL
Install with pip install csvkit or upgrade with pip install --upgrade csvkit.
This package does not have a man page, but instead it has a html based documentation.
Can only process csv files. This is just the suite, use the below commands to convert and process files into csv and analyze them.
in2csv
This module is part of the suite and it does include a man page in2csv --help / -h.
in2csv SpotifyData.xlsx > SpotifyData.csv
Not doing a redirecting arrow (output >) will only print the first page in2csv SpotifyData.xlsx.
Use --name or -n to print all sheet names.
Use --sheet option followed by sheet name to convert it.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters