Skip to content

Instantly share code, notes, and snippets.

@PomoML
Created November 7, 2018 05:30
Show Gist options
  • Save PomoML/1836e9f2b9138ecc9fba1586d2118919 to your computer and use it in GitHub Desktop.
Save PomoML/1836e9f2b9138ecc9fba1586d2118919 to your computer and use it in GitHub Desktop.
download_images file-type problem
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"download_images converts all image extensions to .jpg rather than honoring the actual file type specified in the URL. \n",
"\n",
"In this example, the second and fourth URLs point to PNGs. Those images when downloaded are given the incorrect extension .jpg.\n",
"\n",
"It's likely that this has not been seen as a problem because many image viewers detect the image file type by looking at the contents rather than the extension. However, on Ubuntu the default Image Viewer will not open an image when its extension does not match the actual file content. \n",
"\n",
"This causes unnecessary hassle when cleaning up the training data. It's pertinent to fast.ai v3, Lesson 2-download notebook."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"from fastai import *\n",
"from fastai.vision import *"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"path = Path(os.getcwd())\n",
"urlsource = path/'urls_mixed.txt'\n",
"dest = path/'mixedImages' #Folder to receive downloaded images\n",
"dest.mkdir(parents=True, exist_ok=True)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" <div>\n",
" <style>\n",
" \t/* Turns off some styling */\n",
" \tprogress {\n",
"\n",
" \t/* gets rid of default border in Firefox and Opera. */\n",
" \tborder: none;\n",
"\n",
" \t/* Needs to be in here for Safari polyfill so background images work as expected. */\n",
" \tbackground-size: auto;\n",
" }\n",
"\n",
" .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {\n",
" background: #F44336;\n",
" }\n",
" </style>\n",
" <progress value='4' class='' max='4', style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
" 100.00% [4/4 00:00<00:00]\n",
" </div>\n",
" "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"download_images(urlsource,dest, max_pics=500)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
https://as1.ftcdn.net/jpg/00/98/16/38/500_F_98163853_8GIQCcwAe5fy7WtoMRnzL7gxKYk3kcsj.jpg
https://d29p2nwx3pv59i.cloudfront.net/images-general/5-D-maj-photo-300x233.png
https://i.pinimg.com/originals/dd/92/21/dd9221214b184b7ddc704114d23b5e20.jpg
http://www.guitarresources.net/wp-content/uploads/2014/10/D-Major-chord.png
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment