Grab.py retrieves URLs for either "Intermediate Files" or "Original Files" and appends them to a CSV file.
The script includes a feature for resuming its progress in case of an interruption or error. As a precaution, it adjusts the URLs from the production server to redirect them to a staging server.
Additionally, the script incorporates error-handling mechanisms, utilizing a try-catch-retry approach, to gracefully manage timeouts.
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install selenium chromedriver_autoinstaller
python3 -m pip install pyppeteer
# Maybe?!
# sudo apt-get install chromium-browser
Place an input.csv file next to grab.py.
The input csv needs a column name "url" and no "file" column.
id,url,uuid,title,field_collection_number,field_date_available,field_date_published,field_finding_aid,field_item_barcode,field_library_catalog_link,field_oclc_number,field_unique_id,field_abstract,field_access_rights,field_access_terms,field_contributor,field_copyright_and_use,field_description,field_digital_publisher,field_genre,field_model,field_publisher,field_resource_type,field_subject,field_alternative_title,field_featured_item,field_dspace_item_id,field_digital_identifier,field_creator,field_publisher_country,field_spatial_coverage,field_date_copyrighted
1,https://digital.library.jhu.edu/node/11943,d3ba594a-a226-4bab-9185-72fe54cd7328,Is X-ray harmful?,COLL-0008,2021-06-24,1957-01-20,https://aspace.library.jhu.edu/repositories/3/resources/1109,mq2415107mmmmm,https://catalyst.library.jhu.edu/catalog/bib_2415107,54860034,83aeefed-fbc8-4772-b2a8-a9a3865ab6e2,"Lynn Poole discusses x-rays for treatment and diagnosis of disease and displays a recent report from the National Academy or Sciences and National Research Council on the biological effects of radiation. Dr. Russell Morgan, Director of Radiology Dept. at Johns Hopkins University, fields questions from members of the press: Nate Hazeltine, a ""Washington Post"" science writer; Pare Lorentz, a film producer; and Earl Ubell, a reporter and science editor with the ""New York Herald Tribune"". Dr. Morgan explains that x-rays affect both individual cells and the whole body, making them more susceptible to premature aging. He discusses the research by John Lawrence on the effects of radiation on mice and their extrapolation to man. He also notes a study on radiation vs. non-radiation workers that showed no difference in life spans of the two groups. It is the amount of radiation exposure that determines the effects of the damage. For example, a chest x-ray only delivers about 1/20th roentgen, a unit of radiation. However, Dr. Morgan discusses the feasibility of a reporting system for patients' total x-ray exposure and the need for a set of standards. And he does admit that the complexity and amount of radiation exposure is increasing in diagnostic studies and could double by 1960-65. A film clip demonstrates that this radiation exposure can be reduced by filtration, distance from the x-ray machine, length of time of exposure, and protection of areas not being radiated. Mr. Poole points out that Dr. Morgan has developed a fluoroscopy machine reducing by up to ten times the radiation time. In conclusion, Dr. Morgan discusses whether the Atomic Energy Commission or the U. S. Public Health Services should be responsible for the public's radiation health problems.",Public digital access,Science Review,"relators:brd:ABC Television Network||relators:drt:Calfee, Kennard||relators:nrt:Chaseman, Joel||relators:prd:Hazeltine, Nate||relators:prd:Lorentz, Pare||relators:prd:Morgan, Russell H. (Russell Hedley), 1911-1986||relators:prd:Poole, Lynn||relators:prd:Ubell, Earl||relators:pro:Geier, Leo, 1926-2017||relators:pro:Poole, Lynn||relators:pro:WAAM (Television station : Baltimore, Md.)||relators:aus:Comte, Gilbert",Copyright Not Evaluated,"Originally broadcast as a segment of the television program Johns Hopkins File 7 on January 20, 1957 from the studios of WAAM in Baltimore, Md. Black and white. Lynn Poole, Leo Geier, producers; Kennard Calfee, director; Gilbert Comte, writer; Joel Chaseman, narrator; produced by WAAM television station in Baltimore, Md. for the ABC Television Network. Lynn Poole, Russell H. Morgan, Nate Hazeltine, Pare Lorentz, Earl Ubell, presenters. Digitized in 2004.",Johns Hopkins University. Sheridan Libraries,Educational television programs,Video,Johns Hopkins University,Moving Image,X-rays--Physiological effect||X-rays,,,,,,,,,
2,https://digital.library.jhu.edu/node/11944,9d4a0ed0-3b08-450f-ad7a-32ab30dc104a,Seeing in the dark,COLL-0008,2021-06-24,1957-01-13,https://aspace.library.jhu.edu/repositories/3/resources/1109,mq2415085mmmmm,https://catalyst.library.jhu.edu/catalog/bib_2415085,54859985,793edfd5-d2f7-4604-8742-bc560016cb1f,"Lynn Poole tells how the tenth century Islamic scholar Alhazan described the workings of the camera obscura. Later, Frenchman Niepce discovered an emulsion that could retain a photographic image. Dr. Walter Driscoll, director of research at Baird-Atomic Inc., then shows a chart of the electromagnetic spectrum and notes that while x-rays yield only shadowy pictures and radar waves detect but don't create pictures, germanium and silicon filters block radiated energy and allow infrared light to pass through to form an image. Dr. Driscoll displays a scanning bolometer, which can see in the dark, but the shapes it creates need to be interpreted. He also shows a snooperscope and a film clip of a sniperscope with infrared scope. Previous research on infrared or thermal detection was done by Sir John Frederick William Herschel. Potter Trainer demonstrates and explains the Evaporagraph (EVA), which is based on the principle that all things radiate heat as infrared rays, and shows some of the actual pictures made from heat rather than light. Dr. Walter Baird describes applications of EVA to industry, such as detecting problem-causing hot spots in electronic equipment or indicating heat escape or insulation deficiency in a building. EVA's resolution is 10 lines/mm at best, and it shows temperature contrast of .2 degree. The machine's weakness is the slow speed of response to small temperature differences and the inability to obtain the temperature scale of the item viewed. Nonetheless, Mr. Poole says EVA could play a vital role in civil defense and medicine.",Public digital access,Science Review,"relators:brd:ABC Television Network||relators:drt:Calfee, Kennard||relators:nrt:Chaseman, Joel||relators:prd:Baird, Walter S.||relators:prd:Driscoll, Walter G.||relators:prd:Poole, Lynn||relators:prd:Trainer, Potter.||relators:pro:Geier, Leo, 1926-2017||relators:pro:Poole, Lynn||relators:pro:WAAM (Television station : Baltimore, Md.)||relators:aus:Comte, Gilbert",Copyright Not Evaluated,"Originally broadcast as a segment of the television program Johns Hopkins File 7 on January 13, 1957 from the studios of WAAM in Baltimore, Md. Black and white. Lynn Poole, Leo Geier, producers; Kennard Calfee, director; Gilbert Comte, writer; Joel Chaseman, narrator; produced by WAAM television station in Baltimore, Md. for the ABC Television Network. Lynn Poole, Walter S. Baird, Walter G. Driscoll, Potter Trainer, presenters. Digitized in 2004.",Johns Hopkins University. Sheridan Libraries,Educational television programs,Video,Johns Hopkins University,Moving Image,Infrared radiation||Camera obscuras,,,,,,,,,
id,url,uuid,title,field_collection_number,field_date_available,field_date_published,field_finding_aid,field_item_barcode,field_library_catalog_link,field_oclc_number,field_unique_id,field_abstract,field_access_rights,field_access_terms,field_contributor,field_copyright_and_use,field_description,field_digital_publisher,field_genre,field_model,field_publisher,field_resource_type,field_subject,field_alternative_title,field_featured_item,field_dspace_item_id,field_digital_identifier,field_creator,field_publisher_country,field_spatial_coverage,field_date_copyrighted,field_date_created
1,https://digital.library.jhu.edu/node/11943,d3ba594a-a226-4bab-9185-72fe54cd7328,Is X-ray harmful?,COLL-0008,2021-06-24,1957-01-20,https://aspace.library.jhu.edu/repositories/3/resources/1109,mq2415107mmmmm,https://catalyst.library.jhu.edu/catalog/bib_2415107,54860034,83aeefed-fbc8-4772-b2a8-a9a3865ab6e2,"Lynn Poole discusses x-rays for treatment and diagnosis of disease and displays a recent report from the National Academy or Sciences and National Research Council on the biological effects of radiation. Dr. Russell Morgan, Director of Radiology Dept. at Johns Hopkins University, fields questions from members of the press: Nate Hazeltine, a ""Washington Post"" science writer; Pare Lorentz, a film producer; and Earl Ubell, a reporter and science editor with the ""New York Herald Tribune"". Dr. Morgan explains that x-rays affect both individual cells and the whole body, making them more susceptible to premature aging. He discusses the research by John Lawrence on the effects of radiation on mice and their extrapolation to man. He also notes a study on radiation vs. non-radiation workers that showed no difference in life spans of the two groups. It is the amount of radiation exposure that determines the effects of the damage. For example, a chest x-ray only delivers about 1/20th roentgen, a unit of radiation. However, Dr. Morgan discusses the feasibility of a reporting system for patients' total x-ray exposure and the need for a set of standards. And he does admit that the complexity and amount of radiation exposure is increasing in diagnostic studies and could double by 1960-65. A film clip demonstrates that this radiation exposure can be reduced by filtration, distance from the x-ray machine, length of time of exposure, and protection of areas not being radiated. Mr. Poole points out that Dr. Morgan has developed a fluoroscopy machine reducing by up to ten times the radiation time. In conclusion, Dr. Morgan discusses whether the Atomic Energy Commission or the U. S. Public Health Services should be responsible for the public's radiation health problems.",Public digital access,Science Review,"relators:brd:ABC Television Network||relators:drt:Calfee, Kennard||relators:nrt:Chaseman, Joel||relators:prd:Hazeltine, Nate||relators:prd:Lorentz, Pare||relators:prd:Morgan, Russell H. (Russell Hedley), 1911-1986||relators:prd:Poole, Lynn||relators:prd:Ubell, Earl||relators:pro:Geier, Leo, 1926-2017||relators:pro:Poole, Lynn||relators:pro:WAAM (Television station : Baltimore, Md.)||relators:aus:Comte, Gilbert",Copyright Not Evaluated,"Originally broadcast as a segment of the television program Johns Hopkins File 7 on January 20, 1957 from the studios of WAAM in Baltimore, Md. Black and white. Lynn Poole, Leo Geier, producers; Kennard Calfee, director; Gilbert Comte, writer; Joel Chaseman, narrator; produced by WAAM television station in Baltimore, Md. for the ABC Television Network. Lynn Poole, Russell H. Morgan, Nate Hazeltine, Pare Lorentz, Earl Ubell, presenters. Digitized in 2004.",Johns Hopkins University. Sheridan Libraries,Educational television programs,Video,Johns Hopkins University,Moving Image,X-rays--Physiological effect||X-rays,,,,,,,,,,https://stage.digital.library.jhu.edu/system/files/2022-03-17/jhu_coll-0008_A6024.mp4
2,https://digital.library.jhu.edu/node/11944,9d4a0ed0-3b08-450f-ad7a-32ab30dc104a,Seeing in the dark,COLL-0008,2021-06-24,1957-01-13,https://aspace.library.jhu.edu/repositories/3/resources/1109,mq2415085mmmmm,https://catalyst.library.jhu.edu/catalog/bib_2415085,54859985,793edfd5-d2f7-4604-8742-bc560016cb1f,"Lynn Poole tells how the tenth century Islamic scholar Alhazan described the workings of the camera obscura. Later, Frenchman Niepce discovered an emulsion that could retain a photographic image. Dr. Walter Driscoll, director of research at Baird-Atomic Inc., then shows a chart of the electromagnetic spectrum and notes that while x-rays yield only shadowy pictures and radar waves detect but don't create pictures, germanium and silicon filters block radiated energy and allow infrared light to pass through to form an image. Dr. Driscoll displays a scanning bolometer, which can see in the dark, but the shapes it creates need to be interpreted. He also shows a snooperscope and a film clip of a sniperscope with infrared scope. Previous research on infrared or thermal detection was done by Sir John Frederick William Herschel. Potter Trainer demonstrates and explains the Evaporagraph (EVA), which is based on the principle that all things radiate heat as infrared rays, and shows some of the actual pictures made from heat rather than light. Dr. Walter Baird describes applications of EVA to industry, such as detecting problem-causing hot spots in electronic equipment or indicating heat escape or insulation deficiency in a building. EVA's resolution is 10 lines/mm at best, and it shows temperature contrast of .2 degree. The machine's weakness is the slow speed of response to small temperature differences and the inability to obtain the temperature scale of the item viewed. Nonetheless, Mr. Poole says EVA could play a vital role in civil defense and medicine.",Public digital access,Science Review,"relators:brd:ABC Television Network||relators:drt:Calfee, Kennard||relators:nrt:Chaseman, Joel||relators:prd:Baird, Walter S.||relators:prd:Driscoll, Walter G.||relators:prd:Poole, Lynn||relators:prd:Trainer, Potter.||relators:pro:Geier, Leo, 1926-2017||relators:pro:Poole, Lynn||relators:pro:WAAM (Television station : Baltimore, Md.)||relators:aus:Comte, Gilbert",Copyright Not Evaluated,"Originally broadcast as a segment of the television program Johns Hopkins File 7 on January 13, 1957 from the studios of WAAM in Baltimore, Md. Black and white. Lynn Poole, Leo Geier, producers; Kennard Calfee, director; Gilbert Comte, writer; Joel Chaseman, narrator; produced by WAAM television station in Baltimore, Md. for the ABC Television Network. Lynn Poole, Walter S. Baird, Walter G. Driscoll, Potter Trainer, presenters. Digitized in 2004.",Johns Hopkins University. Sheridan Libraries,Educational television programs,Video,Johns Hopkins University,Moving Image,Infrared radiation||Camera obscuras,,,,,,,,,,https://stage.digital.library.jhu.edu/system/files/2022-03-17/jhu_coll-0008_A6014.mp4
./grab.py
. . .
Processing https://stage.digital.library.jhu.edu/node/11855...5663 left to complete.