Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active February 15, 2022 04:26
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save dannguyen/07e91763f1f5fd410c84 to your computer and use it in GitHub Desktop.
Save dannguyen/07e91763f1f5fd410c84 to your computer and use it in GitHub Desktop.
Bash script to collect New York Times-authored stories from their ArticleSearch API V2

Counting front-page NYT bylines with Bash and jq

A quickie script in Bash, with the jq JSON-command-line-parser, to access the New York Times Article Search API v2 and count up the bylines. You can count whatever you want obviously, but a student was interested in replicating the gender calculation found at Who Writes for the New York Times?.

The NYT Article Search API

First thing you have to do is signup and register as a developer. API Keys are assigned by API, so make sure you specify the Article Search API.

Even before you register, you can use the NYT's handy API Console to interactively test your queries: http://developer.nytimes.com/io-docs

The Article Search API is pretty flexible; you can call it with no parameters except for your api-key and it will return (presumably) a list of articles, in reverse chronological order, starting from Sept. 18, 1851. However, it only returns 10 articles per request. And it won't let you paginate beyond a page parameter of 100 (i.e. you can't go to page 100000 to retrieve the 1,000,000th oldest Times article). To put it another way, you can only paginate through a maximum of 10,000 results, so you'll have to facet your search.

In the shell script attached to this gist, I set a few parameters to limit the number of possible results (the API refers to them as "hits"):

  • begin_date - specifies the oldest day to include in the search, and takes in a date formatted as YYYYMMDD, e.g. 20150316
  • end_date - specifies the most recent day to include in the search. If set equal to begin_date, you can effectively limit your search to that day
  • fq - This parameter lets you filter results with Lucene syntax. I don't know what that is so I just Googled around and copied from this iPython notebook. The API returns objects with a source field. I just want articles from the New York Times (as opposed to wire news from Reuters and Associated Press), so I use this key-value parameter in the API call: fq=source.contains:("New York")
  • page - Increment this parameter (starting at 0 for results 0 to 9) to paginate the results.

For any given day, there seems to be around 200 to 300 NYT-bylined articles. Here's the first page (page=0) of results for Mar. 15, 2015

In my sample shell script, each day is downloaded into a subdirectory such as ./data-hold/20150316.

To get the bylines – last name, first name, and listed position (i.e. rank) – for a given day, after the data's been downloaded, you can use jq.

The following jq query uses the select function to filter the list to front page bylines:

cat data-hold/20150316/*.json | \
  jq -r '.response .docs[] | 
  select(.print_page == "1") .byline .person[] | 
  [.lastname, .firstname, .rank] | @csv'

– which results in this

"ELIGON","John",1
"BRANTLEY","Ben",1
"SISARIO","Ben",1
"HOROWITZ","Jason",1
"RICHTEL","Matt",1
"TRACY","Marc",1
"GOLDSTEIN","Matthew",1
"PERLROTH","Nicole",2
"WEISMAN","Jonathan",1
"CORKERY","Michael",1
"SILVER-GREENBERG","Jessica",2
"HADID","Diaa",1
"DOUGHERTY","Conor",1
"HARDY","Quentin",2
"CARAMANICA","Jon",1
"GENZLINGER","Neil",1
"TABUCHI","Hiroko",1
"TRACY","Marc",1
"GRIMES","William",1
"BAGLI","Charles",1
"YEE","Vivian",2
"SIEGAL","Nina",1

To include the actual headlines per byline, it's kind of a pain in the ass to do it just from jq, so might as well use Python:

import json
import glob
print('|', '|'.join(('lastname', 'firstname', 'rank', 'desk', 'headline')), '|')
print('|--|--|--|--|--|')
for filename in glob.glob('./data-hold/20150316/*.json'):
    data = json.loads(open(filename).read())
    for article in data['response']['docs']:
        if article['print_page'] == "1":
            headline = article['headline']['main']
            desk = article['news_desk']        
            for p in article['byline']['person']:
                print('|', '|'.join((p['lastname'], p['firstname'], str(p['rank']), desk, headline)), '|')

The results, as Markdown-formatted tables:

lastname firstname rank desk headline
ELIGON John 1 National Crackdown in a Detroit Stripped of Metal Parts
BRANTLEY Ben 1 Culture Review: ‘On the Twentieth Century,’ With Kristin Chenoweth, Opens on Broadway
SISARIO Ben 1 Business ‘Blurred Lines’ Lawyer Rocks Music Industry Again
HOROWITZ Jason 1 National Evangelicals Aim to Mobilize an Army for Republicans in 2016
RICHTEL Matt 1 Business A Police Gadget Tracks Phones? Shhh! It’s Secret
TRACY Marc 1 Sports Kentucky, No. 1 in Height, Too, Relishes the View From on High
GOLDSTEIN Matthew 1 Business Authorities Closing In on Hackers Who Stole Data From JPMorgan Chase
PERLROTH Nicole 2 Business Authorities Closing In on Hackers Who Stole Data From JPMorgan Chase
WEISMAN Jonathan 1 National Chasm Grows Within G.O.P. Over Spending
CORKERY Michael 1 Business Many Buyers for Subprime Auto Loan Bundle
SILVER-GREENBERG Jessica 2 Business Many Buyers for Subprime Auto Loan Bundle
HADID Diaa 1 Foreign Arab Alliance Rises as Force in Israeli Elections
DOUGHERTY Conor 1 Business Managers Turn to Computer Games, Aiming for More Efficient Employees
HARDY Quentin 2 Business Managers Turn to Computer Games, Aiming for More Efficient Employees
CARAMANICA Jon 1 Culture Without Joan Rivers, ‘Fashion Police’ Is Falling Apart
GENZLINGER Neil 1 Culture Review: ‘iZombie,’ the Undead as a Force for Good
TABUCHI Hiroko 1 Business Etsy’s Success Gives Rise to Problems of Credibility and Scale
TRACY Marc 1 Sports N.C.A.A. Tournament 2015: Villanova, Duke and Wisconsin Join Kentucky as No. 1 Seeds
GRIMES William 1 Culture Asia Week Is Highlighted by Robert Hatfield Ellsworth Sale
BAGLI Charles 1 Metro Robert Durst of HBO’s ‘The Jinx’ Says He ‘Killed Them All’
YEE Vivian 2 Metro Robert Durst of HBO’s ‘The Jinx’ Says He ‘Killed Them All’
SIEGAL Nina 1 Culture For Michaela DePrince, a Dream Comes True at the Dutch National Ballet
{
"response": {
"meta": {
"hits": 237,
"time": 70,
"offset": 0
},
"docs": [
{
"web_url": "http://artsbeat.blogs.nytimes.com/2015/03/16/better-call-saul-recap-season-1-episode-7-bingo/",
"snippet": "We have the makings here of a fine anti-buddy show.",
"lead_paragraph": null,
"abstract": "We have the makings here of a fine anti-buddy show.",
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [
{
"width": 190,
"url": "images/2015/03/17/arts/16SAULWEB/16SAULWEB-thumbWide.jpg",
"height": 126,
"subtype": "wide",
"legacy": {
"wide": "images/2015/03/17/arts/16SAULWEB/16SAULWEB-thumbWide.jpg",
"wideheight": "126",
"widewidth": "190"
},
"type": "image"
},
{
"width": 600,
"url": "images/2015/03/17/arts/16SAULWEB/16SAULWEB-articleLarge.jpg",
"height": 399,
"subtype": "xlarge",
"legacy": {
"xlargewidth": "600",
"xlarge": "images/2015/03/17/arts/16SAULWEB/16SAULWEB-articleLarge.jpg",
"xlargeheight": "399"
},
"type": "image"
},
{
"width": 75,
"url": "images/2015/03/17/arts/16SAULWEB/16SAULWEB-thumbStandard.jpg",
"height": 75,
"subtype": "thumbnail",
"legacy": {
"thumbnailheight": "75",
"thumbnail": "images/2015/03/17/arts/16SAULWEB/16SAULWEB-thumbStandard.jpg",
"thumbnailwidth": "75"
},
"type": "image"
}
],
"headline": {
"main": "‘Better Call Saul’ Recap: Every Single Penny",
"kicker": "ArtsBeat"
},
"keywords": [
{
"rank": "1",
"name": "persons",
"value": "Banks, Jonathan"
},
{
"rank": "2",
"name": "persons",
"value": "Odenkirk, Bob"
},
{
"rank": "1",
"name": "glocations",
"value": "Albuquerque (NM)"
},
{
"rank": "1",
"name": "subject",
"value": "Television"
}
],
"pub_date": "2015-03-16T23:01:03Z",
"document_type": "blogpost",
"news_desk": "Culture",
"section_name": "Arts",
"subsection_name": null,
"byline": {
"person": [
{
"organization": "",
"role": "reported",
"firstname": "David",
"rank": 1,
"lastname": "SEGAL"
}
],
"original": "By DAVID SEGAL"
},
"type_of_material": "Blog",
"_id": "5507994c38f0d87501d703e1",
"word_count": "1110"
},
{
"web_url": "http://www.nytimes.com/video/us/politics/100000003575558/hillary-clinton-at-irish-american-event.html",
"snippet": "Former Secretary of State Hillary Rodham Clinton spoke at her induction into the Irish America Hall of Fame, recalling an early decision of her husband’s in the Northern Ireland peace process.",
"lead_paragraph": "Former Secretary of State Hillary Rodham Clinton spoke at her induction into the Irish America Hall of Fame, recalling an early decision of her husband’s in the Northern Ireland peace process.",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [
{
"width": 190,
"url": "images/2015/03/17/us/17CLINTON/17CLINTON-thumbWide.jpg",
"height": 126,
"subtype": "wide",
"legacy": {
"wide": "images/2015/03/17/us/17CLINTON/17CLINTON-thumbWide.jpg",
"wideheight": "126",
"widewidth": "190"
},
"type": "image"
},
{
"width": 600,
"url": "images/2015/03/17/us/17CLINTON/17CLINTON-articleLarge.jpg",
"height": 400,
"subtype": "xlarge",
"legacy": {
"xlargewidth": "600",
"xlarge": "images/2015/03/17/us/17CLINTON/17CLINTON-articleLarge.jpg",
"xlargeheight": "400"
},
"type": "image"
},
{
"width": 75,
"url": "images/2015/03/17/us/17CLINTON/17CLINTON-thumbStandard.jpg",
"height": 75,
"subtype": "thumbnail",
"legacy": {
"thumbnailheight": "75",
"thumbnail": "images/2015/03/17/us/17CLINTON/17CLINTON-thumbStandard.jpg",
"thumbnailwidth": "75"
},
"type": "image"
}
],
"headline": {
"main": "Hillary Clinton at Irish American Event"
},
"keywords": [
{
"rank": "1",
"is_major": "N",
"name": "persons",
"value": "Clinton, Hillary Rodham"
},
{
"rank": "2",
"is_major": "N",
"name": "persons",
"value": "Adams, Gerry"
},
{
"rank": "3",
"is_major": "N",
"name": "glocations",
"value": "United States"
},
{
"rank": "4",
"is_major": "N",
"name": "glocations",
"value": "Northern Ireland"
}
],
"pub_date": "2015-03-16T22:06:28Z",
"document_type": "multimedia",
"news_desk": "U.S. / Politics",
"section_name": "U.S.",
"subsection_name": "Politics",
"byline": {
"person": [],
"original": "Reuters",
"organization": "Reuters"
},
"type_of_material": "Video",
"_id": "55078c4738f0d87501d703cb",
"word_count": "31"
},
{
"web_url": "http://wordplay.blogs.nytimes.com/2015/03/16/some-tests/",
"snippet": "David Phillips tests our solving abilities.",
"lead_paragraph": null,
"abstract": "David Phillips tests our solving abilities.",
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [
{
"width": 190,
"url": "images/2015/03/16/crosswords/0316godzillajpg/0316godzillajpg-thumbWide.jpg",
"height": 126,
"subtype": "wide",
"legacy": {
"wide": "images/2015/03/16/crosswords/0316godzillajpg/0316godzillajpg-thumbWide.jpg",
"wideheight": "126",
"widewidth": "190"
},
"type": "image"
},
{
"width": 600,
"url": "images/2015/03/16/crosswords/0316godzillajpg/0316godzillajpg-articleLarge.jpg",
"height": 419,
"subtype": "xlarge",
"legacy": {
"xlargewidth": "600",
"xlarge": "images/2015/03/16/crosswords/0316godzillajpg/0316godzillajpg-articleLarge.jpg",
"xlargeheight": "419"
},
"type": "image"
},
{
"width": 75,
"url": "images/2015/03/16/crosswords/0316godzillajpg/0316godzillajpg-thumbStandard.jpg",
"height": 75,
"subtype": "thumbnail",
"legacy": {
"thumbnailheight": "75",
"thumbnail": "images/2015/03/16/crosswords/0316godzillajpg/0316godzillajpg-thumbStandard.jpg",
"thumbnailwidth": "75"
},
"type": "image"
}
],
"headline": {
"main": "Some Tests",
"kicker": "Wordplay"
},
"keywords": [
{
"rank": "1",
"name": "subject",
"value": "Crossword Puzzles"
}
],
"pub_date": "2015-03-16T22:00:21Z",
"document_type": "blogpost",
"news_desk": "Business",
"section_name": "Crosswords/Games",
"subsection_name": null,
"byline": {
"person": [
{
"organization": "",
"role": "reported",
"firstname": "Deb",
"rank": 1,
"lastname": "AMLEN"
}
],
"original": "By DEB AMLEN"
},
"type_of_material": "Blog",
"_id": "55078afc38f0d87501d703c7",
"word_count": "740"
},
{
"web_url": "http://www.nytimes.com/video/us/politics/100000003575232/obama-on-republican-education-budget.html",
"snippet": "While meeting with the Council of the Great City Schools leadership, the president said that there would be a “major debate” if the Republican budget lowered federal spending on education.",
"lead_paragraph": "While meeting with the Council of the Great City Schools leadership, the president said that there would be a “major debate” if the Republican budget lowered federal spending on education.",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [
{
"width": 190,
"url": "images/2015/03/17/us/17OBAMA/17OBAMA-thumbWide.jpg",
"height": 126,
"subtype": "wide",
"legacy": {
"wide": "images/2015/03/17/us/17OBAMA/17OBAMA-thumbWide.jpg",
"wideheight": "126",
"widewidth": "190"
},
"type": "image"
},
{
"width": 600,
"url": "images/2015/03/17/us/17OBAMA/17OBAMA-articleLarge.jpg",
"height": 600,
"subtype": "xlarge",
"legacy": {
"xlargewidth": "600",
"xlarge": "images/2015/03/17/us/17OBAMA/17OBAMA-articleLarge.jpg",
"xlargeheight": "600"
},
"type": "image"
},
{
"width": 75,
"url": "images/2015/03/17/us/17OBAMA/17OBAMA-thumbStandard.jpg",
"height": 75,
"subtype": "thumbnail",
"legacy": {
"thumbnailheight": "75",
"thumbnail": "images/2015/03/17/us/17OBAMA/17OBAMA-thumbStandard.jpg",
"thumbnailwidth": "75"
},
"type": "image"
}
],
"headline": {
"main": "Obama on Republican Education Budget"
},
"keywords": [
{
"rank": "1",
"is_major": "N",
"name": "subject",
"value": "Federal Budget (US)"
},
{
"rank": "2",
"is_major": "N",
"name": "subject",
"value": "Education (K-12)"
},
{
"rank": "3",
"is_major": "N",
"name": "persons",
"value": "Obama, Barack"
},
{
"rank": "4",
"is_major": "N",
"name": "organizations",
"value": "House of Representatives"
},
{
"rank": "5",
"is_major": "N",
"name": "organizations",
"value": "Senate"
},
{
"rank": "6",
"is_major": "N",
"name": "subject",
"value": "United States Politics and Government"
}
],
"pub_date": "2015-03-16T21:35:01Z",
"document_type": "multimedia",
"news_desk": "U.S. / Politics",
"section_name": "U.S.",
"subsection_name": "Politics",
"byline": {
"person": [],
"original": "AP",
"organization": "AP"
},
"type_of_material": "Video",
"_id": "550784ea38f0d87501d703ba",
"word_count": "30"
},
{
"web_url": "http://www.legacy.com/Link.asp?I=LS000174412985",
"snippet": "HOCKERT--Lorance, 76, died on March 11, 2015. A native New Yorker, graduate of NYU Law School, he used his training to serve his clients, and the community. He received numerous awards for public service: New York State Bar Association - Pro Bono...",
"lead_paragraph": "<!-- LORANCE HOCKERT -->HOCKERT--Lorance, 76, died on March 11, 2015. A native New Yorker, graduate of NYU Law School, he used his training to serve his clients, and the community. He received numerous awards for public service: New York State Bar Association - Pro Bono Award for service to the mentally ill, New York City Parks Council - Leadership for Parks, and the Riverdale Mental Health Association Dodge Award among others. He held leadership roles in Community Board 8 in the Bronx and Manhattan and was founder of the Chinese American Culture and Art Association among other affiliations. Lorance is survived by his wife Ruth, sons Geoffrey and Kenneth, and sister Joan Donnelly.<br/><br/><br><br>\n ",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [],
"headline": {
"main": "Paid Notice: Deaths HOCKERT, LORANCE "
},
"keywords": [],
"pub_date": "2015-03-16T21:12:02Z",
"document_type": "article",
"news_desk": "Classified",
"section_name": "Paid Death Notices",
"subsection_name": null,
"byline": null,
"type_of_material": "Paid Death Notice",
"_id": "5507ed1c38f0d87501d704bf",
"word_count": "110"
},
{
"web_url": "http://www.legacy.com/Link.asp?I=LS000174412984",
"snippet": "FRIEDMAN--Kathe. Kathe A. Friedman. Suddenly on March 15, 2015 beloved wife of Cory E., Devoted mother of Weston H. and Cameron S., Dear sister of Barbara Rothnberg. Services Thursday, March 19, 2015 at 12:00pm at Temple Emanu-El, Fifth Avenue at...",
"lead_paragraph": "<!-- KATHE FRIEDMAN --><IMG SRC=\"/Images/Cobrands/NYTimes/Photos/NYT-0001968632-FRIEDMANK_22_095922002.1_201041.jpg\" lgyOrigName=\"NYT-0001968632-FRIEDMANK_22_095922002.1.jpg\" ALIGN=\"LEFT\" vspace=\"4\" hspace=\"10\">FRIEDMAN--Kathe.<br/><br/> Kathe A. Friedman. Suddenly on March 15, 2015 beloved wife of Cory E., Devoted mother of Weston H. and Cameron S., Dear sister of Barbara Rothnberg. Services Thursday, March 19, 2015 at 12:00pm at Temple Emanu-El, Fifth Avenue at 65th Street.<br><br>\n ",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [],
"headline": {
"main": "Paid Notice: Deaths FRIEDMAN, KATHE "
},
"keywords": [],
"pub_date": "2015-03-16T21:12:00Z",
"document_type": "article",
"news_desk": "Classified",
"section_name": "Paid Death Notices",
"subsection_name": null,
"byline": null,
"type_of_material": "Paid Death Notice",
"_id": "5507ed1b38f0d87501d704bc",
"word_count": "42"
},
{
"web_url": "http://www.legacy.com/Link.asp?I=LS000174412983",
"snippet": "NEWMAN--Eileen. NewYork-Presbyterian Hospital mourns the loss of our dear friend Eileen Newman, whose grace and empathy will continue to inspire our mission. Eileen, along with her late husband and former Hospital trustee Arthur, embodied the ethos...",
"lead_paragraph": "<!-- EILEEN NEWMAN -->NEWMAN--Eileen.<br/><br/> NewYork-Presbyterian Hospital mourns the loss of our dear friend Eileen Newman, whose grace and empathy will continue to inspire our mission. Eileen, along with her late husband and former Hospital trustee Arthur, embodied the ethos of patient- centered, compassionate care. Together, their generosity has touched and comforted the lives of many in need, as evidenced by the Infusion Center named in their honor. We offer our deepest condolences to their son David and all of her family at this time. Frank A. Bennack, Jr., Chair, Board of Trustees Steven J. Corwin, MD, CEO Herbert Pardes, MD, Executive Vice Chair, Board of Trustees NewYork-Presbyterian Hospital<br><br>\n ",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [],
"headline": {
"main": "Paid Notice: Deaths NEWMAN, EILEEN "
},
"keywords": [],
"pub_date": "2015-03-16T21:11:57Z",
"document_type": "article",
"news_desk": "Classified",
"section_name": "Paid Death Notices",
"subsection_name": null,
"byline": null,
"type_of_material": "Paid Death Notice",
"_id": "5507ed1f38f0d87501d704cb",
"word_count": "105"
},
{
"web_url": "http://www.legacy.com/Link.asp?I=LS000174412982",
"snippet": "NEWMAN--Eileen. On behalf of the trustees, faculty, and staff of NYU Langone Medical Center, we extend our heartfelt condolences to the family of Eileen Newman. Eileen was a longtime friend and generous benefactor of NYU Langone, and served as a...",
"lead_paragraph": "<!-- EILEEN NEWMAN -->NEWMAN--Eileen.<br/><br/> On behalf of the trustees, faculty, and staff of NYU Langone Medical Center, we extend our heartfelt condolences to the family of Eileen Newman. Eileen was a longtime friend and generous benefactor of NYU Langone, and served as a member of the Board of Trustees. We are most thankful for her commitment to our Department of Plastic Surgery, which she supported through myFace (formerly the National Foundation for Facial Reconstruction) where she served as president for seven years. Through her personal foundation, Eileen also generously supported our work within the KiDS of NYU Langone Foundation and FACES (Finding a Cure for Epilepsy and Seizures). We offer our deepest sympathies to her children Sondra K. Neuschotz, Allison Beth Newman, and David E. Newman, and her six grandchildren. Kenneth G. Langone, Chair, Board of Trustees, Robert I. Grossman, MD, Dean and CEO<br><br>\n ",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [],
"headline": {
"main": "Paid Notice: Deaths NEWMAN, EILEEN "
},
"keywords": [],
"pub_date": "2015-03-16T21:11:55Z",
"document_type": "article",
"news_desk": "Classified",
"section_name": "Paid Death Notices",
"subsection_name": null,
"byline": null,
"type_of_material": "Paid Death Notice",
"_id": "5507ed1f38f0d87501d704ca",
"word_count": "141"
},
{
"web_url": "http://www.legacy.com/Link.asp?I=LS000174412981",
"snippet": "KYRIAKOU--Konstantinos, known as Dinos, died in Florida on March 11 at age 79. A long-time Manhattan resident, he divided his retirement time between Long Island and Florida. Dinos was born on the Greek island of Kalymnos and left in 1960 to build...",
"lead_paragraph": "<!-- KONSTANTINOS KYRIAKOU -->KYRIAKOU--Konstantinos, known as Dinos, died in Florida on March 11 at age 79. A long-time Manhattan resident, he divided his retirement time between Long Island and Florida. Dinos was born on the Greek island of Kalymnos and left in 1960 to build his life in the US. He arrived in New York with no money, no family, and no friends to begin the remarkable journey that brought him professional success, a loving family, and a host of friends. His career centered on Wall Street, where he was both a stock and futures broker. In the 1980s he served as New York manager of Clayton Brokerage. Dinos married Linda La Gamma in 1967, and together they welcomed their daughter, Christina, in 1978. Family was paramount to Dinos. He embraced his personal family and regarded his friends and associates as an extension of family. In addition to his wife and daughter, he is survived by his son-in-law, David Kahn; his sister-in-law, Madeleine Blanc; his nieces, Jennifer Blanc-Tal, Victoria Blanc, and Kalliope Kakanakis; his nephews, Jason Tal, and Dino and George Kakanakis; grandnephews, William Tal and Nicholas Stephanopoulis, and grand nieces, Josette Tal and Zoe Kakanakis. Celebrations of his life will be held in Florida and New York. Donations in his memory may be made to Memorial Sloan Kettering.<br/><br/><br><br>\n ",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [],
"headline": {
"main": "Paid Notice: Deaths KYRIAKOU, KONSTANTINOS "
},
"keywords": [],
"pub_date": "2015-03-16T21:11:52Z",
"document_type": "article",
"news_desk": "Classified",
"section_name": "Paid Death Notices",
"subsection_name": null,
"byline": null,
"type_of_material": "Paid Death Notice",
"_id": "5507ed1d38f0d87501d704c2",
"word_count": "216"
},
{
"web_url": "http://www.legacy.com/Link.asp?I=LS000174412980",
"snippet": "VAN LEER--Lia. You were so special and you touched so many lives. You changed the world to be a better place not only in Cinema but with your kindness and love. You made us feel like family. We love you and miss you very much. Leon, Michaela, Joanna...",
"lead_paragraph": "<!-- LIA VAN LEER -->VAN LEER--Lia.<br/><br/> You were so special and you touched so many lives. You changed the world to be a better place not only in Cinema but with your kindness and love. You made us feel like family. We love you and miss you very much. Leon, Michaela, Joanna & Philippe<br><br>\n ",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [],
"headline": {
"main": "Paid Notice: Deaths VAN LEER, LIA "
},
"keywords": [],
"pub_date": "2015-03-16T21:11:50Z",
"document_type": "article",
"news_desk": "Classified",
"section_name": "Paid Death Notices",
"subsection_name": null,
"byline": null,
"type_of_material": "Paid Death Notice",
"_id": "5507ed2038f0d87501d704d2",
"word_count": "50"
}
]
},
"status": "OK",
"copyright": "Copyright (c) 2013 The New York Times Company. All Rights Reserved."
}
# API documentation: http://developer.nytimes.com/docs/read/article_search_api_v2
# requires the JQ parser http://stedolan.github.io/jq/
apikey="YOURKEY"
endpoint="http://api.nytimes.com/svc/search/v2/articlesearch.json"
# limit to source of New York times
fq_sourceterm="New+York"
base_url="${endpoint}?fq=source.contains%3A%28%22${fq_sourceterm}%22%29&api-key=$apikey&begin_date=BEGIN_DATE&end_date=END_DATE&page=PAGENUM"
day1=20150301
day2=20150317
# Note: you have to use the date function to iterate across months/years
# This is just simple incrementing
for day in $seq($day1 $day2); do
end_date=$day
begin_date=$day
mkdir -p data-hold/$day
url=$(echo "$base_url" | sed "s/PAGENUM/0/" | sed "s/BEGIN_DATE/$begin_date/" | sed "s/END_DATE/$end_date/")
# first page
curl -sS "$url" -o "data-hold/$day/0.json"
hits=$(cat "data-hold/$day/0.json"| jq '.response .meta .hits')
echo "$day has $hits hits ----------------------"
# there are 10 hits per page
first_page=1
last_page=$((hits / 10))
for pg in $(seq $first_page $last_page); do
page_url=$(echo $url | sed "s/page=0/page=$pg/")
echo "$page_url"
curl -sS "$page_url" -o "data-hold/$day/$pg.json"
done
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment