Skip to content

Instantly share code, notes, and snippets.

@Brlaney
Created December 21, 2021 23:28
Show Gist options
  • Save Brlaney/8161e86e229acc2bd63312480716ab48 to your computer and use it in GitHub Desktop.
Save Brlaney/8161e86e229acc2bd63312480716ab48 to your computer and use it in GitHub Desktop.
A simple script that saves all rows of data for a specified page number. This endpoint contains 130 pages as of 12/21/2021.
const cheerio = require('cheerio');
const axios = require('axios');
const fs = require('fs');
async function obtainData(num) {
const endpoint = 'https://www.tbpr.org/news-publications/recent-disciplinary-actions?page=' + num;
const file = 'data/pg' + num + '.json';
await axios.get(endpoint).then(urlResponse => {
const $ = cheerio.load(urlResponse.data);
let data = [];
$('.table tr').each(function (i, elem) {
data[i-1] = {
date: $(elem).find('td:nth-child(1)').text().trim(),
title: $(elem).find('td:nth-child(3)').text().trim(),
link: $(elem).find('td:nth-child(3) > a').attr('href'),
bpr: $(elem).find('td:nth-child(4)').text().trim(),
attorney: $(elem).find('td:nth-child(5)').text().trim(),
}
});
// Output a json file containing the data obtained
fs.writeFile(file,
JSON.stringify(data, null, 4),
() => console.log('File for pg number ' + num + ' successfully saved.'))
})
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment