Skip to content

Instantly share code, notes, and snippets.

@vdavez
Last active February 12, 2017 22:29
Show Gist options
  • Save vdavez/d96d292b736e72406f7ca89351553c25 to your computer and use it in GitHub Desktop.
Save vdavez/d96d292b736e72406f7ca89351553c25 to your computer and use it in GitHub Desktop.
Public Burden Research
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Paperwork Reduction and Public Burden\n",
"\n",
"Recently, the new Administration issued an [Executive Order](https://www.whitehouse.gov/the-press-office/2017/01/30/presidential-executive-order-reducing-regulation-and-controlling) aimed at Reducing Regulation and Controlling Regulatory Costs. As part of this effort, the Administration is supposed to offset regulated costs.\n",
"\n",
"So, that got me thinking. The Office of Information and Regulatory Affairs (OIRA) is charged with reviewing not only regulations, but also is charged with reviewing agency's information-collection requests under the Paperwork Reduction Act. And as part of that review, OIRA and the agencies are supposed to track the public burden associated with the information collection.\n",
"\n",
"As a thought experiment, I decided to see whether we could find some low-hanging fruit, namely paper-based information requests. And the results were interesting...\n",
"\n",
"## The analysis\n",
"\n",
"First, we need to find the data. Fortunately, that data is already available in bulk [from OIRA](https://www.reginfo.gov/public/do/PRAXML). Well done, OIRA.\n",
"\n",
"From here, it's simple. First we use [lxml](http://lxml.de/) to parse the XML file."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"from lxml import etree\n",
"tree = etree.parse('CurrentInventoryReport.xml')\n",
"root = tree.getroot()"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"So, now that we have the data and it's parsed, where to begin? Let's see what this data looks like by checking out the first Information Collection Request in the data. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<InformationCollectionRequest xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n",
" <OMBControlNumber>0503-0007</OMBControlNumber>\n",
" <ICRReferenceNumber>201405-0503-002</ICRReferenceNumber>\n",
" <AgencyCode>0503</AgencyCode>\n",
" <Title>National Appeals Division Customer Service Survey</Title>\n",
" <Abstract>To conduct a customer survey to gather date on the quality of how to make improvements in NAD processes and establish customer service standards. </Abstract>\n",
" <ICRTypeCode>Revision of a currently approved collection</ICRTypeCode>\n",
" <Expiration>\n",
" <ExpirationDate>2017-08-31-04:00</ExpirationDate>\n",
" </Expiration>\n",
" <ICRStatus>Active</ICRStatus>\n",
" <AgencyContact>\n",
" <Person>\n",
" <FirstName>Jerry</FirstName>\n",
" <MiddleName>L</MiddleName>\n",
" <LastName>Jobe</LastName>\n",
" <PhoneNumber>703 305-2514</PhoneNumber>\n",
" </Person>\n",
" </AgencyContact>\n",
" <StimulusIndicator>No</StimulusIndicator>\n",
" <HealthcareIndicator>No</HealthcareIndicator>\n",
" <DoddFrankActIndicator>No</DoddFrankActIndicator>\n",
" <AuthorizingStatutes>\n",
" <AuthorizingStatute>\n",
" <ExecutiveOrder>\n",
" <EONumber>12862</EONumber>\n",
" <NameOfEO>Setting Customer Service Standards</NameOfEO>\n",
" </ExecutiveOrder>\n",
" </AuthorizingStatute>\n",
" </AuthorizingStatutes>\n",
" <Burden>\n",
" <BurdenResponse>\n",
" <TotalQuantity>2400</TotalQuantity>\n",
" <PreviousTotalQuantity>2400</PreviousTotalQuantity>\n",
" </BurdenResponse>\n",
" <BurdenHour>\n",
" <TotalQuantity>353</TotalQuantity>\n",
" <PreviousTotalQuantity>561</PreviousTotalQuantity>\n",
" </BurdenHour>\n",
" <BurdenCost>\n",
" <TotalAmount>0</TotalAmount>\n",
" <PreviousTotalAmount>0</PreviousTotalAmount>\n",
" </BurdenCost>\n",
" </Burden>\n",
" <InformationCollections>\n",
" <InformationCollection>\n",
" <Title>National Appeals Division Customer Service Survey</Title>\n",
" <StandardFormIndicator>No</StandardFormIndicator>\n",
" <ObligationCode>Voluntary</ObligationCode>\n",
" <FEABusinessReferenceModule>\n",
" <LineOfBusiness Code=\"116\">Litigation and Judicial Activities</LineOfBusiness>\n",
" <Subfunction Code=\"055\">Resolution Facilitation</Subfunction>\n",
" </FEABusinessReferenceModule>\n",
" <Instruments>\n",
" <Instrument>\n",
" <FormNumber>None</FormNumber>\n",
" <FormName>NAD Customer Survey</FormName>\n",
" <AvailableElectronically>No</AvailableElectronically>\n",
" <ElectronicCapability>Paper Only</ElectronicCapability>\n",
" <InstrumentDocument>\n",
" <documentType>Form</documentType>\n",
" </InstrumentDocument>\n",
" </Instrument>\n",
" </Instruments>\n",
" <AffectedPublicCode>\n",
" <PublicCode>Individuals or Households</PublicCode>\n",
" </AffectedPublicCode>\n",
" <NumberResponses>\n",
" <AnnualQuantity>2000</AnnualQuantity>\n",
" </NumberResponses>\n",
" <BurdenHour>\n",
" <TotalQuantity>333</TotalQuantity>\n",
" <BurdenHourPerResponse>\n",
" <ReportingFrequencies>\n",
" <ReportingFrequency>Annually</ReportingFrequency>\n",
" </ReportingFrequencies>\n",
" </BurdenHourPerResponse>\n",
" </BurdenHour>\n",
" <BurdenCost>\n",
" <TotalAmount>0</TotalAmount>\n",
" </BurdenCost>\n",
" </InformationCollection>\n",
" <InformationCollection>\n",
" <Title>National Appeals Division Customer Service Survey Non-Respondents</Title>\n",
" <StandardFormIndicator>No</StandardFormIndicator>\n",
" <ObligationCode>Voluntary</ObligationCode>\n",
" <FEABusinessReferenceModule>\n",
" <LineOfBusiness Code=\"116\">Litigation and Judicial Activities</LineOfBusiness>\n",
" <Subfunction Code=\"055\">Resolution Facilitation</Subfunction>\n",
" </FEABusinessReferenceModule>\n",
" <Instruments>\n",
" <Instrument>\n",
" <FormNumber>None</FormNumber>\n",
" <FormName>NAD Customer Survey</FormName>\n",
" <AvailableElectronically>No</AvailableElectronically>\n",
" <ElectronicCapability>Paper Only</ElectronicCapability>\n",
" <InstrumentDocument>\n",
" <documentType>Form</documentType>\n",
" </InstrumentDocument>\n",
" </Instrument>\n",
" </Instruments>\n",
" <AffectedPublicCode>\n",
" <PublicCode>Individuals or Households</PublicCode>\n",
" </AffectedPublicCode>\n",
" <NumberResponses>\n",
" <AnnualQuantity>400</AnnualQuantity>\n",
" </NumberResponses>\n",
" <BurdenHour>\n",
" <TotalQuantity>20</TotalQuantity>\n",
" <BurdenHourPerResponse>\n",
" <ReportingFrequencies>\n",
" <ReportingFrequency>Annually</ReportingFrequency>\n",
" </ReportingFrequencies>\n",
" </BurdenHourPerResponse>\n",
" </BurdenHour>\n",
" <BurdenCost>\n",
" <TotalAmount>0</TotalAmount>\n",
" </BurdenCost>\n",
" </InformationCollection>\n",
" </InformationCollections>\n",
" <OIRAConclusion>\n",
" <ConcludedDate>\n",
" <Date>2014-08-27-04:00</Date>\n",
" <Time>05:41:10.660-04:00</Time>\n",
" </ConcludedDate>\n",
" </OIRAConclusion>\n",
" </InformationCollectionRequest>\n",
" \n",
"\n"
]
}
],
"source": [
"print(str(etree.tostring(root[0], pretty_print = True).decode('UTF-8')))"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Well, would you look at that?! There's an `AvailableElectronically` element.\n",
"\n",
"So, how about we try and find all the agencies that have some information collection requests that *are not* available electronically.\n",
"\n",
"To do this, we use `xpath` to find all the Information Collection Requests that has a \"AvailableElectronically\" element with \"No\". Then, we simply pick the fields we want to dump into a JSON dict."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"results = []\n",
"\n",
"def getInfoRequests(element):\n",
" res = []\n",
" collections = element.xpath('./InformationCollections/InformationCollection')\n",
" for collection in collections:\n",
" \n",
" res.append({\n",
" \"title\": str(collection.xpath('./Title/text()')[0]).strip(),\n",
" 'obligation': str(collection.xpath('./ObligationCode/text()')[0]).strip(),\n",
" 'affected': str(collection.xpath('./AffectedPublicCode/PublicCode/text()')[0].strip()),\n",
" 'number_responses': str(collection.xpath('./NumberResponses/AnnualQuantity/text()')[0].strip()),\n",
" 'burden_hour': str(collection.xpath('./BurdenHour/TotalQuantity/text()')[0].strip()),\n",
" 'frequency': collection.xpath('.//ReportingFrequency/text()'),\n",
" })\n",
" return res\n",
"\n",
"requests = root.xpath('//InformationCollectionRequest[.//AvailableElectronically/text()[. = \"No\"]]')\n",
"\n",
"for request in requests:\n",
" results.append({\n",
" \"agency_code\": request.xpath('./AgencyCode//text()')[0],\n",
" \"omb_control_number\": request.xpath('./OMBControlNumber//text()')[0],\n",
" \"icr_reference_number\": request.xpath('./ICRReferenceNumber//text()')[0],\n",
" \"title\": str(request.xpath('./Title//text()')[0]).strip(),\n",
" \"abstract\": str(request.xpath('./Abstract//text()')[0]).strip(),\n",
" \"expiration_date\": str(request.xpath('./Expiration/ExpirationDate//text()')[0]).strip(),\n",
" \"burden\": int(request.xpath('./Burden/BurdenHour/TotalQuantity/text()')[0]),\n",
" \"requests\": getInfoRequests(request),\n",
" \"cost\": int(request.xpath('./Burden/BurdenCost/TotalAmount/text()')[0]),\n",
" })"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Now that we have a JSON dict, time for the payoff. We sum the burden for each Information Collection Request and print the results."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 1110 information requests that cannot be filed electronically from 149 different agencies with a total public burden of 3,298,793,451 hours.\n"
]
}
],
"source": [
"burden = sum([result[\"burden\"] for result in results])\n",
"agencies = set([result[\"agency_code\"] for result in results])\n",
"\n",
"print(\"There are %s information requests that cannot be filed electronically from %s different agencies with a total public burden of %s hours.\" % (len(results), len(agencies), \"{:,}\".format(burden)))"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"That's not a typo. That's a total of **3.3 billion hours** of public burden associated with paper-based information requests. Seems like a target-rich environment. \n",
"\n",
"Unfortunately, I've run out of time to really get in there and visualize where to begin. So, for now, I'll simply save the results in a `json` file for later."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"import json\n",
"with open('results.json', 'w') as fp:\n",
" json.dump(results, fp, indent=2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Hope you enjoyed this little exploration in how open government data can be used to make government work better. It's important to note that none of this would be possible if OIRA did not publish the data in bulk... Again, nice work OIRA."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment