Created
October 2, 2023 19:37
-
-
Save shaunagm/9b7185f75ef692b777d3317f09025a16 to your computer and use it in GitHub Desktop.
Sample code for Oct 2 2023 training
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
This file functions as a brief introduction to Parsons. To install Parsons, follow this | |
guide: https://parsons.pubpub.org/installation. If you're used to installing with pip, the tldr is | |
"pip install parsons". | |
""" | |
# print("Let's do this!") | |
""" | |
This is a Parsons Table. It's how we store data using Parsons. You can create a Table from a variety of sources, | |
like a regular old Python dictionary, a csv of data, a Pandas dataframe, a local database, or | |
from a third-party platform. | |
See the docs for this here: https://move-coop.github.io/parsons/html/latest/table.html#to-parsons-table | |
""" | |
# from parsons import Table | |
# table = Table.from_csv("testdata.csv") # won't work without a testdata.csv file in this directory | |
# table = Table.from_postgres("SELECT * FROM people WHERE registration == 'confirmed'") # won't work without postgres connection data | |
# table = Table([{"a": 1, "b": 2}, {"a": 10, "b": 20}]) | |
# print(table) | |
""" | |
To get data from a third-party platform, you'll use connectors. There are nearly 50 connectors, from A to Z - ActBlue to Zoom. | |
Let's demonstrate getting some data from Action Network. | |
""" | |
# from parsons import ActionNetwork | |
# action_network = ActionNetwork() | |
""" | |
You should have just gotten an error. That's because when we work with third party platforms, we usually need to | |
authenticate ourselves. You can see in the Parsons documentation how to do so - each connector has authentication | |
instructions in a little blue box at the top of the page. | |
Action Network: https://move-coop.github.io/parsons/html/stable/action_network.html | |
"You can generate your key from the API & Sync page, located in the Start Organizing menu, under Details." | |
We can save authentication info by exporting an environmental variable (recommended) or setting a variable in our script. | |
""" | |
# from parsons import ActionNetwork | |
# api_token="" | |
# action_network = ActionNetwork(api_token=api_token) | |
# people = action_network.get_people() | |
# print(people) | |
""" | |
A Parsons Table has some convenience methods. | |
""" | |
# print("Columns: ", people.columns) | |
# print("Rows: ", people.num_rows) | |
# print("First row: ", people.first) | |
""" | |
There are also convenience methods for transforming data. We'll go into them in more detail soon, but for now, | |
let's just show one: cut, which allows us to select only the columns we care about. | |
""" | |
# columns = ['given_name', 'family_name', 'email_addresses', 'phone_numbers', 'postal_addresses', 'created_date'] | |
# people = people.cut(columns) # Gets a subset of columns | |
""" | |
Parsons is most useful for moving data from one place to another. | |
Getting credentials from Google Sheets is very tedious so I have already gotten them and saved them as an | |
environmental variable: | |
export GOOGLE_DRIVE_CREDENTIALS="credentials.json" | |
""" | |
# from parsons import GoogleSheets | |
# google_sheets = GoogleSheets() | |
# sheet_id = google_sheets.create_spreadsheet('Action Network Data', editor_email="shaunagm@gmail.com") | |
""" | |
Once you've made your sheet, comment out the 'create_spreadsheet' line of code so you don't create it again. | |
Save the ID given on the command line to the variable sheet_id. | |
You can access your sheet in the browser by adding it to this url: | |
https://docs.google.com/spreadsheets/d/$SHEET_ID | |
Now you can move your Action Network Data to the sheet! | |
""" | |
# sheet_id = "" | |
# google_sheets.overwrite_sheet(sheet_id, people) | |
""" | |
Whoops! What went wrong? | |
Well, sometimes platforms give and take different structures. For instance, Google Sheets can't handle | |
nested data that we get from Action Network. We'll need to transform it with Parsons. | |
Action Network uses list structure for their contact fields, but Google sheet cells need a single value. | |
ie: 'email_addresses': [{'primary': True, 'address': 'email@email.com'}] | |
This custom helper method gets the primary value for each field. | |
""" | |
# def get_primary_contact_or_first(contact_list): | |
# extracted_contacts = [] | |
# for contact in contact_list: | |
# if "number" in contact: selector = "number" | |
# if "postal_code" in contact: selector = "postal_code" | |
# if "address" in contact: selector = "address" | |
# if contact["primary"]: | |
# return contact[selector] | |
# else: | |
# extracted_contacts.append(contact[selector]) | |
# if extracted_contacts: | |
# return extracted_contacts[0] | |
# return None | |
# people.convert_column('phone_numbers', get_primary_contact_or_first) | |
# people.convert_column('postal_addresses', get_primary_contact_or_first) | |
# people.convert_column('email_addresses', get_primary_contact_or_first) | |
""" | |
Pretty cool, right? But there's other transformations we might want to do. | |
For example, maybe we want to alphabetize the first letter of people's first and last names. Or maybe | |
we want to rename the column. | |
""" | |
# def capitalize_names(name): | |
# if name: | |
# name = name.capitalize() | |
# return name | |
# people.convert_column("given_name", capitalize_names) | |
# people.rename_column("given_name", "first_name") | |
""" | |
There's a lot of different transformations we can try out! | |
https://move-coop.github.io/parsons/html/stable/table.html#parsons-table-transformations | |
Another one that might be useful here is "deduplicate". | |
""" | |
# One key | |
people.deduplicate("email_addresses") | |
# Two or more keys | |
people.deduplicate(["given_name", "family_name"]) | |
""" | |
You can also select only specific rows | |
""" | |
# created_last_year = people.select_rows("'2022-' in {created_date}") | |
# print("Created last year: ") | |
# print(created_last_year) | |
# has_phone_number = people.select_rows(lambda row: row.phone_numbers) | |
# print("Has phone number: ") | |
# print(has_phone_number) | |
""" | |
You can also access underlying PETL methods with .table | |
And, for a connector that's based on an underlying package, you can access it with (usually) .client | |
(This works for Google Sheets but not Action Network) | |
""" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment