Skip to content

Instantly share code, notes, and snippets.

@mroswell
Last active August 20, 2023 12:16
Show Gist options
  • Save mroswell/b4d165be7dc87304bea01d5c1d11a9c0 to your computer and use it in GitHub Desktop.
Save mroswell/b4d165be7dc87304bea01d5c1d11a9c0 to your computer and use it in GitHub Desktop.
  • tctetv (total_cases_this_event_this_vax) [integer] (should match ctetv in table upper left, but collect anyway)

  • tcaetv (total_cases_ae_this_vax) [integer] (should match ctetv + coaetv in table upper left, but collect anyway)

  • total_anomaly1 [boolean] (append this field, and populate if tctetv != ctetv)

  • total_anomaly2 [boolean] (append this field, and populate if tcaetv != ctetv + coaetv

--- from code ---

            "total_anomaly1": True if int(re.search(r"Total cases of this event for the vaccine:.*?<", response.text)
            .group().replace("Total cases of this event for the vaccine:", "").replace(
                "<", "")) !=
                                      int(re.search(r"Total case reports associated with this vaccine:.*?<",
                                                    response.text)
                                          .group().replace("Total case reports associated with this vaccine:",
                                                           "").replace("<", "")) else "",
            "total_anomaly2": True if int(
                re.search(r"Total case reports associated with this vaccine:.*?<", response.text)
                .group().replace("Total case reports associated with this vaccine:", "").replace("<", "")) !=
                                      int(df[1].iloc[2, 1]) + int(df[1].iloc[2, 1]) else ""

1. I do not want to drop data that comes from the original scrape. so, keep 'tcaetv', 'tctetv', 'df'.
2. We can't drop records as it will impact our ability to calculate "other adverse events" that are important to PRR calculations. So we s

Scrape COVID-19 Vaccine Knowledge Base Statistical Analysis

URL to scrape: https://violinet.org/cov19vaxkb/cov19vaxafe/stat_analysis_cov19vae.php

VAERS = Vaccine Adverse Event Reporting System AE = Adverse Events

Collect all combinations of the 15,000-ish events, and 5 vaccine categories

Use short names. Parenthetical names are there to help you to find them on the result page

  • event_name [string]
  • vaccine [string] (Shorten to Janssen, Moderna, Novavax, Pfizer, Other)
  • ctetv (cases_this_event_this_vax) [integer] (upper left in table)
  • cteo (cases_this_event_other) [integer] (upper right in table)
  • coaetv (cases_other_ae_this_vax)[integer] (lower left in table)
  • coaeo (cases_other_ae_other) [integer] (lower right in table)
  • tctetv (total_cases_this_event_this_vax) [integer] (should match ctetv in table upper left, but collect anyway)
  • tcaetv (total_cases_ae_this_vax) [integer] (should match ctetv + coaetv in table upper left, but collect anyway)
  • percent (percent_total_this) [float] (Note: sometimes the result uses string exponentional notation. Convert to float.)
  • prr [float]
  • x2 [float]
  • df [integer]
  • pv (p-value) [float]
  • significant [boolean]
  • prr_2plus [boolean] (append this field, and populate if PRR is > 2)
  • x2_4plus [boolean](append this field, and populate if x2 > 4)
  • num_point2 [boolean, I think](append this field, and populate if the number of cases > 0.2% of total reports. I think that means percent. Use that for now.
  • total_anomaly1 [boolean] (append this field, and populate if tctetv != ctetv)
  • total_anomaly2 [boolean] (append this field, and populate if tcaetv != ctetv + coaetv

If references are always the same, ignore. If not, store.

Save as a CSV file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment