mlbach/README.md Secret

## README.md

      
    Raw
  

              README.md
            
          
    Database guide for the U4S SignalDetectionTool

This gist will teach you how to hook up your database to the United4Surveillance SignalDetectionTool Shiny app.
Prerequisites


R is installed on your system
The SignalDetectionTool is installed as a R package
You have access to a database
You have working knowledge of R

Three-step process to connect your database


Load the SDT package: library(SignalDetectionTool)
Define custom functions that retrieve data from your DB
Monkey patch DB related package functions

Detailed steps & explanations

Recently the SignalDetectionTool added the ability to start the app with a pre-defined configuration.
There is currently full support for loading data from a file and in the future this will be extended to fully support data retrieval from relational databases.
In the meantime you can make use of the tool's database mock functions and apply monkey patching to overwrite existing package functions.
The functions of interest are contained in the package's file R/database.R:

get_database_connection()
query_database()
transform_required_format()
load_data_db()

As stated above, these functions are merely mocks of real implementations and currently only load data from an example SQLite database file.
To actually connect the Shiny app to your own database right now, you need to write custom functions and monkey patch the package's equivalents.
This gist's file db_functions.R shows an example of how to implement custom functions based on the assumption that the DB is on a MS SQL Server with this schema.
1. Adjust the database connections

Here we use the data config YAML of the SignalDetectionTool and added new scalars server, database, driver and port to the db mapping. You would need to fill these with your actual values.
To retrieve the values from the config file we use the package's function get_data_config_value() and we supply a string with the parameter name, separating each nested level with a colon (i.e. "datasource:db:driver").
Alternatively, you could also get these from system environment variables like this:
get_env_config <- function() {
  list(
    host = Sys.getenv("DB_HOST", unset = "localhost"),
    port = as.integer(Sys.getenv("DB_PORT", unset = "5432")),
    driver = Sys.getenv("DB_DRIVER", unset = "postgresql"),
    database = Sys.getenv("DB_NAME", unset = "test_db")
  )
}

2. Adjust the authentication and handling of credentials

In the example we use the keyring package to get the username and password from the OS credentials manager.
If your login details are not stored there yet, you first have to save them there using keyring::key_set().
There are of course other alternatives and you might want to read about best practices of managing credentials.
3. Adjust the SQL query to your DB's schema

Depending on your database schema you need to adjust the SQL query (in query_database()). Certainly you will have different table structures, attribute names and you might have different data availabilities.
If you are not familiar with SQL you could also try dbplyr. This allows you to write standard dplyr style code and when you want to collect the data from the DB, dbplyr will write the necessary query and send it to the DB.
4. Adjust the data transformation

Depending on the format of the data you retrieve from your DB you will need to do some transformations. This can be done by adjusting transform_required_format() inside db_functions.R to your needs.
The SDT is rather strict at the moment. Your data needs to adhere to the required format.
If you want to perform the transformations on the SQL server side you can also create a that does nothing i.e. transform_required_format <- function(data){data)}.
How to monkey patch package functions

After you have defined your custom functions you need to call the monkey_patch function from monkey_patching.R function like this:
monkey_patch(your_function, “SignalDetectionTool", "function_name_to_replace")

It doesn't matter how you name your custom functions as long as you substitute them for the tool's functions listed above.
Run the Shiny app with your custom settings and database connection

Now start the Shiny app using
run_app(path_to_yaml='your/path/config.yml').
If you have any troubles you can read more about the data configuration file.

  
## data_config.yml
default:
  datasource:
    file: FALSE # TRUE when external csv or excel should be used, FALSE when db should be used
    filepath: ~
    db: # If using access to db
      server: your_server_name
      database: your_database_name
      driver: "SQL Server"
      port: 5432
  params:
    pathogen: Pertussis
    strata:
      - age_group
      - community
    signal_detection_period: 6
    signal_detection_algorithm: ears
  shapefile_path: ~

pertussis:
  params:
    pathogen: Pertussis
    strata:
      - age_group
      - sex
    signal_detection_period: 6
    signal_detection_algorithm: glm harmonic
    pandemic_correction: TRUE
    intervention_date: 2020-03-18
  shapefile_path: ~


salmonella:
  params:
    pathogen: Salmonella
    strata:
      - subtype
      - county
      - sex
    signal_detection_algorithm: farrington
  shapefile_path: ~

## db_functions.R
library(DBI)
library(dplyr)
library(keyring)

#' Establish a Database Connection
#'
#' @description
#' This function retrieves database credentials from the keyring (make sure your keyring keys
#' "db_user" and "db_password" are configured) and establishes a connection to a Microsoft SQL Server
#' using the \code{odbc} package. Adjust the \code{Driver}, \code{Server}, and \code{Database} parameters
#' according to your database setup.
#'
#' @return A \code{DBIConnection} object that represents an active connection to the database.
#'
#' @details
#' The connection is established via the \code{dbConnect} function from the \code{DBI} package.
#' Ensure that the ODBC driver for SQL Server is installed and properly configured on your system.
#'
#' @import DBI
#' @import odbc
#' @import keyring
#'
#' @examples
#' \dontrun{
#'   con <- get_database_connection()
#'   # Use the connection 'con' to query the database, then disconnect when finished.
#'   dbDisconnect(con)
#' }
#'
#' @export
get_database_connection <- function() {

  # Retrieve credentials from keyring; ensure your keyring keys ("db_user", "db_password") are set up
  user <- key_get("db_user")
  password <- key_get("db_password")

  # Connect to the database (uses get_data_config_value() from SignalDetectionTool)
  con <- dbConnect(odbc::odbc(),
                   Driver   = get_data_config_value("datasource:db:driver"),
                   Server   = get_data_config_value("datasource:db:server"),
                   Database = get_data_config_value("datasource:db:database"),
                   UID      = user,
                   PWD      = password)
  return(con)
}

#' Query the Database and Join Tables
#'
#' @description
#' This function executes an SQL query that retrieves raw data from several tables (Cases, Regions,
#' Pathogens, and Subtypes) and performs multiple JOIN operations to assemble a region hierarchy.
#'
#' @param dbcon A \code{DBIConnection} object obtained from \code{get_database_connection}.
#'
#' @return A data frame containing the results of the query which includes fields from cases, pathogens, subtypes,
#' and the region hierarchy (community, county, state, country).
#'
#' @details
#' The SQL query joins the \code{Regions} table three times to capture the hierarchical structure:
#' \itemize{
#'   \item \code{r3}: The NUTS-3 level (community) corresponding to \code{Cases.region_id}.
#'   \item \code{r2}: The parent region (NUTS-2, e.g., county).
#'   \item \code{r1}: The parent of \code{r2} (NUTS-1, e.g., state).
#'   \item \code{r0}: The parent of \code{r1} (NUTS-0, e.g., country).
#' }
#' Additional joins include linking to the \code{Pathogens} table and optionally the \code{Subtypes} table.
#'
#' @import DBI
#'
#' @examples
#' \dontrun{
#'   con <- get_database_connection()
#'   df <- query_database(con)
#' }
#'
#' @export
query_database <- function(dbcon) {

  query <- "
    SELECT
      c.case_id,
      c.report_date,
      c.symptom_start_date,
      c.hospitalization_date,
      c.death_date,
      c.vaccination_date,
      c.age,
      c.sex,
      c.occupation,
      p.pathogen_id,
      p.pathogen_name,
      s.subtype_id,
      s.subtype_name,
      c.hospitalization,
      c.death,
      c.vaccination_status,
      r3.region_name AS community,
      r3.region_id AS community_id,
      r2.region_name AS county,
      r2.region_id AS county_id,
      r1.region_name AS state,
      r1.region_id AS state_id,
      r0.region_name AS country,
      r0.region_id AS country_id
    FROM Cases c
    JOIN Regions r3 ON c.region_id = r3.region_id
    LEFT JOIN Regions r2 ON r3.parent_region_id = r2.region_id
    LEFT JOIN Regions r1 ON r2.parent_region_id = r1.region_id
    LEFT JOIN Regions r0 ON r1.parent_region_id = r0.region_id
    JOIN Pathogens p ON c.pathogen_id = p.pathogen_id
    LEFT JOIN Subtypes s ON c.subtype_id = s.subtype_id
  "
  dbdata <- dbGetQuery(dbcon, query)
  return(dbdata)
}

#' Transform Queried Data into the Required Output Format
#'
#' @description
#' This function transforms the raw queried data into a format suitable for analysis and reporting.
#' It converts date columns to character strings, categorizes age into 10-year intervals, and converts
#' logical fields (hospitalization, death, vaccination status) to user-friendly character labels.
#'
#' @param dbdata A data frame obtained from \code{query_database} containing the raw query results.
#'
#' @return A transformed data frame with:
#' \itemize{
#'   \item Date fields converted to character strings (formatted as ISO dates).
#'   \item A new \code{age_group} column categorizing ages in 10-year intervals.
#'   \item Logical fields converted to character labels ("yes", "no", "unknown").
#'   \item Columns renamed and ordered to match the required output specification.
#' }
#'
#' @details
#' - The function uses \code{dplyr} functions for data manipulation, including \code{mutate} and \code{select}.
#' - Date columns (\code{report_date}, \code{symptom_start_date}, \code{hospitalization_date}, \code{death_date}, \code{vaccination_date})
#'   are converted to \code{Date} objects and then to character strings.
#' - Ages are grouped into ranges such as \code{"<10"}, \code{"10-19"}, \code{"20-29"}, etc.
#' - Fields that are logical (or have missing values) for \code{hospitalization}, \code{death}, and \code{vaccination_status} are
#'   mapped to \code{"yes"}, \code{"no"}, or \code{"unknown"}.
#'
#' @import dplyr
#'
#' @examples
#' \dontrun{
#'   df <- query_database(get_database_connection())
#'   transformed_df <- transform_required_format(df)
#' }
#'
#' @export
transform_required_format <- function(dbdata) {

  dbdata %>%
    # Transform date columns, create new variables, and duplicate region columns
    mutate(
      date_report          = as.character(as.Date(report_date)),
      date_onset           = as.character(as.Date(symptom_start_date)),
      date_hospitalization = as.character(as.Date(hospitalization_date)),
      date_death           = as.character(as.Date(death_date)),
      date_vaccination     = as.character(as.Date(vaccination_date)),

      # Create age_group with 10-year intervals; if age is missing, set as NA.
      age_group = ifelse(!is.na(age),
                         as.character(
                           cut(age,
                               breaks = c(-Inf, 9, 19, 29, 39, 49, 59, 69, 79, 89, 99, Inf),
                               labels = c("<10", "10-19", "20-29", "30-39", "40-49", "50-59",
                                          "60-69", "70-79", "80-89", "90-99", "100+"),
                               right = TRUE)
                         ),
                         NA),

      # Convert logical fields to character labels ("yes", "no", "unknown")
      hospitalization = ifelse(is.na(hospitalization), "unknown",
                               ifelse(hospitalization, "yes", "no")),
      death           = ifelse(is.na(death), "unknown",
                               ifelse(death, "yes", "no")),
      vaccination     = ifelse(is.na(vaccination_status), "unknown",
                               ifelse(vaccination_status, "yes", "no"))
    ) %>%
    # Select and rename columns to match the required final output order and naming
    select(
      case_id,
      date_report,
      date_onset,
      date_hospitalization,
      date_death,
      date_vaccination,
      country,
      country_id,
      state,
      state_id,
      county,
      county_id,
      community,
      community_id,
      age,
      age_group,
      sex,
      occupation,
      pathogen_name,
      pathogen_id,
      subtype_name,
      subtype_id,
      hospitalization,
      death,
      vaccination
    ) %>%
    # Rename columns to match output specification
    rename(
      pathogen = pathogen_name,
      subtype  = subtype_name
    )
}

## db_schema.sql
-- Table: Cases
CREATE TABLE Cases (
    case_id INT PRIMARY KEY,  -- Unique case identifier (INT)
    report_date DATE NOT NULL,  -- Date when the case was reported
    symptom_start_date DATE,  -- Date when symptoms started
    hospitalization_date DATE,  -- Date when admitted to the hospital
    death_date DATE,  -- Date of death
    vaccination_date DATE,  -- Date of vaccination
    age INT CHECK(age BETWEEN 0 AND 125),  -- Age of the patient (in years)
    sex VARCHAR(10),  -- Sex of the reported case (e.g., 'male', 'female', 'diverse')
    occupation VARCHAR(50),  -- Occupation (e.g., 'care', 'kindergarten')
    region_id VINT,  -- Reporting region of the case (NUTS-3)
    pathogen_id INT,  -- Foreign key to Pathogens table (INT)
    subtype_id INT,  -- Foreign key to Subtypes table (INT)
    hospitalization BOOLEAN,  -- Whether the case was hospitalized (TRUE/FALSE)
    death BOOLEAN,  -- Whether the case died (TRUE/FALSE)
    vaccination_status BOOLEAN,  -- Whether vaccinated (TRUE/FALSE)
	FOREIGN KEY (region_id) REFERENCES Regions(region_id),
    FOREIGN KEY (pathogen_id) REFERENCES Pathogens(pathogen_id),
    FOREIGN KEY (subtype_id) REFERENCES Subtypes(subtype_id)
);

-- Table: Regions
CREATE TABLE Regions (
    region_id INT PRIMARY KEY,  -- Unique ID for each region (INT)
    region_name VARCHAR(255) NOT NULL,  -- Name of the region (e.g., country, state, county, community)
    region_level INT,  -- Indicates level in the hierarchy (1 = country, 2 = state, etc.)
    parent_region_id INT,  -- Foreign key to the parent region (null for top-level)
    region_code VARCHAR(20),  -- Region code (e.g., ISO, NUTS)
    FOREIGN KEY (parent_region_id) REFERENCES Regions(region_id)
);

-- Table: Pathogens
CREATE TABLE Pathogens (
    pathogen_id INT PRIMARY KEY,  -- Unique ID for each pathogen (INT)
    pathogen_name VARCHAR(255) NOT NULL  -- Name of the pathogen (e.g., 'COVID-19')
);

-- Table: Subtypes
CREATE TABLE Subtypes (
    subtype_id INT PRIMARY KEY,  -- Unique ID for each subtype (INT)
    subtype_name VARCHAR(255),  -- Name of the subtype (e.g., 'Alpha variant')
    pathogen_id INT,  -- Foreign key to the Pathogens table (INT)
    FOREIGN KEY (pathogen_id) REFERENCES Pathogens(pathogen_id)
);

## monkey_patching.R
monkey_patch <- function(func, package_name, their_function_name){
  envir <- asNamespace(package_name)
  unlockBinding(their_function_name, envir)
  assign(their_function_name, func , envir = envir)
}

library(SignalDetectionTool)
source("db_functions.R")

monkey_patch(get_database_connection, "SignalDetectionTool", "get_database_connection")
monkey_patch(query_database, "SignalDetectionTool", "query_database")
monkey_patch(transform_required_format, "SignalDetectionTool", "transform_required_format")

# start the SignalDetectionTool app with custom functions and configuration
run_app(path_to_yaml = "./data_config.yml")
	default:
	datasource:
	file: FALSE # TRUE when external csv or excel should be used, FALSE when db should be used
	filepath: ~
	db: # If using access to db
	server: your_server_name
	database: your_database_name
	driver: "SQL Server"
	port: 5432
	params:
	pathogen: Pertussis
	strata:
	- age_group
	- community
	signal_detection_period: 6
	signal_detection_algorithm: ears
	shapefile_path: ~

	pertussis:
	params:
	pathogen: Pertussis
	strata:
	- age_group
	- sex
	signal_detection_period: 6
	signal_detection_algorithm: glm harmonic
	pandemic_correction: TRUE
	intervention_date: 2020-03-18
	shapefile_path: ~


	salmonella:
	params:
	pathogen: Salmonella
	strata:
	- subtype
	- county
	- sex
	signal_detection_algorithm: farrington
	shapefile_path: ~
	library(DBI)
	library(dplyr)
	library(keyring)

	#' Establish a Database Connection
	#'
	#' @description
	#' This function retrieves database credentials from the keyring (make sure your keyring keys
	#' "db_user" and "db_password" are configured) and establishes a connection to a Microsoft SQL Server
	#' using the \code{odbc} package. Adjust the \code{Driver}, \code{Server}, and \code{Database} parameters
	#' according to your database setup.
	#'
	#' @return A \code{DBIConnection} object that represents an active connection to the database.
	#'
	#' @details
	#' The connection is established via the \code{dbConnect} function from the \code{DBI} package.
	#' Ensure that the ODBC driver for SQL Server is installed and properly configured on your system.
	#'
	#' @import DBI
	#' @import odbc
	#' @import keyring
	#'
	#' @examples
	#' \dontrun{
	#' con <- get_database_connection()
	#' # Use the connection 'con' to query the database, then disconnect when finished.
	#' dbDisconnect(con)
	#' }
	#'
	#' @export
	get_database_connection <- function() {

	# Retrieve credentials from keyring; ensure your keyring keys ("db_user", "db_password") are set up
	user <- key_get("db_user")
	password <- key_get("db_password")

	# Connect to the database (uses get_data_config_value() from SignalDetectionTool)
	con <- dbConnect(odbc::odbc(),
	Driver = get_data_config_value("datasource:db:driver"),
	Server = get_data_config_value("datasource:db:server"),
	Database = get_data_config_value("datasource:db:database"),
	UID = user,
	PWD = password)
	return(con)
	}

	#' Query the Database and Join Tables
	#'
	#' @description
	#' This function executes an SQL query that retrieves raw data from several tables (Cases, Regions,
	#' Pathogens, and Subtypes) and performs multiple JOIN operations to assemble a region hierarchy.
	#'
	#' @param dbcon A \code{DBIConnection} object obtained from \code{get_database_connection}.
	#'
	#' @return A data frame containing the results of the query which includes fields from cases, pathogens, subtypes,
	#' and the region hierarchy (community, county, state, country).
	#'
	#' @details
	#' The SQL query joins the \code{Regions} table three times to capture the hierarchical structure:
	#' \itemize{
	#' \item \code{r3}: The NUTS-3 level (community) corresponding to \code{Cases.region_id}.
	#' \item \code{r2}: The parent region (NUTS-2, e.g., county).
	#' \item \code{r1}: The parent of \code{r2} (NUTS-1, e.g., state).
	#' \item \code{r0}: The parent of \code{r1} (NUTS-0, e.g., country).
	#' }
	#' Additional joins include linking to the \code{Pathogens} table and optionally the \code{Subtypes} table.
	#'
	#' @import DBI
	#'
	#' @examples
	#' \dontrun{
	#' con <- get_database_connection()
	#' df <- query_database(con)
	#' }
	#'
	#' @export
	query_database <- function(dbcon) {

	query <- "
	SELECT
	c.case_id,
	c.report_date,
	c.symptom_start_date,
	c.hospitalization_date,
	c.death_date,
	c.vaccination_date,
	c.age,
	c.sex,
	c.occupation,
	p.pathogen_id,
	p.pathogen_name,
	s.subtype_id,
	s.subtype_name,
	c.hospitalization,
	c.death,
	c.vaccination_status,
	r3.region_name AS community,
	r3.region_id AS community_id,
	r2.region_name AS county,
	r2.region_id AS county_id,
	r1.region_name AS state,
	r1.region_id AS state_id,
	r0.region_name AS country,
	r0.region_id AS country_id
	FROM Cases c
	JOIN Regions r3 ON c.region_id = r3.region_id
	LEFT JOIN Regions r2 ON r3.parent_region_id = r2.region_id
	LEFT JOIN Regions r1 ON r2.parent_region_id = r1.region_id
	LEFT JOIN Regions r0 ON r1.parent_region_id = r0.region_id
	JOIN Pathogens p ON c.pathogen_id = p.pathogen_id
	LEFT JOIN Subtypes s ON c.subtype_id = s.subtype_id
	"
	dbdata <- dbGetQuery(dbcon, query)
	return(dbdata)
	}

	#' Transform Queried Data into the Required Output Format
	#'
	#' @description
	#' This function transforms the raw queried data into a format suitable for analysis and reporting.
	#' It converts date columns to character strings, categorizes age into 10-year intervals, and converts
	#' logical fields (hospitalization, death, vaccination status) to user-friendly character labels.
	#'
	#' @param dbdata A data frame obtained from \code{query_database} containing the raw query results.
	#'
	#' @return A transformed data frame with:
	#' \itemize{
	#' \item Date fields converted to character strings (formatted as ISO dates).
	#' \item A new \code{age_group} column categorizing ages in 10-year intervals.
	#' \item Logical fields converted to character labels ("yes", "no", "unknown").
	#' \item Columns renamed and ordered to match the required output specification.
	#' }
	#'
	#' @details
	#' - The function uses \code{dplyr} functions for data manipulation, including \code{mutate} and \code{select}.
	#' - Date columns (\code{report_date}, \code{symptom_start_date}, \code{hospitalization_date}, \code{death_date}, \code{vaccination_date})
	#' are converted to \code{Date} objects and then to character strings.
	#' - Ages are grouped into ranges such as \code{"<10"}, \code{"10-19"}, \code{"20-29"}, etc.
	#' - Fields that are logical (or have missing values) for \code{hospitalization}, \code{death}, and \code{vaccination_status} are
	#' mapped to \code{"yes"}, \code{"no"}, or \code{"unknown"}.
	#'
	#' @import dplyr
	#'
	#' @examples
	#' \dontrun{
	#' df <- query_database(get_database_connection())
	#' transformed_df <- transform_required_format(df)
	#' }
	#'
	#' @export
	transform_required_format <- function(dbdata) {

	dbdata %>%
	# Transform date columns, create new variables, and duplicate region columns
	mutate(
	date_report = as.character(as.Date(report_date)),
	date_onset = as.character(as.Date(symptom_start_date)),
	date_hospitalization = as.character(as.Date(hospitalization_date)),
	date_death = as.character(as.Date(death_date)),
	date_vaccination = as.character(as.Date(vaccination_date)),

	# Create age_group with 10-year intervals; if age is missing, set as NA.
	age_group = ifelse(!is.na(age),
	as.character(
	cut(age,
	breaks = c(-Inf, 9, 19, 29, 39, 49, 59, 69, 79, 89, 99, Inf),
	labels = c("<10", "10-19", "20-29", "30-39", "40-49", "50-59",
	"60-69", "70-79", "80-89", "90-99", "100+"),
	right = TRUE)
	),
	NA),

	# Convert logical fields to character labels ("yes", "no", "unknown")
	hospitalization = ifelse(is.na(hospitalization), "unknown",
	ifelse(hospitalization, "yes", "no")),
	death = ifelse(is.na(death), "unknown",
	ifelse(death, "yes", "no")),
	vaccination = ifelse(is.na(vaccination_status), "unknown",
	ifelse(vaccination_status, "yes", "no"))
	) %>%
	# Select and rename columns to match the required final output order and naming
	select(
	case_id,
	date_report,
	date_onset,
	date_hospitalization,
	date_death,
	date_vaccination,
	country,
	country_id,
	state,
	state_id,
	county,
	county_id,
	community,
	community_id,
	age,
	age_group,
	sex,
	occupation,
	pathogen_name,
	pathogen_id,
	subtype_name,
	subtype_id,
	hospitalization,
	death,
	vaccination
	) %>%
	# Rename columns to match output specification
	rename(
	pathogen = pathogen_name,
	subtype = subtype_name
	)
	}
	-- Table: Cases
	CREATE TABLE Cases (
	case_id INT PRIMARY KEY, -- Unique case identifier (INT)
	report_date DATE NOT NULL, -- Date when the case was reported
	symptom_start_date DATE, -- Date when symptoms started
	hospitalization_date DATE, -- Date when admitted to the hospital
	death_date DATE, -- Date of death
	vaccination_date DATE, -- Date of vaccination
	age INT CHECK(age BETWEEN 0 AND 125), -- Age of the patient (in years)
	sex VARCHAR(10), -- Sex of the reported case (e.g., 'male', 'female', 'diverse')
	occupation VARCHAR(50), -- Occupation (e.g., 'care', 'kindergarten')
	region_id VINT, -- Reporting region of the case (NUTS-3)
	pathogen_id INT, -- Foreign key to Pathogens table (INT)
	subtype_id INT, -- Foreign key to Subtypes table (INT)
	hospitalization BOOLEAN, -- Whether the case was hospitalized (TRUE/FALSE)
	death BOOLEAN, -- Whether the case died (TRUE/FALSE)
	vaccination_status BOOLEAN, -- Whether vaccinated (TRUE/FALSE)
	FOREIGN KEY (region_id) REFERENCES Regions(region_id),
	FOREIGN KEY (pathogen_id) REFERENCES Pathogens(pathogen_id),
	FOREIGN KEY (subtype_id) REFERENCES Subtypes(subtype_id)
	);

	-- Table: Regions
	CREATE TABLE Regions (
	region_id INT PRIMARY KEY, -- Unique ID for each region (INT)
	region_name VARCHAR(255) NOT NULL, -- Name of the region (e.g., country, state, county, community)
	region_level INT, -- Indicates level in the hierarchy (1 = country, 2 = state, etc.)
	parent_region_id INT, -- Foreign key to the parent region (null for top-level)
	region_code VARCHAR(20), -- Region code (e.g., ISO, NUTS)
	FOREIGN KEY (parent_region_id) REFERENCES Regions(region_id)
	);

	-- Table: Pathogens
	CREATE TABLE Pathogens (
	pathogen_id INT PRIMARY KEY, -- Unique ID for each pathogen (INT)
	pathogen_name VARCHAR(255) NOT NULL -- Name of the pathogen (e.g., 'COVID-19')
	);

	-- Table: Subtypes
	CREATE TABLE Subtypes (
	subtype_id INT PRIMARY KEY, -- Unique ID for each subtype (INT)
	subtype_name VARCHAR(255), -- Name of the subtype (e.g., 'Alpha variant')
	pathogen_id INT, -- Foreign key to the Pathogens table (INT)
	FOREIGN KEY (pathogen_id) REFERENCES Pathogens(pathogen_id)
	);
	monkey_patch <- function(func, package_name, their_function_name){
	envir <- asNamespace(package_name)
	unlockBinding(their_function_name, envir)
	assign(their_function_name, func , envir = envir)
	}

	library(SignalDetectionTool)
	source("db_functions.R")

	monkey_patch(get_database_connection, "SignalDetectionTool", "get_database_connection")
	monkey_patch(query_database, "SignalDetectionTool", "query_database")
	monkey_patch(transform_required_format, "SignalDetectionTool", "transform_required_format")

	# start the SignalDetectionTool app with custom functions and configuration
	run_app(path_to_yaml = "./data_config.yml")