Skip to content

Instantly share code, notes, and snippets.

@mrecos
Created October 25, 2022 12:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mrecos/09aa0c1cd4cd04731b01d102f5239040 to your computer and use it in GitHub Desktop.
Save mrecos/09aa0c1cd4cd04731b01d102f5239040 to your computer and use it in GitHub Desktop.
Code snippet to query O/T Pairs
## DR Set up
# predict on all 65M rows of DRPrimaryv4.csv
# using deployment [BOOKED predcitons] (634728e0ba6623c209179662)
# using model [eXtreme Gradient Boosted Trees Classifier with Early Stopping (learning rate =0.01) (Fast Feature Binning) M60 BP52] (63457d2fcc011ba8bee7ec78)
# from project Test1004 created by Adam
## Post-processing (R & SQL)
# load predictions (result-6356a08dabe595d288af9a3e.csv) into SQLite DB
sqlite_file <- "AMN.sqlite"
table_name <- "DRPrimaryv4_Test1004"
# Run this query to assign ranks to DR predictions and return the Booked pairs
query1 <- "SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY `TRAVELERID` ORDER BY `BOOKED_1_PREDICTION` DESC ROWS UNBOUNDED PRECEDING) AS `DR_rank`
FROM `DRPrimaryv4_Test1004`
WHERE (NOT((`MODEL_RANK` IS NULL)) AND `MODEL_RANK` >= 1.0)
)
WHERE (`BOOKED` = 1.0)"
# Collect results and pivot for plotting
x4 <- dbGetQuery(mydb, query1)
x5 <- x4 %>%
select(DR_rank, MODEL_RANK) %>%
pivot_longer(everything(), names_to = "Model", values_to = "Rank") %>%
mutate(class = cut(Rank, breaks=seq(0,600,20)))
# Plot results (a different looking plot from Cory, but same data)
ggplot(x5, aes(x = class, fill = Model)) +
geom_bar(position="dodge",stat = "count") +
scale_y_continuous(labels = seq(0,260,20),
breaks = seq(0,260,20)) +
scale_fill_DataRobot(palette = "DR_Diverging") +
labs(x = "Prediction Rank") +
theme_bw() +
# facet_wrap(~Model) +
theme(
axis.text.x = element_text(angle = -90, hjust = 0)
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment