Skip to content

Instantly share code, notes, and snippets.

View mindyng's full-sized avatar
🤸‍♀️
statistician playing in everyone's backyard ~ John Tukey

Mindy Ng mindyng

🤸‍♀️
statistician playing in everyone's backyard ~ John Tukey
View GitHub Profile
@mindyng
mindyng / gist:35b418e11ff480815841080bbb7cf71d
Last active February 17, 2017 04:10
Mode Analytics Case Study: Validating A B Test Results
Validating A/B Test Results for Yammer
Possible Causes to Increased Messages in Treatment Group
1. Metric may need to be redefined
2. Poor calculations
3. Users were not random, which would make test set-up faulty by being bias
4. Confounding factor that is hard to detect, but having effect(s) on test results
@mindyng
mindyng / gist:62c76312d969da12997572a41314513a
Created March 1, 2017 08:14
Meetup #2 - Data Science Career Inspiration Night
I was at first unwilling to attend tonight's meetup because it seemed like it was more for people who were still exploring
Data Science as a career. And I was already committed to Data Science. So maybe not tonight.
However, the point of me going to these meet-ups were not to just hear some advice, but to meet people, network with others in the Data Science space. And I was able to meet Michelle Kelsey, who is part of IBM's Watson Cognitive team. This really excited me because I was first exposed to IBM's Watson Cognitive through Serena's Watson. She was able to feed her play data into Watson, who predicted for her the best move/s for her next game.
The same cognitive solution was demonstrated with Cognotoy dino, a toy that learns, remembers and responds through dialog with the
thw user. This got the best of me. Now I want my own cognotoy dino. I ended up meeting Michelle face-to-face as planned, got her business card and took up her offer to meet her back in the city (SF) sometime in April to discuss more
@mindyng
mindyng / gist:baca4114da1af15a43fca0458e40c147
Last active March 10, 2017 03:59
Meetup #2 - Inspirational Night
Last night, I got to meet a huge variety of people interested in data analytics/science! I really enjoyed meeting people who recently got their appetite wet in data science and people who are seasoned in the field since writing 1000-lines of code is a breeze for them and who talk about R as if they are more fluent in it than English. A neat surprise was the presence of Math and Economics professors adding in their input to the discussion on whether or not Peer Assisted Learning would help raise performance levels in STEM classes offered at Sacramento State University. This talk helped me learn that a result from an experiment can always be questioned. Experiment design can be reassessed even after the experiment has been completed. Therefore, even after drawing conclusions on my experiment, keeping an open mind for feedback would be advisable.
The second talk was the one I had more interest in since I have been wondering about how to pick a model that best addressed my Capstone Project problem. I had a
@mindyng
mindyng / gist:3b97e11092140310253cb56a619f1324
Last active March 10, 2017 04:00
Capstone Project I Proposal
The problem is I want to assign a sentiment to a review as +/-/neutral based on words used in product reviews. (Given a review, the goal is to predict the user’s attitude.)
According to Wikipedia, sentiment analysis is (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information.
My client would be amazon.com or some other e-commerce giant that would like to know which of their products are highly liked. Then the company can invest accordingly. Based on my analysis, the company would make certain products more available or recommend similar products in order to retain and grow their customer base.
For negative sentiments, client could do research on what are the drivers behind negative sentiments, especially related to competitors. If there is negative conversation, reach out to these reviewers.
With sentime
The very first step after downloading and unzipping the dataset was to import all 8 separate .csv files and format them
as individual pandas data frames. Each data frame would have a review per row. Each data frame would have 4 different
columns (from left to right): “Review Score”, “Tail of Review URL”, “Review Title” and “Review Text”.
All reviews were combined into one big dataframe to make data wrangling easier- such as applying functions on it.
Then columns: “Review Score” and “Review Text” were separated out as their own variables since these would be the main
objects handled in the Machine Learning algorithm.
@mindyng
mindyng / binary_search_tree.py
Created December 16, 2020 21:53
TestDome Python "Basic" Practice Q's
#Binary search tree (BST) is a binary tree where the value of each node is larger or equal to the values in all the nodes in that node's left subtree and is smaller than the values in all the nodes in that node's right subtree.
# Write a function that, efficiently with respect to time used, checks if a given binary search tree contains a given value.
# For example, for the following tree:
# n1 (Value: 1, Left: null, Right: null)
# n2 (Value: 2, Left: n1, Right: n3)
# n3 (Value: 3, Left: null, Right: null)
# Call to contains(n2, 3) should return True since a tree with root at n2 contains number 3.
@mindyng
mindyng / app_session.sql
Last active January 24, 2024 08:23
TestDome SQL Practice Q's
/*App usage data are kept in the following table:
TABLE sessions
id INTEGER PRIMARY KEY,
userId INTEGER NOT NULL,
duration DECIMAL NOT NULL
Write a query that selects userId and average session duration for each user who has more than one session.*/
-- Example case create statement: