Skip to content

Instantly share code, notes, and snippets.

View shafaq-aslam's full-sized avatar

Shafaq Aslam shafaq-aslam

View GitHub Profile
Role Analytical Skills Business Acumen Data Storytelling Soft Skills Software Skills
Data Analyst High Medium to High High Medium to High Medium
Data Engineer Medium Low Low Medium High
Data Scientist High High High High Medium
ML Engineer Medium to High Medium Low High High
Instance-based Learning Model-based Learning
Required data preprocessing Required data preprocessing
No explicit training required; discovers patterns when new receives new data points Build mathematical model from data to discover hidden patterns.
No model to store Stores the trained model for future predictions
Original data must be kept for predictions Discard training data after model training
k-Nearest Neighbors (kNN), Locally Weighted Regression Linear Regression, Logistic Regression, Decision Trees, Neural Networks
Customer ID Spending Behavior Shopping Frequency Brand Preference Cluster (Segment)
C101 Low–High Discount Rare None Budget Shoppers
C205 High–Low Discount Frequent Yes Brand Loyal
C309 Medium–Medium Frequent No Frequent Buyers
Customer ID Avg. Monthly Spend Shopping Frequency Preferred Brands Discount Sensitivity Cluster (Segment)
C101 Low Rare None High Budget Shoppers
C205 High Frequent Yes Low Brand Loyal
C309 Medium Frequent No Medium Frequent Buyers
Customer ID Avg. Monthly Spend Shopping Frequency Preferred Brands Cluster (Segment)
C101 Low Rare No Budget Shoppers
C205 High Frequent Yes Brand Loyal Customers
C309 Medium Frequent No Frequent Buyers
C412 Low Moderate No Budget Shoppers
C523 High Rare Yes Brand Loyal Customers
C634 Medium Frequent Yes Frequent Buyers
C745 Low Frequent No Budget Shoppers
C856 High Frequent Yes Brand Loyal Customers
Email ID Contains "Free" Sender Reputation Has Attachments Predicted Class
1 Yes Low No Spam
2 No High No Not Spam
3 Yes Medium Yes Spam
4 No High Yes Not Spam
5 Yes Low Yes Spam
Plant Sunlight (hours/day) Water (liters/day) Growth (cm/week)
1 4 1 5
2 6 1.5 8
3 5 1.2 6
4 7 2 10
5 3 0.8 4
Data Type Examples
Numeric (Continuous) Height, temperature, stock price
Numeric (Discrete) Number of children, count of clicks
Categorical Country names, product category, car brand
Ordinal T-shirt sizes (S, M, L, XL), survey ratings (1–5 stars)
Binary Gender (M/F), customer churn (yes/no)
Text Emails, product reviews, news articles
Image Photographs, X-rays, handwritten digits
Audio Speech, music, environmental sounds