Skip to content

Instantly share code, notes, and snippets.

@nitingautam
Forked from abhinishetye/100daysLog.md
Created November 13, 2018 18:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nitingautam/56c4b4b96bf8722ce4871434901d119a to your computer and use it in GitHub Desktop.
Save nitingautam/56c4b4b96bf8722ce4871434901d119a to your computer and use it in GitHub Desktop.
100DaysofMLCode

100 Days Of ML Code

Hi! I am Abhini, a Machine Learning Enthusiast and this is my log for the 100DaysOfMLCode Challenge

Day 1: July 08, 2018

Today's Progress: Understood the basics of Neural Network and how to build ANN. Also practiced Python on Hackerrank.

Thoughts: Cleared up my concepts on ANN in which I had earlier found confusing like Activation and Cost functions, Batch and Stochastic Gradient Descent and Backpropagation.

Links to work:

Hackerrank Profile

Day1 Learning

Day 2: July 09, 2018

Today's Progress: Learnt about Keras library. Installed it and learnt the theory behind how to implement ANN using Keras. Practiced Statistics on Hackerrank.

Thoughts: Seeing how to implement the concept learnt as a theory yesterday cleared the concepts much more. I will implement the code for the same tomorrow.

Links to work:

Day2 Learning

Hackerrank Profile

Day 3: July 10, 2018

Today's Progress: Implemented and practiced ANN network using Keras Library for Churn Prediction

Thoughts: After implementing the ANN, I tried several combinations of parameters by varing the activation functions, number of nodes in hidden layers and the number of hidden layers. Changing these parameters had an effect on the Accuracy and the best found combinations were relu function with 5 hidden layers having 6 nodes each and also elu activation function with 2 hidden layers having 6 nodes each. Both the combinations gave the accuracy of around 86%.

Link to work:

ANNBasics Repository

Day 4: July 11, 2018

Today's Progress: Studied about how to improve and tune the model.

Thoughts: Studied about K-folds cross validation and Bias-variance tradeoff. Also learnt how to improve the accuracy of the models with the help of hyper-parameter tuning using Grid search method. Making use of Keras and sklearn both together for k-folds cross validation using wrapper was something interesting to learn.

Day 5: July 12, 2018

Today's Progress: Studied about basics of Convolutional Neural Networks

Thoughts: Learnt about how an image is translated into a language which machine can understand. Starting from raw 2D image to making it into a 1D array of values, still retaining the spatial co-relations of the original image was great to understand. It involved processes like Convolution (Applying filters), pooling (Retaining the most relevant features), flattening (Converting in 1D array of values) and full connection (fitting the convolved layer inside ANN). Watched the 2D visualization of all the above process from the following link.

2D visualization of CNN

Day 6: July 13, 2018

Today's Progress: Studied about implementation of CNN using Keras Library

Thoughts: Working with images was something new. I had to learn right from how to import the dataset, since it was no more a CSV file. Then learnt how to implement and complile CNN using convolution, pooling, flattening and full connection. One more interesting thing learnt today is Image Augmentation and how it can be used to prevent overfitting. The code is taking too much time to train unlike all the models run till now.

Day 7: July 14, 2018

Today's Progress: Watched videos to learn about basics of RNN and LSTM

Thoughts: Re-current Neural Network basically mimics the short term memory of the human brain. Using RNN is important when the network needs to remember something from the recent past to perform operations in the future. Common applications of RNN lies in Natural Language Processing because it requires remembering the context and the subjects in order to process the information. I learnt about the problem of vanishing gradient and how if the weights are small, the network fails to train the network as the gradient goes on diminishing and eventually backpropagation fails. This problem can be solved with LSTM (Long Short Term Memory). It was interesting to watch a movie which was written by an algorithm which was trained using RNN, making use of LSTM. The link to the video is given below.

Movie written by an Algorithm

Day 8: July 15, 2018

Today's Progress: Started off with building RNN

Thoughts: Started with implementation of RNN for predicting Google Stock price. The model is not complete yet. Shall complete it tomorrow.

Day 9: July 16, 2018

Today's Progress: Implemented RNN network

Thoughts: Implemented RNN for predicting Google Stock price.

Link to work:

RNNBasics Repository

Day 10: July 17, 2018

Today's Progress: Started with Unsupervised Deep learing and Learnt about the Self Organizing Maps.

Thoughts: SOM are basically used for simplified visualization of results involving many output variables. It is closely related to dimensionality reduction. It was interesting to see how the BMU (Best Matching Unit) pulls the nodes around it towards itself to form clusters of inputs sharing similar features. Also revised the topic of K-Mean clustering.

Day 11: July 18, 2018

Today's Progress: Started implementation of SOM using minisom

Thoughts: Started implementation of SOM using minisom library for probable bank frauds.

Day 12: July 19, 2018

Today's Progress: Completed implementation of SOM

Thoughts: Implemented and visualized the SOM for probable bank frauds (Unsupervised learning) also used this information to lable the input dataset. Then also used supervised learning (ANN) to classify the customer based on the outcome from SOM.

Day 13: July 20, 2018

Today's Progress: Learnt about Boltzmann Machines

Thoughts: Boltzmann machines gets their name because they follow Boltzmann Distribution. There are Input and Hidden layers connected to each other and also the neuron within the same layer connect to each other. The connections are bi-directional. I learnt the concept of Energy based models and how the model is most stabilized when it is at its lowest energy state. I saw how Boltzmann machines can be used in Recommender Systems. Learnt about Restricted Boltzmann machine in which the neurons are not connected within the same layers.

Day 14: July 23, 2018

Today's Progress: Learnt about Auto Encoders

Thoughts: Auto encoders are basically unsupervised deep networks which encodes inputs and reproduce it at the output. I learnt about overcomplete hidden layers in auto encoder. In this architechture there are more hidden nodes than that in input layer. Due to this there arises a possibility that the model will just pass on the signal through the layers and will not encode at all. The method to overcome this are sparse and denoising auto encoder.

Day 15: July 24, 2018

Today's Progress: Discussed and learnt about the ways to avoid overfitting in CNN

Thoughts: My CNN model was overfitting, today I found out various ways to avaoid that.

How to avoid over fitting

Batch Normalization

Day 16: July 25, 2018

Today's Progress: Added dropout regularization in CNN code

Thoughts: Tried to reduce overfitting by adding 0.2 dropout after full connnection layer. It improved the test set accuracy from 78% to 84%

Day 17/18: July 26 and 27, 2018

Today's Progress: Implemented recommender system using Boltzmann machine

Thoughts: Implemented Recommender system to predict whether a particular user will like the movie. Implementation was done using PyTorch library.

Day 19/20: July 28 and 29, 2018

Today's Progress: Implemented recommender system using Auto Encoder

Thoughts: Implemented Recommender system to predict a movie rating given by a particular user. Implementation was done using PyTorch library.

Day 21/22: July 30 and 31, 2018

Today's Progress: Completed Google's tutorial on Image Classification

Thoughts: Implemented the basic CNN using Keras module of Tensorflow library. The model contained 3 Conv layer. This model was overfitting hence dropout regularization and image augmentation was used. Next the concept of Transfer learning was used to further increase the accuracy of the model using the Inception model.

Google Image Classification

Day 23/24: August 1 and 2, 2018

Today's Progress: Built a CNN by implementing transfer learning using Keras

Thoughts: In order to reduce the Overfitting and to improve the accuracy, I implemented trasfer learning using the model VGG19 pre-trained on ImageNet dataset. This was done using the keras library.

Transfer Learning using Keras

Day 25: August 3, 2018

Today's Progress: Studied the filters used in CNN in detail with the focus on Edge Detection

Thoughts: Filters are used in CNN to detect the features from the image. One of the features could be detecting edges in the image and I learnt about the Vertical and Horizontal edge detection filters. 3x3 Prewitt operator is a 3x3 matrix with the first column of 1s, second column of 0s and third column of -1s. To enhance this and to put more emphasis on the middle pixels of the filter, sobel filter or scharr filter is used. For visually seeing how filters detect the features, GIMP image editing platform can be used.

GIMP Documentation

UCI PPT

Day 26: August 4, 2018

Today's Progress: Studied about Padding and Stride used in CNN

Thoughts: When filters are used to convolve the image, the image size reduces and if many of such operations are performed then it reduces the size of the image significantly and can degrade the image. Hence to avoid this, padding is used. Padding is basically adding rows and columns of zeros around the matrix. If the image size is nxn and the filter of size fxf is used then the output size is n-f+1. If padding of p is used on both the sides then this output size becomes n+2p-f+1. So if we want the output image to be of the same size as that of input then "Same Padding" is used. Here we need n+2p-f+1 = n. Hence in the same padding p= (f-1)/2. If f is odd then p is an integer and integer padding is generally preffered as we cannot add fraction pixels. Hence mostly the filter size is odd. e.g. 3x3 or 5x5.

Day 27: August 5, 2018

Today's Progress: Learnt about 3D convolutions and pooling layer

Thoughts: 3D convilutions is also termed as convolution over volumn. This is used when the image is coloured. i.e when it has R, G, B channels. Pooling layer is used to retain the important features from the images even after reduction in size.

Day 28: August 6, 2018

Today's Progress: Learnt about LeNet 5 Architecture

Thoughts: LeNet 5 was one of the early models and its purpose was to detect handwritten digits. It was trained on greyscale images and applied no padding. Also instead of Max pooling, it used Average pooling. This model followed the usual process of using the conv layers, pooling layers and then 2 fully connected layers but instead of using softmax function at the end, it used tanh or sigmoid.

Day 29: August 7, 2018

Today's Progress: Learnt AlexNet and VGG16

Thoughts:

Day 30: August 9, 2018

Today's Progress: Learnt about ResNet

Thoughts:

Day 31: August 10, 2018

Today's Progress: Learnt about Transfer Learning

Thoughts: After learning about the Benchmark Models, I learnt how can we make use of these models for our own usecase. The models like VGG16, AlexNet, Inception are very deep models, trained on huge dataset. In these models, the earlier layers identify the less complex features like colour or texture, the middle layers identify contours and the later layers identify complex features like objects. So for our usecase we can use the earlier layers and then add some fully connected layer and softmax function after it. This way we utilize the learning from the Pre trained networks to make our models perform better. If we have very small dataset then we can freeze all the layers and train just the softmax since there is inadequate data to train the later layer. If we have substantial amount of data then we can freeze the first few layers and train the later layers and the softmax unit. And if we have a huge dataset then maybe we can think of training all the layers.

Transfer Learning

Day 32: August 11, 2018

Today's Progress: Read about Neural Style Transfer

Thoughts: I was always curious about how Computer can generate Art. I was amused by the Prisma app and wanted to know the concept behind it. Today I spent time in getting the idea about Neural Style Transfer. Neural Style Transfer is an idea of imposing the style from any artwork on to any other image. I learnt that this Style Transfer actually uses transfer learning and it works on minimizing different losses.

Neural Style Transfer1

Neural Style Transfer2

Day 33: August 12, 2018

Today's Progress: Learnt in detail about Neural Style Transfer

Thoughts: Neural style transfer is a great concept where you have two images, one style image and one content image. And what we do is we extract the style i.e. colour, texture from the style image and impose it on to the content image. For doing so, we make use of the transfer learning. We are using say VGG16, ideally this model is trained as a image classifier, but here we tweak it and use it to extract features. We feed the Style image and extract the style from the output of earlier layers of the model. Similarly we extract the shapes and objects from the content image using the output of the later layer. We then construct a stylized image by mixing the extracted features. We then compare this mixed image with the two original images and then calculate the content loss and the style loss. The style transfer then becomes the optimization task where we try to minimize these losses.

Neural Style Transfer (Useful Link)

Day 34/35: August 13/14, 2018

Today's Progress: Understood and implemented the Open Source implementation of Neural Style Transfer

Thoughts: Studied the awesome Jupyter notebook by Siraj Raval which implemented Neural Style Transfer using Tensorflow. It made use of the VGG16 model.

Open Source Implementation

Day 36: August 19, 2018

Today's Progress: Learnt about Object Localization

Thoughts: Object Localization is figuring out where in the image the object lies. This can be done by adding more parameters to the softmax function so that the output y comprises of 1. The probability that the object exists in the given grid 2. co-ordinates of the mid-point of the object wrt to the grid in which the object is found 3. Height of the object 4. Width of the object 5. Variables equal to the number of possible objects to be detected. These are binary variables representing 1 if the respective object is predicted in the given grid. The box thus made from parameters 2,3 and 4 is called Bounding Box and this box localize the object on the image.

Day 37: August 20, 2018

Today's Progress: Learnt about Landmark Detection and Object Detection

Thoughts: Landmark detection is detecting the position of specific parts of the given object. Say for example the facial features, position of eyes, nose or jaw line. This can be done by defining the landmarks first, then train a supervised learning model to learn the landmarks and then in the softmax function, the relative co-ordinates of the landmark points can be returned. For detecting object the basic method is making use of the sliding window. In this method, the model is first trained to identify the object using the closely cropped image of the object without any background. Then the test image is taken and a window i.e. a small grid is used to detect the object. This window is slid with some stride and the process is repeated. This process is computationally very expensive.

Day 38: August 21, 2018

Today's Progress: Learnt about Sliding Window and Bounding Box Prediction

Thoughts:

Day 39: August 22, 2018

Today's Progress: Learnt about Non-max suppression and Anchor Boxes

Thoughts: Sometimes one can have multiple bounding boxes for the same object and Non-max suppression is used to eliminate all the secondary boxes and keep just one box for one object. For doing so, it uses Intersection over Union (IoU). The Intersection between the two areas, the bounding box and the object is found. Similarly their union is found out for all the bounding boxes and only the box with highest IoU ratio is retained and the rest are suppressed. Anchor Boxes are used when one is expecting more than one objects are the same location.

Day 40: August 23, 2018

Today's Progress: Learnt about YOLO Algorithm

Thoughts: YOLO is an object detection algorithm and stands for You Only Look Once. It basically combines all the concepts covered in last 4 days to make the prediction. The training set comprises of image divided into grids and the classifier output with the dimensions n x n x a x (5+c) where n x n is the grid size, a is the number of Anchor Boxes and c is the number of classes. The next step is making the prediction, where we get output vector y for each grid of the image predicting if the object is detected in that grid and it gives the bounding boxes. Next we eliminate the boxes on grids returning lower probability of object being detected. We also use the Non-max suppression to eliminate the extra bounding boxes.

Day 41: August 24, 2018

Today's Progress: Learnt about One Shot Learning and Siamese Network

Thoughts: CNN does a good job when it has a lot of training images to learn from, but in real usecase the network might have to learn from just one image. Say for example, attendace system using facial recognition. This system will have just one image of every student feed into it and have to identify students based on that. This is called one shot learning and it cannot be achieved through traditional CNN, because of the training set constraint and also because if any new student is added then the entired system needs to be retrained. To solve one shot learning, Siamese networks are used. In siamese network, a network similar to CNN but without the sigmoid function is used to make encoding for every image. Then the difference between these encodings is found out, if the difference is very less then the images have a high degree of similarity and hence treated of the same class and vice versa.

Day 42: August 25, 2018

Today's Progress: Learnt about Triplet Loss and Face Verification

Thoughts:

Day 43: August 26, 2018

Today's Progress: Implemented basic Face Detection

Thoughts: Today I started learning OpenCV. There are some good tutorials and I started with Face Detection using Haar Cascade.

Face Detection using Haar Cascades

Day 44: August 27, 2018

Today's Progress: Implemented Color Detection

Thoughts: Implemented colour detection by changing the colour space to HSV and using masking. The code was implemented to detect blue colour and extended to detect blue and green colour in one frame.

Colour Detection

Day 45: August 28, 2018

Today's Progress: Implemented OCR of Digit using KNN and OpenCV

Thoughts: Optical Character Recognition was done using K-Nearest Neighbours algorithm and OpenCV.

OCR: Digit Detection

Day 46: August 29, 2018

Today's Progress: Implemented Object Detection in Video

Thoughts: Implemented Object detection in Video using Meanshift algorithm.

Object Detection in Video

Day 47/48/49/50: August 30/31, September 1/2, 2018

Today's Progress: Studied about the Capsule Theory

Thoughts: Attended a seesion by Tarry Singh in which he talked about applications of deep learning in Health care sector. He also threw light upon the concept of Capsule theory. I spent the last 4 days studying about the Capsule Theory or CapsNet. This theory tries to overcome the drawbacks of ConvNet by taking into consideration spatial orientation or pose rather than pixel intensity. I read Tarry Singhs's view on CapsNet and also watched a video of Geoffrey Hinton explaining "What's wrong with ConvNets".

Capsule Theory - Tarry Singh

Capsule Theory - Geoffrey Hinton

Capsule Theory - MyGist

Day 51: September 7, 2018

Today's Progress: Basics of Statistics: Types of data

Thoughts: Learnt about the broad types of data. Data can be qualitative (Description of qualities) or quantitative (measure of quantities). Qualitative data can be either Nominal or Ordinal. Nominal data does not have any natural ordering. e.g. gender whereas Ordinal data has a natural ordering. e.g. Rating, level of agreement. Quantitative data can also be of two types, Discrete and continuous. Discrete type takes specific values whereas continuous type can take any value.

Day 52: September 8, 2018

Today's Progress: Basics of Statistics: Lifecycle of a statistical experiement

Thoughts: Solving a problem with the help of statistics can be an iterative process. This process starts from posing a relavant question to understand the problem. Once the problem statement is clear the next step is to collect the data which is needed for the analysis. We need to decide what data is needed and in which format. Then once we have all the needed data, we proceed with the analysis step. Analysis can be done by applying several statistical techniques to the data. The next step is to interpret the results obtained from this analysis. These results may give rise to new questions and hence the whole process repeats itself till we finally reach to a conclusion.

Day 53: September 9, 2018

Today's Progress: Basics of Statistics: Trimmed Mean and Winsurized Mean

Thoughts: Whenever your data is skewed, normal mean does not give the correct interpretation. Hence either Trimmed mean or Winsurized mean is used. In trimmed mean, the outliers (i.e. extreme values) are trimmed off. e.g. if we apply trimmed mean of 0.2 then it trims off 20% of the extreme values and finds the mean of the rest of the samples. On the other hand, if Winsurized mean of 0.2 is applied then it calculates 10th percentile and 90th percentile of the dataset and for all the samples having value lesser than 10th percentile, it assigns the value of 10th percentile. Similarly for all the samples having the value greater than 90th percentile, it assigns the value of 90th percentile and then it finds of the mean of this new dataset.

Day 54/55: September 10/11, 2018

Today's Progress: Python: objects and Data Structure and coding practice

Thoughts: Learnt the basic objects in python such as Numbers, Boolean, Strings (indenxing, slicing, print formatting with strings), Tuples, Lists, Dictionaries and practiced the same.

Day 56: September 12, 2018

Today's Progress: Python: comparison operator and coding practice

Thoughts: Learnt about the comparison operator and logical operators in python any practiced the same.

Day 57: September 13, 2018

Today's Progress: Python: Loops and coding practice

Thoughts: Learnt about the if else, elif, while, for loops and practiced the same.

Day 58: September 14, 2018

Today's Progress: Python: Useful operators and list comprehension/ coding practice

Thoughts: Learnt about the useful operators in python like range, enumerate, zip, in, max, min, shuffle and list comprehension. Practiced the same.

Day 59/60: September 15/16, 2018

Today's Progress: Python: Methods and Functions/ coding practice

Thoughts: Started with the basic of defining a function with 'def'. Further saw how passing an arguement works and also about the return statement. Docstrings can be useful for writing the documentation of a user defined function which can be available to other users through the Help section. Learnt about * args and ** kwargs and saw how they can be useful in getting any number of arguments a users wishes to pass into the function. * args stores the arguments in the form of a tuple and ** kwargs stores them in the format of a dictionary.

Day 61: September 17, 2018

Today's Progress: Python: Lambda Expressions

Thoughts:

Day 62: September 18, 2018

Today's Progress: Python: Numpy Array and Array Indexing

Thoughts: Learnt about how to create Numpy array (casting a list) and the advantages of the same. Went through functions like arrange, zeros, ones, linspace, random methods, reshape, dtype. Then learnt about the indexing and slicing. One point to note is Numpy array slicing does not create new array, it references to the old array. This is done to avoid memory issue on duplication of large arrays. If you want a copy of an array you need to explicitly run arr.copy() to create separate array.

Day 63: September 19, 2018

Today's Progress: Python: Numpy Operations

Thoughts: Learnt about 2D arrays (matrix), two ways to denote matrix: double bracket notation, comma single bracket notation. Also went through concepts like Conditional selection, Arithmatic operations and trigonometric functions.

Day 64-66: September 20-22, 2018

Today's Progress: Python: Numpy coding practice

Thoughts: Practiced excercises on Numpy

Day 67: September 23, 2018

Today's Progress: Python: Pandas Series

Thoughts: Started with Pandas. Learnt about Pandas Series. Series holds objects with index. Index can be set to anything. Series can hold wide variety of objects. Pros: very fast lookup.

Day 68: September 24, 2018

Today's Progress: Python: Pandas Dataframe

Thoughts: Dataframe in Pandas is a bunch of series, sharing the same index. Pandas methods usually have inplace = False by default. This is for user to not lose any information accidentally. Learnt about some ways to access the entries in a Dataframe. Df.loc used to get row by name based location and df.iloc is used to get row by index based location df.reset_index() to set index to numerical values. Df.set_index(“states”) to set index to a column. Pandas also support multi level indexing.

Day 69: September 26, 2018

Today's Progress: Python: Groupby, Merging, joining and concatenating

Thoughts: Learnt about how to handle missing values. Also the functions like Dropna, fillna, groupby (describe), concat, merge, join.

Day 70: September 27, 2018

Today's Progress: Python: Pandas Operations

Thoughts: Learnt about some functions like Unique() which gives unique entries in the datafram, nunique() which gives number of unique entries in the dataframe, value_counts(), apply() used to apply custom functions on dataframe, Sort_values(), pivot_table()

Day 71: September 28, 2018

Today's Progress: Python: Pandas Data Input and Output

Thoughts: Read about the ways in which different types of files can in inputed with Pandas. CSV, EXCEL, HTML and SQL are some common format in which files can be inputed. Inputing from HTML is usually as done while web scrapping.

Day 72-75: September 29-October 2, 2018

Today's Progress: Python: Pandas coding practice

Thoughts: Practiced exercises on Pandas.

Day 76: October 3, 2018

Today's Progress: Visualization: matplotlib

Thoughts: Learnt about visualizing with matplotlib using functional method and object oriented method. Functional method is simple and just involves using a plot statement. This simplicity comes at the cost of customization. For customizing the plots, the object oriented method is preferred. It involves initializing a figure object (plt.figure()), setting the axes (fig.add_axes()) and then plotting(axes.plot()). Subplots can also be made to visualize more than one plots on the same screen. Here tight_layout function is used for avoiding axes overlapping while visualizing multiple plots. Also went through functions like savefig, legend, color, linewidth, linestyle and marker.

Day 77: October 4, 2018

Today's Progress: Visualization: Seaborn

Thoughts: Seaborn is a great library for visualization and many interesting plots can be made using this. Some of the plots which I studied are: Distplot which gives the histogram, jointplot which is used to plot 2 variables. pairplot which plots the joint plot for all the available numerical variables. PairGrid is used to customise pairplot. rugplot, kdeplot (Kernel Density Estimation). For categorical variables following plots can be used: barplot, boxplot, violinplot, stripplot, swarmplot, factorplot. Next I learnt matrix plots which are heatmap and clustermap. For using heatmap, data needs to be arranged in matrix format. Correlation (df.corr) or pivot table can be used for this purpose. Next is regression plot which is simple lmplot which plots the regression line.

Day 78: October 5, 2018

Today's Progress: Visualization: Pandas In-Built visualization

Thoughts: Psndas provide built in visualization which runs matplotlib at the backend. Simple plots can be used using pandas. It is not very advanced and mostly used for just quick data exploration. Some of the common plots which can be made are histogram, area plot, bargraph, linegraph, scatterplot, boxplot, hexbin, kdeplot. The convention for using pandas for visualization is df.plot.plot_type(). eg. df.plot.hist() or df.plot.line().

Day 79: October 6, 2018

Today's Progress: Visualization: Plotly and Cufflinks

Thoughts: Plotly is interactive visualization library. Cufflinks connects plotly with pandas. Plotly provides nice interactive plots where you can zoom, pan, hover, save, switch on/off the variable entries and much more. After importing the libraries one needs to use iplot() function for plotting. iplot function takes one argument 'kind' which is used to specify the type of the plot. Some of the examples of kind argument are : ‘scatter’, ’bar’ in barplot you can use additional aggregate functions like sum or count on raw data, ‘box’, ‘surface’ to make 3D surface plot, ‘hist’, ‘spread’, ‘bubble’.

Day 80: October 7, 2018

Today's Progress: Visualization: Geographical Plotting

Thoughts: Geographical plotting is usually difficult due to projection of earth surface on to a single plane is difficult. Choropleth maps are used for the same and plotly library is used for plotting. For plotting choropleth plots 2 things needs to be specified. data and layout. Data is a dictionary which contains type (choropleth), locations (individual location codes), locationmode (geography e.g. USA-states), colorscale, text (text to be displayed on hover), z (variable whose value to be used for coloring) and colorbar (title). Layout is a disctionary of scope e.g. USA. Then a map needs to be created using go.Figure by specifying data and layout. After that iplot is used to visualize the created choromap.

Day 81-82: October 16-17, 2018

Today's Progress: Machine Learning: Data Preprocessing

Thoughts: Revised the first and essential part of any machine learning project, data preprocessing. Data Pre-processing involves the following major steps. i. Importing the libraries ii. Importing the dataset iii. Handling missing data iv. Encoding Categorical data v. Data split into train and test set vi. Feature scaling.

Day 83-84: October 18-19, 2018

Today's Progress: Regression: Simple linear and Multiple Linear

Thoughts: In simple terms, Simple linear regression draws multiple virtual lines through the data and selects the one in which the error is minimum. Multiple linear regression is used when there are more than one independent variables. I then learnt about the dummy variable trap. Then went through the 5 methods of model builiding: 1. All in 2. Backword Elimination 3. Forward Selection 4. Bidirectional Elimination 5. Score Comparison

Day 85: October 20, 2018

Today's Progress: Regression: Polynomial and SVR

Thoughts: Polynomial regression is used when there is an exponential growth. It is called linear regression because the co-effiecients can be expressed as a linear combination.

Day 86: October 21, 2018

Today's Progress: Regression: Decision Tree and Random Forest

Thoughts: Decision tree works on making groups of the dataset displaying similar characteristics and assigning a value (average of datapoints in that group) for further predictions. Random forest is an ensemble technique in which many decision trees vote to predict a value.

Day 87: October 22, 2018

Today's Progress: R Squared and adjusted R squared

Thoughts: R squared is goodness of fit of regression line. It basically measures how better a regression line is performing in comparison to an average line. When R squared is closer to 1, it is better. Adjusted R squared is used because whenever a new variable is added, R squared never decreases, it rather remains the same or increases by a small quantity. When adjusted R squared is used, it decreases when added variable is making the model worse and it increases when the added variable is making the model better.

Day 88: October 23, 2018

Today's Progress: Classification: Logistic and K-Nearest Neighbours

Thoughts: Logistic regression gives out probalities and when used with a threshold function, can be used to classify the dataset in distinct classes. It makes use of sigmoid function for doing so. KNN algorithm checks for the K nearest data points to a new data point and assigns it to the class in which majority of those K data points lie. Usually K is 5 and if it is too small then Overfitting happens and if it is too large then underfitting happens.

Day 89: October 24, 2018

Today's Progress: Classification: SVM and Kernel SVM

Thoughts: SVM model takes up support vectors, i.e. 2 boundary data points of the neighbouring classes and searches for a decision line based on the maximum margin. Sometimes the classes cannot be seperated using a straight line, in this case Kernel SVM is used which is non linear.

Day 90: October 25, 2018

Today's Progress: Classification: Decision Tree and Random Forest

Thoughts:

Day 91: October 26, 2018

Today's Progress: Clustering: K-Means

Thoughts:

Day 92: October 27, 2018

Today's Progress: Clustering: Hierarchical

Thoughts:

Day 93: October 28, 2018

Today's Progress: Association Rule Learning: Apriori

Thoughts: Apriori works on the principle "people who bought ___ also bought ___ ". It has 3 parts: support, confidence and lift. Support in case of movies will be (# users watchlist containing M) / (# user watchlist). Confidence will be (# users watchlist containing M1 and M2)/ (# users watchlist containing M1). Lift is what is the gain recommending a movie M2 to people who have watched M1 vs that to random people. Apriori algorithm gives the rules of items which relates the most to each other and can be used in recommender systems.

Day 94: October 29, 2018

Today's Progress: Association Rule Learning: Eclat

Thoughts:

Day 95: October 30, 2018

Today's Progress: Reinforcement Learning: UCB

Thoughts: Reinforcement learning is learning based on rewards and punishments. UCB or Upper Confidence Bound is used to exploit one out of many possible options which would give the best result, while in the process of the experiment. UCB algorithm comes up with confidence interval for each possible option and as a particular option is selected, depending on the result of that trial the confidence interval becomes narrow or wide. Here the option with the highest upper bound is chosen, and in the process of the experiment an optimal option is found using these bounds.

Day 96: October 31, 2018

Today's Progress: Reinforcement Learning: Thompson Sampling

Thoughts:

Day 97: November 1, 2018

Today's Progress: Natural Language Processing Basics

Thoughts:

Day 98: November 2, 2018

Today's Progress: Dimensionality Reduction: PCA

Thoughts:

Day 99: November 3, 2018

Today's Progress: Dimensionality Reduction: LDA

Thoughts:

Day 100: November 5, 2018

Today's Progress: Dimensionality Reduction: Kernel PCA

Thoughts:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment