This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
from collections import Counter, defaultdict | |
def build_vocab(corpus: str) -> dict: | |
"""Step 1. Build vocab from text corpus""" | |
# Separate each char in word by space and add mark end of token | |
tokens = [" ".join(word) + " </w>" for word in corpus.split()] | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Linear Regression - Vectorized Implementation w/ Numpy | |
Setup: | |
- features X = Feature Vector of shape (m, n) [Could append bias term to feature matrix with ones(m, 1)] | |
- Target y = continuous variable - shape (m, 1) | |
- Weights = Weight matrix of shape (n, 1) - initialize with zeros | |
- Standardize features to have zero mean and unit variance. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Logistic Regression - Vectorized Implementation w/ Numpy | |
Setup: | |
- features X = Feature Vector of shape (m, n) [Could append bias term to feature matrix with ones(m, 1)] | |
- Target y = discrete variable - shape (m, 1) (Consider Binary for now) | |
- Weights = Weight matrix of shape (n, 1) - initialize with zeros | |
- Standardize features to have zero mean and unit variance. | |
Gradient Descent Algorithm: |