Skip to content

Instantly share code, notes, and snippets.

@18520339
Last active February 18, 2024 02:41
Show Gist options
  • Save 18520339/a6843aa82b32f6517f5af67cdc985bde to your computer and use it in GitHub Desktop.
Save 18520339/a6843aa82b32f6517f5af67cdc985bde to your computer and use it in GitHub Desktop.
My study notes
#include <iostream>
#include <math.h>
using namespace std;
bool is_prime(int n) {
if (n <= 1) return false;
if (n <= 3) return true;
if (n % 2 == 0 || n % 3 == 0) return false;
for (int i = 5; i * i <= n; i += 6)
if (n % i == 0 || n % (i + 2) == 0) return false;
return true;
}
bool is_fibo(int n) {
int n1 = n * n * 5 - 4;
int n2 = n * n * 5 + 4;
float sqrt1 = sqrt(n1);
float sqrt2 = sqrt(n2);
return (int)sqrt1 == sqrt1 || (int)sqrt2 == sqrt2;
}
int get_fibo(int n) {
double phi = (1 + sqrt(5)) / 2;
return round(pow(phi, n) / sqrt(5));
}
// SAKAMOTO ALGORITHM to checks what day of the week it is
int day_of_week(int year, int month, int day) {
int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4};
year -= month < 3;
return (497 * year/400 + t[month - 1] + day) % 7;
}
function getWebName(url) {
// http://example1.com/a/b?c=d => example1
// http://www.example2.com/b?c=d => example2
// https://ww.example3.com.vn => example3
const hostnameParts = new URL(url).hostname.split('.');
return hostnameParts[hostnameParts.length - 1].length === 2
? hostnameParts[hostnameParts.length - 3]
: hostnameParts[hostnameParts.length - 2];
}
// Check even and odd without `if else`
number = 3
["even", "odd"][number % 2]
// Get intersection
const a = new Set([1,2,3]);
const b = new Set([4,3,2]);
const intersection = [...a].filter(x => b.has(x))
console.log(intersection) // [2, 3]
function getCookieField(name) {
const cookie = document.cookie.split("; ").find(item => item.startsWith(`${name}=`));
return cookie ? decodeURIComponent(cookie.split("=")[1]) : null;
}
(265 >>> 0).toString(2);
(_$=($,_=[]+[])=>$?_$($>>+!![],($&+!![])+_):_)(265);
/*
Đây ko phải là RegEx mà là hàm mũi tên (arrow function) với các tên hàm, tên biến và số (1) được thể hiện bằng các kí tự đặc biệt và sô 1 được thể hiện bằng biểu thức mảng như này +!![]
Đây là phiên bản dễ hiểu hơn một chút của đoạn mã:
(toBinary = (val, str = "") => val ? toBinary(val >> 1, (val & 1) + str):str)(265);
[]+[] chính là chuỗi trống "".
+!![] chính là số 1.
Dùng đệ quy để lấy từng bit và cộng dồn vào chuỗi str (ban đầu là trống ""). Điều kiện dừng là val bằng 0 (đoạn toán tử 2 ngôi chỗ val?... đấy).
Viết cho dễ nhìn và chú thích:
(
toBinary = (val, str = "") => // gán toBinary cho hàm mũi tên với 2 tham số val và str (mặc định là "").
val ? // nếu val khác 0...
toBinary(val >> 1, (val & 1) + str) : // ... thì thực hiện đệ quy cho bit tiếp theo
str // ...ngược lại kết thúc đệ quy và trả về giá trị
)(265); // gọi trực tiếp hàm toBinary
*/
@18520339
Copy link
Author

18520339 commented Sep 26, 2023

Anomaly detection

image image
Try to make sure the features you give it are more or less Gaussian. If your features are not Gaussian, sometimes you can change it to make it a little bit more Gaussian -> Train the model and then to see what anomalies in the cross-validation set the algorithm is failing to detect -> Look at those examples to see if that can inspire the creation of new features that would allow the algorithm to spot. image

@18520339
Copy link
Author

18520339 commented Sep 26, 2023

Collaborative Filtering

image image
image image

If you were to run the algorithm on this dataset, you actually end up with the parameters w = [0 0] & b = 0 for the user Eve. Because Eve hasn't rated any movies yet, w & b don't affect the first term in the cost function because none of Eve's movie ratings play a role in the squared error cost function -> Normalizing the rows so that you can give reasonable ratings.

Normalizing the columns would help if there was a brand-new movie that no one has rated yet. But if there's a brand new movie that no one has rated yet, you probably shouldn't show that movie to too many users initially because you don't know that much about that movie. So normalizing columns is less important than normalizing the rules to hope with the case of a new user that's hardly rated any movies yet.

@18520339
Copy link
Author

18520339 commented Sep 27, 2023

Content-based filtering

image image
image image
image image

The retrieval step tries to prune out a lot of items that are just not worth doing the more detailed influence and inner product on. And then the ranking step makes a more careful prediction for what are the items that the user is actually likely to enjoy. Retrieving more items results in better performance but slower recommendations.

To optimize the trade-off, carry out offline experiments to see if retrieving additional items results in more relevant recommendations (ex: $p(y^{(i,j)})$=1 of items displayed to user are higher). image

image

@18520339
Copy link
Author

18520339 commented Sep 28, 2023

Reinforcement learning

One way to think of why reinforcement learning is so powerful is you have to tell it what to do rather than how to do it:

  • For an autonomous helicopter, you could then train a neural network using supervised learning to directly learn the mapping from the states $s$ (x) to action $a$ (y).
  • But it turns out that when the helicopter is moving through the air is actually very ambiguous. What is the exact right action to take? It's actually very difficult to get a dataset of x and the ideal action y -> For lots of tasks of controlling a robot like a helicopter & other robots, the supervised learning approach doesn't work well, and we instead use reinforcement learning.
  • Specifying the reward function (make it impatient) rather than the optimal action gives you more flexibility in how to design the system.
image image

Reinforcement learning is more finicky in terms of the choice of hyperparameters. For example, in supervised learning, if you set the learning rate a little bit too small, then maybe the algorithm may take 3 times longer to train, which is annoying but maybe not that bad. Whereas in Reinforcement learning, if you set the value of Epsilon or other parameters not good, it may take 10 times or 100 times longer to learn.

@18520339
Copy link
Author

18520339 commented Sep 28, 2023

Bellman Equation

image image
image image

When the RF problem is stochastic, there isn't a sequence of rewards that you see for sure -> what we're interested in is not maximizing the return (because that's a random number) but maximizing the average value of the sum of discounted rewards. In cases where both the state and action space are discrete we can estimate the action-value function iteratively by using the Bellman equation:

$$ Q_{i+1}(s,a) = R + \gamma \max_{a'}Q_i(s',a') $$

This iterative method converges to the optimal action-value function $Q^*(s,a)$ as $i\to\infty$. This means that the agent just needs to gradually explore the state-action space and keep updating the estimate of $Q(s,a)$ until it converges to the optimal action-value function $Q^*(s,a)$:

  • However, in cases where the state space is continuous it becomes practically impossible to explore the entire state-action space. Consequently, this also makes it practically impossible to gradually estimate $Q(s,a)$ until it converges to $Q^*(s,a)$.

In the Deep $Q$-Learning, we solve this problem by using a neural network to estimate the action-value function $Q(s,a)\approx Q^*(s,a)$. It can be trained by adjusting its weights at each iteration to minimize the mean-squared error in the Bellman equation:

  • Using neural networks in RF to estimate action-value functions has proven to be highly unstable -> Use a Target Network (soft update) and Experience Replay storing the agent's states, actions, rewards the agent receives in a memory buffer and then samples a random mini-batch to generate uncorrelated experiences for training agent.
  • Towards the end of training, the agent will lean towards selecting the action that it believes (based on past experiences) will maximize $Q(s,a)$ -> We will set the minimum 𝜖 value = 0.01 (not 0) because we always want to keep a little bit of exploration during training.

@18520339
Copy link
Author

18520339 commented Sep 28, 2023

Deep reinforcement learning

image image
image image
rl_formalism.mp4
In the standard “agent-environment loop” formalism, an agent interacts with the environment in discrete time-steps t=0,1,2,... At each $t$, the agent uses a policy $\pi$ to select an action $A_t$ based on its observation of the environment's state $S_t$. The agent receives a numerical reward $R_t$ and on the next time step, moves to a new state $S_{t+1}$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment