Alexandros Nikou alexshmmy

## m1-max-numpy-setup.md

      
              1 file
            
          
              6 forks
            
          
              26 comments
            
          
              33 stars
            
          
                MarkDana
                / m1-max-numpy-setup.md
            
            
              Last active
              October 28, 2023 11:42
            
              
                Install NumPy on M1 Max
              
          
    How to install numpy on M1 Max, with the most accelerated performance (Apple's vecLib)? Here's the answer as of Dec 6 2021.

Steps

I. Install miniforge

So that your Python is run natively on arm64, not translated via Rosseta.

Download Miniforge3-MacOSX-arm64.sh, then
Run the script, then open another shell

$ bash Miniforge3-MacOSX-arm64.sh

  
## cem.md

      
              2 files
            
          
              9 forks
            
          
              0 comments
            
          
              46 stars
            
          
                kashif
                / cem.md
            
            
              Last active
              November 7, 2023 12:56
            
              
                Cross Entropy Method
              
          
    Cross Entropy Method

How do we solve  for the policy optimization problem which is to maximize the total reward given some parametrized policy?
Discounted future reward

To begin with, for an episode the total reward is the sum of all the rewards. If our environment is stochastic, we can never be sure if we will get the same rewards the next time we perform the same actions. Thus the more we go into the future the more the total future reward may diverge. So for that reason it is common to use the discounted future reward where the parameter discount is called the discount factor and is between 0 and 1.
A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward. In other words we want to maximize the expected reward per episode.