Kirill Bobyrev kirillbobyrev

## eda.py
import requests
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
from collections import Counter

PLAYER = "Hikaru"
START_DATE = "2023-01-01"
END_DATE = "2023-11-28"

## td-gamma.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                kirillbobyrev
                / td-gamma.ipynb
            
            
              Last active
              June 6, 2018 15:35
            
              
                TD(\gamma).ipynb
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## tic_tac_toe.py
'''
Author: Kirill Bobyrev (https://github.com/kirillbobyrev)

This module implements "An Extended Example: Tic Tac Toe" from `Reinforcement
Learning: An Introduction`_ book by Richard S. Sutton and Andrew G. Barto
(January 1, 2018 complete draft) described in Section 1.5. The implemented
Reinforcement Learning algorithm is TD(0) and it is trained via self-play
between two agents. The update rule is slightly modified given the environment
specifics to comply with the one introduced in the Chapter 1, but as shown
later is equivalent to the one used in generic settings.
	import requests
	import pandas as pd
	import numpy as np
	from datetime import datetime
	import matplotlib.pyplot as plt
	from collections import Counter

	PLAYER = "Hikaru"
	START_DATE = "2023-01-01"
	END_DATE = "2023-11-28"
	'''
	Author: Kirill Bobyrev (https://github.com/kirillbobyrev)

	This module implements "An Extended Example: Tic Tac Toe" from `Reinforcement
	Learning: An Introduction`_ book by Richard S. Sutton and Andrew G. Barto
	(January 1, 2018 complete draft) described in Section 1.5. The implemented
	Reinforcement Learning algorithm is TD(0) and it is trained via self-play
	between two agents. The update rule is slightly modified given the environment
	specifics to comply with the one introduced in the Chapter 1, but as shown
	later is equivalent to the one used in generic settings.