Thiago Coelho Vieira tcvieira

## min-char-rnn.py
"""
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
BSD License
"""
import numpy as np

# data I/O
data = open('input.txt', 'r').read() # should be simple plain text file
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)

## speed_up.py
import numpy as np
import multiprocessing as multi

def chunks(n, page_list):
    """Splits the list into n chunks"""
    return np.array_split(page_list,n)

cpus = multi.cpu_count()
workers = []
page_list = ['www.website.com/page1.html', 'www.website.com/page2.html'

## setupFastaiV1.md

      
              1 file
            
          
              1 fork
            
          
              6 comments
            
          
              6 stars
            
          
                tcvieira
                / setupFastaiV1.md
            
            
              Last active
              October 12, 2021 13:10
            
              
                Setup Fast.ai v1 on Paperspace Fast.ai Template
              
          
    Setup Fastai v1 on Paperspace

Machine


Create a Fast.ai machine from public templates w/ P4000 and public IP

Connect to the machine


$ source deactivate fastai
$ pip install virtualenv


## format.py
# Formattinng data
data['state'] = data['state'].str.upper() # Capitalize the whole thing
data['state'] = data['state'].replace( # Changing the format of the string
                                      to_replace=["CA", "C.A", "CALI"],
                                      value=["CALIFORNIA", "CALIFORNIA", "CALIFORNIA"])

# Dates and times are quite common in large datasets
# Converting all strings to datetime objects is good standardisation practice
# Here, the data["time"] strings will look like "2019-01-15", which is exactly
# how we set the "format" variable below

## missing.py
# Filling in NaN values of a particular feature variable
avg_height = 67 # Maybe this is a good number
data["height"] = data["height"].fillna(avg_height)

# Filling in NaN values with a calculated one
avg_height = data["height"].median() # This is probably more accurate
data["height"] = data["height"].fillna(avg_height)

# Dropping rows with missing values
# Here we check which rows of "height" aren't null

## dropping.py
# Computing correlation coefficients
x_cols = [col for col in data.columns if col not in ['output']]

for col in x_cols:
  corr_coeffs = np.corrcoef(data[col].values, data.output.values)

# Get the number of missing values in each column / feature variable
data.isnull().sum()

# Drop a feature variable

## display_closestwords_tsnescatterplot.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tcvieira
                / display_closestwords_tsnescatterplot.ipynb
            
            
              Created
              January 29, 2019 05:23
                — forked from aneesha/display_closestwords_tsnescatterplot.ipynb
            
              
                Use TSNE to only plot similar words using Word2Vec
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## install_ngrok_gcp_fastai.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                tcvieira
                / install_ngrok_gcp_fastai.md
            
            
              Last active
              March 14, 2019 23:13
            
          
    Instalando ngrok no Google Cloud Shell

Para contornar o problema de tunelamento usando ssh para acessar o Jupyter notebook.
A ideia é usar o ngrok para acessar o jupyter sem a necessidade de tunelamento via ssh.
Abrindo o Google Cloud Shell

https://console.cloud.google.com/compute/instances
Inicie a instância e abra o shell:

  
## Acesso_gcp_jupyter_pela_rede_ISC.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tcvieira
                / Acesso_gcp_jupyter_pela_rede_ISC.md
            
            
              Created
              March 23, 2019 13:20
                — forked from EMFS/Acesso_gcp_jupyter_pela_rede_ISC.md
            
          
    Acesso ao GCP e servidor Jupyter pela rede ISC

Solução para contornar o problema de tunelamento usando ssh para acessar o Jupyter notebook que é bloqueado pelo firewall da rede wifi do ISC, onde ocorrem os encontros presenciais do grupo de estudo em Deep Learning de Brasília.
A ideia é tornar o servidor jupyter executando no Google Cloud Platform (GCP) acessível para rede externa, diretamente por seu IP, sem a necessidade de tunelamento via ssh.
Alterando as configurações da instância na console GCP

https://console.cloud.google.com/networking/addresses/
Logado na console GCP, acessar o endereço acima e reservar um endereço IP estático: "Reserve Static Address".

  
## ludwig.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                tcvieira
                / ludwig.ipynb
            
            
              Created
              August 6, 2019 15:01
            
              
                Ludwig Example
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	"""
	Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
	BSD License
	"""
	import numpy as np

	# data I/O
	data = open('input.txt', 'r').read() # should be simple plain text file
	chars = list(set(data))
	data_size, vocab_size = len(data), len(chars)
	import numpy as np
	import multiprocessing as multi

	def chunks(n, page_list):
	"""Splits the list into n chunks"""
	return np.array_split(page_list,n)

	cpus = multi.cpu_count()
	workers = []
	page_list = ['www.website.com/page1.html', 'www.website.com/page2.html'
	# Formattinng data
	data['state'] = data['state'].str.upper() # Capitalize the whole thing
	data['state'] = data['state'].replace( # Changing the format of the string
	to_replace=["CA", "C.A", "CALI"],
	value=["CALIFORNIA", "CALIFORNIA", "CALIFORNIA"])

	# Dates and times are quite common in large datasets
	# Converting all strings to datetime objects is good standardisation practice
	# Here, the data["time"] strings will look like "2019-01-15", which is exactly
	# how we set the "format" variable below
	# Filling in NaN values of a particular feature variable
	avg_height = 67 # Maybe this is a good number
	data["height"] = data["height"].fillna(avg_height)

	# Filling in NaN values with a calculated one
	avg_height = data["height"].median() # This is probably more accurate
	data["height"] = data["height"].fillna(avg_height)

	# Dropping rows with missing values
	# Here we check which rows of "height" aren't null
	# Computing correlation coefficients
	x_cols = [col for col in data.columns if col not in ['output']]

	for col in x_cols:
	corr_coeffs = np.corrcoef(data[col].values, data.output.values)

	# Get the number of missing values in each column / feature variable
	data.isnull().sum()

	# Drop a feature variable