István Ketykó ketyi

## PyTorch_bucket_by_sequence_length.py
"""
PyTorch has pack_padded_sequence this doesn’t work with dense layers. For sequence data with high variance in its length
the best way to minimize padding and masking within a batch is by feeding in data that is already grouped by sequence length
(while still shuffling it somewhat). Here is my current solution in numpy.
I will need to convert every function over to torch to allow it to run on the GPU and am sure there are many other
ways to optimize it further. Hope this helps others and that maybe it can become a new PyTorch Batch Sampler someday.

General approach to how it works:

Decide what your bucket boundaries for the data are.

## NLP Reading List.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              5 stars
            
          
                yoavg
                / NLP Reading List.md
            
            
              Last active
              March 6, 2024 14:42
            
          
    NLP Reading

Books


Deep Learning for NLP: Book by Yoav Goldberg, and a Primer version (without the NLP bits, without some of the advanced bits)


Manning and Schutze Foundations of Statistical Natural Language Processing. Buy at Amazon

Classic book, a bit outdates by now, but some chapters are still worth reading today.


Jurafsky and Martin Speech and Language Processing (3rd Edition)


## sn_for_rnn.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-


"""
Most of this code is borrowed by niffler92's project.
https://github.com/niffler92/SNGAN
"""


## weight_init.py
#!/usr/bin/env python
# -*- coding:UTF-8 -*-

import torch
import torch.nn as nn
import torch.nn.init as init


def weight_init(m):
    '''

## sgdr.py
from keras.callbacks import Callback
import keras.backend as K
import numpy as np

class SGDRScheduler(Callback):
    '''Cosine annealing learning rate scheduler with periodic restarts.

    # Usage
        ```python
            schedule = SGDRScheduler(min_lr=1e-5,

## tensorflow-memory-1
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
config.gpu_options.visible_device_list = "0"
set_session(tf.Session(config=config))

## principal_component_analysis.ipynb

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              3 stars
            
          
                infinite-Joy
                / principal_component_analysis.ipynb
            
            
              Last active
              August 29, 2017 12:21
            
              
                description for principal component analysis
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## ThresholdingAlgo.py
#!/usr/bin/env python
# Implementation of algorithm from http://stackoverflow.com/a/22640362/6029703
import numpy as np
import pylab

def thresholding_algo(y, lag, threshold, influence):
    signals = np.zeros(len(y))
    filteredY = np.array(y)
    avgFilter = [0]*len(y)
    stdFilter = [0]*len(y)

## tensorflow-add-noise.py
import tensorflow as tf
import numpy as np


def gaussian_noise_layer(input_layer, std):
    noise = tf.random_normal(shape=tf.shape(input_layer), mean=0.0, stddev=std, dtype=tf.float32)
    return input_layer + noise


inp = tf.placeholder(tf.float32, shape=[None, 8], name='input')

## cua8_install.MD

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                sono-bfio
                / cua8_install.MD
            
            
              Created
              January 25, 2017 02:06
            
              
                Cuda 8.0 Install
              
          
    Nvidia Repo Setup
NVIDIA_GPGKEY_SUM=d1be581509378368edeec8c1eb2958702feedf3bc3d17011adbf24efacce4ab5 && \
NVIDIA_GPGKEY_FPR=ae09fe4bbd223a84b2ccfce3f60f4b3d7fa2af80 && \
apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/7fa2af80.pub && \
apt-key adv --export --no-emit-version -a $NVIDIA_GPGKEY_FPR | tail -n +2 > cudasign.pub && \
echo "$NVIDIA_GPGKEY_SUM  cudasign.pub" | sha256sum -c --strict - && rm cudasign.pub && \
echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64 /" > /etc/apt/sources.list.d/cuda.list
	"""
	PyTorch has pack_padded_sequence this doesn’t work with dense layers. For sequence data with high variance in its length
	the best way to minimize padding and masking within a batch is by feeding in data that is already grouped by sequence length
	(while still shuffling it somewhat). Here is my current solution in numpy.
	I will need to convert every function over to torch to allow it to run on the GPU and am sure there are many other
	ways to optimize it further. Hope this helps others and that maybe it can become a new PyTorch Batch Sampler someday.

	General approach to how it works:

	Decide what your bucket boundaries for the data are.
	#!/usr/bin/env python3
	# -- coding: utf-8 --


	"""
	Most of this code is borrowed by niffler92's project.
	https://github.com/niffler92/SNGAN
	"""
	#!/usr/bin/env python
	# -- coding:UTF-8 --

	import torch
	import torch.nn as nn
	import torch.nn.init as init


	def weight_init(m):
	'''
	from keras.callbacks import Callback
	import keras.backend as K
	import numpy as np

	class SGDRScheduler(Callback):
	'''Cosine annealing learning rate scheduler with periodic restarts.

	# Usage
	```python
	schedule = SGDRScheduler(min_lr=1e-5,
	import tensorflow as tf
	from keras.backend.tensorflow_backend import set_session
	config = tf.ConfigProto()
	config.gpu_options.per_process_gpu_memory_fraction = 0.9
	config.gpu_options.visible_device_list = "0"
	set_session(tf.Session(config=config))
	#!/usr/bin/env python
	# Implementation of algorithm from http://stackoverflow.com/a/22640362/6029703
	import numpy as np
	import pylab

	def thresholding_algo(y, lag, threshold, influence):
	signals = np.zeros(len(y))
	filteredY = np.array(y)
	avgFilter = [0]*len(y)
	stdFilter = [0]*len(y)
	import tensorflow as tf
	import numpy as np


	def gaussian_noise_layer(input_layer, std):
	noise = tf.random_normal(shape=tf.shape(input_layer), mean=0.0, stddev=std, dtype=tf.float32)
	return input_layer + noise


	inp = tf.placeholder(tf.float32, shape=[None, 8], name='input')