Fabian Schreiber fabsta

## pandas_columns_to_dict.py
area_dict = dict(zip(lakes.area, lakes.count))

## pandas_read_csv.py
# often works
df = pd.read_csv('file.csv')
df = pd.read_csv('file.csv', header=0, index_col=0, quotechar='"',sep=':', na_values = ['na', '-', '.', ''])
# specifying "." and "NA" as missing values in the Last Name column and "." as missing values in Pre-Test Score column
df = pd.read_csv('../data/example.csv', na_values={'Last Name': ['.', 'NA'], 'Pre-Test Score': ['.']})
# skipping the top 3 rows
df = pd.read_csv('../data/example.csv', na_values=sentinels, skiprows=3)
# interpreting "," in strings around numbers as thousands separators
df = pd.read_csv('../data/example.csv', thousands=',')

## python_for_list.py
# list = [1, 3, 5, 7, 9]
# Using for loop
for i in list:
    print(i)
# 1
# 3

# for index
for i in range(length):
    print(list[i])

## Data science.md

      
              1 file
            
          
              2 forks
            
          
                3 comments
              
            
              9 stars
            
          
                fabsta
                / Data science.md
            
            
              Last active
              January 10, 2025 11:15
            
              
                Data science
              
          
    Pandas cheat sheet for data science


Pandas cheat sheet for data science
Statistics

Multi-variate analysis
Feature understanding


Preliminaries
Import


## reading_data.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                fabsta
                / reading_data.md
            
            
              Last active
              December 5, 2017 17:40
            
              
                Reading data #deeplearning #InputOutput
              
          
    minst

from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
>> ((60000, 28, 28), (60000,), (10000, 28, 28), (10000,))

X_test = np.expand_dims(X_test,1)
X_train = np.expand_dims(X_train,1)

  
## adding_data_augmentation.md

      
              2 files
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                fabsta
                / adding_data_augmentation.md
            
            
              Last active
              November 26, 2017 17:23
            
              
                [Reduce Overfitting] #deeplearning
              
          
    About data augmentation

Keras comes with very convenient features for automating data augmentation. You simply define what types and maximum amounts of augmentation you want, and keras ensures that every item of every batch randomly is changed according to these settings. Here's how to define a generator that includes data augmentation:
In [26]:
dim_ordering='tf' uses tensorflow dimension ordering,
which is the same order as matplotlib uses for display.
Therefore when just using for display purposes, this is more convenient
gen = image.ImageDataGenerator(rotation_range=10, width_shift_range=0.1, 

  
## examples.md

      
              2 files
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                fabsta
                / examples.md
            
            
              Last active
              November 25, 2017 13:36
            
              
                Visualization #datascience
              
          
    Observing Model Predictions

source: https://www.cs.utah.edu/~cmertin/dogs+cats+redux.html
First, we need to calculate the predictions on the validation set, since we know those labels, rather than looking at the test set.
In [19]:
vgg.model.load_weights(latest_weights_filename)
In [20]:

  
## clipping_predictions.md

      
              4 files
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                fabsta
                / clipping_predictions.md
            
            
              Last active
              November 30, 2017 13:21
            
              
                [Kaggle tipps] useful kaggle tips collected along the way #deeplearning
              
          
    Input

array([[  1.9247e-01,   7.2496e-04,   3.7586e-05,   2.4820e-05,   8.0483e-01,   1.4839e-03,
          3.4440e-06,   4.3349e-04],
       [  7.4949e-02,   2.5567e-04,   9.0141e-05,   2.7097e-04,   3.8967e-01,   8.0172e-04,
          4.2277e-04,   5.3354e-01],
       [  7.3892e-02,   8.5835e-04,   4.3923e-05,   8.5646e-04,   4.6396e-01,   4.9485e-05,
          1.5451e-03,   4.5879e-01],
       [  8.8657e-01,   2.1959e-03,   9.6101e-05,   3.6997e-04,   6.2324e-02,   1.6894e-05,
          3.1924e-05,   4.8398e-02]], dtype=float32)


## R_plotting.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                fabsta
                / R_plotting.md
            
            
              Created
              November 21, 2017 08:46
            
              
                R plotting
              
          
    Barplot (with order)
https://rstudio-pubs-static.s3.amazonaws.com/7433_4537ea5073dc4162950abb715f513469.html

  
## Filelink.md

      
              4 files
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                fabsta
                / Filelink.md
            
            
              Last active
              December 15, 2017 08:40
            
              
                Jupyter useful stuff #jupyter
              
          
    from IPython.display import FileLink
FileLink(file_name)
	# often works
	df = pd.read_csv('file.csv')
	df = pd.read_csv('file.csv', header=0, index_col=0, quotechar='"',sep=':', na_values = ['na', '-', '.', ''])
	# specifying "." and "NA" as missing values in the Last Name column and "." as missing values in Pre-Test Score column
	df = pd.read_csv('../data/example.csv', na_values={'Last Name': ['.', 'NA'], 'Pre-Test Score': ['.']})
	# skipping the top 3 rows
	df = pd.read_csv('../data/example.csv', na_values=sentinels, skiprows=3)
	# interpreting "," in strings around numbers as thousands separators
	df = pd.read_csv('../data/example.csv', thousands=',')
	# list = [1, 3, 5, 7, 9]
	# Using for loop
	for i in list:
	print(i)
	# 1
	# 3

	# for index
	for i in range(length):
	print(list[i])