David Yerrington dyerrington

## subplots.py
##
# Create a figure space matrix consisting of 3 columns and 2 rows
#
# Here is a useful template to use for working with subplots.
#
##################################################################
fig, ax = plt.subplots(figsize=(10,5), ncols=3, nrows=2)

left   =  0.125  # the left side of the subplots of the figure
right  =  0.9    # the right side of the subplots of the figure

## linear_regression_kfold_cross_validation.py
# k-fold regression
# we need our modules for this:
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import cross_val_score, cross_val_predict
from matplotlib import pyplot as plt

from sklearn import metrics

# Make the plots bigger
plt.rcParams['figure.figsize'] = 10, 10

## hiring_guidelines.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                dyerrington
                / hiring_guidelines.md
            
            
              Created
              July 17, 2019 20:27
            
          
    Great Data Science Project Criteria:


Problem statement that defines a measurable, and/or falsifiable outcome.  “Frequency of [specific event] is influential over [some outcome]”. “Users who use [some feature in app] are differentiable from users who less frequently use [some feature in app]”. etc.  If you can’t frame a data problem properly, none of has it has purpose.  The biggest challenge in data science is making sense and defining the gray area of business problems.  This also comes with experience.
EDA EDA EDA.  Define your scope.  Report only what is necessary and relevant to your problem statement.  If the model reports only 4-5 common variables as parameters (logistic regression for instance), focus on those when summarizing your work in terms of EDA.
How much data is necessary to make this analysis work?  Are you sampling?  Is a t-test necessary to gain assurance or a rank order test?
Explain which model makes the most sense to use. Are you trying to gain inference about a data problem?


## google_translate_api_demo.ipynb

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                dyerrington
                / google_translate_api_demo.ipynb
            
            
              Last active
              November 9, 2022 19:35
            
              
                Google Translate API demo tested with Python 3.9.x. I want to say this may not work so well with Python 3.10 for some reason but if you follow the guide I referenced otherwise, you should be in business. Highly recommended that you create a new Python environment before engaging with any serious development if you haven't done so.
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## config.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                dyerrington
                / config.md
            
            
              Last active
              July 8, 2022 17:42
            
          
Installation

It's recommended that you install the requirements for these t5 models in a new environment since they are known to conflict with common Python package requirements in the scientific Python Stack.
Conda ENV Setup

Create

conda create -n nlp-t5

  
## finplot_example.py
import finplot as fplt
import yfinance

df = yfinance.download('AAPL')
d = df[['Open', 'Close', 'High', 'Low']].reset_index(drop=True)
fplt.candlestick_ochl(d)
fplt.show()

## daves_pearson.py


def daves_pearson(corr, threashold = False, empty_dimensions = True, title = False):
    """
        Based on http://seaborn.pydata.org/examples/many_pairwise_correlations.html

        Parameters
        -----------------------
        corr : Pandas Pearson coorelation matrix object
        threashold : Threashold filter for absolute score value.  Useful to suppress display of

## basic_linear_regression.py
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes
from sklearn.cross_validation import train_test_split

# We load some test data
data = load_diabetes()

# Put it in a data frame for future reference -- or you work from your own dataframe
df = pd.DataFrame(data['data'])

## overfitting_decision_trees.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              5 stars
            
          
                dyerrington
                / overfitting_decision_trees.md
            
            
              Created
              May 12, 2017 19:10
            
              
                Re: Why are decision trees prone to overfitting, I’ll do my best --
              
          
    Overfitting in decision trees

Overfitting can be one problem that describes if your model no longer generalizes well.
Overfitting happens when any learning processing overly optimizes training set error at the cost test error.  While it’s possible for training and testing to perform equality well in cross validation, it could be as the result of the data being very close in characteristics, which may not be a huge problem.  In the case of decision tree’s they can learn a training set to a point of high granularity that makes them easily overfit.  Allowing a decision tree to split to a granular degree, is the behavior of this model that makes it prone to learning every point extremely well — to the point of perfect classification — ie: overfitting.
I recommend the following steps to avoid overfitting:

Use a test set that is not exactly like the training set, or different enough that error rates are going to be easy to see.


## craigstlist_spider_images.py
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

import scrapy

# item models
from craigslist.items import CraigslistItem, CraigslistItemDetail, CraigslistImage

class CraigslistSpider(CrawlSpider):
	##
	# Create a figure space matrix consisting of 3 columns and 2 rows
	#
	# Here is a useful template to use for working with subplots.
	#
	##################################################################
	fig, ax = plt.subplots(figsize=(10,5), ncols=3, nrows=2)

	left = 0.125 # the left side of the subplots of the figure
	right = 0.9 # the right side of the subplots of the figure
	# k-fold regression
	# we need our modules for this:
	from sklearn.linear_model import LinearRegression
	from sklearn.cross_validation import cross_val_score, cross_val_predict
	from matplotlib import pyplot as plt

	from sklearn import metrics

	# Make the plots bigger
	plt.rcParams['figure.figsize'] = 10, 10
	import finplot as fplt
	import yfinance

	df = yfinance.download('AAPL')
	d = df[['Open', 'Close', 'High', 'Low']].reset_index(drop=True)
	fplt.candlestick_ochl(d)
	fplt.show()


	def daves_pearson(corr, threashold = False, empty_dimensions = True, title = False):
	"""
	Based on http://seaborn.pydata.org/examples/many_pairwise_correlations.html

	Parameters
	-----------------------
	corr : Pandas Pearson coorelation matrix object
	threashold : Threashold filter for absolute score value. Useful to suppress display of
	from sklearn.linear_model import LinearRegression
	from sklearn.datasets import load_diabetes
	from sklearn.cross_validation import train_test_split

	# We load some test data
	data = load_diabetes()

	# Put it in a data frame for future reference -- or you work from your own dataframe
	df = pd.DataFrame(data['data'])
	from scrapy.spiders import CrawlSpider, Rule
	from scrapy.linkextractors import LinkExtractor

	import scrapy

	# item models
	from craigslist.items import CraigslistItem, CraigslistItemDetail, CraigslistImage

	class CraigslistSpider(CrawlSpider):