Daniel Grady DGrady

## oracle-query.org

      
              1 file
            
          
              8 forks
            
          
              9 comments
            
          
              36 stars
            
          
                DGrady
                / oracle-query.org
            
            
              Last active
              March 21, 2024 11:57
            
              
                Example of querying an Oracle database using Python, SQLAlchemy, and Pandas
              
          
    Query Oracle databases with Python and SQLAlchemy

N.B. SQLAlchemy now incorporates all of this information in its documentation; I’m leaving this post here, but recommend referring to SQLAlchemy instead of these instructions.
Install requirements


  We’ll assume you already have SQLAlchemy and Pandas installed; these are included by default in many Python distributions.
  Install the cx_Oracle package in your Python environment, using either pip or conda, for example:


## scikit-learn-character-tokenization.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                DGrady
                / scikit-learn-character-tokenization.ipynb
            
            
              Created
              September 18, 2019 17:05
            
              
                Demonstration of the `char_wb` tokenization strategy in scikit-learn
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## subprocess_filter.py
"""
Problem: provide two-way communication with a subprocess in Python.

See also:
- https://kevinmccarthy.org/2016/07/25/streaming-subprocess-stdin-and-stdout-with-asyncio-in-python/
- http://eli.thegreenplace.net/2017/interacting-with-a-long-running-child-process-in-python/
"""

import asyncio
import sys

## frequency_histogram.py
import numpy as np
import pandas as pd


def frequency_histogram(
    data: pd.DataFrame,
    n_bins=20,
    bins=None,
    log_bins=False,
    normalize=False,

## random_seed.py
# To reproduce a random sample, we need a fixed seed.

"{:_}".format(np.random.randint(np.iinfo(np.uint32).max))

## flatten_spark_schema.py
"""
The schemas that Spark produces for DataFrames are typically
nested, and these nested schemas are quite difficult to work with
interactively. In many cases, it's possible to flatten a schema
into a single level of column names.
"""

import typing as T

import cytoolz.curried as tz

## template-xgboost.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                DGrady
                / template-xgboost.ipynb
            
            
              Last active
              June 24, 2019 23:01
            
              
                A template for XGBoost models
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## README.org

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                DGrady
                / README.org
            
            
              Last active
              March 11, 2019 19:10
            
              
                Pretty printing delimited text files at the command line
              
          
    Pretty printing delimited text files at the command line

Sometimes, you’d like to look at delimited files on the command line:
cat test.csv


## custom.js
// Automatically hide the toolbar and header in Jupyter Notebook 4.1.0
// This should go in ~/.jupyter/custom/custom.js
require(
    ['base/js/namespace', 'base/js/events'],
    function(Jupyter, events) {
        events.on("notebook_loaded.Notebook", function () {
            Jupyter.toolbar.actions.call('jupyter-notebook:toggle-toolbar')
            Jupyter.toolbar.actions.call('jupyter-notebook:toggle-header')
        })
    }

## describe_population.py
import pandas as pd

def describe_population(df: pd.DataFrame) -> pd.DataFrame:
    """
    Report the populated and uniqueness counts for each column of the input.

    The ratio columns are given as percents.
    """

    N = len(df)
	"""
	Problem: provide two-way communication with a subprocess in Python.

	See also:
	- https://kevinmccarthy.org/2016/07/25/streaming-subprocess-stdin-and-stdout-with-asyncio-in-python/
	- http://eli.thegreenplace.net/2017/interacting-with-a-long-running-child-process-in-python/
	"""

	import asyncio
	import sys
	import numpy as np
	import pandas as pd


	def frequency_histogram(
	data: pd.DataFrame,
	n_bins=20,
	bins=None,
	log_bins=False,
	normalize=False,
	# To reproduce a random sample, we need a fixed seed.

	"{:_}".format(np.random.randint(np.iinfo(np.uint32).max))
	"""
	The schemas that Spark produces for DataFrames are typically
	nested, and these nested schemas are quite difficult to work with
	interactively. In many cases, it's possible to flatten a schema
	into a single level of column names.
	"""

	import typing as T

	import cytoolz.curried as tz
	// Automatically hide the toolbar and header in Jupyter Notebook 4.1.0
	// This should go in ~/.jupyter/custom/custom.js
	require(
	['base/js/namespace', 'base/js/events'],
	function(Jupyter, events) {
	events.on("notebook_loaded.Notebook", function () {
	Jupyter.toolbar.actions.call('jupyter-notebook:toggle-toolbar')
	Jupyter.toolbar.actions.call('jupyter-notebook:toggle-header')
	})
	}
	import pandas as pd

	def describe_population(df: pd.DataFrame) -> pd.DataFrame:
	"""
	Report the populated and uniqueness counts for each column of the input.

	The ratio columns are given as percents.
	"""

	N = len(df)