qheuristics

## Automating Salesforce Data Extraction Using Python.ipynb

      
              1 file
            
          
              5 forks
            
          
              11 comments
            
          
              29 stars
            
          
                rapatil
                / Automating Salesforce Data Extraction Using Python.ipynb
            
            
              Last active
              March 22, 2024 05:11
            
              
                Approach: Automating Salesforce Data Extraction Using Python 
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## layers.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                datajoely
                / layers.md
            
            
              Last active
              September 4, 2023 10:22
            
              
                Kedro data layers
              
          
Layer
Order
Description


raw
Sequential
Initial start of the pipeline, containing the sourced data model(s) that should never be changed, it forms your single source of truth to work from. These data models can be un-typed in most cases e.g. csv, but this will vary from case to case. Given the relative cost of storage today, painful experience suggests it's safer to never work with the original data directly!


intermediate
Sequential
This stage is optional if your data is already typed. Typed representation of the raw layer e.g. converting string based values into their current typed representation as numbers, dates etc. Our recommended approach is to mirror the raw layer in a typed format like Apache Parquet. Avoid transforming the structure of the data, but simple operations like cleaning up field names or 'unioning' mutli-part CSVs are permitted.


primary
Sequential


## install.sh
# setup docker-compose
sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

# setup airflow 1.10.14
git clone https://github.com/xnuinside/airflow_in_docker_compose
cd airflow_in_docker_compose
docker-compose -f docker-compose-with-celery-executor.yml up --build

## sub-sub-command.py
"""
Example of using sub-parser, sub-commands and sub-sub-commands :-)
"""

import argparse


def main(args):
    """
    Just do something

## custom_sort_order.py
import pandas as pd
import numpy as np

def generate_random_dates(num_dates: int) -> np.array:
	"""Generate a 1D array of `num_dates` random dates.
	"""
	start_date = "2020-01-01"
	# Generate all days for 2020
	available_dates = [np.datetime64(start_date) + days for days in range(365)]
	# Get `num_dates` random dates from 2020

## prefect_coiled_demo.ipynb

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              3 stars
            
          
                cicdw
                / prefect_coiled_demo.ipynb
            
            
              Last active
              December 7, 2020 21:27
            
              
                Outline of Prefect + Coiled demo
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 2020_01_12a_nn_least_sq_optimization_with_pyomo_and_ipopt.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                aflaxman
                / 2020_01_12a_nn_least_sq_optimization_with_pyomo_and_ipopt.ipynb
            
            
              Created
              January 12, 2020 20:51
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## 01-explanation-of-python-logging-and-the-root-logger.md

      
              3 files
            
          
              2 forks
            
          
              0 comments
            
          
              6 stars
            
          
                gene1wood
                / 01-explanation-of-python-logging-and-the-root-logger.md
            
            
              Last active
              February 8, 2023 16:09
            
              
                Explanation of the relationship between python logging root logger and other loggers
              
          
    Explanation of the relationship between python logging root logger and other loggers


## read_csv_files_in_tar_gz_from_s3_bucket.py
# -- read csv files from tar.gz in S3 with S3FS and tarfile (https://s3fs.readthedocs.io/en/latest/)

bucket = 'mybucket'
key = 'mycompressed_csv_files.tar.gz'

import s3fs
import tarfile
import io
import pandas as pd

## ds-project-organization.md

      
              1 file
            
          
              46 forks
            
          
              21 comments
            
          
              249 stars
            
          
                ericmjl
                / ds-project-organization.md
            
            
              Last active
              July 28, 2024 13:13
            
              
                How to organize your Python data science project
              
          
    UPDATE: I have baked the ideas in this file inside a Python CLI tool called pyds-cli. Please find it here: https://github.com/ericmjl/pyds-cli
How to organize your Python data science project

Having done a number of data projects over the years, and having seen a number of them up on GitHub, I've come to see that there's a wide range in terms of how "readable" a project is. I'd like to share some practices that I have come to adopt in my projects, which I hope will bring some organization to your projects.
Disclaimer: I'm hoping nobody takes this to be "the definitive guide" to organizing a data project; rather, I hope you, the reader, find useful tips that you can adapt to your own projects.
Disclaimer 2: What I’m writing below is primarily geared towards Python language users. Some ideas may be transferable to other languages; others may not be so. Please feel free to remix whatever you see here!
Layer	Order	Description
`raw`	Sequential	Initial start of the pipeline, containing the sourced data model(s) that should never be changed, it forms your single source of truth to work from. These data models can be un-typed in most cases e.g. `csv`, but this will vary from case to case. Given the relative cost of storage today, painful experience suggests it's safer to never work with the original data directly!
`intermediate`	Sequential	This stage is optional if your data is already typed. Typed representation of the raw layer e.g. converting string based values into their current typed representation as numbers, dates etc. Our recommended approach is to mirror the raw layer in a typed format like Apache Parquet. Avoid transforming the structure of the data, but simple operations like cleaning up field names or 'unioning' mutli-part CSVs are permitted.
`primary`	Sequential
	# setup docker-compose
	sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
	sudo chmod +x /usr/local/bin/docker-compose
	sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

	# setup airflow 1.10.14
	git clone https://github.com/xnuinside/airflow_in_docker_compose
	cd airflow_in_docker_compose
	docker-compose -f docker-compose-with-celery-executor.yml up --build
	"""
	Example of using sub-parser, sub-commands and sub-sub-commands :-)
	"""

	import argparse


	def main(args):
	"""
	Just do something
	import pandas as pd
	import numpy as np

	def generate_random_dates(num_dates: int) -> np.array:
	"""Generate a 1D array of `num_dates` random dates.
	"""
	start_date = "2020-01-01"
	# Generate all days for 2020
	available_dates = [np.datetime64(start_date) + days for days in range(365)]
	# Get `num_dates` random dates from 2020
	# -- read csv files from tar.gz in S3 with S3FS and tarfile (https://s3fs.readthedocs.io/en/latest/)

	bucket = 'mybucket'
	key = 'mycompressed_csv_files.tar.gz'

	import s3fs
	import tarfile
	import io
	import pandas as pd