Ayla Khan a-y-khan

## tweet_deleting_script.py
# -*- coding: utf-8 -*-

""" Deletes all tweets below a certain retweet threshold.
"""

import tweepy
from datetime import datetime

# Constants
CONSUMER_KEY = ''

## astar.py
# Credit for this: Nicholas Swift
# as found at https://medium.com/@nicholas.w.swift/easy-a-star-pathfinding-7e6689c7f7b2
from warnings import warn
import heapq

class Node:
    """
    A node class for A* Pathfinding
    """

## science_installs.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                stirlingw
                / science_installs.md
            
            
              Last active
              March 6, 2019 03:06
            
          
    Introduction to Installing PySpark & Jupyter Notebooks on Mac OSX

Spark is used for large-scale distributed data processing.  It has become the go to standard for a lot of companies in the technology industry.  The Spark framework is capable of computing at high speeds, processing massive amounts of resilient sets of data, and it does it all while computing in a highly distributed manner.
Jupyter Notebooks, commenly called "Jupyter", has been a popular application within the Data Science community for many years.   It enables you to edit, run, and share Python code into a web view.  It allows you to execute your code in a step by step process in order to share parts of your code in a very flexible way for data analysis work.  This is why Jupyter is a great tool to prototype in, and should be used at all companies that are data centric.
Why use PySpark in a Jupyter Notebook?

Most data engineers argue that the Scala programming language version is more performant than Python version, and it is. Howev

  
## callback_retry_clear_subdag.py
import logging
from airflow.models import DagBag

def callback_subdag_clear(context):
    """Clears a subdag's tasks on retry."""
    dag_id = "{}.{}".format(
        context['dag'].dag_id,
        context['ti'].task_id,
    )
    execution_date = context['execution_date']

## 0_article.md

      
              3 files
            
          
              6 forks
            
          
              6 comments
            
          
              16 stars
            
          
                hemebond
                / 0_article.md
            
            
              Last active
              March 13, 2022 11:51
            
              
                A SaltStack AWS Auto Scaling Solution
              
          
    A SaltStack AWS Auto Scaling Solution

Overview

The AWS Auto Scaling Goup, configured with a customised Cloud-Init file, sends a notification to an SNS Topic,
which in turn passes it onto an SQS queue that the Salt Master is subscribed to. A Reactor watches for the auto
scaling events and pre-approves the new minion based on its Auto Scaling group name and instance ID.
Salt Master Configuration


## External_GTest.cmake
find_package(Threads REQUIRED)

ExternalProject_Add(
  googletest
  GIT_REPOSITORY https://github.com/google/googletest.git
  UPDATE_COMMAND ""
  INSTALL_COMMAND ""
  LOG_DOWNLOAD ON
  LOG_CONFIGURE ON
  LOG_BUILD ON)

## atom_clojure_setup.md

      
              3 files
            
          
              51 forks
            
          
              20 comments
            
          
              403 stars
            
          
                jasongilman
                / atom_clojure_setup.md
            
            
              Last active
              May 11, 2024 02:25
            
              
                This describes how I setup Atom for Clojure Development.
              
          
    Atom Clojure Setup

This describes how I setup Atom for an ideal Clojure development workflow. This fixes indentation on newlines, handles parentheses, etc. The keybinding settings for enter (in keymap.cson) are important to get proper newlines with indentation at the right level. There are other helpers in init.coffee and keymap.cson that are useful for cutting, copying, pasting, deleting, and indenting Lisp expressions.
Install Atom

Download Atom
The Atom documentation is excellent. It's highly worth reading the flight manual.

  
## threadedgenerator.py
# A simple generator wrapper, not sure if it's good for anything at all.
# With basic python threading
from threading import Thread

try:
    from queue import Queue

except ImportError:
    from Queue import Queue


## gist:8172796

      
              1 file
            
          
              404 forks
            
          
              23 comments
            
          
              1644 stars
            
          
                debasishg
                / gist:8172796
            
            
              Last active
              May 10, 2024 13:37
            
              
                A collection of links for streaming algorithms and data structures
              
          
    General Background and Overview


Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&amp;rep=rep1&amp;t


## middlewares.py
import os
import random
from scrapy.conf import settings
class RandomUserAgentMiddleware(object):
    def process_request(self, request, spider):
        ua  = random.choice(settings.get('USER_AGENT_LIST'))
        if ua:
            request.headers.setdefault('User-Agent', ua)

class ProxyMiddleware(object):
	# -- coding: utf-8 --

	""" Deletes all tweets below a certain retweet threshold.
	"""

	import tweepy
	from datetime import datetime

	# Constants
	CONSUMER_KEY = ''
	# Credit for this: Nicholas Swift
	# as found at https://medium.com/@nicholas.w.swift/easy-a-star-pathfinding-7e6689c7f7b2
	from warnings import warn
	import heapq

	class Node:
	"""
	A node class for A* Pathfinding
	"""
	import logging
	from airflow.models import DagBag

	def callback_subdag_clear(context):
	"""Clears a subdag's tasks on retry."""
	dag_id = "{}.{}".format(
	context['dag'].dag_id,
	context['ti'].task_id,
	)
	execution_date = context['execution_date']
	find_package(Threads REQUIRED)

	ExternalProject_Add(
	googletest
	GIT_REPOSITORY https://github.com/google/googletest.git
	UPDATE_COMMAND ""
	INSTALL_COMMAND ""
	LOG_DOWNLOAD ON
	LOG_CONFIGURE ON
	LOG_BUILD ON)
	# A simple generator wrapper, not sure if it's good for anything at all.
	# With basic python threading
	from threading import Thread

	try:
	from queue import Queue

	except ImportError:
	from Queue import Queue
	import os
	import random
	from scrapy.conf import settings
	class RandomUserAgentMiddleware(object):
	def process_request(self, request, spider):
	ua = random.choice(settings.get('USER_AGENT_LIST'))
	if ua:
	request.headers.setdefault('User-Agent', ua)

	class ProxyMiddleware(object):