Skip to content

Instantly share code, notes, and snippets.

View rmax's full-sized avatar
:octocat:
ヾ(⌐■_■)ノ♪

R Max Espinoza rmax

:octocat:
ヾ(⌐■_■)ノ♪
View GitHub Profile
@rgaidot
rgaidot / gist:792451
Created January 23, 2011 21:24
Entity Extraction using NLTK
import nltk
text = """Barack Hussein Obama II (born August 4, 1961) is the 44th and current President of the United States. He is the first African American to hold the office. Obama previously served as a United States Senator from Illinois, from January 2005 until he resigned after his election to the presidency in November 2008."""
sentences = nltk.sent_tokenize(text)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences, binary=True)
def extract_entity_names(t):
@mallamanis
mallamanis / texttilling.py
Created December 31, 2011 18:47
Text-tilling implementation with nltk
#!/usr/bin/env python
import nltk
from nltk.stem.porter import PorterStemmer
def preprocessText(text):
# To lower case
text = text.lower();
# Tokenize text
@laserson
laserson / avrostreaming.py
Created February 11, 2014 18:55
Allow streaming of Avro data using the Python client. Simulates a seekable file type.
# The Python avro client expects a seekable Avro data file, which makes it annoying
# to stream bytes through it using HDFS clients that just give you cat (like snakebite).
# It's idiotic because the client only seeks to the end in order to call tell() to get
# the file size, which in turn is only used to determine when you get to EOF.
import snakebite.client
class AvroStreamWrapper(object):
# this class can be provided to DataFileReader to read Avro data.
def __init__(self, hdfs_client, path):
@PurpleBooth
PurpleBooth / README-Template.md
Last active June 17, 2024 16:45
A template to make good README.md

Project Title

One Paragraph of project description goes here

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

@arindampradhan
arindampradhan / spin.py
Last active August 12, 2019 05:38
Spinners for python | python
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import signal
from os import system
### MENU ###
# Here are all the elements you can import
# Box elements
@rcknr
rcknr / README.md
Last active July 19, 2018 12:36
Using Let's Encrypt certificates with Amazon API Gateway

##Using Let's Encrypt certificates with AWS API Gateway

Before starting off with API Gateway set up it's worth mentioning that certificate configuration for this particular service is so far isn't well integrated, therefore different from other AWS services. Despite it using CloudFrount to serve on custom domains it won't let you customize distributions it creates, however all the limitations of CloudFront naturally apply to API Gateway. The most important in this case is the size of the key, which is limited by 2048 bit. Many tutorials provide ready to use terminal commands that have the key size preset at 4096 bit for the sake of better security. This won't work with API Gateway and you'll get an error message about certificate's validity or incorrect chain which won't suggest you the real cause of the issue. Another consideration is that to add a custom domain to API Gateway you have to have a certif

@mcg1969
mcg1969 / spaces.md
Last active November 12, 2021 14:28
Conda hackery: namespaces

Conda Proposal: namespaces

Motivation

We would like to position Conda as a language-agnostic package manager, but at present it maintains a distinct bias towards Python. Given its origins this was expected and, frankly, reasonable. Nevertheless, as we begin to use it to subsume other packaging ecosystems, such as CRAN, NPM, Ruby Gems, etc., we are going to want to overcome this history; and one key challenge is to address naming conflicts across platforms.

@rmax
rmax / dask_avro.py
Last active September 17, 2018 19:28
An Avro reader for Dask (with fastavro)
"""A fastavro-based avro reader for Dask.
Disclaimer: This code was recovered from dask's distributed project.
"""
import io
import fastavro
import json
from dask import delayed
@rmax
rmax / dask_elasticsearch.py
Last active May 3, 2018 13:51
An Elasticsearch reader for Dask
from dask import delayed
from elasticsearch import Elasticsearch
from elasticsearch.helpers import scan
def read_elasticsearch(query=None, npartitions=8, client_cls=None,
client_kwargs=None, **kwargs):
"""Reads documents from Elasticsearch.
By default, documents are sorted by ``_doc``. For more information see the

WannaCry|WannaDecrypt0r NSA-Cyberweapon-Powered Ransomware Worm

  • Virus Name: WannaCrypt, WannaCry, WanaCrypt0r, WCrypt, WCRY
  • Vector: All Windows versions before Windows 10 are vulnerable if not patched for MS-17-010. It uses EternalBlue MS17-010 to propagate.
  • Ransom: between $300 to $600. There is code to 'rm' (delete) files in the virus. Seems to reset if the virus crashes.
  • Backdooring: The worm loops through every RDP session on a system to run the ransomware as that user. It also installs the DOUBLEPULSAR backdoor. It corrupts shadow volumes to make recovery harder. (source: malwarebytes)
  • Kill switch: If the website www.iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea.com is up the virus exits instead of infecting the host. (source: malwarebytes). This domain has been sinkholed, stopping the spread of the worm. Will not work if proxied (source).

update: A minor variant of the viru