One Paragraph of project description goes here
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
import nltk | |
text = """Barack Hussein Obama II (born August 4, 1961) is the 44th and current President of the United States. He is the first African American to hold the office. Obama previously served as a United States Senator from Illinois, from January 2005 until he resigned after his election to the presidency in November 2008.""" | |
sentences = nltk.sent_tokenize(text) | |
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences] | |
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences] | |
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences, binary=True) | |
def extract_entity_names(t): |
#!/usr/bin/env python | |
import nltk | |
from nltk.stem.porter import PorterStemmer | |
def preprocessText(text): | |
# To lower case | |
text = text.lower(); | |
# Tokenize text |
# The Python avro client expects a seekable Avro data file, which makes it annoying | |
# to stream bytes through it using HDFS clients that just give you cat (like snakebite). | |
# It's idiotic because the client only seeks to the end in order to call tell() to get | |
# the file size, which in turn is only used to determine when you get to EOF. | |
import snakebite.client | |
class AvroStreamWrapper(object): | |
# this class can be provided to DataFileReader to read Avro data. | |
def __init__(self, hdfs_client, path): |
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- | |
import sys | |
import signal | |
from os import system | |
### MENU ### | |
# Here are all the elements you can import | |
# Box elements |
##Using Let's Encrypt certificates with AWS API Gateway
Before starting off with API Gateway set up it's worth mentioning that certificate configuration for this particular service is so far isn't well integrated, therefore different from other AWS services. Despite it using CloudFrount to serve on custom domains it won't let you customize distributions it creates, however all the limitations of CloudFront naturally apply to API Gateway. The most important in this case is the size of the key, which is limited by 2048 bit. Many tutorials provide ready to use terminal commands that have the key size preset at 4096 bit for the sake of better security. This won't work with API Gateway and you'll get an error message about certificate's validity or incorrect chain which won't suggest you the real cause of the issue. Another consideration is that to add a custom domain to API Gateway you have to have a certif
We would like to position Conda as a language-agnostic package manager, but at present it maintains a distinct bias towards Python. Given its origins this was expected and, frankly, reasonable. Nevertheless, as we begin to use it to subsume other packaging ecosystems, such as CRAN, NPM, Ruby Gems, etc., we are going to want to overcome this history; and one key challenge is to address naming conflicts across platforms.
"""A fastavro-based avro reader for Dask. | |
Disclaimer: This code was recovered from dask's distributed project. | |
""" | |
import io | |
import fastavro | |
import json | |
from dask import delayed |
from dask import delayed | |
from elasticsearch import Elasticsearch | |
from elasticsearch.helpers import scan | |
def read_elasticsearch(query=None, npartitions=8, client_cls=None, | |
client_kwargs=None, **kwargs): | |
"""Reads documents from Elasticsearch. | |
By default, documents are sorted by ``_doc``. For more information see the |
www.iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea.com
is up the virus exits instead of infecting the host. (source: malwarebytes). This domain has been sinkholed, stopping the spread of the worm. Will not work if proxied (source).update: A minor variant of the viru