Skip to content

Instantly share code, notes, and snippets.

View iandow's full-sized avatar

Ian Downard iandow

View GitHub Profile
############################################################################
# This code shows how to split large text documents along sentence
# boundaries using NLTK and process each chunk with AWS Translate.
############################################################################
# Be sure to first install nltk and boto3
import nltk.data
import boto3
# Define the source document that needs to be translated
source_document = "My little pony heart is yours..."
# Tell the NLTK data loader to look for resource files in /tmp/
{
"tracks": [
{
"track_type": "General",
"count": "331",
"count_of_stream_of_this_kind": "1",
"kind_of_stream": "General",
"other_kind_of_stream": [
"General"
],
# Tell the NLTK data loader to look for resource files in /tmp/
nltk.data.path.append("/tmp/")
# Download NLTK tokenizers to /tmp/
# We use /tmp because that's where AWS Lambda provides write access to the local file system.
nltk.download('punkt', download_dir='/tmp/')
# Load the English language tokenizer
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
# Split input text into a list of sentences
sentences = tokenizer.tokenize(transcript)
print("Input text length: " + str(len(transcript)))
$ aws cloudformation describe-stack-events --stack-name mas1
{
"StackEvents": [
{
"StackId": "arn:aws:cloudformation:us-west-2:773074507832:stack/mas1/e0df81c0-54c7-11e9-9a27-0ae01deb57d8",
"EventId": "49ea43c0-54c9-11e9-bfc1-064b816f3a4a",
"StackName": "mas1",
"LogicalResourceId": "mas1",
"PhysicalResourceId": "arn:aws:cloudformation:us-west-2:773074507832:stack/mas1/e0df81c0-54c7-11e9-9a27-0ae01deb57d8",
"ResourceType": "AWS::CloudFormation::Stack",
[Feb 22, 2019 1:49:45 PM]: Integration test for 'Connection' started.
[Feb 22, 2019 1:49:45 PM]: Using Radoop version 9.1.0.
[Feb 22, 2019 1:49:45 PM]: Running 8 tests: [Fetch dynamic settings, NameNode networking, DataNode networking, YARN services networking, MapReduce, HDFS upload, Radoop jar upload, Import job]
[Feb 22, 2019 1:49:45 PM]: Running test 1/8: Fetch dynamic settings
[Feb 22, 2019 1:49:45 PM]: Retrieving required configuration properties...
[Feb 22, 2019 1:49:45 PM]: Successfully fetched property: hive.execution.engine
[Feb 22, 2019 1:49:45 PM]: Successfully fetched property: yarn.resourcemanager.scheduler.address
[Feb 22, 2019 1:49:45 PM]: Successfully fetched property: yarn.resourcemanager.resource-tracker.address
[Feb 22, 2019 1:49:45 PM]: Successfully fetched property: yarn.resourcemanager.admin.address
[Feb 22, 2019 1:49:45 PM]: Successfully fetched property: yarn.app.mapreduce.am.staging-dir
@iandow
iandow / license.txt
Last active February 12, 2019 23:06
-----BEGIN SIGNED MESSAGE-----
clusterid: "5427061983691452296"
version: "4.0"
customerid: "ignore"
issuer: "MapR Technologies"
licType: Demo
description: "MapR Enterprise Trial Edition"
enforcement: HARD
gracePeriod: 0
issuedate: 1550010076
# Copyright (c) 2009 & onwards. MapR Tech, Inc., All rights reserved
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: test-static-pvc
namespace: test-csi
spec:
accessModes:
- ReadWriteOnce
resources:
@iandow
iandow / teststaticpv.yaml
Last active January 29, 2019 21:40
teststaticpv.yaml
# Copyright (c) 2009 & onwards. MapR Tech, Inc., All rights reserved
apiVersion: v1
kind: PersistentVolume
metadata:
name: test-static-pv
namespace: test-csi
spec:
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
@iandow
iandow / mapred-site.xml
Created January 28, 2019 22:42
mapred-site.xml
<configuration>
<property>
<name>mapreduce.jobhistory.address</name>
<value>nodec:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>nodec:19888</value>
</property>
<property>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>nodeb</value>
<description>host is the hostname of the resourcemanager</description>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>