Original content: https://github.com/aws-samples/data-pipeline-samples/tree/master/samples/DynamoDBImportCSV
-
appropriate IAM
-
ISM users to use aws with https://en.wikipedia.org/wiki/Principle_of_least_privilege
- Roles:
- [DataPipelineDefaultRole, AWSDataPipelineRole]
- [DataPipelineDefaultResourceRole, AmazonEC2RoleforDataPipelineRole]
- Group:
- [DataPipelineDevelopers, AWSDataPipeline_FullAccess]
Ref. https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-get-setup.html
- Custom IAM roles
Not create the VPC with the wizard but create a default vpc as I deleted the default vpc on us-west-2 while doing the hands-on experience for the previous cert. test
Ref. https://docs.aws.amazon.com/vpc/latest/userguide/default-vpc.html#create-default-vpc
- [aws-workshop, workshop]
- [ARN_FOR_CREATED_SNS_TOPIC]
Ref. https://docs.aws.amazon.com/sns/latest/dg/sns-getting-started.html
-
aws-workshop-dynamodb id number
-
tags: [env: dev, user: SET_USER_HERE]
-
submit init data
aws dynamodb put-item --table-name aws-workshop-dynamodb --item file://aws-workshop-dynamodb-init-data.json --return-consumed-capacity TOTAL
Ref. https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-prereq.html
https://docs.aws.amazon.com/cli/latest/reference/dynamodb/put-item.html
- hello-aws-workshop-us-west-2-201901, us-west-2
- tags: [env: dev, user: SET_USER_HERE]
- folder: [data, log]
Aceess to data folder
Ref. https://docs.aws.amazon.com/AmazonS3/latest/gsg/GetStartedWithS3.html
- aws-workshop-data-pipeline
- [s3://us-west-2-aws-workshop-s3-bucket/data/, aws-workshop-dynamodb, us-west-2, s3://us-west-2-aws-workshop-s3-bucket/log/]
- tags: [env: dev, user: SET_USER_HERE]
DROP TABLE IF EXISTS tempHiveTable;
DROP TABLE IF EXISTS s3TempTable;
CREATE EXTERNAL TABLE tempHiveTable (#{myDDBColDefn})
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "#{myDDBTableName}", "dynamodb.column.mapping" = "#{myDDBTableColMapping}");
CREATE EXTERNAL TABLE s3TempTable (#{myS3ColMapping})
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '
' LOCATION '#{myInputS3Loc}'
tblproperties ("skip.header.line.count"="1");
INSERT OVERWRITE TABLE tempHiveTable SELECT * FROM s3TempTable;
Ref.
https://stackoverflow.com/questions/15751999/hive-external-table-skip-first-row
Ref.
https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emrcluster.html
https://aws.amazon.com/emr/pricing/
- unsubscribe SNS notification to click the link in the email body to unsubscribe
- delete the data pilelime
- delete the dynamoDB
- delete the topic for the SnsAlarm
- delete the topic for dynamoDb alert
some alternated parameter
hive script : additional statement to skip the column line in the csv file