This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
provider "aws" { | |
region = var.region | |
} | |
variable "region" {} | |
variable "account_id" {} | |
output "bucket_name" { | |
value = aws_s3_bucket.data.id | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
REGION := us-east-1 | |
ACCOUNT_ID := $(shell aws sts get-caller-identity --query Account --output text) | |
init: | |
@terraform init -reconfigure -upgrade | |
clean: | |
@rm -rf .terraform terraform.* | |
plan: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from ftplib import FTP | |
from re import compile | |
names = list() | |
archive = compile('[^0-9](\.xml\.bz2|\.xml\.gz)$') | |
ftp = FTP('ftp.acc.umu.se') | |
ftp.login() | |
ftp.cwd('mirror/wikimedia.org/dumps/enwiki/20201020/') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from ftplib import FTP | |
ftp = FTP('ftp.acc.umu.se') | |
ftp.login() | |
ftp.cwd('mirror/wikimedia.org/dumps/enwiki/20201020/') | |
filename = 'enwiki-20201020-langlinks.sql.gz' | |
with open(filename, 'wb') as fp: | |
ftp.retrbinary(f'RETR {filename}', fp.write) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from ftplib import FTP | |
from hashlib import md5, sha1 | |
ftp = FTP('ftp.acc.umu.se') | |
ftp.login() | |
ftp.cwd('mirror/wikimedia.org/dumps/enwiki/20201020/') | |
filename = 'enwiki-20201020-langlinks.sql.gz' | |
md5data = md5() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
filename = 'enwiki-20201020-langlinks.sql.gz' | |
s3client = boto3.client('s3') | |
with open(filename, 'rb') as fp: | |
s3client.put_object(Bucket='la-labs-279215538049', Key=filename, Body=fp.read()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
filename = 'enwiki-20201020-langlinks.sql.gz' | |
s3client = boto3.client('s3') | |
with open(filename, 'rb') as fp: | |
s3client.upload_fileobj(fp, 'la-labs-279215538049', filename) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
parts = list() | |
chunksize = 32 * 1024 * 1024 | |
filename = 'enwiki-20201020-langlinks.sql.gz' | |
s3client = boto3.client('s3') | |
with open(filename, 'rb') as fp: | |
part = 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
from ftplib import FTP | |
parts = list() | |
chunksize = 32 * 1024 * 1024 | |
filename = 'enwiki-20201020-langlinks.sql.gz' | |
s3client = boto3.client('s3') | |
ftp = FTP('ftp.acc.umu.se') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
from ftplib import FTP | |
parts = list() | |
chunksize = 32 * 1024 * 1024 | |
filename = 'enwiki-20201020-langlinks.sql.gz' | |
s3client = boto3.client('s3') | |
ftp = FTP('ftp.acc.umu.se') |