Skip to content

Instantly share code, notes, and snippets.

@andyyuan78
Forked from bwhitman/msd.txt
Last active August 29, 2015 14:20
Show Gist options
  • Save andyyuan78/c7e50b74b8cb816a1742 to your computer and use it in GitHub Desktop.
Save andyyuan78/c7e50b74b8cb816a1742 to your computer and use it in GitHub Desktop.
The toughest part was getting access to an EC2 instance. I followed
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html#access-ec2
To set up the aws command line interface, I followed
http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html
-> http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html
-> http://docs.aws.amazon.com/cli/latest/userguide/installing.html#install-bundle-other-os
(for my MacBook)
-> http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html
I don't recall how I set up the public/private key pair, but it wasn't
that hard.
Once I had a running default, mininum-cost default Ubuntu EC2 instance
running in the us-east-1 region ("N. Virginia"), I was able to use the
AWS web EC2 Dashboard to create a EBS instance from the Million Song
Dataset snapshot, snap-5178cf30, following the directions on:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-public-data-sets.html#using-public-data-sets-launching-mounting
then attach it to my Ubuntu instance following:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-attaching-volume.html
Then I ssh'd into the instance:
> ssh -i ~/Documents/aws/amazonec2.pem ubuntu@54.173.207.160
(where amazonec2.pem is the security certificate I created when setting up ec2)
Because I had already attached the snapshot, it was already there:
ubuntu@ip-172-30-0-62:~$ sudo file -s /dev/xvdf
/dev/xvdf: Linux rev 1.0 ext3 filesystem data,
UUID=21a8ff2f-0b14-46a8-8e69-62951b27dfd4 (large files)
So I just had to mount it:
ubuntu@ip-172-30-0-62:~$ sudo mkdir /mnt/snap
ubuntu@ip-172-30-0-62:~$ sudo mount -t ext4 /dev/xvdf /mnt/snap
ubuntu@ip-172-30-0-62:~$ ls /mnt/snap
AdditionalFiles data LICENSE lost+found README
ubuntu@ip-172-30-0-62:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 7.8G 808M 6.6G 11% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 492M 12K 492M 1% /dev
tmpfs 100M 328K 99M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 497M 0 497M 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/xvdf 493G 272G 196G 59% /mnt/snap
The 493G partition at the end (which is only 272G used) is the MSD
data. You could scp it off that linux instance, but it probably makes
sense to run your processing on the EC2 instance itself.
DAn.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment