MatthewDaniels/Upload file to S3 - Multipart

## readme.md

      
    Raw
  

              readme.md
            
          
    Javascript (browser) Integration with AWS S3

Note: This originally appeared on my personal blog, which I no longer maintain due to time constraints, so have moved it here for reference to the code.
Some background

The SaaS product that we are currently redeveloping has a need to store images uploaded by the user before sending them off to a publishing site. We are also wanting to extend this functionality to include video in the media types that are available to the user.
At the moment the system grabs a multi-part form post and stores the entire byte-array of the image in the database, along with the rest of the details of that piece of content. When the platform was originally built (2012), this may have sufficed, however now that we have much larger image files and the need to store videos up to 1Gb in size, this is probably not ideal.
A plan

I decided to implement storage via AWS S3 – it’s nice and fast, has lots of locations around the world and some great capabilities as far as security and access control are concerned. I have had a little experience in the past integrating with the SDKs, but mostly from PHP – and this bad boy is built in Java (not my first language of choice – or the one I am very versed at). I built a class that just wrapped up what I needed it to do, sending a file blob to an S3 bucket of my choosing in a region of my choosing; attaching a nice bunch of meta-data along the way. All was going well, when I had the epiphany – why not change the S3 integration to the client-side, then we will wouldn’t have to deal with the upload bandwidth at all.
The benefits of moving to a front-end solution were, it would be quicker for the end-user (the file would go straight to S3, not to our server, then S3); it would save bandwidth on our server (and ultimately money); and it would save on disc space requirements on our server – happy days.
Now, it would be remiss of me to outline the dis-benefits of this change – first, this would take more time to implement than going with the S3 integration from the server (this one is certainly outweighed by the benefits though); the other really big issue was that if we were to integrate via the front-end, our Amazon SDK credentials would be exposed for the world to see (a great little solution to this problem, next). All-in-all, I think the benefits greatly outweighed the dis-benefits, so we were a go on the front-end integration.
The implementation

As stated previously, one of the dis-benefits to integrating with AWS from the front-end is that you need to provide SDK credentials for the world to see in your javascript. So the solution to this was to do the following:

Create a temporary bucket that will only receive PUT requests via appropriate CORS headers (which also limits domain origins that can operate on the bucket)
Create a final storage bucket that has NO CORS headers, meaning it cannot be uploaded to from javascript
Add a file expiry policy on the bucket to 24 hours (this will probably come down considerably, after launch) – so files will automatically be deleted after this period of time
Create an IAM role in AWS that has very limited capabilities – ONLY S3 access and ONLY the ability to upload (no delete or move, etc) – this will be used by the javascript
Create an IAM role in AWS that has much more powerful capabilities (still only S3 access, but more control over all aspects of S3) – this will be used by the Java backend
Upload the file to the temporary bucket from javascript, using the credentials for the limited IAM user
Send the details of the file (along with the rest of the form details) to our server
Our server will move (or rather, copy) the file from one bucket to another (note: the file would never go to our server), using the more powerful IAM role
Store the file details in the database along with the rest of the form data

Obviously we would need to manage the ability to have content with the old, blob-storage images AND the newly created S3 url stored images, but that is a single check on the DB record and only a couple of places in the rest of the codebase where it is required.
Using the limited IAM user and the multi-bucket system, alongside Amazon’s lifecycle file management in S3; means that the worst that can happen is that someone maliciously uploads a really large file to our S3 bucket, and it gets deleted automatically after a short period of time (granted it would be 24 hours to begin with, but this will come down).
Codes

Here is a cut-back gist of the code – note that it is written completely functionally and in a future build of the platform, this will all get wrapped up in a nice component within a larger framework, but for now, we needed to get this going as fast as we could.
I will go through a more technical document for the actual code at a later date, but the code has a few comments if you want to follow along at home… 😉
https://gist.github.com/3cb2180f19fc59e482d2
Improvements

Of course there can always be improvements; and the big one here will be to implement signed urls, which would mean that I would no longer need to have the AWS Secret in the front-end. I would still use the temp bucket as that would mean the main bucket could not be touched via javascript or CORS, just adding a small amount of extra security.
I am sure I will think of more improvements – in fact this entire solution will probably look positively archaic in a few months ! 😉

  
## Upload file to S3 - Multipart
// NOTE: it is suggested that you use very limited credentials
// a good solution is to create an IAM user in AWS that has only PUT rights to S3
// use a temporary bucket that has an expiry (24hrs or something low)
// once the file is uploaded get your server to move the file to another bucket that has much less
// restricted rights.
// See: readme for more info

// setup the credentials
AWS.config.update({
    accessKeyId: accessKeyId,
    secretAccessKey: secretKey
});
function uploadToS3(bucket, folder, file, callback) {

    // first check if the AWS object is available in the global namespace
    if (AWS !== null && typeof AWS !== 'undefined') {

        // create the S3 SDK object
        var s3Bucket = new AWS.S3({
                region: 'us-east-1',
                maxRetries: 3,
                params: {
                    Bucket: bucketName
                }
            }),
            multiPartOptions = {
                // 5MB part sizes (note: this is the minimum part size)
                partSize: 5 * 1024 * 1024,
                // number of concurrent parts - set to 1 as more will create a timeout
                queueSize: 1
            },

            // files are uploaded using the key structure:
            //  folder/fileName-timestamp.fileExtension
            // this structure allows the same file to be uploaded to the same folder (as the timestamp makes it a different file in s3)
            currentTime = +new Date(),
            fileName = getFileName(file.name),
            fileExtension = getFileExt(file.name),

            fileKey = companyId + "/" + fileName + '.' + currentTime + '.' + fileExtension,
            params = {
                Key: fileKey,
                ContentType: file.type,
                Body: file,
                ACL: 'public-read',
                Metadata: {
                    'company-id': globalConfig.companyId,
                    'company-name': globalConfig.companyName,
                    'user-id': globalConfig.userId,
                    'user-name': globalConfig.userName
                },
                StorageClass: 'REDUCED_REDUNDANCY'
            },

            // create the AWS.Request object
            // @see: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3/ManagedUpload.html
            // @see: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property
            request = s3Bucket.upload(params, multiPartOptions);

        // register a callback to report on the progress
        request.on('httpUploadProgress', function(progress) {

            // Log Progress Information
            var pct = (progress.loaded / progress.total);

            // use pct to create a loading bar by multiplying the width of the bar by pct
            // use pct * 100 + '%' to display the percentage in a loading display

            console.log('Upload progress:', Math.round(pct * 100), '%');

        });

        // send the request
        // for the s3 managed upload (which automatically splits the upload into parts)
        // we need to define a callback as there are no success and error events for it
        // @see: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3/ManagedUpload.html#send-property
        request.send(function(error, data) {
            // error will be null if there is no error
            // data will be null if there is an error
            //

            // just a little debug code
            console.log('s3.upload.callback()', this, error, data);

            if (error) {
                // error handling

                if (toType(callback) !== 'function') {
                    // there is no callback, let's display an error here
                    console.error('Uh Oh! Something went wrong - The media has NOT been saved. Please try again soon. Error Details: \n' + error);
                } else {
                    callback.call(null, data);
                }
            } else {
                // success!!

                // determine if we are successful or not

                // if there is a callback - call it
                if (typeof callback === 'function') {
                    // using null for the scope in the call function, leaves the scope at the window level
                    // change if required
                    callback.call(null, data);
                }
            }
        });
    } else {
        // AWS is not on the global namespace - means it is not loaded, could handle this differently - load the script and recall this function??
        console.error('Uh Oh! Something went wrong - We cannot upload the media file for some reason. Please try again soon (you will need to refresh the browser unfortunately).');
    }
}

///////////////////////////////////////////
////////////// HELPER FUNCTIONS

// extracts JUST the file extension from a file object
getFileExt = function(fn) {
    return fn.replace(/^.*\./, '');
};

// extracts JUST the filename from a file object
getFileName = function(fn) {
    return fn.replace(/.[^.]+$/, '');
};

// compiles an s3 object url based on the parameters sent in
function getS3Url(fileKey, bucketName, region) {
    bucketName = bucketName || globalConfig.aws.bucketName;
    if (region) {
        return "https://" + bucketName + "." + region + ".amazonaws.com/" + fileKey;
    } else {
        return "https://s3.amazonaws.com/" + bucketName + "/" + fileKey;
    }
}
	// NOTE: it is suggested that you use very limited credentials
	// a good solution is to create an IAM user in AWS that has only PUT rights to S3
	// use a temporary bucket that has an expiry (24hrs or something low)
	// once the file is uploaded get your server to move the file to another bucket that has much less
	// restricted rights.
	// See: readme for more info

	// setup the credentials
	AWS.config.update({
	accessKeyId: accessKeyId,
	secretAccessKey: secretKey
	});
	function uploadToS3(bucket, folder, file, callback) {

	// first check if the AWS object is available in the global namespace
	if (AWS !== null && typeof AWS !== 'undefined') {

	// create the S3 SDK object
	var s3Bucket = new AWS.S3({
	region: 'us-east-1',
	maxRetries: 3,
	params: {
	Bucket: bucketName
	}
	}),
	multiPartOptions = {
	// 5MB part sizes (note: this is the minimum part size)
	partSize: 5 * 1024 * 1024,
	// number of concurrent parts - set to 1 as more will create a timeout
	queueSize: 1
	},

	// files are uploaded using the key structure:
	// folder/fileName-timestamp.fileExtension
	// this structure allows the same file to be uploaded to the same folder (as the timestamp makes it a different file in s3)
	currentTime = +new Date(),
	fileName = getFileName(file.name),
	fileExtension = getFileExt(file.name),

	fileKey = companyId + "/" + fileName + '.' + currentTime + '.' + fileExtension,
	params = {
	Key: fileKey,
	ContentType: file.type,
	Body: file,
	ACL: 'public-read',
	Metadata: {
	'company-id': globalConfig.companyId,
	'company-name': globalConfig.companyName,
	'user-id': globalConfig.userId,
	'user-name': globalConfig.userName
	},
	StorageClass: 'REDUCED_REDUNDANCY'
	},

	// create the AWS.Request object
	// @see: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3/ManagedUpload.html
	// @see: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property
	request = s3Bucket.upload(params, multiPartOptions);

	// register a callback to report on the progress
	request.on('httpUploadProgress', function(progress) {

	// Log Progress Information
	var pct = (progress.loaded / progress.total);

	// use pct to create a loading bar by multiplying the width of the bar by pct
	// use pct * 100 + '%' to display the percentage in a loading display

	console.log('Upload progress:', Math.round(pct * 100), '%');

	});

	// send the request
	// for the s3 managed upload (which automatically splits the upload into parts)
	// we need to define a callback as there are no success and error events for it
	// @see: http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3/ManagedUpload.html#send-property
	request.send(function(error, data) {
	// error will be null if there is no error
	// data will be null if there is an error
	//

	// just a little debug code
	console.log('s3.upload.callback()', this, error, data);

	if (error) {
	// error handling

	if (toType(callback) !== 'function') {
	// there is no callback, let's display an error here
	console.error('Uh Oh! Something went wrong - The media has NOT been saved. Please try again soon. Error Details: \n' + error);
	} else {
	callback.call(null, data);
	}
	} else {
	// success!!

	// determine if we are successful or not

	// if there is a callback - call it
	if (typeof callback === 'function') {
	// using null for the scope in the call function, leaves the scope at the window level
	// change if required
	callback.call(null, data);
	}
	}
	});
	} else {
	// AWS is not on the global namespace - means it is not loaded, could handle this differently - load the script and recall this function??
	console.error('Uh Oh! Something went wrong - We cannot upload the media file for some reason. Please try again soon (you will need to refresh the browser unfortunately).');
	}
	}

	///////////////////////////////////////////
	////////////// HELPER FUNCTIONS

	// extracts JUST the file extension from a file object
	getFileExt = function(fn) {
	return fn.replace(/^.*\./, '');
	};

	// extracts JUST the filename from a file object
	getFileName = function(fn) {
	return fn.replace(/.[^.]+$/, '');
	};

	// compiles an s3 object url based on the parameters sent in
	function getS3Url(fileKey, bucketName, region) {
	bucketName = bucketName \|\| globalConfig.aws.bucketName;
	if (region) {
	return "https://" + bucketName + "." + region + ".amazonaws.com/" + fileKey;
	} else {
	return "https://s3.amazonaws.com/" + bucketName + "/" + fileKey;
	}
	}