Skip to content

Instantly share code, notes, and snippets.

@jeremybmerrill
Forked from anonymous/instructions.md
Last active January 23, 2021 21:16
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jeremybmerrill/cdb5c1f9d37bbd453fb418fd3ad29de9 to your computer and use it in GitHub Desktop.
Save jeremybmerrill/cdb5c1f9d37bbd453fb418fd3ad29de9 to your computer and use it in GitHub Desktop.
Moreland Ave traffic single-datapoint site with Amazon Lambda, for NICAR
  1. Log in to your Amazon AWS account. What we're doing should be totally free, so you can use your personal account. We'll tell you how to disable the function when we're done, but it should likely be free forever (or maybe like a penny if you use it a TON).
  2. Go to create a new Lambda function here: https://console.aws.amazon.com/lambda/home?region=us-east-1#/create
  3. Fill in the form:
    • Name: whatever you want, maybe "NICAR18: serverless single-datapoint site"
    • Runtime: Node 6.10.3
    • Role: "Create new role from template(s)"
    • Role name: whatever you want, maybe "nicar18-serverless-site-role"
    • Policy templates: Basic Lambda Permissions or Basic Edge Lambda Permissions (This gives our Lambda just enough permissions to run, but not to do anything else.)
  4. Click "Create Function"
  5. You should get a little weird tree diagram, with the name for your Lambda up top, with a right-branching line (for outputs) going to Amazon Cloudwatch Logs and a left-branching line (for inputs) going to a blank list.
  6. To the left of that tree diagram, under "Add Triggers" hit "API Gateway". Configuring this will let us create our little ugly website.
  7. Scroll down to see Configure Triggers. This is where we're configuring the API Gateway, a.k.a. the piece of Amazon software that triggers our Lambda function when someone visits the URL and displays the Lambda function's output.
    • You can set the API Name to whatever you want.
    • You can leave the Deployment Stage as 'prod'.
    • Set Security to "Open". We're not going to tell anyone about the secret URL this generates and it's not like there are the crown jewels in here anyways. :)
    • Click Add (in the bottom right) when you're done, to save our API Gateway trigger.
  8. Click the orange Save button in the top right to save the whole thing.
  9. Now that we've set up all the plumbing, we have to actually write the function. Luckily for you, I already wrote it. In that tree diagram, click the top element, with the name of your Lambda function. Paste the code below in there. Then hit the top-right orange Save button again.
  10. Okay, now go back to that tree diagram and back to the API Gateway thing. Scroll down and you'll see an ugly URL called to "Invoke URL". Click it. You should see your page.
  11. For instance, here's what mine is: https://c52ghc4rgh.execute-api.us-east-1.amazonaws.com/prod/moreland_ave_speeds2
// exports.handler is a "magic word" to tell Amazon where to start running our program.
exports.handler = function(event, context){
var http = require('http');
// here's how to get our data. it's totally specific to this website, so if you're changing what data you want to fetch,
// you'll want to change all of this.
var options = {
host: 'ga511maps0.iteriscdn.com',
port: 80,
path: '/mapserv/511ga-traffic/?LAYERS=all&QUERY_LAYERS=all&STYLES=&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetFeatureInfo&BBOX=-9390143.427899%2C3996082.6502%2C-9388160.842479%2C3998652.845275&FEATURE_COUNT=10&HEIGHT=538&WIDTH=415&FORMAT=image%2Fpng&INFO_FORMAT=application%2Fvnd.ogc.gml&SRS=EPSG%3A900913&X=96&Y=200'
};
// actually fetch the XML from the Georgia Dept. of Transportation website, using the `options` above.
// SO this means that every time someone loads your page, Amazon's (invisible to you) server will visit the GDOT site
// and get the data you want.
http.get(options, function(resp){
resp.setEncoding("utf8");
let body = "";
resp.on("data", data => {
body += data;
});
resp.on('end', function(){
// once we've obtained the data, we parse it.
var output = parse_xml_to_speeds(body);
// now that we have the data parsed, like {"N": 35, "S": 25},
// we get to make a decision about what we want to output.
// we can choose what kind of output we want -- maybe sending you text messages with Twilio or writing to a database
// but in this case, we're going to return an HTML page. See the `body` attribute?
// It's a super basic HTML page. You could easily return a much better HTML page -- or
// JSON data to be integrated into a normal page in your CMS.
if (output["N"] > 30 && output["S"] > 30){
context.done(null, {statusCode: 200, headers: {"Content-Type": "text/html"}, body: "<html><head><title>How's Moreland Ave Traffic?</title></head></body><h1>Moreland Ave traffic is fine</h1></body></html>"})
}else if(output["N"] > 20 && output["S"] > 20){
context.done(null, {statusCode: 200, headers: {"Content-Type": "text/html"}, body: "<html><head><title>How's Moreland Ave Traffic?</title></head></body><h1>Moreland Ave traffic is kinda slow</h1></body></html>"})
}else{
context.done(null, {statusCode: 200, headers: {"Content-Type": "text/html"}, body: "<html><head><title>How's Moreland Ave Traffic?</title></head></body><h1>You'd be better off walking on Moreland Ave.</h1></body></html>"})
}
});
}).on("error", function(e){
console.log("Got error: " + e.message);
context.done("Got error: " + e.message, null)
});
}
// this is a little scraping function (but it's awful and don't look to closely!).
// your scraper will be different -- and totally specific to what you're building.
// How to scrape a website (or an XML or JSON document) is outside the scope of this hands-on class, so you don't need to pay that much attention to how this works.
// but, the steps for doing it are very similar in JavaScript with the Cheerio library as it is in Python with Beautiful Soup or Ruby with Nokogiri.
// all you need to know about this function is that it returns an Object with the northbound and southbound speeds
// e.g. {"N": 35, "S", 25}
function parse_xml_to_speeds(xml){
var split_features = xml.split("</speeds_feature>")
var output = {"N": null, "S": null}
split_features.forEach(function(feature){
if(feature.indexOf("<route>SR 42/Moreland Ave</route>") >= 0 && feature.indexOf("<dir>N</dir>") >= 0){
var key = "N"
}else if(feature.indexOf("<route>SR 42/Moreland Ave</route>") >= 0 && feature.indexOf("<dir>S</dir>") >= 0){
var key = "S"
}else{
return
}
output[key] = parseInt(feature.split("<mph>")[1].split("</mph>")[0])
})
return output;
}
// a little helper that'll let you test your script from your computer.
if (require.main === module) {
exports.handler(null, {done: function(err, output){console.log("would return", output)} })
}

What should you use Lambda for -- and what shouldn't you?

The benefit of serverless functions is that you don't pay for servers when you're not using it. So scheduled tasks like cron jobs and scrapers are a good fit, as are APIs that react to reader interest that occurs rarely, but when it does occur, the load is heavy.

For instance, if you had a text-message bot that responded to readers' text messages with data about school closings: most days of the year, no one would text it, but when it was snowy or the water supply was disrupted, it would get many messages. If you needed to pay for enough servers to cope with that maximum load all the time, you'd be wasting a lot of money. With serverless functions, you don't need to pay for any servers when no one is using the tool.

Here's a list:

Consider using serverless:

  • scrapers that run regularly, like downloading crime data that's updated each week and summarizing it
  • cron jobs (e.g. generating archives or screenshotting your homepage)
  • processing user generated content, e.g. resizing and cropping images that readers upload.
  • responding to user input, like text messages
  • single serving websites that respond, live, with data from a third-party (e.g. the single-serving website above that displays the current traffic speed on Moreland Ave. in Atlanta on a website, with data scraped from the Georgia DOT).

Probably not a good idea to use serverless:

  • your CMS
  • your website
  • your databases (but you may want to connect a serverless function to a database)
  • training machine learning models (because they use so much memory and take a long time)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment