Basics of running anything on AWS part 2 — getting the task done

EC2, ECR, Lambda, CloudWatch, S3 … whatever you might need

Now, setting up the application to run in cloud does not differ that much from short living server performing computation heavy task, it is just the nature of problem it solves that it does not need continuous deployment since it is running ad-hoc, not continuously…

This is actually wrong, it is just a different use of continuous delivery capability - whenever it needs to run, it first needs to deploy newest version of code.

For the following to make complete sense, you would have to read previous part, which explains how to setup a machine, how to prepare basic code deployment mechanism with systemd etc.

Runnyng anything on AWS 1

The data script scenario

We can also call it a small data script since it, for sure, is not Big Data Script(tm). But it can nevertheless be one if you so desire…

This script should run on (in this case) NodeJS inside the reaaaalllly strong EC2 instance and, of course, instance should shut down when task is finished.

This script does tons of stuff - gets some data from different online sources, reads CSV file and compares everything to local in-memory database yielding really big dataset that it pushes further on in data pipeline. Or, it might just do really complex and resource consuming build.

Ultimate point would be to be able to have these completely code agnostic and be able to spin dozens at the same time giving them different code to run. Since this is whole another complex subject, I’ll leave it for another time.

Running the task

How do you run script in NodeJS? By just running the node some-script.js - script will just execute everything and exit.

We can do the same thing in docker container, and it will run, install everything, run the script and then exit when script exits (Docker stops when last process ran inside of it exits.

  • some-script.js script is ran inside a Docker container.
  • Docker is spinning on EC2 instance, which boots up and starts it as a deamon via systemd.
  • as soon as job is done EC2 instance is shut down.

Setup

Setup of EC2 machine and way you build and make code available for deployment is the same as in part 1, that is why it is there - if you did not, go and read it right now.

To run the task

Code

Simply, code you run inside docker is not server this time… It is script, so let us call it script.js.

For the purpose of this exercise, I will not invent fancy task, I’ll just write a nodejs script that creates S3 bucket, writes “hello” into it and shuts down. For this example, I am going to install two node packages - winston (logging utility) and aws-sdk (obviously).

const winston = require('winston');
const aws = require('aws-sdk');
aws.config.update({region: 'your_region'};
winston.level = 'debug';
winston.log('info', 'Pushing to S3'); 
const s3 = new aws.S3();
var ec2 = new AWS.EC2();
const myBucket = 'bucket.name'; 
const myKey = 'myBucketKey';
const shutDownEC2 = (instance_id) => {
    winston.info("Shutting down");
    ec2.stopInstances({
        InstanceIds: [ instance_id ]
    }, function(err, data) {
            if (err) {
                winston.error(JSON.stringify(err))
                // Trigger some alerting here
            } else {
                winston.info('Done')
            }
       }
    );
}
s3.createBucket({Bucket: myBucket}, function(err, data) { 
    if (err) {  
        // Creating bucket failed
        winston.log(err);
        shutDownEC2('ec2_id');
   } else {    
       const params = {
                  Bucket: myBucket, 
                  Key: myKey, 
                  Body: 'Hello!'
               };      
        s3.putObject(params, function(err, data) {          
            if (err) {
                // Putting object failed
                winston.log(err);
                shutDownEC2('ec2_id');
            } else {   
                // It was successfully done
                winston.log("Successfully uploaded data to myBucket/myKey");
                 shutDownEC2('ec2_id');
            }       
        });    
    } 
});

So, it is as simple as that. You have to pay attention to two things:

  • Instance id is id of instance that runs the code (it is basically executed on self)
  • Shutdown has to be triggered on every process exit, meaning on success and every failure. I also advise putting everything in try/catch and shutting down on catch with nice logging of exception and alerting… You do not want exception to leave your expensive EC2 instance up.

Docker

Docker is as simple as in previous example, just running different script.

FROM node:boron
WORKDIR /usr/src/app
COPY package.json .
RUN npm i
COPY . .
CMD ["node", "script.js"]

I added two extra lines, since I assume you gonna need to install dependencies, and it is optimal for this task to be cached… so we are first copying package.json, installing, and then putting everything else in working directory, installation is cached for the next run.

As you see, we can now shut down EC2 instance once script is done. What we still do not have is the way to start it on demand.

I remind you that, once it starts, it will download newest Docker image from ECR and spin it as described in part 1.

Control

When EC2 instance is set up, we are going to suspend it (shut it down) and leave it that way. We now need trigger to start it first and any other time, since it automatically shuts down when done.

Start

Ideal way to start the job is by using AWS lambda to trigger EC2 through AWS JS api (since recently, you can use golang in lambdas as well).

There are two tools that I recommend for easily setting up the lambdas:

Claudia.js - https://claudiajs.com/

Serverless framework - https://serverless.com/

Former is my current tool of choice since I had to work with couple cloud providers at the same time. Serverless has more options and can work with all cloud providers in western hemisphere (no alibaba cloud yet) but is super slow with frequent changes while you develop, since it relays on CloudFormation. Claudia is, on the other hand, focused on lambdas, but also super fast and probably the best and easiest tool to use if you are doing serverless stuff on AWS only.

With above in mind, I’ll just provide you with a code to put in lambda, so we can keep this as tool agnostic as possible… Maybe you want to create it without the tool guiding your hand.

Code to put in lambda:

'use strict';
// Load the AWS SDK for Node.js
const AWS = require('aws-sdk'); 
const ec2 = new AWS.EC2(); 
module.exports.handler = (event, context, callback) => {   
    const params = {
        InstanceIds: [ 'instance_id' ],
        AdditionalInfo: 'START IT UP'  
    };  
    ec2.startInstances(params, function(err, data) {
        if (err) {
          callback(err, null);
        } else {
          callback(null, {
              statusCode: 200,
              body: JSON.stringify({
                 message: 'STARTING'
              })
          })
         }
      });
};

Stop

Just take above described shutDownEC2 function and put it in lambda… You can use it to confirm shutdown when alert is triggered, or shut it down when it gets stuck (and it will get stuck :)).

Scheduling

To schedule a task like this (or to schedule anything that is controlled by or executed in AWS lambda), you can use CloudWatch.

You can set this up by doing following:

  • Go to CloudWatch in your AWS account
  • Go Events>Rules and click schedule
  • You are given the option to choose between regular intervals and cron expressions, chose one… Whatever you want, for instance you can choose 1 and “Days” and that will schedule task to run ones a day.
  • on the right choose “Add Target” and choose your lambda from Functions dropdown
  • Click Configure details, fill it in and click Create rule.
  • Tadaaaaah! now your task is scheduled to run in desired interval.

Ideas

Using tools and patterns I gave you in part 1 and part 2 you can easily implement:

  • Your own web CD system triggered by lambdas - trigger one lambda to build, use EC2 as a build machine, at the end trigger second lambda to deploy.
  • Your own regular database backup
  • Scheduled crawling task

Conclusion

As you can see, this simple combination of easily accessible and cheap (or free) cloud computing tools is giving us powerful tooling to setup and maintain the most commonly needed dev-ops tasks.

Comments