This tutorial shows you how to deploy a web app on AWS in a reliable way (similar to the way we do it at Transcend.io). We use the following stack, which will be doven into deeper in future sections:

brew install node
brew install terraform
brew install docker
brew install awscli

This tutorial assumes basic familiarity with web applications and hosting

In this section

We will create a basic web app that can run on localhost.

Tutorial

npm init
npm install --save express
const app = require('express')();

app.get('/', (req, res) => {
  res.send('Hello, World!\n');
});

app.listen(3000, '0.0.0.0');

This file, as you might expect, starts a web server on localhost (If unfamiliar, that is what 0.0.0.0 refers to) on port 3000.

node app.js

Click here to view the app in your browser

In the last step, we used a package.json file to explicitly remember our dependencies. This is a common approach for ensuring that other team members or cloud environments can easily use the same libraries as we do.

However, it is not always enough. The previous step lacked:

With Docker, you can:

Blueprints of our image

Create a file named Dockerfile. In it, paste the following code:

FROM node:10
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm i
COPY . .
EXPOSE 3000
CMD [ "node", "app.js" ]

Let's examine this code line by line:

FROM node:10

This says that your app will depend on Node version 10.

WORKDIR /usr/src/app

This says that for any other commands in your docker file, you will be in the /usr/src/app directory of the virtual file system.

COPY package*.json ./

This tells the image to copy over your package.json and package-lock.json files. The COPY command takes its first argument from your current directory (outside of docker) and its second argument relative to your WORKDIR inside docker.

RUN npm i

This installs the dependencies you listed in the package.json file.

COPY . .

This copies over all the other files from your current workspace, minus those in your .dockerignore file. We do not want to copy over our node_modules because we already ran npm i last step. So we can create a new file named .dockerignore with the contents:

node_modules
npm-debug.log

This also saves a bit of time as node_modules can be quite large and slow to copy.

EXPOSE 3000

By default, docker will not give any external process access to inside the container. We want to allow one port to be exposed so that outsiders can access the webapp. This exposes port 3000, which happens to be the port we hosted our Node app on locally earlier.

CMD [ "node", "app.js" ]

The last step is to tell the container to run the app.js file we copied over using the node binary, which hosts the app.

Running our image locally

Let's build an image!

docker build -t some-image-name .

This will build an image to your local machine named some-image-name.

In the output, you may notice that it also tags your image as some-image-name:latest. Tags are a way to version and, well, tag, your images in case you ever want multiple images related to the same app/job.

Now, lets run it on localhost:

docker run -p 12345:3000 -d some-image-name

This command says to run the some-image-name image locally. The -p 12345:3000 flag is an example of port forwarding. Your local machine and the docker container have their own, distinct sets of ports. Port forwarding enables you to say "Whenever someone asks for my 12345 port, send them to some docker container's 3000 port instead." This is kind of like a proxy server, if that helps you.

Click here to view the app in your browser

Kill the Docker process

Let's cleanup, as we no longer want to run this web app locally.

Running

docker ps

will give you an overview of the currently running docker images. Copy the name of the image you just started, and run

docker kill <process_name>

to stop it. You may want to run docker ps one last time to verify it has stopped.

Terraform lets you declare infrastructure as code. This is a pretty popular paradigm predicated on the idea that code is easier to version, share, and change than infrastructure made through web consoles. Here are a few more specific benefits of Terraform:

ECR is the Amazon Elastic Container Registry. If you have ever used Docker Hub, it is basically the same thing. At Transcend, we use ECR because it gives us cheap/free private repos, unlike Docker Hub (TODO: verify this. Right now this is just my best guess).

Its entire job is to host Docker images in repos. You use it similarly to using S3, where you create a repo (instead of an S3 bucket) and then can upload images (instead of files) to that repo. Just like S3, it keeps track of versions and tags for you.

Create a new folder named deployment to store your terraform code and cd into it.

To start, create a file named provider.tf. In this, we will specify that we want to deploy to AWS specifically. Terraform supports many cloud providers. This looks like:

provider "aws" {
  region  = "eu-west-1"
  profile = "test"
}

This says that all deploys will be in the eu-west-1 region. It also says that I would like to use my test profile in the awscli, which I set up to be my personal account.

Now, create a file named ecr.tf with the contents:

resource "aws_ecr_repository" "ecr_repo" {
    name = "ecr_example_repo"
}

This follows the syntax:

resource "some aws resource" "some terraform name that lets you reference this resource from other resources in terraform" {
    name = "the name that will appear in the AWS console for this resource"
    ...other args...
}

To find a list of usable aws resource names and the arguments they take, check out the docs.

Deploy to AWS

Run the command

terraform init

to initialize your directory as containing terraform code. This will download all plugins available from the aws provider you listed in provider.tf

Next, run

terraform plan

This step is optional, but highly recommended anytime you change infrastructure.

You should see output that looks something like:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_ecr_repository.ecr_repo will be created
  + resource "aws_ecr_repository" "ecr_repo" {
      + arn                  = (known after apply)
      + id                   = (known after apply)
      + image_tag_mutability = "MUTABLE"
      + name                 = "ecr_example_repo"
      + registry_id          = (known after apply)
      + repository_url       = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

That looks good. It shows us that an aws_ecr_repository will be created. As this matches our expectation, we can run:

terraform apply

After confirming the plan, you can go to your ECR page on your AWS account and will see that an empty repository was made!

Authenticating to Docker

We have a repo, now we need to make sure we are authenticated to it so we can push and pull images.

This can be done by using the following commands:

ACCOUNT_ID=$(aws sts get-caller-identity | jq -r ".Account")
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin "$ACCOUNT_ID.dkr.ecr.eu-west-1.amazonaws.com"

Pushing our Docker Image to ECR

Now that we're authenticated, we can push our local docker image to the remote repo. This is done in two steps, tagging our local image and pushing our changes.

Find the repository url from your docker image, and copy it. Then, run:

docker tag some-example-image:latest <repo_url>:latest

This is kind of similar to a git remote add origin <repo_url> in git.

Then, run:

docker push <repo_url>:latest

to upload the image to the remote repo. This is similar to a git push in git.

Head back to your AWS console, and verify you can see the image you uploaded.

The remaining step is to deploy the ECR image to AWS, which requires quite a few aws services, each with some terraform code to specify it.

Let's start with the fun stuff, permissions and roles!

IAM roles

Create a file named iam.tf with the contents:

resource "aws_iam_role" "ecs_role" {
  name = "ecs_role_example_app"

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "ecs_policy_attachment" {
  role = "${aws_iam_role.ecs_role.name}"

  // This policy adds logging + ecr permissions
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

This creates a new IAM role named ecs_role_example_app with an attached AmazonECSTaskExecutionRolePolicy. This policy ensures that the role will be able to pull from ECR.

A Custom Virtual Private Cloud

Next, create a file network.tf that contains:

resource "aws_vpc" "vpc_example_app" {
    cidr_block = "10.0.0.0/16"
    enable_dns_hostnames = true
    enable_dns_support = true
}

resource "aws_subnet" "public_a" {
    vpc_id = "${aws_vpc.vpc_example_app.id}"
    cidr_block = "10.0.1.0/24"
    availability_zone = "${var.aws_region}a"
}

resource "aws_subnet" "public_b" {
    vpc_id = "${aws_vpc.vpc_example_app.id}"
    cidr_block = "10.0.2.0/24"
    availability_zone = "${var.aws_region}b"
}

resource "aws_internet_gateway" "internet_gateway" {
    vpc_id = "${aws_vpc.vpc_example_app.id}"
}

resource "aws_route" "internet_access" {
    route_table_id = "${aws_vpc.vpc_example_app.main_route_table_id}"
    destination_cidr_block = "0.0.0.0/0"
    gateway_id = "${aws_internet_gateway.internet_gateway.id}"
}

resource "aws_security_group" "security_group_example_app" {
    name = "security_group_example_app"
    description = "Allow TLS inbound traffic on port 80 (http)"
    vpc_id = "${aws_vpc.vpc_example_app.id}"

    ingress {
        from_port = 80
        to_port = 3000
        protocol = "tcp"
        cidr_blocks = ["0.0.0.0/0"]
    }

    egress {
        from_port = 0
        to_port = 0
        protocol = "-1"
        cidr_blocks = ["0.0.0.0/0"]
    }
}

This creates a VPC that other resources can go into. It has a public subnet (in two availability zones) that can connect to the internet via an internet gateway.

For security reasons, we specify that only port 3000 should be exposed to the public, but outgoing traffic from our resources is unrestricted.

If this is confusing (it was for me at first), then I would recommend this youtube playlist.

Fargate deployment

Fargate is the final, and most exciting step. It is a service that deploys Docker containers for us, which means we're finally at the step of having our simple NodeJs app running on AWS infrastructure!

Create a file fargate.tf with the contents:

resource "aws_ecs_task_definition" "backend_task" {
    family = "backend_example_app_family"

    // Fargate is a type of ECS that requires awsvpc network_mode
    requires_compatibilities = ["FARGATE"]
    network_mode = "awsvpc"

    // Valid sizes are shown here: https://aws.amazon.com/fargate/pricing/
    memory = "512"
    cpu = "256"

    // Fargate requires task definitions to have an execution role ARN to support ECR images
    execution_role_arn = "${aws_iam_role.ecs_role.arn}"

    container_definitions = <<EOT
[
    {
        "name": "example_app_container",
        "image": "<your_ecr_repo_url>:latest",
        "memory": 512,
        "essential": true,
        "portMappings": [
            {
                "containerPort": 3000,
                "hostPort": 3000
            }
        ]
    }
]
EOT
}

resource "aws_ecs_cluster" "backend_cluster" {
    name = "backend_cluster_example_app"
}

resource "aws_ecs_service" "backend_service" {
    name = "backend_service"

    cluster = "${aws_ecs_cluster.backend_cluster.id}"
    task_definition = "${aws_ecs_task_definition.backend_task.arn}"

    launch_type = "FARGATE"
    desired_count = 1

    network_configuration {
        subnets = ["${aws_subnet.public_a.id}", "${aws_subnet.public_b.id}"]
        security_groups = ["${aws_security_group.security_group_example_app.id}"]
        assign_public_ip = true
    }
}

Please fill in where I specified <your_ecr_repo_url>

Fargate is a type of the Elastic Container Service, which has three concepts:

It should be pretty easy to map those concepts to the three terraform resource blocks above.

There are quite a few arguments I won't go over in detail here, but they mostly relate to:

Find the public IP Address on the task page in your AWS console, and go to http://<your_public_ip>:3000 to view your super scalable hello world application!

Sometimes you need to put sensitive data in your terraform code, or otherwise you need to repeat the same values over and over (such as with an AWS region). That's where variables come in.

This page is a summary of the official terraform docs on input variables.

To declare a variable, you can write a variable block:

variable "aws_region" {
  default     = "eu-west-1"
  description = "Which region should the resources be deployed into?"
}

Anywhere you want to use the value of that variable in your resource or provider blocks, you can just enter something like:

provider "aws" {
  region  = "${var.aws_region}"
}

and the variable will be injected.

Overriding variables with the CLI

You can specify a variable in a terraform plan or terraform apply command by running something like

terraform apply -var="region=us-east-1"

Overriding variables from a file

You can store your secrets in a file, and then load them all in with the -var-file flag.

Example vars.tfvars file:

region = "us-east-1"
family = "some_other_var"

Usage:

terraform apply -var-file="vars.tfvars"

If you have sensitive data in this file, make sure it is in your .gitignore.

Overriding at runtime

If you don't specify a default value, running terraform plan or terraform apply will ask you for an input before running.

Overriding using Environment Variables

Any env var with the prefix TF_VAR_ will be picked up automatically.

From the terminal, type:

export TF_VAR_region="us-east-1"

Datadog

Datadog is a tool for collecting metrics about your apps, and provides the options to add dashboards and alerts to stay on top of out of line metrics. It even has some fancy ML code that watches over your stats and looks for anomalies. Some examples of useful questions Datadog can answer for you are:

and many more.

Datadog data collection is often automatic once you install the Datadog Agent, but can also require installation of an integration. They have integrations for dozens of popular services, including:

and more. Most of these integrations require a few short lines of code to add in, and are rather painless.

Installing the Agent

Let's start by installing the agent, which is software that runs on your servers and sends the metrics to Datadog. You don't have to manually send data ever, the agent simply runs in the background and sends the data for you without blocking your tasks. How neat is that? That's pretty neat.

In your fargate.tf file from earlier, add the following json into your task definition. We are using the publically available datadog agent Docker image from Docker Hub and are running it in the same task as our webapp. By doing so, the agent will examine Fargate for us and will give us useful slices in our dashboard by Docker image, EC2 server, etc. Because we are using Fargate, it is required to add the ECS_FARGATE flag to be true so the auto discovery can happen. It also needs your api key so that it can publish the metrics it collects to your dashboard.

{
  "name": "datadog-agent",
  "image": "datadog/agent:latest",
  "essential": true,
  "environment": [
    {
      "name": "DD_API_KEY",
      "value": "${var.datadog_api_key}"
    },
    {
      "name": "ECS_FARGATE",
      "value": "true"
    }
  ]
}

After running terraform apply, you should see metrics about your Fargate cluster appear in Datadog within 5 minutes or so :)

Collecting some stats

StatsD is a daemon for aggregating arbitrary stats. Datadog supports it as an easy to install integration.

So why would you use it?

Say you want to keep track of how many times a specific line of code has run. At Transcend, an example is that we keep track of how many times a user submits a DSR.

Let's create a new express route where we will keep track of how many times it is requested (this is a simple example as Datadog already tracks this, but the concept can be used anywhere).

First, we need to install dogstatsd:

npm install --save node-dogstatsd

Then, we need to initialize the stats client:

const StatsD = require('node-dogstatsd').StatsD;
const dogstatsd = new StatsD();

Lastly, we can use the client from our routes:

app.get('/one', (req, res) => {
  dogstatsd.increment('page.views.one');
  res.send('one');
});

I encourage you to be very liberal with counters, histograms, and any supported statsd data types you want a metric for. They are great for anytime you want to track a metric that doesn't have an existing integration that works out of the box from Datadog. As we'll see later, it is very easy to setup alerts in the datadog console for when thresholds are crossed.

It's important to cleanup any resources you created in this codelab so that we don't get charged for them going forwards.

To do so, all it takes is a:

terraform destroy

When prompted, type yes and all the resources will magically disappear.