Deploy NodeJs App on AWS Using Terraform and Docker

This tutorial shows you how to deploy a web app on AWS in a reliable way (similar to the way we do it at Transcend.io). We use the following stack, which will be doven into deeper in future sections:

NodeJs: For making the web app

brew install node

Terraform: Infrastructure as Code tool for deploying to AWS

brew install terraform

Negative : I am looking to start using a terraform version manager in place of the brew version. https://github.com/tfutils/tfenv

Docker: For bundling the dependencies of the web app into a Container

brew install docker

Amazon Web Services: For hosting the web app

brew install awscli

Positive : If you work at Transcend, It may be helpful to familiarize yourself with these services using our internal Notion docs

This tutorial assumes basic familiarity with web applications and hosting

In this section

We will create a basic web app that can run on localhost.

Tutorial

In a new directory, initialize a package.json file to track dependencies

npm init

For this app, we will use Express as our web server

npm install --save express

Create a file named app.js for the web server with the following contents:

const app = require('express')();

app.get('/', (req, res) => {
  res.send('Hello, World!\n');
});

app.listen(3000, '0.0.0.0');

This file, as you might expect, starts a web server on localhost (If unfamiliar, that is what 0.0.0.0 refers to) on port 3000.

Verify that your server works. On the command line, run:

node app.js

Click here to view the app in your browser

In the last step, we used a package.json file to explicitly remember our dependencies. This is a common approach for ensuring that other team members or cloud environments can easily use the same libraries as we do.

However, it is not always enough. The previous step lacked:

Complete Dependencies: While a package.json lists javascript dependencies, it does not list all dependencies, such as Node itself or even that it must run on some operating system.
Container Isolation: By creating a container, you can explicitly decide how that container can communicate with other processes, helping with security.
Full Instructions for running the app: A package.json may have helpful scripts such as npm run for running the app, there is no set of commands that will properly build all apps.

With Docker, you can:

Fully list all dependencies (excluding a linux kernel, which is assumed)
Isolate your web app by default
Give simple instructions for how to run the app

Blueprints of our image

Create a file named Dockerfile. In it, paste the following code:

FROM node:10
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm i
COPY . .
EXPOSE 3000
CMD [ "node", "app.js" ]

Let's examine this code line by line:

FROM node:10

This says that your app will depend on Node version 10.

WORKDIR /usr/src/app

This says that for any other commands in your docker file, you will be in the /usr/src/app directory of the virtual file system.

COPY package*.json ./

This tells the image to copy over your package.json and package-lock.json files. The COPY command takes its first argument from your current directory (outside of docker) and its second argument relative to your WORKDIR inside docker.

RUN npm i

This installs the dependencies you listed in the package.json file.

COPY . .

This copies over all the other files from your current workspace, minus those in your .dockerignore file. We do not want to copy over our node_modules because we already ran npm i last step. So we can create a new file named .dockerignore with the contents:

node_modules
npm-debug.log

This also saves a bit of time as node_modules can be quite large and slow to copy.

EXPOSE 3000

By default, docker will not give any external process access to inside the container. We want to allow one port to be exposed so that outsiders can access the webapp. This exposes port 3000, which happens to be the port we hosted our Node app on locally earlier.

CMD [ "node", "app.js" ]

The last step is to tell the container to run the app.js file we copied over using the node binary, which hosts the app.

Running our image locally

Let's build an image!

docker build -t some-image-name .

This will build an image to your local machine named some-image-name.

In the output, you may notice that it also tags your image as some-image-name:latest. Tags are a way to version and, well, tag, your images in case you ever want multiple images related to the same app/job.

Now, lets run it on localhost:

docker run -p 12345:3000 -d some-image-name

This command says to run the some-image-name image locally. The -p 12345:3000 flag is an example of port forwarding. Your local machine and the docker container have their own, distinct sets of ports. Port forwarding enables you to say "Whenever someone asks for my 12345 port, send them to some docker container's 3000 port instead." This is kind of like a proxy server, if that helps you.

Click here to view the app in your browser

Kill the Docker process

Let's cleanup, as we no longer want to run this web app locally.

Running

docker ps

will give you an overview of the currently running docker images. Copy the name of the image you just started, and run

docker kill <process_name>

to stop it. You may want to run docker ps one last time to verify it has stopped.

Terraform lets you declare infrastructure as code. This is a pretty popular paradigm predicated on the idea that code is easier to version, share, and change than infrastructure made through web consoles. Here are a few more specific benefits of Terraform:

You can put your infrastructure under version control.
Ease of automation. If your infra is defined in code, it is easy to automate deploys.
Speed of automation. Terraform deploys changes very quickly, compared to a human manually navigating a cloud service provider's website.
Running terraform plan shows you what infra will change before you make any changes so you can easily see exactly how it will change.
Running terraform graph gives you a visual representation of how your infra exists.
Immutable Infrastructure: By deploying new servers with most changes, terraform ensures your system works properly. There's far less likelihood that someone ssh'd in to your EC2 instance at some point and ran commands to fix the system that were never documented. With terraform, changes are forced to be reproducible.
Declarative code: Say your terraform code says to make an EC2 server. If you run terraform apply 10 times, you'll still only have one server, not 10. It automatically cleans up old, no longer necessary resources to save some money and confusion.
Masterless: Terraform does not require you to run any sort of server that your infra must depend on. You do need to keep track of state, but we'll get to that later.
It's open sourced! Unlike AWS CloudFormation, you can easily find the source code for the terraform CLI online.

ECR is the Amazon Elastic Container Registry. If you have ever used Docker Hub, it is basically the same thing. At Transcend, we use ECR because it gives us cheap/free private repos, unlike Docker Hub (TODO: verify this. Right now this is just my best guess).

Its entire job is to host Docker images in repos. You use it similarly to using S3, where you create a repo (instead of an S3 bucket) and then can upload images (instead of files) to that repo. Just like S3, it keeps track of versions and tags for you.

Positive : You should have already set up an AWS profile locally that you have permissions to deploy AWS resources with. You can also use a personal account (which I did), or just follow along without actually deploying anything if you won't be working on Terraform changes very often.

Create a new folder named deployment to store your terraform code and cd into it.

To start, create a file named provider.tf. In this, we will specify that we want to deploy to AWS specifically. Terraform supports many cloud providers. This looks like:

provider "aws" {
  region  = "eu-west-1"
  profile = "test"
}

This says that all deploys will be in the eu-west-1 region. It also says that I would like to use my test profile in the awscli, which I set up to be my personal account.

Now, create a file named ecr.tf with the contents:

resource "aws_ecr_repository" "ecr_repo" {
    name = "ecr_example_repo"
}

This follows the syntax:

resource "some aws resource" "some terraform name that lets you reference this resource from other resources in terraform" {
    name = "the name that will appear in the AWS console for this resource"
    ...other args...
}

To find a list of usable aws resource names and the arguments they take, check out the docs.

Deploy to AWS

Run the command

terraform init

to initialize your directory as containing terraform code. This will download all plugins available from the aws provider you listed in provider.tf

Next, run

terraform plan

This step is optional, but highly recommended anytime you change infrastructure.

You should see output that looks something like:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_ecr_repository.ecr_repo will be created
  + resource "aws_ecr_repository" "ecr_repo" {
      + arn                  = (known after apply)
      + id                   = (known after apply)
      + image_tag_mutability = "MUTABLE"
      + name                 = "ecr_example_repo"
      + registry_id          = (known after apply)
      + repository_url       = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

That looks good. It shows us that an aws_ecr_repository will be created. As this matches our expectation, we can run:

terraform apply

After confirming the plan, you can go to your ECR page on your AWS account and will see that an empty repository was made!

Negative : If you don't see the repo, ensure that you are looking in the correct region.

Authenticating to Docker

We have a repo, now we need to make sure we are authenticated to it so we can push and pull images.

This can be done by using the following commands:

ACCOUNT_ID=$(aws sts get-caller-identity | jq -r ".Account")
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin "$ACCOUNT_ID.dkr.ecr.eu-west-1.amazonaws.com"

Pushing our Docker Image to ECR

Now that we're authenticated, we can push our local docker image to the remote repo. This is done in two steps, tagging our local image and pushing our changes.

Find the repository url from your docker image, and copy it. Then, run:

docker tag some-example-image:latest <repo_url>:latest

This is kind of similar to a git remote add origin in git.

Then, run:

docker push <repo_url>:latest

to upload the image to the remote repo. This is similar to a git push in git.

Head back to your AWS console, and verify you can see the image you uploaded.

The remaining step is to deploy the ECR image to AWS, which requires quite a few aws services, each with some terraform code to specify it.

Negative : Terraform can be very verbose for simple examples, which you'll see in this section. With great control comes annoying levels of specification. This is as basic an example I could think of, with no logging, load balancer, etc.

Let's start with the fun stuff, permissions and roles!

IAM roles

Create a file named iam.tf with the contents:

resource "aws_iam_role" "ecs_role" {
  name = "ecs_role_example_app"

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "ecs_policy_attachment" {
  role = "${aws_iam_role.ecs_role.name}"

  // This policy adds logging + ecr permissions
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

This creates a new IAM role named ecs_role_example_app with an attached AmazonECSTaskExecutionRolePolicy. This policy ensures that the role will be able to pull from ECR.

A Custom Virtual Private Cloud

Next, create a file network.tf that contains:

resource "aws_vpc" "vpc_example_app" {
    cidr_block = "10.0.0.0/16"
    enable_dns_hostnames = true
    enable_dns_support = true
}

resource "aws_subnet" "public_a" {
    vpc_id = "${aws_vpc.vpc_example_app.id}"
    cidr_block = "10.0.1.0/24"
    availability_zone = "${var.aws_region}a"
}

resource "aws_subnet" "public_b" {
    vpc_id = "${aws_vpc.vpc_example_app.id}"
    cidr_block = "10.0.2.0/24"
    availability_zone = "${var.aws_region}b"
}

resource "aws_internet_gateway" "internet_gateway" {
    vpc_id = "${aws_vpc.vpc_example_app.id}"
}

resource "aws_route" "internet_access" {
    route_table_id = "${aws_vpc.vpc_example_app.main_route_table_id}"
    destination_cidr_block = "0.0.0.0/0"
    gateway_id = "${aws_internet_gateway.internet_gateway.id}"
}

resource "aws_security_group" "security_group_example_app" {
    name = "security_group_example_app"
    description = "Allow TLS inbound traffic on port 80 (http)"
    vpc_id = "${aws_vpc.vpc_example_app.id}"

    ingress {
        from_port = 80
        to_port = 3000
        protocol = "tcp"
        cidr_blocks = ["0.0.0.0/0"]
    }

    egress {
        from_port = 0
        to_port = 0
        protocol = "-1"
        cidr_blocks = ["0.0.0.0/0"]
    }
}

This creates a VPC that other resources can go into. It has a public subnet (in two availability zones) that can connect to the internet via an internet gateway.

For security reasons, we specify that only port 3000 should be exposed to the public, but outgoing traffic from our resources is unrestricted.

If this is confusing (it was for me at first), then I would recommend this youtube playlist.

Fargate deployment

Fargate is the final, and most exciting step. It is a service that deploys Docker containers for us, which means we're finally at the step of having our simple NodeJs app running on AWS infrastructure!

Create a file fargate.tf with the contents:

resource "aws_ecs_task_definition" "backend_task" {
    family = "backend_example_app_family"

    // Fargate is a type of ECS that requires awsvpc network_mode
    requires_compatibilities = ["FARGATE"]
    network_mode = "awsvpc"

    // Valid sizes are shown here: https://aws.amazon.com/fargate/pricing/
    memory = "512"
    cpu = "256"

    // Fargate requires task definitions to have an execution role ARN to support ECR images
    execution_role_arn = "${aws_iam_role.ecs_role.arn}"

    container_definitions = <<EOT
[
    {
        "name": "example_app_container",
        "image": "<your_ecr_repo_url>:latest",
        "memory": 512,
        "essential": true,
        "portMappings": [
            {
                "containerPort": 3000,
                "hostPort": 3000
            }
        ]
    }
]
EOT
}

resource "aws_ecs_cluster" "backend_cluster" {
    name = "backend_cluster_example_app"
}

resource "aws_ecs_service" "backend_service" {
    name = "backend_service"

    cluster = "${aws_ecs_cluster.backend_cluster.id}"
    task_definition = "${aws_ecs_task_definition.backend_task.arn}"

    launch_type = "FARGATE"
    desired_count = 1

    network_configuration {
        subnets = ["${aws_subnet.public_a.id}", "${aws_subnet.public_b.id}"]
        security_groups = ["${aws_security_group.security_group_example_app.id}"]
        assign_public_ip = true
    }
}

Please fill in where I specified

Fargate is a type of the Elastic Container Service, which has three concepts:

Clusters: a logical grouping of tasks or services
Services: where you group tasks and specify how many of each task you want running at a time
Tasks: defines a specified docker image to use, along with what IAM roles are needed to use it

It should be pretty easy to map those concepts to the three terraform resource blocks above.

There are quite a few arguments I won't go over in detail here, but they mostly relate to:

Connecting the service to the VPC we created in network.tf
Ensuring this is the cheapest possible Fargate instance you can run so you don't spend more than a few cents on your demo app.
Connecting the task, service, and cluster to each other.
Giving the web app a public IP Address so you can visit the website you just made.

Find the public IP Address on the task page in your AWS console, and go to http://:3000 to view your super scalable hello world application!

Sometimes you need to put sensitive data in your terraform code, or otherwise you need to repeat the same values over and over (such as with an AWS region). That's where variables come in.

This page is a summary of the official terraform docs on input variables.

To declare a variable, you can write a variable block:

variable "aws_region" {
  default     = "eu-west-1"
  description = "Which region should the resources be deployed into?"
}

Anywhere you want to use the value of that variable in your resource or provider blocks, you can just enter something like:

provider "aws" {
  region  = "${var.aws_region}"
}

and the variable will be injected.

Overriding variables with the CLI

You can specify a variable in a terraform plan or terraform apply command by running something like

terraform apply -var="region=us-east-1"

Overriding variables from a file

You can store your secrets in a file, and then load them all in with the -var-file flag.

Example vars.tfvars file:

region = "us-east-1"
family = "some_other_var"

Usage:

terraform apply -var-file="vars.tfvars"

If you have sensitive data in this file, make sure it is in your .gitignore.

Overriding at runtime

If you don't specify a default value, running terraform plan or terraform apply will ask you for an input before running.

Overriding using Environment Variables

Any env var with the prefix TF_VAR_ will be picked up automatically.

From the terminal, type:

export TF_VAR_region="us-east-1"

Datadog

Datadog is a tool for collecting metrics about your apps, and provides the options to add dashboards and alerts to stay on top of out of line metrics. It even has some fancy ML code that watches over your stats and looks for anomalies. Some examples of useful questions Datadog can answer for you are:

How long does my /some/url/endpoint url take to return a response on average?
How many times is a particular line of code ran?
What is the memory usage on my EC2 resources?
How many rows are returned from my postgres queries?
How long are my Apollo queries taking?

and many more.

Datadog data collection is often automatic once you install the Datadog Agent, but can also require installation of an integration. They have integrations for dozens of popular services, including:

Apollo Engine
Express
NodeJs (collecting statsd metrics)
Postgres
AWS services

and more. Most of these integrations require a few short lines of code to add in, and are rather painless.

Installing the Agent

Let's start by installing the agent, which is software that runs on your servers and sends the metrics to Datadog. You don't have to manually send data ever, the agent simply runs in the background and sends the data for you without blocking your tasks. How neat is that? That's pretty neat.

In your fargate.tf file from earlier, add the following json into your task definition. We are using the publically available datadog agent Docker image from Docker Hub and are running it in the same task as our webapp. By doing so, the agent will examine Fargate for us and will give us useful slices in our dashboard by Docker image, EC2 server, etc. Because we are using Fargate, it is required to add the ECS_FARGATE flag to be true so the auto discovery can happen. It also needs your api key so that it can publish the metrics it collects to your dashboard.

{
  "name": "datadog-agent",
  "image": "datadog/agent:latest",
  "essential": true,
  "environment": [
    {
      "name": "DD_API_KEY",
      "value": "${var.datadog_api_key}"
    },
    {
      "name": "ECS_FARGATE",
      "value": "true"
    }
  ]
}

After running terraform apply, you should see metrics about your Fargate cluster appear in Datadog within 5 minutes or so :)

Collecting some stats

StatsD is a daemon for aggregating arbitrary stats. Datadog supports it as an easy to install integration.

So why would you use it?

Say you want to keep track of how many times a specific line of code has run. At Transcend, an example is that we keep track of how many times a user submits a DSR.

Let's create a new express route where we will keep track of how many times it is requested (this is a simple example as Datadog already tracks this, but the concept can be used anywhere).

First, we need to install dogstatsd:

npm install --save hot-shots

Then, we need to initialize the stats client:

const StatsD = require('hot-shots');
const dogstatsd = new StatsD();

Lastly, we can use the client from our routes:

app.get('/one', (req, res) => {
  dogstatsd.increment('page.views.one');
  res.send('one');
});

I encourage you to be very liberal with counters, histograms, and any supported statsd data types you want a metric for. They are great for anytime you want to track a metric that doesn't have an existing integration that works out of the box from Datadog. As we'll see later, it is very easy to setup alerts in the datadog console for when thresholds are crossed.

It's important to cleanup any resources you created in this codelab so that we don't get charged for them going forwards.

To do so, all it takes is a:

terraform destroy

When prompted, type yes and all the resources will magically disappear.

Terraform outputs
Logging
Dev/Prod env modules
AWS Tags
Load balancing
Moving Terraform state to the Cloud
SSL certs and Domain Names
Datadog APM + a few integrations + alerts