Permissions boundaries are hard, especially with databases. You need them to be hidden away in private subnets, but want highly available access to them without hassle.
Traditionally, you would use a bastion host (AKA Jump Server) in a public subnet to get access to your resources in private subnets, which works for its purpose. But managing these servers was cumbersome and annoying, as they lived with public DNS, and often used SSH for security enforcement which meant admins needed to manage ssh keys in a way that stayed up to date with their IAM policies.
Enter AWS Session Manager, AKA SSM. This tool has been widely blogged about, as it gives access to servers through IAM Policies instead of SSH keys. From a quick search, I found these great resources:
Even with these awesome resources, it wasn't immediately clear how to get started, especially with modern infrastructure management practices like terraform. And of the tools, like ssh-over-ssm
, there is a significant prerequisite knowledge needed to make use of them.
Just about everyone on the planet with RDS instances wants to access them from a local port, so the goal of this codelab is to explore how to get secure access from scratch. It will explore some older ways of getting access, which will hopefully help explain why the industry has moved the way we have to the current best practices approach of combining EC2 Instance Connect with SSM.
At each step, I will create fully working example with basic security setups. The goal of this tutorial is to create near-production ready examples.
Now for a haiku:
Start with the basics
Security can be hard
It takes time to learn
To start with, let's whip up a quick RDS Server in a private subnet with no internet access. In a new terraform module, add the following code:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 2.18.0"
name = "codelab-vpc"
cidr = "10.0.0.0/16"
azs = ["eu-west-1a", "eu-west-1b"]
# For the bastion host
public_subnets = ["10.0.101.0/24", "10.0.102.0/24"]
# For the RDS Instance
database_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
# Allow private DNS
enable_dns_hostnames = true
enable_dns_support = true
}
This describes a VPC we want to put all of our resources from this codelab into. It creates two public subnets that we can put the bastion host into, and two private subnets that we can put a database in. It also enables private DNS so that our bastion server will be able to reach out to the RDS endpoint.
To actually create these resources, let's run terraform init
followed by terraform plan -out plan
. If the plan looks good to you, you can apply it with a terraform apply plan
.
Now, let's create an RDS instance. In the same terraform file, add:
module "db" {
source = "terraform-aws-modules/rds/aws"
version = "2.5.0"
# Put the DB in a private subnet of the VPC created above
vpc_security_group_ids = [module.db_security_group.this_security_group_id]
create_db_subnet_group = false
db_subnet_group_name = module.vpc.database_subnet_group
# Make it postgres just as an example
identifier = "codelab-db"
name = "codelab_db"
engine = "postgres"
engine_version = "10.6"
username = "codelab_user"
password = "codelab_password"
port = 5432
# Disable stuff we don't care about
create_db_option_group = false
create_db_parameter_group = false
# Other random required variables that we don't care about in this codelab
allocated_storage = 5 # GB
instance_class = "db.t2.small"
maintenance_window = "Tue:00:00-Tue:03:00"
backup_window = "03:00-06:00"
}
module "db_security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "3.1.0"
name = "codelab-db-sg"
vpc_id = module.vpc.vpc_id
# Allow all incoming SSL traffic from the VPC
ingress_cidr_blocks = [module.vpc.vpc_cidr_block]
ingress_rules = ["postgresql-tcp"]
# Allow all outgoing HTTP and HTTPS traffic for updates
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["http-80-tcp", "https-443-tcp"]
}
You can use any other config you'd like for the RDS instance, my setup here was just meant to be as simple as possible with a small postgres instance. It is important to note that the security group is open to incoming TCP on port 5432
so that we can query it through the bastion hosts.
Run another terraform init
, then terraform plan -out plan
. If the plan looks good, apply it with terraform apply plan
. It can take up to 40m to provisision a new RDS instance (in my case it took 9m), so please be patient. We only have to do this once.
Wrap Up
Woohoo! You now have a database we can use to test bastion configurations with :) It is not accessible to the public internet, and has a strict security policy.
Here's a haiku about RDS Instances in VPCs:
Made a VPC
Also made an RDS
One in the other
The RDS instance isn't super interesting yet, because it doesn't have any tables, data, or access set up. Because our security policy and DNS are so strict, we can't directly query it in any way yet.
In this step, we will set up a standard bastion server that we can SSH to that will let us query the database.
In the same terraform file as before, add the following:
module "ssh_key_pair" {
source = "terraform-aws-modules/key-pair/aws"
version = "0.2.0"
# Feel free to change if you want to use a different public key
public_key = file("~/.ssh/id_rsa.pub")
key_name = "bastion_public_key"
}
module "bastion_security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "3.1.0"
name = "codelab-bastion-sg"
vpc_id = module.vpc.vpc_id
# Allow all incoming SSH traffic
ingress_cidr_blocks = ["0.0.0.0/0"]
ingress_rules = ["ssh-tcp"]
# Allow all outgoing HTTP and HTTPS traffic, as well as communication to db
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["http-80-tcp", "https-443-tcp", "postgresql-tcp"]
}
module "bastion" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "2.12.0"
# Ubuntu 18.04 LTS AMI
ami = "ami-035966e8adab4aaad"
name = "codelab-bastion"
associate_public_ip_address = true
instance_type = "t2.small"
key_name = module.ssh_key_pair.this_key_pair_key_name
vpc_security_group_ids = [module.bastion_security_group.this_security_group_id]
subnet_ids = module.vpc.public_subnets
}
###########
# Outputs #
###########
output "bastion_ip" {
value = module.bastion.public_ip[0]
}
output "rds_endpoint" {
value = module.db.this_db_instance_endpoint
}
This creates an Ubuntu 18.04 EC2 server in a public subnet of the VPC we created earlier. It's security group is open to incoming SSH and to outgoing communications with the RDS instance. It is set up so that your local ~/.ssh/id_rsa
ssh key can be used to authenticate to the instance.
After the outputs of running terraform init
and terraform plan -out plan
look good, run a terraform apply
to create the resources. Once completed, we're ready to ssh onto the instance!
To begin, run the command:
ssh -L 5432:`terraform output -raw rds_endpoint` -Nf ubuntu@`terraform output -raw bastion_ip`
Breaking this down, it says:
terraform output bastion_ip
: Let's ssh to the bastion server we just createdterraform output rds_endpoint
: Forward the remote database socket to local port 5432-f
sends the ssh command execution to a background process so the tunnel stays open after the command completes-N
says not to execute anything remotely. As we are just port forwarding, this is fine.Once complete, our tunnel has been established. Now let's create some test data!
We're going to add some test data to the RDS instance here just to make it easy to validate that we can still access the data later once we use more complicated setups. But it should be noted that this is entirely optional.
Install psql
and postgres
any way you know how, such as sudo apt-get -y install postgresql postgresql-contrib
on ubuntu.
Then, let's create a table:
psql -d codelab_db -p 5432 \
-h localhost \
-U codelab_user \
-c "CREATE TABLE codelab_table (Name varchar(255))"
and add some data:
psql -d codelab_db -p 5432 \
-h localhost \
-U codelab_user \
-c "INSERT INTO codelab_table (Name) VALUES ('codelab_data')"
If you're prompted for a password, use the password we set in terraform earlier, codelab_password
.
Finally, let's close the tunnel with kill $(lsof -t -i :5432)
, which says to kill the process that controls local port 5432
.
Do you hike? Cuz I want to haiku:
We did it. We're in!
Let the bastions grind you down?
Nah, I conquer them.
Pre September 2018, what we have now would be considered standard, even cutting edge to those adopting cloud services. But in that September, AWS Systems Manager Session Manager (SSM) was announced, which helps to provide secure, access-controlled, and audited EC2 management.
Source: https://aws.amazon.com/about-aws/whats-new/2018/09/introducing-aws-systems-manager-session-manager/
A related service, EC2 Instance Connect, was introduced in June of 2019: https://aws.amazon.com/about-aws/whats-new/2019/06/introducing-amazon-ec2-instance-connect/. It enabled IAM based ssh controls with CloudTrail auditing, as well as browser based SSH from the AWS Console for those who like web GUIs.
These two products are superior to our current bastion setup for a few reasons:
For the next step, we'll add support for SSH'ing to our instance over EC2 Instance Connect.
Did you ever hear of the time when the villagers of a mountainous domain overthrew their government? It was called the high coup. And it went a little something like this:
AWS
Services out the wazzu
Gotta learn 'em all
One of the cool things about EC2 Instance Connect is that you don't use long term SSH keys that need to live on the bastion instance. Instead, you use the aws ec2-instance-connect
cli tools to send temporary public keys to the instance, that you then have 60 seconds to authenticate to using the private key. After the 60 seconds are up, the public key is forgotten.
This is more powerful than it may at first seem. What this really means is that:
aws ec2-instance-connect send-ssh-public-key
This requires a few quick updates to our terraform.
First, you can entirely remove the ssh_key_pair
module, as we no longer need to manage SSH keys :)
Then, add an IAM Instance Profile that will allow your bastion to make use of the EC2InstanceConnect
policy:
module instance_profile_role {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
version = "~> 3.0"
role_name = "codelab-role"
create_role = true
create_instance_profile = true
role_requires_mfa = false
trusted_role_services = ["ec2.amazonaws.com"]
custom_role_policy_arns = ["arn:aws:iam::aws:policy/EC2InstanceConnect"]
}
Next, update your bastion module to remove the key_pair, make use of the new instance profile, and install ec2-instance-connect
, by making it look like:
module "bastion" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "2.12.0"
# Ubuntu 18.04 LTS AMI
ami = "ami-035966e8adab4aaad"
name = "codelab-bastion"
associate_public_ip_address = true
instance_type = "t2.small"
vpc_security_group_ids = [module.bastion_security_group.this_security_group_id]
subnet_ids = module.vpc.public_subnets
iam_instance_profile = module.instance_profile_role.this_iam_instance_profile_name
# Install dependencies
user_data = <<USER_DATA
#!/bin/bash
sudo apt-get update
sudo apt-get -y install ec2-instance-connect
USER_DATA
}
The very last update is to add two new outputs:
output "instance_id" {
value = module.bastion.id[0]
}
output "az" {
value = module.bastion.availability_zone[0]
}
As per usual, run a terraform init
then terraform plan -out plan
. If the output looks good (note that the bastion instance should be recreated), then run a terraform apply
.
Hooray! You have now enabled EC2 Instance connect on the instance. To connect, we need to generate a temporary ssh key, send the public key to the instance, and then create an SSH tunnel using the private key.
This is as easy as:
echo -e 'y\n' | ssh-keygen -t rsa -f /tmp/temp -N '' >/dev/null 2>&1
aws ec2-instance-connect send-ssh-public-key \
--instance-id `terraform output -raw instance_id` \
--availability-zone `terraform output -raw az` \
--instance-os-user ubuntu \
--ssh-public-key file:///tmp/temp.pub
ssh -L 5432:`terraform output -raw rds_endpoint` -Nf -i /tmp/temp ubuntu@`terraform output -raw bastion_ip`
Checking that is worked is as easy as executing a psql
command to verify the data we added earlier is still there:
psql -d codelab_db -p 5432 \
-h localhost \
-U codelab_user \
-c "SELECT * FROM codelab_table"
To cleanup, let's close the tunnel with kill $(lsof -t -i :5432)
.
"Hey there," Kew said. "Hi Kew," I replied:
Short lived keys are good
To dust - the keys shall return
But the tunnel lives
We already have a pretty darn secure and auditable system, but at Transcend we strive to minimize public DNS endpoints wherever possible to reduce our attack surface.
In the current setup, our bastion still has a public IP address, which is a requirement for EC2 Instance Connect (without using the SSM trick this codelab is leading up to). If we moved our bastion to a private subnet, where it could have no public DNS, we would lose EC2 Instance Connect access.
However, we could still access it with AWS Systems manager: so let's talk about it.
Systems Manager comes with a bunch of premade "Documents" that you can run on sets of EC2s, similar-ish to running Ansible Playbooks on servers, if you're familiar. If you aren't familiar with Ansible or other automation tools, no worries - the idea is pretty simple: automate running the same commands on a bunch of machines at once.
In this code lab, we just need to worry about the AWS-StartSSHSession
Document, which lets you create SSH sessions to EC2s, with the only requirements being that the EC2:
To start, update the VPC module to have some private subnets, and to have NAT gateways that will allow those private subnets to access the internet:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 2.18.0"
name = "codelab-vpc"
cidr = "10.0.0.0/16"
azs = ["eu-west-1a", "eu-west-1b"]
# For the NAT gateways
public_subnets = ["10.0.101.0/24", "10.0.102.0/24"]
# For the bastion host
private_subnets = ["10.0.201.0/24", "10.0.202.0/24"]
# For the RDS Instance
database_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
# Ensure the private gateways can talk to the internet for SSM
enable_nat_gateway = true
# Allow private DNS
enable_dns_hostnames = true
enable_dns_support = true
}
Next, In the database_security_group
security group, update the ingress_cidr_blocks
param to be module.vpc.private_subnets_cidr_blocks
(notice that it should not be in a list anymore). This is really cool, as it ensures that only private instances can access your database, a huge security win!
On your bastion_security_group
security group, you can entirely remove the ingress_cidr_blocks
and ingress_rules
lines, because we no longer need the SSH ports open. SSM works by sending out requests to the SSM endpoint, so we can entirely eliminate the ingress.
Think about how cool that is, our instance will have no public DNS, and will have no security group ingress: it's like a dream! If you're like me and dream about highly restricted network access, anyways.
On the bastion
module, there are two changes:
associate_public_ip_address
to false
(or remove the line entirely, which has the same effect)subnet_ids
input to be module.vpc.private_subnets
.Lastly, we need to update the IAM permissions to allow for the bastion host to talk to the SSM service endpoints.
In your instance_profile_role
module, add the "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
arn to the custom_role_policy_arns
list. Make sure to add this to the beginning of the list, as it must come before "arn:aws:iam::aws:policy/EC2InstanceConnect"
.
And we're all done with our complete terraform setup!
Run terraform plan -out plan
, verify the changes look good, and run terraform apply plan
to finish off.
Private subnets rock
But you need a NAT gateway
to get to the web
We're finally to the final step: tunneling through a bastion instance in a private subnet to reach an RDS instance in its own even more private subnet.
It has been pretty common for a long time to have bastions or Jump Servers, where someone would want to SSH onto a system and then SSH from that instance to a different instance that couldn't be SSH'd to without first going to the bastion instance.
To help with this, ssh
has an option called ProxyCommand
. There's a great blog at https://www.cyberciti.biz/faq/linux-unix-ssh-proxycommand-passing-through-one-host-gateway-server/ if you're unfamiliar and want to see example of its usage in depth, but the main idea is just that it allows you to make two ssh
jumps in a single command.
A command like ssh -o ProxyCommand="ssh -W %h:%p user@bastion.com" other_user@private.server.com
is roughly equivalent to:
ssh user@bastion.com
ssh other_user@private.server.com # run from the bastion.com server
We want to use our bastion instance as a jump server in a ProxyCommand to get to our RDS instance, but our bastion does not have any public DNS that we can put as the host in our ssh
command.
That's where the SSM Document we talked about earlier, AWS-StartSSHSession
, comes in. It uses some SSM TCP magic to create SSH sessions to any server with SSM enabled.
Quick prereq: Install the SSM plugin for your cli: https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html
Putting this all together, we can run the commands:
echo -e 'y\n' | ssh-keygen -t rsa -f /tmp/temp -N '' >/dev/null 2>&1
aws ec2-instance-connect send-ssh-public-key \
--instance-id `terraform output -raw instance_id` \
--availability-zone `terraform output -raw az` \
--instance-os-user ubuntu \
--ssh-public-key file:///tmp/temp.pub
ssh -i /tmp/temp \
-Nf -M \
-L 5432:`terraform output -raw rds_endpoint` \
-o "UserKnownHostsFile=/dev/null" \
-o "StrictHostKeyChecking=no" \
-o ProxyCommand="aws ssm start-session --target %h --document AWS-StartSSHSession --parameters portNumber=%p --region=eu-west-1" \
ubuntu@`terraform output -raw instance_id`
To test that this worked, let's run a query:
psql -d codelab_db -p 5432 \
-h localhost \
-U codelab_user \
-c "SELECT * FROM codelab_table"
Yay. :)
Lets kill the tunnel: kill $(lsof -t -i :5432)
and then remove all AWS resources: terraform destroy -auto-approve
Gosh I love terraform :)
Got to kill the port
Got to destroy resources
It's time to clean up
Here is the final code after completing the tutorial:
##################
# Create the VPC #
##################
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 2.18.0"
name = "codelab-vpc"
cidr = "10.0.0.0/16"
azs = ["eu-west-1a", "eu-west-1b"]
# For the bastion host
private_subnets = ["10.0.101.0/24", "10.0.102.0/24"]
# For the NAT gateways
public_subnets = ["10.0.201.0/24", "10.0.202.0/24"]
# For the RDS Instance
database_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
# Ensure the private gateways can talk to the internet for SSM
enable_nat_gateway = true
# Allow private DNS
enable_dns_hostnames = true
enable_dns_support = true
}
#######################
# Create the database #
#######################
module "db" {
source = "terraform-aws-modules/rds/aws"
version = "2.5.0"
# Put the DB in a private subnet of the VPC created above
vpc_security_group_ids = [module.db_security_group.this_security_group_id]
create_db_subnet_group = false
db_subnet_group_name = module.vpc.database_subnet_group
# Make it postgres just as an example
identifier = "codelab-db"
name = "codelab_db"
engine = "postgres"
engine_version = "10.6"
username = "codelab_user"
password = "codelab_password"
port = 5432
# Disable stuff we don't care about
create_db_option_group = false
create_db_parameter_group = false
# Other random required variables that we don't care about in this codelab
allocated_storage = 5 # GB
instance_class = "db.t2.small"
maintenance_window = "Tue:00:00-Tue:03:00"
backup_window = "03:00-06:00"
}
module "db_security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "3.1.0"
name = "codelab-db-sg"
vpc_id = module.vpc.vpc_id
# Allow all incoming SSL traffic from the VPC
ingress_cidr_blocks = module.vpc.private_subnets_cidr_blocks
ingress_rules = ["postgresql-tcp"]
# Allow all outgoing HTTP and HTTPS traffic for updates
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["http-80-tcp", "https-443-tcp"]
}
###############################
# Create the bastion instance #
###############################
module "bastion_security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "3.1.0"
name = "codelab-bastion-sg"
vpc_id = module.vpc.vpc_id
# Allow all outgoing HTTP and HTTPS traffic, as well as communication to db
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["http-80-tcp", "https-443-tcp", "postgresql-tcp"]
}
module "bastion" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "2.12.0"
# Ubuntu 18.04 LTS AMI
ami = "ami-035966e8adab4aaad"
name = "codelab-bastion"
instance_type = "t2.small"
vpc_security_group_ids = [module.bastion_security_group.this_security_group_id]
subnet_ids = module.vpc.private_subnets
iam_instance_profile = module.instance_profile_role.this_iam_instance_profile_name
# Install dependencies
user_data = <<USER_DATA
#!/bin/bash
sudo apt-get update
sudo apt-get -y install ec2-instance-connect
USER_DATA
}
module instance_profile_role {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
version = "~> 2.7.0"
role_name = "codelab-role"
create_role = true
create_instance_profile = true
role_requires_mfa = false
trusted_role_services = ["ec2.amazonaws.com"]
custom_role_policy_arns = [
"arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
"arn:aws:iam::aws:policy/EC2InstanceConnect",
]
}
###########
# Outputs #
###########
output "instance_id" {
value = module.bastion.id[0]
}
output "az" {
value = module.bastion.availability_zone[0]
}
output "rds_endpoint" {
value = module.db.this_db_instance_endpoint
}
145 lines for a VPC with solid security, a private database, and a private bastion host isn't too shabby.