1. Overview
Permissions boundaries are hard, especially with databases. You need them to be hidden away in private subnets, but want highly available access to them without hassle.
Traditionally, you would use a bastion host (AKA Jump Server) in a public subnet to get access to your resources in private subnets, which works for its purpose. But managing these servers was cumbersome and annoying, as they lived with public DNS, and often used SSH for security enforcement which meant admins needed to manage ssh keys in a way that stayed up to date with their IAM policies.
Enter AWS Session Manager, AKA SSM. This tool has been widely blogged about, as it gives access to servers through IAM Policies instead of SSH keys. From a quick search, I found these great resources:
- https://medium.com/@dnorth98/hello-aws-session-manager-farewell-ssh-7fdfa4134696 Says that RDS access is possible, but doesn't show how
- https://www.reddit.com/r/aws/comments/df6uip/ssm_tunnelling_ec2_what_about_rds/ Gives a bash script to access RDS, but doesn't explain it or how to set up the necessary Infra
- https://aws.amazon.com/blogs/aws/new-session-manager/ Introduces shell access, without much depth with examples
- https://binx.io/blog/2019/02/02/how-to-login-to-ec2-instances-without-ssh/ Has CloudFormation templates, which is nice, but doesn't show RDS access
- https://cloudonaut.io/goodbye-ssh-use-aws-session-manager-instead/ No talk of RDS
- https://medium.com/tensult/use-aws-system-manager-bastion-free-ssh-key-free-access-to-ec2-instances-e6897c4143c5 No talk of RDS
- https://github.com/elpy1/ssh-over-ssm Gives a CLI tool to get secure access to RDS, without much instructions on how to create infrastructure
Even with these awesome resources, it wasn't immediately clear how to get started, especially with modern infrastructure management practices like terraform. And of the tools, like ssh-over-ssm
, there is a significant prerequisite knowledge needed to make use of them.
Just about everyone on the planet with RDS instances wants to access them from a local port, so the goal of this codelab is to explore how to get secure access from scratch. It will explore some older ways of getting access, which will hopefully help explain why the industry has moved the way we have to the current best practices approach of combining EC2 Instance Connect with SSM.
At each step, I will create fully working example with basic security setups. The goal of this tutorial is to create near-production ready examples.
Now for a haiku:
Start with the basics
Security can be hard
It takes time to learn
2. Setting up an RDS Instance
Making a VPC
To start with, let's whip up a quick RDS Server in a private subnet with no internet access. In a new terraform module, add the following code:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 2.18.0"
name = "codelab-vpc"
cidr = "10.0.0.0/16"
azs = ["eu-west-1a", "eu-west-1b"]
# For the bastion host
public_subnets = ["10.0.101.0/24", "10.0.102.0/24"]
# For the RDS Instance
database_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
# Allow private DNS
enable_dns_hostnames = true
enable_dns_support = true
}
This describes a VPC we want to put all of our resources from this codelab into. It creates two public subnets that we can put the bastion host into, and two private subnets that we can put a database in. It also enables private DNS so that our bastion server will be able to reach out to the RDS endpoint.
To actually create these resources, let's run terraform init
followed by terraform plan -out plan
. If the plan looks good to you, you can apply it with a terraform apply plan
.
Making an RDS Instance in that VPC
Now, let's create an RDS instance. In the same terraform file, add:
module "db" {
source = "terraform-aws-modules/rds/aws"
version = "2.5.0"
# Put the DB in a private subnet of the VPC created above
vpc_security_group_ids = [module.db_security_group.this_security_group_id]
create_db_subnet_group = false
db_subnet_group_name = module.vpc.database_subnet_group
# Make it postgres just as an example
identifier = "codelab-db"
name = "codelab_db"
engine = "postgres"
engine_version = "10.6"
username = "codelab_user"
password = "codelab_password"
port = 5432
# Disable stuff we don't care about
create_db_option_group = false
create_db_parameter_group = false
# Other random required variables that we don't care about in this codelab
allocated_storage = 5 # GB
instance_class = "db.t2.small"
maintenance_window = "Tue:00:00-Tue:03:00"
backup_window = "03:00-06:00"
}
module "db_security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "3.1.0"
name = "codelab-db-sg"
vpc_id = module.vpc.vpc_id
# Allow all incoming SSL traffic from the VPC
ingress_cidr_blocks = [module.vpc.vpc_cidr_block]
ingress_rules = ["postgresql-tcp"]
# Allow all outgoing HTTP and HTTPS traffic for updates
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["http-80-tcp", "https-443-tcp"]
}
You can use any other config you'd like for the RDS instance, my setup here was just meant to be as simple as possible with a small postgres instance. It is important to note that the security group is open to incoming TCP on port 5432
so that we can query it through the bastion hosts.
Run another terraform init
, then terraform plan -out plan
. If the plan looks good, apply it with terraform apply plan
. It can take up to 40m to provisision a new RDS instance (in my case it took 9m), so please be patient. We only have to do this once.
Wrap Up
Woohoo! You now have a database we can use to test bastion configurations with :) It is not accessible to the public internet, and has a strict security policy.
Here's a haiku about RDS Instances in VPCs:
Made a VPC
Also made an RDS
One in the other
3. Setting up a Bastion Server
Intro
The RDS instance isn't super interesting yet, because it doesn't have any tables, data, or access set up. Because our security policy and DNS are so strict, we can't directly query it in any way yet.
In this step, we will set up a standard bastion server that we can SSH to that will let us query the database.
Terraform Changes
In the same terraform file as before, add the following:
module "ssh_key_pair" {
source = "terraform-aws-modules/key-pair/aws"
version = "0.2.0"
# Feel free to change if you want to use a different public key
public_key = file("~/.ssh/id_rsa.pub")
key_name = "bastion_public_key"
}
module "bastion_security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "3.1.0"
name = "codelab-bastion-sg"
vpc_id = module.vpc.vpc_id
# Allow all incoming SSH traffic
ingress_cidr_blocks = ["0.0.0.0/0"]
ingress_rules = ["ssh-tcp"]
# Allow all outgoing HTTP and HTTPS traffic, as well as communication to db
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["http-80-tcp", "https-443-tcp", "postgresql-tcp"]
}
module "bastion" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "2.12.0"
# Ubuntu 18.04 LTS AMI
ami = "ami-035966e8adab4aaad"
name = "codelab-bastion"
associate_public_ip_address = true
instance_type = "t2.small"
key_name = module.ssh_key_pair.this_key_pair_key_name
vpc_security_group_ids = [module.bastion_security_group.this_security_group_id]
subnet_ids = module.vpc.public_subnets
}
###########
# Outputs #
###########
output "bastion_ip" {
value = module.bastion.public_ip[0]
}
output "rds_endpoint" {
value = module.db.this_db_instance_endpoint
}
This creates an Ubuntu 18.04 EC2 server in a public subnet of the VPC we created earlier. It's security group is open to incoming SSH and to outgoing communications with the RDS instance. It is set up so that your local ~/.ssh/id_rsa
ssh key can be used to authenticate to the instance.
After the outputs of running terraform init
and terraform plan -out plan
look good, run a terraform apply
to create the resources. Once completed, we're ready to ssh onto the instance!
Establishing SSH Tunnel
To begin, run the command:
ssh -L 5432:`terraform output -raw rds_endpoint` -Nf ubuntu@`terraform output -raw bastion_ip`
Breaking this down, it says:
- ssh ubuntu@
terraform output bastion_ip
: Let's ssh to the bastion server we just created - -L 5432:
terraform output rds_endpoint
: Forward the remote database socket to local port 5432 -f
sends the ssh command execution to a background process so the tunnel stays open after the command completes-N
says not to execute anything remotely. As we are just port forwarding, this is fine.
Once complete, our tunnel has been established. Now let's create some test data!
Creating Test Data
We're going to add some test data to the RDS instance here just to make it easy to validate that we can still access the data later once we use more complicated setups. But it should be noted that this is entirely optional.
Install psql
and postgres
any way you know how, such as sudo apt-get -y install postgresql postgresql-contrib
on ubuntu.
Then, let's create a table:
psql -d codelab_db -p 5432 \
-h localhost \
-U codelab_user \
-c "CREATE TABLE codelab_table (Name varchar(255))"
and add some data:
psql -d codelab_db -p 5432 \
-h localhost \
-U codelab_user \
-c "INSERT INTO codelab_table (Name) VALUES ('codelab_data')"
If you're prompted for a password, use the password we set in terraform earlier, codelab_password
.
Finally, let's close the tunnel with kill $(lsof -t -i :5432)
, which says to kill the process that controls local port 5432
.
Haiku
Do you hike? Cuz I want to haiku:
We did it. We're in!
Let the bastions grind you down?
Nah, I conquer them.
4. Security Problems of a Bastion Server
Cool AWS Services
Pre September 2018, what we have now would be considered standard, even cutting edge to those adopting cloud services. But in that September, AWS Systems Manager Session Manager (SSM) was announced, which helps to provide secure, access-controlled, and audited EC2 management.
Source: https://aws.amazon.com/about-aws/whats-new/2018/09/introducing-aws-systems-manager-session-manager/
A related service, EC2 Instance Connect, was introduced in June of 2019: https://aws.amazon.com/about-aws/whats-new/2019/06/introducing-amazon-ec2-instance-connect/. It enabled IAM based ssh controls with CloudTrail auditing, as well as browser based SSH from the AWS Console for those who like web GUIs.
Why they are so cool
These two products are superior to our current bastion setup for a few reasons:
- Managing SSH keys is hard, especially in multiple production environments. IAM based access is easier, especially if you're already managing IAM access to other resources. In our current example, managing a single key is easy. But you'd need to be creative to manage multiple keys, especially if you wanted them to dynamically stay up to date with new members of your company.
- Databases are.. well.. important. Auditing access to them can be critical, but auditing SSH sessions manually is a pain.
- SSM enables you to access instances in private subnets (as long as they have external internet access through a NAT Gateway).
- By not requiring public DNS access, attackers can only attack your instance if they have access to your AWS account. This is a huge improvement over a public IP address that they could attack with any sort of SSH-based attack. We all know you don't constantly keep up to date with updating OpenSSL on all your exposed servers ;)
For the next step, we'll add support for SSH'ing to our instance over EC2 Instance Connect.
Haiku
Did you ever hear of the time when the villagers of a mountainous domain overthrew their government? It was called the high coup. And it went a little something like this:
AWS
Services out the wazzu
Gotta learn 'em all
5. Setting Up EC2 Instance Connect
Why EC2 Instance Connect is Awesome
One of the cool things about EC2 Instance Connect is that you don't use long term SSH keys that need to live on the bastion instance. Instead, you use the aws ec2-instance-connect
cli tools to send temporary public keys to the instance, that you then have 60 seconds to authenticate to using the private key. After the 60 seconds are up, the public key is forgotten.
This is more powerful than it may at first seem. What this really means is that:
- Access is now controlled by IAM policies of who can use
aws ec2-instance-connect send-ssh-public-key
- You don't need to manage SSH keys on the instance
- AWS can automagically audit all access, just by watching when people sent ssh keys
This requires a few quick updates to our terraform.
Updating Terraform Code
First, you can entirely remove the ssh_key_pair
module, as we no longer need to manage SSH keys :)
Then, add an IAM Instance Profile that will allow your bastion to make use of the EC2InstanceConnect
policy:
module instance_profile_role {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
version = "~> 3.0"
role_name = "codelab-role"
create_role = true
create_instance_profile = true
role_requires_mfa = false
trusted_role_services = ["ec2.amazonaws.com"]
custom_role_policy_arns = ["arn:aws:iam::aws:policy/EC2InstanceConnect"]
}
Next, update your bastion module to remove the key_pair, make use of the new instance profile, and install ec2-instance-connect
, by making it look like:
module "bastion" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "2.12.0"
# Ubuntu 18.04 LTS AMI
ami = "ami-035966e8adab4aaad"
name = "codelab-bastion"
associate_public_ip_address = true
instance_type = "t2.small"
vpc_security_group_ids = [module.bastion_security_group.this_security_group_id]
subnet_ids = module.vpc.public_subnets
iam_instance_profile = module.instance_profile_role.this_iam_instance_profile_name
# Install dependencies
user_data = <<USER_DATA
#!/bin/bash
sudo apt-get update
sudo apt-get -y install ec2-instance-connect
USER_DATA
}
The very last update is to add two new outputs:
output "instance_id" {
value = module.bastion.id[0]
}
output "az" {
value = module.bastion.availability_zone[0]
}
As per usual, run a terraform init
then terraform plan -out plan
. If the output looks good (note that the bastion instance should be recreated), then run a terraform apply
.
Hooray! You have now enabled EC2 Instance connect on the instance. To connect, we need to generate a temporary ssh key, send the public key to the instance, and then create an SSH tunnel using the private key.
Establishing SSH Tunnel
This is as easy as:
echo -e 'y\n' | ssh-keygen -t rsa -f /tmp/temp -N '' >/dev/null 2>&1
aws ec2-instance-connect send-ssh-public-key \
--instance-id `terraform output -raw instance_id` \
--availability-zone `terraform output -raw az` \
--instance-os-user ubuntu \
--ssh-public-key file:///tmp/temp.pub
ssh -L 5432:`terraform output -raw rds_endpoint` -Nf -i /tmp/temp ubuntu@`terraform output -raw bastion_ip`
Verifying our Test Data is still there
Checking that is worked is as easy as executing a psql
command to verify the data we added earlier is still there:
psql -d codelab_db -p 5432 \
-h localhost \
-U codelab_user \
-c "SELECT * FROM codelab_table"
To cleanup, let's close the tunnel with kill $(lsof -t -i :5432)
.
Haiku
"Hey there," Kew said. "Hi Kew," I replied:
Short lived keys are good
To dust - the keys shall return
But the tunnel lives
6. Setting Up Systems Manager
There's still room for improvement
We already have a pretty darn secure and auditable system, but at Transcend we strive to minimize public DNS endpoints wherever possible to reduce our attack surface.
In the current setup, our bastion still has a public IP address, which is a requirement for EC2 Instance Connect (without using the SSM trick this codelab is leading up to). If we moved our bastion to a private subnet, where it could have no public DNS, we would lose EC2 Instance Connect access.
However, we could still access it with AWS Systems manager: so let's talk about it.
About SSM
Systems Manager comes with a bunch of premade "Documents" that you can run on sets of EC2s, similar-ish to running Ansible Playbooks on servers, if you're familiar. If you aren't familiar with Ansible or other automation tools, no worries - the idea is pretty simple: automate running the same commands on a bunch of machines at once.
In this code lab, we just need to worry about the AWS-StartSSHSession
Document, which lets you create SSH sessions to EC2s, with the only requirements being that the EC2:
- Has internet access
- Has an instance profile allowing it to talk to an SSM endpoint
- Has SSM installed and running on the server (which Ubuntu 18.04 LTS amis do by default)
Enabling SSM Access
To start, update the VPC module to have some private subnets, and to have NAT gateways that will allow those private subnets to access the internet:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 2.18.0"
name = "codelab-vpc"
cidr = "10.0.0.0/16"
azs = ["eu-west-1a", "eu-west-1b"]
# For the NAT gateways
public_subnets = ["10.0.101.0/24", "10.0.102.0/24"]
# For the bastion host
private_subnets = ["10.0.201.0/24", "10.0.202.0/24"]
# For the RDS Instance
database_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
# Ensure the private gateways can talk to the internet for SSM
enable_nat_gateway = true
# Allow private DNS
enable_dns_hostnames = true
enable_dns_support = true
}
Next, In the database_security_group
security group, update the ingress_cidr_blocks
param to be module.vpc.private_subnets_cidr_blocks
(notice that it should not be in a list anymore). This is really cool, as it ensures that only private instances can access your database, a huge security win!
On your bastion_security_group
security group, you can entirely remove the ingress_cidr_blocks
and ingress_rules
lines, because we no longer need the SSH ports open. SSM works by sending out requests to the SSM endpoint, so we can entirely eliminate the ingress.
Think about how cool that is, our instance will have no public DNS, and will have no security group ingress: it's like a dream! If you're like me and dream about highly restricted network access, anyways.
On the bastion
module, there are two changes:
- set
associate_public_ip_address
tofalse
(or remove the line entirely, which has the same effect) - Change the
subnet_ids
input to bemodule.vpc.private_subnets
.
Lastly, we need to update the IAM permissions to allow for the bastion host to talk to the SSM service endpoints.
In your instance_profile_role
module, add the "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
arn to the custom_role_policy_arns
list. Make sure to add this to the beginning of the list, as it must come before "arn:aws:iam::aws:policy/EC2InstanceConnect"
.
And we're all done with our complete terraform setup!
Run terraform plan -out plan
, verify the changes look good, and run terraform apply plan
to finish off.
Haiku
Private subnets rock
But you need a NAT gateway
to get to the web
7. Tunneling with SSM + EC2 Instance Connect
We're finally to the final step: tunneling through a bastion instance in a private subnet to reach an RDS instance in its own even more private subnet.
SSH'ing with ProxyCommand
It has been pretty common for a long time to have bastions or Jump Servers, where someone would want to SSH onto a system and then SSH from that instance to a different instance that couldn't be SSH'd to without first going to the bastion instance.
To help with this, ssh
has an option called ProxyCommand
. There's a great blog at https://www.cyberciti.biz/faq/linux-unix-ssh-proxycommand-passing-through-one-host-gateway-server/ if you're unfamiliar and want to see example of its usage in depth, but the main idea is just that it allows you to make two ssh
jumps in a single command.
A command like ssh -o ProxyCommand="ssh -W %h:%p user@bastion.com" other_user@private.server.com
is roughly equivalent to:
ssh user@bastion.com
ssh other_user@private.server.com # run from the bastion.com server
How does that help us
We want to use our bastion instance as a jump server in a ProxyCommand to get to our RDS instance, but our bastion does not have any public DNS that we can put as the host in our ssh
command.
That's where the SSM Document we talked about earlier, AWS-StartSSHSession
, comes in. It uses some SSM TCP magic to create SSH sessions to any server with SSM enabled.
The final command
Quick prereq: Install the SSM plugin for your cli: https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html
Putting this all together, we can run the commands:
echo -e 'y\n' | ssh-keygen -t rsa -f /tmp/temp -N '' >/dev/null 2>&1
aws ec2-instance-connect send-ssh-public-key \
--instance-id `terraform output -raw instance_id` \
--availability-zone `terraform output -raw az` \
--instance-os-user ubuntu \
--ssh-public-key file:///tmp/temp.pub
ssh -i /tmp/temp \
-Nf -M \
-L 5432:`terraform output -raw rds_endpoint` \
-o "UserKnownHostsFile=/dev/null" \
-o "StrictHostKeyChecking=no" \
-o ProxyCommand="aws ssm start-session --target %h --document AWS-StartSSHSession --parameters portNumber=%p --region=eu-west-1" \
ubuntu@`terraform output -raw instance_id`
To test that this worked, let's run a query:
psql -d codelab_db -p 5432 \
-h localhost \
-U codelab_user \
-c "SELECT * FROM codelab_table"
Yay. :)
Cleanup
Lets kill the tunnel: kill $(lsof -t -i :5432)
and then remove all AWS resources: terraform destroy -auto-approve
Gosh I love terraform :)
Haiku
Got to kill the port
Got to destroy resources
It's time to clean up
8. Reference Code
Here is the final code after completing the tutorial:
##################
# Create the VPC #
##################
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 2.18.0"
name = "codelab-vpc"
cidr = "10.0.0.0/16"
azs = ["eu-west-1a", "eu-west-1b"]
# For the bastion host
private_subnets = ["10.0.101.0/24", "10.0.102.0/24"]
# For the NAT gateways
public_subnets = ["10.0.201.0/24", "10.0.202.0/24"]
# For the RDS Instance
database_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
# Ensure the private gateways can talk to the internet for SSM
enable_nat_gateway = true
# Allow private DNS
enable_dns_hostnames = true
enable_dns_support = true
}
#######################
# Create the database #
#######################
module "db" {
source = "terraform-aws-modules/rds/aws"
version = "2.5.0"
# Put the DB in a private subnet of the VPC created above
vpc_security_group_ids = [module.db_security_group.this_security_group_id]
create_db_subnet_group = false
db_subnet_group_name = module.vpc.database_subnet_group
# Make it postgres just as an example
identifier = "codelab-db"
name = "codelab_db"
engine = "postgres"
engine_version = "10.6"
username = "codelab_user"
password = "codelab_password"
port = 5432
# Disable stuff we don't care about
create_db_option_group = false
create_db_parameter_group = false
# Other random required variables that we don't care about in this codelab
allocated_storage = 5 # GB
instance_class = "db.t2.small"
maintenance_window = "Tue:00:00-Tue:03:00"
backup_window = "03:00-06:00"
}
module "db_security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "3.1.0"
name = "codelab-db-sg"
vpc_id = module.vpc.vpc_id
# Allow all incoming SSL traffic from the VPC
ingress_cidr_blocks = module.vpc.private_subnets_cidr_blocks
ingress_rules = ["postgresql-tcp"]
# Allow all outgoing HTTP and HTTPS traffic for updates
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["http-80-tcp", "https-443-tcp"]
}
###############################
# Create the bastion instance #
###############################
module "bastion_security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "3.1.0"
name = "codelab-bastion-sg"
vpc_id = module.vpc.vpc_id
# Allow all outgoing HTTP and HTTPS traffic, as well as communication to db
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["http-80-tcp", "https-443-tcp", "postgresql-tcp"]
}
module "bastion" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "2.12.0"
# Ubuntu 18.04 LTS AMI
ami = "ami-035966e8adab4aaad"
name = "codelab-bastion"
instance_type = "t2.small"
vpc_security_group_ids = [module.bastion_security_group.this_security_group_id]
subnet_ids = module.vpc.private_subnets
iam_instance_profile = module.instance_profile_role.this_iam_instance_profile_name
# Install dependencies
user_data = <<USER_DATA
#!/bin/bash
sudo apt-get update
sudo apt-get -y install ec2-instance-connect
USER_DATA
}
module instance_profile_role {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
version = "~> 2.7.0"
role_name = "codelab-role"
create_role = true
create_instance_profile = true
role_requires_mfa = false
trusted_role_services = ["ec2.amazonaws.com"]
custom_role_policy_arns = [
"arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
"arn:aws:iam::aws:policy/EC2InstanceConnect",
]
}
###########
# Outputs #
###########
output "instance_id" {
value = module.bastion.id[0]
}
output "az" {
value = module.bastion.availability_zone[0]
}
output "rds_endpoint" {
value = module.db.this_db_instance_endpoint
}
145 lines for a VPC with solid security, a private database, and a private bastion host isn't too shabby.