AWS Control Tower setting up issues and solutions

Here’s a list of possible issues and solutions when setting up your landing zone in AWS Control Tower.

Your AWS environment is not ready for AWS Control Tower to be set up.

AWS Control Tower detected issues with your AWS account environment that prevent successful setup.

  • You must unsubscribe your organization from AWS Config so that AWS Control Tower can proceed. You’ll incur additional costs if you turn off AWS Config to set up AWS Control Tower and then turn AWS Config on again.
  • Only one AWS Control Tower environment is permitted per organization.

Issue: “You must unsubscribe your organization from AWS Config so that AWS Control Tower can proceed. You’ll incur additional costs if you turn off AWS Config to set up AWS Control Tower and then turn AWS Config on again.”

Solution: Disable Config from the AWS Organization services.

disable config

Issue: Only one AWS Control Tower environment is permitted per organization.

Solution: Disable AWS Control Tower from the AWS Organization services.

disable control tower

Visit the AWS Control Tower Troubleshooting page for other possible issues and solutions.

photo of warehouse

Elastic Container Registry ECR – AWS CLI commands

Let’s get started with the Amazon Elastic Container Registry or ECR – AWS CLI commands. The AWS CLI is an easy and quick method to verify something or learn about the various API’s available for an AWS Service. This post covers Amazon’s Elastic Container Registry (ECR) way of tagging an container image, creating a private Amazon ECR repository, authenticating with ECR repository, pushing a container image to ECR, getting the images scan findings, deleting images, and sharing unexpected error messages I received along with their possible solution.

Push Images to Elastic Container Registry ECR

I want to transfer the container images I downloaded from Docker to Amazon Elastic Container Registry (ECR).

You’ll need AWS IAM policy to create and manage the repository. Below is the minimum to get started. For ECR IAM policy best practices see the AWS documentation. Or use IAM Access Analyzer policy generation.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ecr",
            "Effect": "Allow",
            "Action": [
                "ecr:PutLifecyclePolicy",
                "ecr:DescribeImageScanFindings",
                "ecr:StartImageScan",
                "ecr:GetLifecyclePolicyPreview",
                "ecr:GetDownloadUrlForLayer",
                "ecr:PutImageScanningConfiguration",
                "ecr:DescribeImageReplicationStatus",
                "ecr:ListTagsForResource",
                "ecr:ListImages",
                "ecr:BatchGetRepositoryScanningConfiguration",
                "ecr:DeleteRepository",
                "ecr:PutImage",
                "ecr:UntagResource",
                "ecr:BatchGetImage",
                "ecr:DescribeImages",
                "ecr:TagResource",
                "ecr:DescribeRepositories",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetRepositoryPolicy",
                "ecr:GetLifecyclePolicy",
                "ecr:SetRepositoryPolicy",
                "ecr:DeleteRepositoryPolicy",
                "ecr:DeleteLifecyclePolicy"
            ],
            "Resource": "arn:aws:ecr:<aws-region>:<account-id>:repository/<repo-name-here>"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetRegistryPolicy",
                "ecr:CreateRepository",
                "ecr:DescribePullThroughCacheRules",
                "ecr:DescribeRegistry",
                "ecr:GetAuthorizationToken",
                "ecr:PutRegistryScanningConfiguration",
                "ecr:GetRegistryScanningConfiguration"
            ],
            "Resource": "*"
        }
    ]
}

Create ECR repository – AWS CLI

aws ecr create-repository \
    --repository-name <repo-name> \
    --image-scanning-configuration scanOnPush=true \
    --encryption-configuration encryptionType="AES256" # Dev: Use KMS in Production for fine grain control of the key

aws ecr put-lifecycle-policy \
    --repository-name <repo-name> \
    --lifecycle-policy-text "file://policy.json"

# Contents of policy.json:

{
    "rules": [
        {
            "rulePriority": 1,
            "description": "Keep the last 5 images.",
            "selection": {
                "tagStatus": "tagged",
                "tagPrefixList": [
                    "v"
                ],
                "countType": "imageCountMoreThan",
                "countNumber": 5
            },
            "action": {
                "type": "expire"
            }
        }
    ]
}

Create ECR repository – Terraform

For this tutorial we’ll create a private repository with basic image scanning, enable scans on push, and for now we’ll use an Amazon managed encryption key. For production environments please take the time to create a KMS key and use that for your repo.


#tfsec:ignore:aws-ecr-repository-customer-key 
module "repo" {
  source = "terraform-aws-modules/ecr/aws"

  repository_name               = format("%s-%s-image-gallery", var.environment, var.aws_region)
  repository_type               = "private"
  registry_scan_type            = "BASIC" # No additional cost for basic scanning!
  repository_image_scan_on_push = true
  repository_encryption_type    = "AES256" # Production: Use KMS for fine grain control of the key

  repository_lifecycle_policy = jsonencode({
    rules = [
      {
        rulePriority = 1,
        description  = "Keep the last 5 images.",
        selection = {
          tagStatus     = "tagged",
          tagPrefixList = ["v"],
          countType     = "imageCountMoreThan",
          countNumber   = 5
        },
        action = {
          type = "expire"
        }
      }
    ]
  })

  tags = {
    Name = format("%s-%s-img-repo", var.environment, var.aws_region)
  }
}

Authenticate with Amazon ECR

Get the commands from the AWS ECR console. Navigate to the repo and click on View Push Commands button at the top right.

View push commands button

Select your OS tab and follow the instructions as provided.

view push commands
ECR push commands
aws ecr get-login-password --region <aws-region> | docker login --username AWS --password-stdin <account-id>.dkr.ecr.<aws-region>.amazonaws.com

Tag existing container images

I have downloaded images from the Docker Hub so I don’t need to build images. We just need to tag them before we push them.

docker tag bitnami/mariadb:10.6 <account-id>.dkr.ecr.<aws-region>.amazonaws.com/<repo-name>:10.6

docker tag bitnami/wordpress:6 <account-id>.dkr.ecr.<aws-region>.amazonaws.com/<repo-name>:6

Push image to Amazon ECR

docker push <account-id>.dkr.ecr.<aws-region>.amazonaws.com/<repo-name>:10.6

List ECR images – Amazon ECR

aws ecr list-images --repository-name <repo-name> --output json --no-cli-pager

Get image scan findings – Amazon ECR

aws ecr describe-image-scan-findings --repository-name <repo-name> --image-id imageDigest=sha256:<image-id> --no-cli-pager --query "imageScanFindings.findingSeverityCounts"
{
    "HIGH": 1,
    "MEDIUM": 4,
    "LOW": 5,
    "UNDEFINED": 5,
    "INFORMATIONAL": 29
}

Delete image – Amazon ECR

aws ecr batch-delete-image --image-ids imageDigest=sha256:5879db2e34d5c237677f0af681780543b50be28e8f3b6b4bd44363e0833bdad1 --repository-name <repo-name>

Subscribe: Get the latest cloudly.engineer post directly to your inbox. Just enter your email address below. Thanks!


Issues and solutions

  1. An error occurred (InvalidParameterException) when calling the PutLifecyclePolicy operation.
An error occurred (InvalidParameterException) when calling the PutLifecyclePolicy operation: Invalid parameter at 'LifecyclePolicyText' failed to satisfy constraint: 'Lifecycle policy validation failure: instance failed to match exactly one schema (matched 0 out of 2)
'

Solution: Remove “countUnit”: “days” from your policy.json file

2. ImageReferencedByManifestList failurecode

You can’t delete the child before the parent image.

{
    "imageIds": [],
    "failures": [
        {
            "imageId": {
                "imageDigest": "sha256:5879db2e34d5c237677f0af681780543b50be28e8f3b6b4bd44363e0833bdad1"
            },
            "failureCode": "ImageReferencedByManifestList",
            "failureReason": "Requested image referenced by manifest list: [sha256:d326a7001411a2ffae889ed70007258fb70585b4a08caeb3bf21de1ce6552f01]"
        }
    ]
}

{
    "imageIds": [
        {
            "imageDigest": "sha256:d326a7001411a2ffae889ed70007258fb70585b4a08caeb3bf21de1ce6552f01",
            "imageTag": "10.6"
        },
        {
            "imageDigest": "sha256:5879db2e34d5c237677f0af681780543b50be28e8f3b6b4bd44363e0833bdad1"
        },
        {
            "imageDigest": "sha256:8f0dfdf7b1012c368633cf88542d38f4c7ea63656fca5dc474cc8661d3e6c617"
        }
    ]
}

Solution: Run list-images and select the root imageDigest first. In the example below. Delete the image digest that starts with “d326” first (line 17), then the ones beneath it.

3. unexpected status: 403 Forbidden. You get the “unexpected status: 403 Forbidden” when attempting to push the image.

Solution: Your AWS credentials didn’t expire or you have access. Also, run ‘aws ecr get-login-password‘ as specified above.

4. failed to do request error. If you forget to add the repository name in the image tag, then you may get the following error:

docker push <account-id>.dkr.ecr.<aws-region>.amazonaws.com/mariadb:10.6

failed to do request: Post "https://<account-id>.dkr.ecr.<aws-region>.amazonaws.com/v2/mariadb/blobs/uploads/": EOF

Solution: Include the repository name between .com/ and :<version>

docker push <account-id>.dkr.ecr.<aws-region>.amazonaws.com/<repo-name>:10.6
aerial photography of container van lot

Amazon EKS IAM roles and policies with Terraform

Before you use or approve Amazon EKS in production you must have a security checklist. Everyone’s list is different but everyone’s listing must-have items to ensure authentication and authorization are at a minimum; in other words least privilege. Let’s explore Amazon EKS IAM roles and policies written in Terraform!

What are some suggestions to improve your Amazon EKS IAM design?

  • Start with the managed roles and policies, then review AWS CloudTrail logs to see what events or API calls actually occurs
  • Start creating your own managed IAM policies and IAM roles; one at a time
  • If possible require MFA for ensuring the user is who he/she says they are
  • Once you have validated your custom roles and policies then add conditions to your IAM policies; again one condition at a time

Before I show any code it’s important to know basic AWS IAM terminology. Let’s add Identity-based policies Resource-based policies to your vocabulary. Resource-based policies are about “what”. The Identity-based policies are about “who”. Then there’s the “action” that identity or resource can use based on the “effect”. There are dozens of EKS IAM actions available, see the Actions defined by Amazon Elastic Kubernetes Service page.

EKS Cluster Authentication

Just a reminder on how EKS cluster authentication works. Bottomline is that all permissions are essentially managed by Kubernetes Role Based Access Control (RBAC). However AWS IAM is involved.

Prerequisites

Prior to creating your EKS cluster be sure to identify which IAM role or user will be the “primary” identity to create the EKS cluster. The identity that first creates the EKS cluster will be automatically added to K8s system:masters group. Which is great, however you will not be able to visually see that identity in the ConfigMap!

Amazon EKS IAM Resource

The Amazon EKS cluster resource has the following ARN:

arn:${Partition}:eks:${Region}:${Account}:cluster/${ClusterName}

Note: Use a wildcard (“*”) if you really need to specify all clusters. Also you cannot use this resource filter pattern for certain events such as creating a new cluster. How would anyone know that? Well, take a look again at the Actions defined by Amazon Elastic Kubernetes Service page. You’ll notice in the CreateCluster action the Resource box is empty.

eks actions table
I drew a oval shape in the Resource type cell; notice cluster* isn’t listed

IAM policies based on the cluster name

Read/View all clusters: Terraform IAM example

Initially the user or role may not have any EKS permissions, so if you attempt to list all the clusters it would return an error like this.

aws eks list-clusters

An error occurred (AccessDeniedException) when calling the ListClusters operation: User: arn:aws:iam::1234567890:user/read-only-all is not authorized to perform: eks:ListClusters on resource: arn:aws:eks:us-east-2:1234567890:cluster/*

The “eks:ListClusters” Resource must not be restricted so therefore it’s in a different statement than the other actions that do allow the EKS resource ARN. See example below.

{
    "Statement": [
        {
            "Action": [
                "eks:ListUpdates",
                "eks:ListTagsForResource",
                "eks:ListNodegroups",
                "eks:ListIdentityProviderConfigs",
                "eks:ListFargateProfiles",
                "eks:ListAddons",
                "eks:DescribeCluster"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:eks:us-east-2:1234567890:cluster/*",
            "Sid": "ReadAllEKSclusters"
        },
        {
            "Action": "eks:ListClusters",
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "ListAllEKSclusters"
        }
    ],
    "Version": "2012-10-17"
}

Rerun list cluster action with this newly attached policy.

aws eks list-clusters

# My results
{
    "clusters": [
        "aws001-preprod-dev-eks"
    ]
}

The “Describe cluster” action and few others will require the exact name of the cluster you want to describe and because the ARN is set to all (asterisk in the ARN) that means this policy allows describing all clusters.

aws eks describe-cluster --name aws001-preprod-dev-eks

# result

{
    "cluster": {
        "name": "aws001-preprod-dev-eks",
        "arn": "arn:aws:eks:us-east-2:1234567890:cluster/aws001-preprod-dev-eks",
        "createdAt": "2022-03-22T06:37:57.278000-04:00",
        "version": "1.21",
        "endpoint": "https://abc123456789.gr7.us-east-2.eks.amazonaws.com",
        "roleArn": "arn:aws:iam::1234567890:role/aws001-preprod-dev-eks-cluster-role",
        "resourcesVpcConfig": {
            "subnetIds": [
...
}

Read only specific cluster name

This time simply replace the last asterisk in the resource name with the cluster name, in this case I’m using Terraform so I’m passing the name via a local variable.

actions = [
      "eks:AccessKubernetesApi",
      "eks:DescribeCluster",
      "eks:ListAddons",
      "eks:ListFargateProfiles",
      "eks:ListIdentityProviderConfigs",
      "eks:ListNodegroups",
      "eks:ListTagsForResource",
      "eks:ListUpdates"
    ]

    resources = [
      "arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:cluster/${local.cluster_name}",
    ]
  }
aws eks list-clusters --profile iam

An error occurred (AccessDeniedException) when calling the ListClusters operation: User: arn:aws:iam::1234567890:user/Waleed is not authorized to perform: eks:ListClusters on resource: arn:aws:eks:us-east-2:1234567890:cluster/*

Now listing all EKS clusters is not allowed.

Modify all clusters

This role should only be assigned to a few EKS/Kubernetes administrators whose highly experienced or certified to manage Kubernetes clusters in the AWS Cloud. Larger organizations may have a single administrator per cluster while others have one administrator to manage multiple.

statement {
    sid = "ModifyAllEKSclusters"

    actions = [
      "eks:AccessKubernetesApi",
      "eks:Associate*",
      "eks:Create*",
      "eks:Delete*",
      "eks:DeregisterCluster",
      "eks:Describe*",
      "eks:List*",
      "eks:RegisterCluster",
      "eks:TagResource",
      "eks:UntagResource",
      "eks:Update*"
    ]

    resources = [
      "*"
    ]
  }

  statement {
    sid = "Deny"

    # No major updates allowed in this example
    actions = [
      "eks:CreateCluster",
      "eks:DeleteCluster"
    ]

    resources = [
      "*"
    ]
  }

Modify a specific cluster

statement {
    sid = "ModifyaEKScluster"

    actions = [
      "eks:AccessKubernetesApi",
      "eks:Associate*",
      "eks:Create*",
      "eks:Delete*",
      "eks:DeregisterCluster",
      "eks:DescribeCluster",
      "eks:DescribeUpdate",
      "eks:List*",
      "eks:TagResource",
      "eks:UntagResource",
      "eks:Update*"
    ]

    resources = [
      "arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:cluster/${local.cluster_name}",
    ]
  }

  statement {
    sid = "ModifyaEKSclusterResource"

    actions = [
      "eks:DescribeNodegroup",
      "eks:DescribeFargateProfile",
      "eks:DescribeIdentityProviderConfig",
      "eks:DescribeAddon"
    ]

    resources = [
      "arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:cluster/${local.cluster_name}",
      "arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:nodegroup/${local.cluster_name}/*/*",
      "arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:addon/${local.cluster_name}/*/*",
      "arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:identityproviderconfig/${local.cluster_name}/*/*/*",
      "arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:fargateprofile/${local.cluster_name}/*/*"
    ]
  }

  # These actions don't use the 'cluster' resource type
  statement {
    sid = "Modify"

    actions = [
      "eks:RegisterCluster",
      "eks:DisassociateIdentityProviderConfig"
    ]

    resources = [
      "*",
    ]
  }

  statement {
    sid = "Deny"

    # No major updates allowed in this example
    actions = [
      "eks:CreateCluster",
      "eks:DeleteCluster"
    ]

    resources = [
      "*"
    ]
  }

For a complete list of Amazon EKS actions see the original documentation.

Amazon EKS resources ARN

Use these various ARN patterns to constrain permissions for certain teams.

TypeARN pattern
clusterarn:aws:eks:${Region}:${Account}:cluster/${ClusterName}
nodegrouparn:aws:eks:${Region}:${Account}:nodegroup/${ClusterName}/${NodegroupName}/${UUID}
addonarn:aws:eks:${Region}:${Account}:addon/${ClusterName}/${AddonName}/${UUID}
fargateprofilearn:aws:eks:${Region}:${Account}:fargateprofile/${ClusterName}/${FargateProfileName}/${UUID}
identityproviderconfigarn:aws:eks:${Region}:${Account}:identityproviderconfig/${ClusterName}/${IdentityProviderType}/${IdentityProviderConfigName}/${UUID}
EKS resource ARN patterns

EKS IAM Condition Keys

In addition to filtering EKS permissions with the resource name(s), you can further filter down by using keys.

KeyTypeDescription
aws:RequestTag/${TagKey}stringUse this to ensure tags are present in the “request”/create call. Basically before creating new resources.
aws:ResourceTag/${TagKey}stringUse this to find resources that have the tags already. It’s for existing resources with tags.
aws:TagKeysArrayOfStringSimilar to aws:RequestTag/${TagKey} but it’s a list of tag keys, instead of just one.
eks:clientIdstringThe “clientId” value in the associateIdentityProviderConfig call
eks:issuerUrlstringThe “issuerUrl” value in the associateIdentityProviderConfig call
{
            "Sid": "TagEKSWithTheseTags",
            "Effect": "Allow",
            "Action": [
                "eks:CreateCluster",
                "eks:TagResource"
            ],
            "Resource": "*",
            "Condition": {
                "StringEqualsIfExists": {
                    "aws:RequestTag/environment": [
                        "development",
                        "sandbox"
                    ],
                    "aws:RequestTag/jobfunction": "DevOps"
                },
                "ForAllValues:StringEquals": {
                    "aws:TagKeys": [
                        "environment",
                        "jobfunction"
                    ]
                }
            }
        }

Other IAM policies

EKS Console Admin policy: This permission will allow full read and write to the Configuration tab on the EKS console. The Resources and the Overview tabs requires Kubernetes RBAC permissions.

data "aws_iam_policy_document" "console_admin" {

  statement {
    sid = "admin"

    actions = [
      "eks:*"
    ]

    resources = [
      "*"
    ]
  }

  statement {
    sid = "console"

    effect = "Allow"
    actions = [
      "iam:PassRole"
    ]

    resources = [
      "*"
    ]

    condition {
      test     = "StringEquals"
      variable = "iam:PassedToService"
      values   = ["eks.amazonaws.com"]
    }
  }
}

Update a Kubernetes cluster version: This policy will only allow to update just the Kubernetes Cluster version. In the Terraform example below, updating the cluster version is only allowed when the EKS cluster has a tag of “environment” and a value of “sandbox”; the EKS cluster that’s in the current account and region of course.

data "aws_iam_policy_document" "cluster_version" {

  statement {
    sid = "admin"

    actions = [
      "eks:UpdateClusterVersion"
    ]

    resources = [
      "arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:cluster/*"
    ]

    condition {
      test     = "StringEquals"
      variable = "aws:ResourceTag/environment"
      values   = ["sandbox"]
    }
  }
}

EKS Service-linked roles

The following table shows all the service-linked roles that are automatically created when you create the cluster and its components.

ComponentRole nameService URLIAM PolicyIAM Policy ARN
EKS Cluster roleAWSServiceRoleForAmazonEKSeks.amazonaws.comAmazonEKSServiceRolePolicyarn:aws:iam::aws:policy/aws-service-role/AmazonEKSServiceRolePolicy
EKS node groupsAWSServiceRoleForAmazonEKSNodegroupeks-nodegroup.amazonaws.comAWSServiceRoleForAmazonEKSNodegrouparn:aws:iam::aws:policy/aws-service-role/AWSServiceRoleForAmazonEKSNodegroup
EKS Fargate profilesAWSServiceRoleForAmazonEKSForFargateeks-fargate.amazonaws.comAmazonEKSForFargateServiceRolePolicyarn:aws:iam::aws:policy/aws-service-role/AmazonEKSForFargateServiceRolePolicy
EKS ConnectorAWSServiceRoleForAmazonEKSConnectoreks-connector.amazonaws.comAmazonEKSConnectorServiceRolePolicyarn:aws:iam::aws:policy/aws-service-role/AmazonEKSConnectorServiceRolePolicy
EKS service-linked roles

EKS IAM Roles

Amazon EKS Cluster Role

The AmazonEKSClusterPolicy is required to be attached to your EKS Cluster role before you create your cluster.

resource "aws_iam_role" "eks_cluster_role" {
  name = "eks-cluster-role"
  tags = local.required_tags

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "eks_cluster_role" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks_cluster_role.name
}

Amazon EKS node IAM role

Each node (EC2 instance for example) uses IAM roles to make AWS API calls. Before you can create and register nodes to the EKS cluster they must have an IAM role with the following policies attached AmazonEKSWorkerNodePolicy and AmazonEC2ContainerRegistryReadOnly. We’ll add the AmazonEKS_CNI_Policy later.

locals {
  eks_node_policies = ["AmazonEC2ContainerRegistryReadOnly", "AmazonEKSWorkerNodePolicy"]
}

resource "aws_iam_role" "eks_node_role" {
  name = "eks-node-role"
  tags = local.required_tags

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "eks_node_role" {
  for_each = toset(local.eks_node_policies)

  policy_arn = "arn:aws:iam::aws:policy/${each.value}"
  role       = aws_iam_role.eks_node_role.name
}

Amazon EKS CNI Policy

You can attach the AmazonEKS_CNI_Policy to the node role above. However, you must follow least privilege model to protect your nodes as much as possible. We’ll need to create the IAM roles for Kubernetes service account or IRSA. There are multiple steps to create it, but we’re in luck because there’s an IRSA Terraform module built by the AWS Open Source community! if you’re using IPV4. There’s an another IRSA Terraform module maintained by the community.

locals {
  addon_context = {
    aws_caller_identity_account_id = data.aws_caller_identity.current.account_id
    aws_caller_identity_arn        = data.aws_caller_identity.current.arn
    aws_eks_cluster_endpoint       = data.aws_eks_cluster.eks_cluster.endpoint
    aws_partition_id               = data.aws_partition.current.partition
    aws_region_name                = data.aws_region.current.name
    eks_oidc_issuer_url            = local.eks_oidc_issuer_url
    eks_cluster_id                 = aws_eks_cluster.this.id
    eks_oidc_provider_arn          = "arn:${data.aws_partition.current.partition}:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.eks_oidc_issuer_url}"
    tags                           = local.required_tags
  }
}

module "vpc_cni_irsa" {
  source = "git@github.com:aws-ia/terraform-aws-eks-blueprints.git//modules/irsa?ref=v4.2.1"

  kubernetes_namespace              = "kube-system"
  kubernetes_service_account        = "aws-node"
  create_kubernetes_namespace       = false
  create_kubernetes_service_account = false
  irsa_iam_policies                 = ["arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"]
  addon_context                     = local.addon_context
}
ComponentService URLIAM PolicyIAM Policy ARN
EKS Cluster roleeks.amazonaws.comAmazonEKSClusterPolicyarn:aws:iam::aws:policy/AmazonEKSClusterPolicy
EKS node roleec2.amazonaws.comAmazonEKSWorkerNodePolicy

AmazonEC2ContainerRegistryReadOnly
arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
EKS Fargate profileseks-fargate.amazonaws.comAmazonEKSForFargateServiceRolePolicyarn:aws:iam::aws:policy/aws-service-role/AmazonEKSForFargateServiceRolePolicy
EKS Connectoreks-connector.amazonaws.comAmazonEKSConnectorServiceRolePolicyarn:aws:iam::aws:policy/aws-service-role/AmazonEKSConnectorServiceRolePolicy
EKS service-linked roles

EKS Fargate profiles

We cannot use the node IAM role for the EKS Farget profiles, we have to create a pod execution IAM role. Kubernetes Role based access control (RBAC) will use this pod execution IAM role for authorization to AWS services, for example to pull an image from Amazon Elastic Container Registry (ECR). The code below creates the Amazon EKS pod execution IAM role with the required policy and trust settings.

resource "aws_iam_role" "eks_pod_exe_role" {
  name = "eks-fargate-pod-execution-role"
  tags = local.required_tags

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks-fargate-pods.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
         "ArnLike": {
            "aws:SourceArn": "arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:fargateprofile/${local.cluster_name}/*"
         }
      }
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "eks_pod_exe_role" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSFargatePodExecutionRolePolicy"
  role       = aws_iam_role.eks_pod_exe_role.name
}

EKS Connector

This is a read or view only feature to see your Kubernetes clusters from your other cloud providers or on-premises or running on your EC2s. This also needs a different IAM role.

# #########################################
# EKS Connector
# #########################################

data "aws_iam_policy_document" "connector" {

  statement {
    sid = "SsmControlChannel"

    actions = [
      "ssmmessages:CreateControlChannel"
    ]

    resources = [
      "arn:aws:eks:*:*:cluster/*"
    ]
  }

  statement {
    sid = "ssmDataplaneOperations"

    actions = [
      "ssmmessages:CreateDataChannel",
      "ssmmessages:OpenDataChannel",
      "ssmmessages:OpenControlChannel"
    ]

    resources = ["*"]
  }
}

resource "aws_iam_policy" "connector" {
  name   = "eks-connector"
  path   = "/"
  policy = data.aws_iam_policy_document.connector.json

  tags = {
    "Name" = "eks-connector"
  }
}
resource "aws_iam_role" "eks_connector_role" {
  name = "eks-connector-role"
  tags = local.required_tags

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ssm.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "eks_connector_role" {
  policy_arn = aws_iam_policy.connector.arn
  role       = aws_iam_role.eks_connector_role.name
}

To learn more about AWS managed policies, see https://docs.aws.amazon.com/eks/latest/userguide/security-iam-awsmanpol.html

To see all the code in Terraform, visit the GitHub repo.

If you don’t know how Terraform works, then jump to the Intro to Terraform guide first.

two spoons

Terraform AWS KMS Multi-Region Keys

Terraform just (November 2021) released the resource to create replica KMS keys! As the name says, a Multi-Region Key is a single key that’s available in two different AWS regions. There are few use cases, such as reducing cost of keys. Even a better case is the ability to share encrypted objects like AMI’s with other regions or accounts. Before I start showing the Terraform AWS KMS Multi-Region Keys Module, you have to know what AWS KMS is. Checkout my previous posts, AWS Key management service (KMS) – Part 1 and AWS KMS Customer Managed CMK with Terraform.

Terraform AWS KMS Multi-Region Keys Module code

We’ll need another “aws” provider. The second provider will be for your replicated key. This region will be different than the first provider.

provider "aws" {
  alias  = "replica"
  region = var.replica_region
}

The primary key will still use the original “aws_kms_key” Terraform resource. I just added additional tags. Don’t forget the key alias!

resource "aws_kms_key" "primary" {
  multi_region             = true
  description              = var.description
  customer_master_key_spec = var.key_spec
  is_enabled               = var.is_enabled
  enable_key_rotation      = var.rotation_enabled
  policy                   = var.primary_key_policy
  deletion_window_in_days  = var.deletion_window_in_days

  tags = merge(
    var.tags,
    {
      "Multi-Region" = "true",
      "Primary"      = "true"
    }
  )
}

# Add an alias to the primary key
resource "aws_kms_alias" "primary" {
  name          = "alias/${var.alias}"
  target_key_id = aws_kms_key.primary.key_id
}

Here comes the boom! The “aws_kms_replica_key” terraform resource is required to replicate the key that was just created with the above resource. That’s done with the “primary_key_arn” parameter. The key ARN of the replica key is the key ARN of the primary key.

Notice the “provider” is required in order to ensure this is created in another region. Now, you can reverse this design. You can have the provider on your primary key instead but this is my preference.

You can have a different or the same key policy. The alias, tags, description, deleteion_window_in_days can be the same or different, it doesn’t matter. It is “enabled” and there’s no option to rotate a replica key because the rotation is managed by the primary key.

# Create the replica key using the primary's arn.
resource "aws_kms_replica_key" "replica" {
  provider = aws.replica

  description             = var.description
  deletion_window_in_days = var.deletion_window_in_days
  primary_key_arn         = aws_kms_key.primary.arn
  policy                  = var.replica_key_policy

  tags = merge(
    var.tags,
    {
      "Multi-Region" = "true",
      "Primary"      = "false"
    }
  )
}

# Add an alias to the replica key
resource "aws_kms_alias" "replica" {
  provider = aws.replica

  name          = "alias/${var.alias}"
  target_key_id = aws_kms_replica_key.replica.key_id
}

Module usage

Here’s an example on how to use this module.

data "aws_iam_policy_document" "ebs_key" {
  statement {
    sid       = "Enable IAM User Permissions"
    effect    = "Allow"
    actions   = ["kms:*"]
    resources = ["*"]

    principals {
      type        = "AWS"
      identifiers = ["arn:aws:iam::${local.account_id}:root"]
    }
  }

  statement {
    sid    = "Allow access for Key Administrators"
    effect = "Allow"
    actions = [
      "kms:Create*",
      "kms:Describe*",
      "kms:Enable*",
      "kms:List*",
      "kms:Put*",
      "kms:Update*",
      "kms:Revoke*",
      "kms:Disable*",
      "kms:Get*",
      "kms:Delete*",
      "kms:TagResource",
      "kms:UntagResource",
      "kms:ScheduleKeyDeletion",
      "kms:CancelKeyDeletion"
    ]
    resources = ["*"]

    principals {
      type = "AWS"
      identifiers = [
        "arn:aws:iam::${local.account_id}:user/${local.admin_username}",
        "arn:aws:iam::${local.account_id}:role/${local.role_name}"
      ]
    }
  }

  statement {
    sid    = "Allow use of the key"
    effect = "Allow"
    actions = [
      "kms:Encrypt",
      "kms:Decrypt",
      "kms:ReEncrypt*",
      "kms:GenerateDataKey*",
      "kms:DescribeKey"
    ]
    resources = ["*"]

    principals {
      type = "AWS"
      identifiers = [
        "arn:aws:iam::${local.account_id}:user/${local.admin_username}",
        "arn:aws:iam::${local.account_id}:role/${local.role_name}"
      ]
    }
  }

  statement {
    sid    = "Allow attachment of persistent resources"
    effect = "Allow"
    actions = [
      "kms:CreateGrant",
      "kms:ListGrants",
      "kms:RevokeGrant"
    ]
    resources = ["*"]

    principals {
      type = "AWS"
      identifiers = [
        "arn:aws:iam::${local.account_id}:user/${local.admin_username}",
        "arn:aws:iam::${local.account_id}:role/${local.role_name}"
      ]
    }

    condition {
      test     = "Bool"
      variable = "kms:GrantIsForAWSResource"
      values   = ["true"]
    }
  }
}

module "ebs_key" {
  source = "git@github.com:masterwali/terraform-kms-multi-region-module.git"

  description        = "KMS key for EBS volumes."
  alias              = "multi-region-ebs"
  primary_key_policy = data.aws_iam_policy_document.ebs_key.json
  replica_key_policy = data.aws_iam_policy_document.ebs_key.json
  replica_region     = "us-west-2"

  tags = {
    Name  = "multi-region-ebs"
    Owner = "Waleed"
  }
}

Here’s my applied code. I set the EBS default encryption to use the multi-region-ebs key that I created using the module. Notice Multi-Region key ID’s start with “mrk” for Multi-Region Key.

Terraform multi-region kMS key example
A volume resource using a multi-region KMS key.

Regions supported

Multi-Region keys are supported in all AWS Regions where AWS KMS is available.

Cost

Every pair, primary and replica, is priced as a single key! But the KMS quotas are still counted separately.

Complete Code

Here’s what you came for, https://github.com/masterwali/terraform-kms-multi-region-module. Learn more about AWS KMS Multi-Region Keys.

Don’t forget to subscribe for more 🙂

multi colored folders piled up

Export AWS Security Groups & rules to CSV

As of this month, October 2021, there’s a super easy way to export AWS security groups & rules to CSV! Yes, finally, go ahead jump up and down! Let’s settle down now, but seriously we have been waiting for a way to export one or more security groups to CSV and export just the AWS security group rules to CSV. This solution is just a click away!

Export all security groups to CSV

  1. Login to the AWS console, navigate to the EC2 service
  2. Select Security Groups
  3. Select the top most square checkbox
  4. You’ll see a big dropdown button that says, “Export Security Groups to CSV“, simply click it!
  5. Done, your CSV will automatically download.
Export AWS security groups to CSV
Exports all selected security groups to CSV
All AWS security groups export in spreadsheet format
All AWS security groups export in spreadsheet format

Export one security group to CSV

  1. Login to the AWS console, navigate to the EC2 service
  2. Select Security Groups
  3. Select the square checkbox just for one or more security groups
  4. You’ll see a big dropdown button that says, “Export Security Groups to CSV“, simply click it!
  5. Done, your CSV will automatically download.
A single AWS security groups export in spreadsheet format
A single AWS security groups export in CSV format

Export all inbound and outbound rules only to CSV

  1. Login to the AWS console, navigate to the EC2 service
  2. Select Security Groups
  3. Select the square checkbox just for one or more security groups
  4. You’ll see a big dropdown button that says, “Export Security Groups to CSV“, this time click on the triangle to view more options. This time select “Export security groups inbound/outbound rules to CSV“.
  5. Done, your CSV will automatically download.
Exporting inbound/outbound rules
Exporting inbound/outbound rules
Inbound/outbound rules in CSV format
Inbound/outbound rules in CSV format

Pro Tip The first two options does include the Tags in the export!

Interested in AWS infrastructure automation? Checkout my previous blogs on Terraform and Terragrunt! Subscribe below to learn more.

temple illustration

Get started with EC2 Image Builder in Terraform

I can safely assume a lot of engineer’s know of HashCorp’s Packer utility already. Packer is simply an automated virtual machine image template maker, it can create images for all the major cloud providers. It can build Amazon Machine Images (AMI) in AWS or Azure’s Virtual Machine Image. Not too long ago, AWS released their version of automated image builder, called EC2 Image Builder! On this get started with EC2 Image Builder in Terraform I will be showing you how to quickly put together your Terraform code to create an a simple AMI.

Tools

You will need Terraform and if you are deploying this exact code then you’ll need Terragrunt too. Here’s a Setup infrastructure as code environment instructions and if you’re new to Terragrunt then checkout Intro to Terragrunt and Terraform post. I also suggest installing pre-commit too.

EC2 Image Builder Cost

This service doesn’t cost anything but the various resources created by this service could cost you. For example, you have to select your EC2 instance type to run for the during of the AMI creation. It will terminate the EC2 instance once the job is completed. Also, as you know AMI’s have EC2 snapshots, hint cost of storage. You get the point, let’s continue!

Permissions

You will need full permissions on the EC2 Image Builder service.

"imagebuilder:*"

Now the EC2 Image Builder IAM role will need at least the block below. Here I’m creating the policy and role with Terraform. You may need more less, adjust accordingly!

data "aws_iam_policy_document" "image_builder" {
  statement {
    effect = "Allow"
    actions = [
      "ssm:DescribeAssociation",
      "ssm:GetDeployablePatchSnapshotForInstance",
      "ssm:GetDocument",
      "ssm:DescribeDocument",
      "ssm:GetManifest",
      "ssm:GetParameter",
      "ssm:GetParameters",
      "ssm:ListAssociations",
      "ssm:ListInstanceAssociations",
      "ssm:PutInventory",
      "ssm:PutComplianceItems",
      "ssm:PutConfigurePackageResult",
      "ssm:UpdateAssociationStatus",
      "ssm:UpdateInstanceAssociationStatus",
      "ssm:UpdateInstanceInformation",
      "ssmmessages:CreateControlChannel",
      "ssmmessages:CreateDataChannel",
      "ssmmessages:OpenControlChannel",
      "ssmmessages:OpenDataChannel",
      "ec2messages:AcknowledgeMessage",
      "ec2messages:DeleteMessage",
      "ec2messages:FailMessage",
      "ec2messages:GetEndpoint",
      "ec2messages:GetMessages",
      "ec2messages:SendReply",
      "imagebuilder:GetComponent",

    ]
    resources = ["*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "s3:List",
      "s3:GetObject"
    ]
    resources = ["*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "s3:PutObject"
    ]
    resources = ["arn:aws:s3:::${var.aws_s3_log_bucket}/image-builder/*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "logs:CreateLogStream",
      "logs:CreateLogGroup",
      "logs:PutLogEvents"
    ]
    resources = ["arn:aws:logs:*:*:log-group:/aws/imagebuilder/*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "kms:Decrypt"
    ]
    resources = ["*"]
    condition {
      test     = "ForAnyValue:StringEquals"
      variable = "kms:EncryptionContextKeys"

      values = [
        "aws:imagebuilder:arn"
      ]
    }

    condition {
      test     = "ForAnyValue:StringEquals"
      variable = "aws:CalledVia"

      values = [
        "imagebuilder.amazonaws.com"
      ]
    }
  }
}

EC2 Image Builder features

It has automated pipelines! You can set it to build your AMI on a schedule or on-demand. It has a package installer and security components. You can share the AMI across multiple AWS accounts. You can read more about its features at EC2 Image Builder Features.

EC2 Image Builder Pipeline

I’m setting this pipeline to run every Tuesday morning at 8 am. This pipeline will trigger on that schedule and if there are any updates available. I have enabled testing of the image and setting a timeout of 60 minutes.

resource "aws_imagebuilder_image_pipeline" "this" {
  image_recipe_arn                 = aws_imagebuilder_image_recipe.this.arn
  infrastructure_configuration_arn = aws_imagebuilder_infrastructure_configuration.this.arn
  name                             = "amazon-linux-baseline"
  status                           = "ENABLED"
  description                      = "Creates an Amazon Linux 2 image."

  schedule {
    schedule_expression = "cron(0 8 ? * tue)"
    # This cron expressions states every Tuesday at 8 AM.
    pipeline_execution_start_condition = "EXPRESSION_MATCH_AND_DEPENDENCY_UPDATES_AVAILABLE"
  }

  # Test the image after build
  image_tests_configuration {
    image_tests_enabled = true
    timeout_minutes     = 60
  }

  tags = {
    "Name" = "${var.ami_name_tag}-pipeline"
  }
}

EC2 Image Builder Recipe

In the Image Recipe, I’m defining the AMI’s volume size and type, and the components. For this simple example, I’m only installing the CloudWatch agent to the AMI.

resource "aws_imagebuilder_image" "this" {
  distribution_configuration_arn   = aws_imagebuilder_distribution_configuration.this.arn
  image_recipe_arn                 = aws_imagebuilder_image_recipe.this.arn
  infrastructure_configuration_arn = aws_imagebuilder_infrastructure_configuration.this.arn

  depends_on = [
    data.aws_iam_policy_document.image_builder
  ]
}

resource "aws_imagebuilder_image_recipe" "this" {
  block_device_mapping {
    device_name = "/dev/xvdb"

    ebs {
      delete_on_termination = true
      volume_size           = var.ebs_root_vol_size
      volume_type           = "gp3"
    }
  }

  component {
    component_arn = aws_imagebuilder_component.cw_agent.arn
  }

  name         = "amazon-linux-recipe"
  parent_image = "arn:${data.aws_partition.current.partition}:imagebuilder:${data.aws_region.current.name}:aws:image/amazon-linux-2-x86/x.x.x"
  version      = var.image_receipe_version
}

resource "aws_s3_bucket_object" "cw_agent_upload" {
  bucket = var.aws_s3_bucket_object
  key    = "/files/amazon-cloudwatch-agent-linux.yml"
  source = "${path.module}/files/amazon-cloudwatch-agent-linux.yml"
  # If the md5 hash is different it will re-upload
  etag = filemd5("${path.module}/files/amazon-cloudwatch-agent-linux.yml")
}

data "aws_kms_key" "image_builder" {
  key_id = "alias/image-builder"
}

# Amazon Cloudwatch agent component
resource "aws_imagebuilder_component" "cw_agent" {
  name       = "amazon-cloudwatch-agent-linux"
  platform   = "Linux"
  uri        = "s3://${var.aws_s3_bucket_object}/files/amazon-cloudwatch-agent-linux.yml"
  version    = "1.0.0"
  kms_key_id = data.aws_kms_key.image_builder.arn

  depends_on = [
    aws_s3_bucket_object.cw_agent_upload
  ]
}

EC2 Image Builder Infrastructure Configuration

Select the EC2 instance type, the IAM role, security group, subnet, logging bucket, and much more in the infrastructure configuration resource.

resource "aws_imagebuilder_infrastructure_configuration" "this" {
  description           = "Simple infrastructure configuration"
  instance_profile_name = var.ec2_iam_role_name
  instance_types        = ["t2.micro"]
  key_pair              = var.aws_key_pair_name
  name                  = "amazon-linux-infr"
  security_group_ids    = [data.aws_security_group.this.id]

  subnet_id                     = data.aws_subnet.this.id
  terminate_instance_on_failure = true

  logging {
    s3_logs {
      s3_bucket_name = var.aws_s3_log_bucket
      s3_key_prefix  = "image-builder"
    }
  }

  tags = {
    Name = "amazon-linux-infr"
  }
}

EC2 Image Builder Distribution Configuration

Here you can choose to share this AMI with other accounts or it’s just for this account. You can tag the AMI in this resource too.

resource "aws_imagebuilder_distribution_configuration" "this" {
  name = "local-distribution"

  distribution {
    ami_distribution_configuration {
     ami_tags = {
        Project = "IT"
      }

      name = "amzn-linux-{{ imagebuilder:buildDate }}"

      launch_permission {
        user_ids = ["123456789012"]
      }
    }
    region = var.aws_region
  }
}

Tests the AMI

Since I have enabled testing of the AMI EC2 Image Builder, it will create an instance from the AMI automatically.

build and test instances
The build and test instances
The AMI

Complete Code

A lot of other files and code aren’t shown here. You can find a completed working example at https://github.com/masterwali/ec2-image-builder

Get notified when new posts are published, sign up below!

code coding computer cyberspace

AWS Three-Tier VPC with ALB in Terraform

This AWS Three-Tier VPC with ALB in Terraform is the second part of AWS Three-Tier VPC network with Terraform. In the first post I had created many of the VPC components; such as the VPC, app subnets, web subnets, data subnets, route tables for each subnet, internet and NAT gateways, NACLs for each subnet, and a generic security group. In this post I’ll reveal the Terraform code for creating an Elastic Load Balancer, specifically the Application Load Balancer (ALB). The ALB will require a listener, so I’ll add that too. For simplicity purposes the listener will monitor a non-secure port 80. Remember to use certificates and port 443 in production!

Cost

Charges may occur on various resources created using this module. The load balancers have a low hourly price set.

The design

Here’s the original diagram from the previous post.

three-tier vpc diagram

ALB (Application Load Balancer)

The ALB diagram. The ALB does a health check on port 80 on every instance in the target group. Whichever instance is healthy it will use that. In this drawing there’s only one web server; your production should have at least two that are in different availability zones.

alb diagram
The ALB

ALB Terraform Code

# ALB for the web servers
resource "aws_lb" "web_servers" {
  name               = format("%s-alb", var.vpc_name)
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.web.id]
  subnets            = aws_subnet.public.*.id
  enable_http2       = false
  enable_deletion_protection = true

  tags = {
    Name = format("%s-alb", var.vpc_name)
  }
}

This is an example of an external ALB. Note the “internal = false” statement. The load balancer type is what makes this an application load balancer. The other options are network or gateway. Since this is an ALB, it does require a security group. This external load balancer will be required to be in public subnets so outside users can reach the web servers which are hosted in private subnets. That’s one of the design features that makes this a three-tier VPC! See the Terraform documentation on more options.

Target Groups

Let’s make the ALB useful by creating a target group and a load balancer listener.

# Target group for the web servers
resource "aws_lb_target_group" "web_servers" {
  name     = "sharepoint-web-servers-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = aws_vpc.this.id
}

resource "aws_lb_listener" "front_end" {
  load_balancer_arn = aws_lb.web_servers.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.web_servers.arn
  }
}

An empty target group is not useful either. From now on when you launch a new EC2 web server be sure to add it to the target group like this.

# Find the target group
data "aws_lb_target_group" "web_servers" {
  name = "sharepoint-web-servers-tg"
}

# Attach an EC2 instance to the target group on port 80
resource "aws_lb_target_group_attachment" "web" {
  target_group_arn = aws_lb_target_group.arn
  target_id        = aws_instance.web.id
  port             = 80
}
target group registered targets
EC2 added to a target group

Notice the URL is using the ALB’s DNS record to reach the Nginx web server.

website
Nginx website using the ALB URL

Here’s the complete terraform module code: https://github.com/masterwali/tf-module-aws-three-tier-network-vpc

Here’s an example on how to call or reference the module.

module "sharepoint_network" {
  source = "git@github.com:masterwali/tf-module-aws-three-tier-network-vpc.git"
  
  aws_cli_profile = var.aws_cli_profile
  additional_tags = var.additional_tags
}

Subscribe to the newsletter for more tutorials!

Full Disclosure: I am an AWS employee, this post is opinion of my own.

web text

AWS Three-Tier VPC network with Terraform

A three-tier network is an enterprise architecture to deliver the best performance and security to the end-users. Each component of the design is separated into tiers. Reminder, a typical three-tier network consists of a website then the application then the database from an end-user perspective. Not every website automatically works like that. The developers and engineers have to build the web application by separating the user interface from the logic and data. An AWS three-tier VPC network is not too difficult to build in the cloud, either. However, in this post, I’ll be using Terraform and Terragrunt to build and deploy an AWS three-tier VPC network using, of course, VPC, subnets, route tables, network access control lists (NACLs), and few other VPC parts. Next, I’ll share with you how to create the AWS application load balancer (ALB) and the target groups with health checks in another post.

The design

This will be a simple start and in future posts, I’ll add more details. This AWS three-tier VPC network module will create a VPC, subnets, Network Access Control Lists (NACLs), Internet Gateway, NAT Gateways, route tables, Elastic IPs, and few other resources using Terraform and I’ll deploy it with Terragrunt.

Notice there are two NAT gateways, this provides high availability and fault tolerance. This means if the NAT Gateway in the availability zone (AZ) A fails or gets corrupted then the EC2 instances in AZ B will still be able to function as expected. It’s a little bit more costly… it all depends on your requirements.

The Terraform Module

This module will be generic so I can reuse the three-tier VPC network over and over again. By creating a module it makes my main code a lot less, therefore it will be a lot cleaner to view and understand.

What’s Terraform and Terragrunt? Well visit my Intro to Terragrunt and Terraform post first then come back here! You can name your module anything you like… I named my three-tier-vpc. Reminder in Terraform we can use one or more .tf files to build a module. I’ll be separating this module into few different Terraform files just for organizational purposes. Here’s my structure.

aws/tf-modules/three-tier-vpc
               ├── README.md
               ├── gateways.tf
               ├── main.tf
               ├── nacls.tf
               ├── outputs.tf
               ├── routes.tf
               ├── sec-grps.tf
               └── vars.tf

AWS VPC

Let’s start with the main.tf file which contains the VPC resource. The VPC CIDR will be a variable so we can plugin any CIDR during deployment.

# Create the VPC
resource "aws_vpc" "this" {
  cidr_block           = var.vpc_cidr
  instance_tenancy     = "default"
  enable_dns_support   = true
  enable_dns_hostnames = true

  tags = merge(
    var.additional_tags,
    {
      Name = "${var.vpc_name}-vpc"
    }
  )
}

In my module I’m giving the vpc_cidr a default value but you’re not required to do so. This default won’t hurt because if this network range is already taken then terraform apply will fail cleanly.

# vars.tf
variable "vpc_cidr" {
  type    = string
  default = "10.0.0.0/16"
}

The public subnet CIDR blocks will be variable too. Never hardcode values in modules that could or should be changed. The public_subnet_cidrs is a list of one or many CIDR blocks. If we provide one string value in the block then it will create one public subnet. If we provide four CIDR blocks then it will create four, how sweet is that!

# Create the public subnets
resource "aws_subnet" "public" {
  count = length(var.public_subnet_cidrs)

  vpc_id                  = aws_vpc.this.id
  cidr_block              = var.public_subnet_cidrs[count.index]
  availability_zone       = "${var.aws_region}${var.zones[count.index]}"
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.vpc_name}-public-subnet-${var.zones[count.index]}"
  }
}

The map_public_ip_on_launch attribute set to true is one of the configurations that makes this a public subnet. Just because we named it public doesn’t make it public. Next are the private subnets. Note the map_public_ip_on_launch is now set to false.

# Create the web subnets
resource "aws_subnet" "web" {
  count = length(var.web_subnet_cidrs)

  vpc_id                  = aws_vpc.this.id
  cidr_block              = var.web_subnet_cidrs[count.index]
  availability_zone       = "${var.aws_region}${var.zones[count.index]}"
  map_public_ip_on_launch = false

  tags = {
    Name = "${var.vpc_name}-web-subnet-${var.zones[count.index]}"
  }
}

The rest of the main.tf file contains the resources for the data and the app subnets, these subnets are non-public too.

VPC Gateways

In this three-tier VPC network architecture, we have only a few public subnets (meaning the instances launched in the public subnets will get Public IPs; which makes them routable without a NAT gateway). The public subnets will need an internet gateway to complete the design. The private or non-public subnets will need a NAT gateway to allow instances with just private IPs to communicate to the internet (AKA outbound internet access). NAT gateways need 1) a public IP, 2) added to your public subnet for each availability zone you are using!

# Add internet gateway
resource "aws_internet_gateway" "this" {
  vpc_id = aws_vpc.this.id

  tags = {
    Name = "${var.vpc_name}-internet-gateway"
  }
}

# Charges may occur

# Reserve EIPs
resource "aws_eip" "nat_a" {
  vpc = true

  tags = {
    Name = "${var.vpc_name}-eip-nat-a"
  }

}

# NAT Gateway in AZ A
resource "aws_nat_gateway" "zone_a" {
  allocation_id = aws_eip.nat_a.id
  subnet_id     = aws_subnet.public[0].id

  tags = {
    Name = "${var.vpc_name}-nat-gateway-aza"
  }

  depends_on = [
    aws_subnet.public
  ]
}

# Reserve EIPs
resource "aws_eip" "nat_b" {
  vpc = true

  tags = {
    Name = "${var.vpc_name}-eip-nat-b"
  }

}

# NAT Gateway in AZ B
resource "aws_nat_gateway" "zone_b" {
  allocation_id = aws_eip.nat_b.id
  subnet_id     = aws_subnet.public[1].id

  tags = {
    Name = "${var.vpc_name}-nat-gateway-azb"
  }

  depends_on = [
    aws_subnet.public
  ]
}

Note one of the comments in the code, EIP’s and the gateways may cost you for the duration of their existence.

VPC Routes

Every time a VPC is created the main route table is automatically provisioned, too. I’m going to tag it and create my own route tables for this module. Do note I do have some hardcoded values in this module. I have decided this will be for two AZ’s and no more so I hardcoded the index value of zero and one in the resources below.

# Tag the main route table
resource "aws_ec2_tag" "main_route_table" {
  resource_id = aws_vpc.this.main_route_table_id
  key         = "Name"
  value       = "${var.vpc_name}-main-route-table"
}

# Create route table for the public subnets
# Uses IG
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.this.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.this.id
  }

  tags = {
    Name = "${var.vpc_name}-public-route-table"
  }

  depends_on = [
    aws_internet_gateway.this
  ]
}

# Associate the public subnets with the public route table
resource "aws_route_table_association" "public" {
  count = length(var.public_subnet_cidrs)

  subnet_id      = element(aws_subnet.public.*.id, count.index)
  route_table_id = aws_route_table.public.id
}

# Create a route table for the web and app subnets in AZ A
# Uses NAT gateway in AZ A
resource "aws_route_table" "private_aza" {
  vpc_id = aws_vpc.this.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.zone_a.id
  }

  tags = {
    Name = "${var.vpc_name}-private-route-table-aza"
  }

  depends_on = [
    aws_nat_gateway.zone_a
  ]
}

# Create a route table for the web and app subnets in AZ B
# Uses NAT gateway in AZ B
resource "aws_route_table" "private_azb" {
  vpc_id = aws_vpc.this.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.zone_b.id
  }

  tags = {
    Name = "${var.vpc_name}-private-route-table-azb"
  }

  depends_on = [
    aws_nat_gateway.zone_b
  ]
}

resource "aws_route_table" "data" {
  vpc_id = aws_vpc.this.id

  tags = {
    Name = "${var.vpc_name}-data-route-table"
  }
}

# Associate these subnets with the private route tables accordingly 
resource "aws_route_table_association" "web_aza" {
  subnet_id      = aws_subnet.web[0].id
  route_table_id = aws_route_table.private_aza.id
}

resource "aws_route_table_association" "app_aza" {
  subnet_id      = aws_subnet.app[0].id
  route_table_id = aws_route_table.private_aza.id
}

resource "aws_route_table_association" "web_azb" {
  subnet_id      = aws_subnet.web[1].id
  route_table_id = aws_route_table.private_azb.id
}

resource "aws_route_table_association" "app_azb" {
  subnet_id      = aws_subnet.app[1].id
  route_table_id = aws_route_table.private_azb.id
}

resource "aws_route_table_association" "data" {
  count = length(var.data_subnet_cidrs)

  subnet_id      = element(aws_subnet.data.*.id, count.index)
  route_table_id = aws_route_table.data.id
}

The routes are another set of configurations that differentiates public and internal sub-network. In the public subnet route table, all non-local traffic will be sent to the internet gateway. A private route table sends its non-local traffic to the NAT gateway which then routes it to the internet gateway and back.

VPC NACLs

Now it’s time to control which traffic is allowed or denied to this network. You may need to modify these rules to meet your requirements. For most web application projects, the public NACLs will most likely look like this.

# Public NACLS
resource "aws_network_acl" "public" {
  vpc_id     = aws_vpc.this.id
  subnet_ids = [aws_subnet.public[0].id, aws_subnet.public[1].id]

  # Ingress rules
  # Allow all local traffic
  ingress {
    protocol   = -1
    rule_no    = 100
    action     = "allow"
    cidr_block = aws_vpc.this.cidr_block
    from_port  = 0
    to_port    = 0
  }

  # Allow HTTPS traffic from the internet
  ingress {
    protocol   = "6"
    rule_no    = 105
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 443
    to_port    = 443
  }

  # Allow HTTP traffic from the internet
  ingress {
    protocol   = "6"
    rule_no    = 110
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 80
    to_port    = 80
  }

  # Allow the ephemeral ports from the internet
  ingress {
    protocol   = "6"
    rule_no    = 120
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 1025
    to_port    = 65534
  }

  ingress {
    protocol   = "17"
    rule_no    = 125
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 1025
    to_port    = 65534
  }

  # Egress rules
  # Allow all ports, protocols, and IPs outbound
  egress {
    protocol   = -1
    rule_no    = 100
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 0
    to_port    = 0
  }

  tags = {
    Name = "${var.vpc_name}-public-nacl"
  }

  depends_on = [aws_subnet.public]
}

The web subnet NACLs for this module.

resource "aws_network_acl" "web" {
  vpc_id     = aws_vpc.this.id
  subnet_ids = [aws_subnet.web[0].id, aws_subnet.web[1].id]

  # Ingress rules
  # Allow all local traffic
  ingress {
    protocol   = -1
    rule_no    = 100
    action     = "allow"
    cidr_block = aws_vpc.this.cidr_block
    from_port  = 0
    to_port    = 0
  }

  # Allow HTTP web traffic from anywhere
  ingress {
    protocol   = 6
    rule_no    = 105
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 80
    to_port    = 80
  }

  # Allow HTTPS web traffic from anywhere
  ingress {
    protocol   = 6
    rule_no    = 110
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 443
    to_port    = 443
  }

  # Allow the ephemeral ports from the internet
  ingress {
    protocol   = "6"
    rule_no    = 120
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 1025
    to_port    = 65534
  }

  ingress {
    protocol   = "17"
    rule_no    = 125
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 1025
    to_port    = 65534
  }

  # Egress rules
  # Allow all ports, protocols, and IPs outbound
  egress {
    protocol   = -1
    rule_no    = 100
    action     = "allow"
    cidr_block = "0.0.0.0/0"
    from_port  = 0
    to_port    = 0
  }

  tags = {
    Name = "${var.vpc_name}-web-nacl"
  }
}

The App and Data subnet NACLs have been set to allow only local traffic. You may adjust these rules.

VPC Security Group

A default security group is created every time a new VPC is provisioned. Here I’ll just give it some tags and few generic rules.

# Modify the default security group
resource "aws_default_security_group" "this" {
  vpc_id = aws_vpc.this.id

  dynamic "ingress" {
    for_each = var.default_security_group_ingress
    content {
      self        = lookup(ingress.value, "self", null)
      cidr_blocks = compact(split(",", lookup(ingress.value, "cidr_blocks", "")))
      description = lookup(ingress.value, "description", null)
      from_port   = lookup(ingress.value, "from_port", 0)
      to_port     = lookup(ingress.value, "to_port", 0)
      protocol    = lookup(ingress.value, "protocol", "-1")
    }
  }

  dynamic "egress" {
    for_each = var.default_security_group_egress
    content {
      self        = lookup(egress.value, "self", null)
      cidr_blocks = compact(split(",", lookup(egress.value, "cidr_blocks", "")))
      description = lookup(egress.value, "description", null)
      from_port   = lookup(egress.value, "from_port", 0)
      to_port     = lookup(egress.value, "to_port", 0)
      protocol    = lookup(egress.value, "protocol", "-1")
    }
  }

  tags = merge(
    {
      Name = format("%s-default-security-group", var.vpc_name)
    },
    var.additional_tags
  )
}

Now the values for this security group are passed as a variable like so. Be sure to change the ports and protocols to meet your needs.

variable "default_security_group_ingress" {
  description = "List of maps of ingress rules to set on the default security group"
  type        = list(map(string))
  default = [
    {
      cidr_blocks = "10.0.0.0/16"
      description = "Allow all from the local network."
      from_port   = 0
      protocol    = "-1"
      self        = false
      to_port     = 0
    },
    {
      cidr_blocks = "0.0.0.0/0"
      description = "Allow all HTTPS from the internet."
      from_port   = 443
      protocol    = "6"
      self        = false
      to_port     = 443
    },
    {
      cidr_blocks = "0.0.0.0/0"
      description = "Allow all HTTP from the internet."
      from_port   = 80
      protocol    = "6"
      self        = false
      to_port     = 80
    },
    {
      cidr_blocks = "0.0.0.0/0"
      description = "Allow all ephemeral ports from the internet."
      from_port   = 32768
      protocol    = "6"
      self        = false
      to_port     = 60999
    }
  ]
}

variable "default_security_group_egress" {
  description = "List of maps of egress rules to set on the default security group"
  type        = list(map(string))
  default = [
    {
      cidr_blocks = "0.0.0.0/0"
      description = "Allow all"
      from_port   = 0
      protocol    = "-1"
      self        = false
      to_port     = 0
    }
  ]
}

Here’s a link to the complete code so far in the dev branch. That’s it all for now! In the next post, we’ll add application load balancers, target groups, listeners, etc. For a more in-depth explanation of VPC resources checkout AWS technical documentation. Be sure to subscribe for more content like this!

AWS Service Control Policies with Terraform

AWS Organizations

A cloud service designed to centralize & manage AWS accounts and to roll up billing from multiple AWS accounts into a single account. May be referred to as the “master” account because it can manage permissions of all its accounts that are “attached” to it. “Billing” is another name for this account because it’s the account that gets the invoice or the monthly charge. You can select any commercial AWS account to play this role. The account becomes this master account as soon as you join the master’s accounts organization. Next comes AWS’s Service Control Policies, this feature allows permission management for all your AWS accounts in your organization. In this post, I’ll share with you how to implement AWS’s service control policies with Terraform! Let’s break it down.

How to join an AWS Organization

Check out my previous post on the details.

AWS Organization Units

This has nothing to do with Microsoft’s Active Directory (AD) organization units. There’s no integration between MS AD and this, either. AWS’s organization units are a way of grouping AWS accounts so we can manage accounts permissions in groups or in a hierarchical format. There are dozens of ways to create this hierarchy and it all depends on your objective. You can group them by projects, departments, missions, environments, classifications, etc.

Simple OU structure by environments

our simple structure
AWS OU by environments

Service Control Policies with Terraform

Service Control Policies (SCP) is a critical feature to learn and understand. Questions related to this feature is a topic on many, many AWS certifications. This feature is highly useful and it’s used a lot in the real world. It’s these policies that can allow or deny actions or services at a high level. What I mean by “high level” is outside of the AWS’s account. For example; let’s say you want to experiment with the most expensive EC2 instance type and let’s say you also have IAM permissions to allow those actions. Then you try to launch an EC2 with an instance type of p4d.24xlarge and bam you get the encoded authorized failure message! How?! You have given yourself full permissions! Bingo… it’s the service control policies. Even with a full administration policy on the account, you can still be denied by SCP.

It’s best practice to enable SCP and create the OUs, and policies and attach the policies even before building your systems! For information on SCP checkout AWS’s documents.

Show me the Terraform code!

I personally like to break down the main.tf terraform file into separate manageable terraform files. Here’s my file structure. Notice there’s only one environment aka account “master” here. Yes, I also use terragrunt.

org/
├── README.md
├── dev-ou.tf
├── main.tf
├── master
│   ├── inputs.yml
│   ├── terragrunt.hcl
│   └── vars.tf
├── prod-ou.tf
├── root-ou.tf
├── staging-ou.tf
└── terragrunt.hcl

main.tf this file usually contains common code. Enable SCP by adding the “SERVICE_CONTROL_POLICY” to the enabled_policy_types array.

provider "aws" {
  region  = var.aws_region
  profile = var.aws_cli_profile
}

 terraform {
   backend "s3" {}
 }

# Provides a resource to create an AWS organization.
resource "aws_organizations_organization" "this" {

  # List of AWS service principal names for which 
  # you want to enable integration with your organization
  aws_service_access_principals = [
    "cloudtrail.amazonaws.com",
    "config.amazonaws.com",
  ]

  feature_set = "ALL"

  enabled_policy_types = [
    "TAG_POLICY",
    "SERVICE_CONTROL_POLICY"
  ]
}

my root-out.tf contains the master account code and all the service control policies that I want to be applied to all accounts. Notice it’s attached to the root OU which then it’s inherited by all the accounts below the root OU!

resource "aws_organizations_account" "master" {
  # A friendly name for the member account
  name  = "my-master"
  email = "mymaster@email.com"

  # Enables IAM users to access account billing information 
  # if they have the required permissions
  # iam_user_access_to_billing = "ALLOW"

  tags = {
    Name  = "my-master"
    Owner = "Waleed"
    Role  = "billing"
  }

  parent_id = aws_organizations_organization.this.roots[0].id
}

# ---------------------------------------- # 
# Service Control Policies for all accounts
# ---------------------------------------- #

# ---------------------------- #
# REGION RESTRICTION 
# ---------------------------- #

data "aws_iam_policy_document" "restrict_regions" {
  statement {
    sid       = "RegionRestriction"
    effect    = "Deny"
    actions   = ["*"]
    resources = ["*"]

    condition {
      test     = "StringNotEquals"
      variable = "aws:RequestedRegion"

      values = [
        "us-east-1"
      ]
    }
  }
}

resource "aws_organizations_policy" "restrict_regions" {
  name        = "restrict_regions"
  description = "Deny all regions except US East 1."
  content     = data.aws_iam_policy_document.restrict_regions.json
}

resource "aws_organizations_policy_attachment" "restrict_regions_on_root" {
  policy_id = aws_organizations_policy.restrict_regions.id
  target_id = aws_organizations_organization.this.roots[0].id
}

# ---------------------------- #
# EC2 INSTANCE TYPE RESTRICTION 
# ---------------------------- #

data "aws_iam_policy_document" "restrict_ec2_types" {
  statement {
    sid       = "RestrictEc2Types"
    effect    = "Deny"
    actions   = ["ec2:RunInstances"]
    resources = ["arn:aws:ec2:*:*:instance/*"]

    condition {
      test     = "StringNotEquals"
      variable = "ec2:InstanceType"

      values = [
        "t3*",
        "t4g*",
        "a1.medium",
        "a1.large"
      ]
    }
  }
}

resource "aws_organizations_policy" "restrict_ec2_types" {
  name        = "restrict_ec2_types"
  description = "Allow certain EC2 instance types only."
  content     = data.aws_iam_policy_document.restrict_ec2_types.json
}

resource "aws_organizations_policy_attachment" "restrict_ec2_types_on_root" {
  policy_id = aws_organizations_policy.restrict_ec2_types.id
  target_id = aws_organizations_organization.this.roots[0].id
}

# ---------------------------- #
# REQUIRE EC2 TAGS 
# ---------------------------- #

data "aws_iam_policy_document" "require_ec2_tags" {
  statement {
    sid    = "RequireTag"
    effect = "Deny"
    actions = [
      "ec2:RunInstances",
      "ec2:CreateVolume"
    ]
    resources = [
      "arn:aws:ec2:*:*:instance/*",
      "arn:aws:ec2:*:*:volume/*"
    ]

    condition {
      test     = "Null"
      variable = "aws:RequestTag/Name"

      values = ["true"]
    }
  }
}

resource "aws_organizations_policy" "require_ec2_tags" {
  name        = "require_ec2_tags"
  description = "Name tag is required for EC2 instances and volumes."
  content     = data.aws_iam_policy_document.require_ec2_tags.json
}

resource "aws_organizations_policy_attachment" "require_ec2_tags_on_root" {
  policy_id = aws_organizations_policy.require_ec2_tags.id
  target_id = aws_organizations_organization.this.roots[0].id
}

Here’s an authorization failure message when I attempted to create an EC2 with an instance type that was not approved in the SCP defined above!

The encoded authorization failure message from the console.
Decoded message shows the deny statement from the SCP
Failed to create the volume because it now requires a Name tag in order to create a volume.
Name tag denied statement from SCP is shown once the failure message is decoded.

Account or OU specific SCP

What if you want to restrict certain actions or services on a single account. The code below shows how to block the internet in this environment. This says to create the prod account, create the SCP and only attach it directly to this OU; and this OU only has the prod account. To attach an SCP to an account only check the documentation.

prod-ou.tf

resource "aws_organizations_account" "prod" {
  # A friendly name for the member account
  name  = "my-prod"
  email = "my-prod@email.com"

  # Enables IAM users to access account billing information 
  # if they have the required permissions
  iam_user_access_to_billing = "ALLOW"

  tags = {
    Name  = "my-prod"
    Owner = "Waleed"
    Role  = "prod"
  }

  parent_id = aws_organizations_organizational_unit.prod.id
}

resource "aws_organizations_organizational_unit" "prod" {
  name      = "prod"
  parent_id = aws_organizations_organization.this.roots[0].id
}

# ------------------------------- #
# PREVENT INTERNET ACCESS TO A VPC 
# ------------------------------- #

data "aws_iam_policy_document" "block_internet" {
  statement {
    sid    = "BlockInternet"
    effect = "Deny"
    actions = [
      "ec2:AttachInternetGateway",
      "ec2:CreateInternetGateway",
      "ec2:CreateEgressOnlyInternetGateway",
      "ec2:CreateVpcPeeringConnection",
      "ec2:AcceptVpcPeeringConnection",
      "globalaccelerator:Create*",
      "globalaccelerator:Update*"
    ]
    resources = ["*"]

  }
}

resource "aws_organizations_policy" "block_internet" {
  name        = "block_internet"
  description = "Block internet access to the production network."
  content     = data.aws_iam_policy_document.block_internet.json
}

resource "aws_organizations_policy_attachment" "block_internet_on_prod" {
  policy_id = aws_organizations_policy.block_internet.id
  target_id = aws_organizations_organizational_unit.prod.id
}

I think you get the idea, go forth and implement governance with AWS’s organizations and Service Control Policies! Subscribe for more cloud code!