Enforcing Policies with Cloud Custodian and ClodWatch Events

Basic concepts and terms

Cloud Custodian works with the following basic concepts, terms, and relationships between them.

Policy

Defined in yaml, specifies a set of filters and actions to take on a given AWS resource type.

Resource

Provides for retrieval of a resource of a given type (typically via AWS API) and defines the vocabulary of filters and actions that can be used on those resources (e.g., ASG, S3, EC2, ELBs, etc).

Mode

Defines how the policy will execute (lambda, config rule, poll, etc).

1
2
3
4
mode:
type: cloudtrail
events:
  - RunInstances

Filters

Given a set of resources, how we filter to the subset that we’re interested in operating on. The filtering language has some default behaviors across resource types like value filtering with JMESPath expressions against the JSON representation of a resource, as well as specific filters for particular resources types (instance age, tag count, etc).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
filters:
- "tag:aws:autoscaling:groupName": absent
- type: ebs
    key: Encrypted
    value: false
    skip-devices:
    - "/dev/sda1"
    - "/dev/xvda"
- type: event
    key: "detail.userIdentity.sessionContext.sessionIssuer.userName"
    value: "SuperUser"
    op: ne

Actions

A verb to use on a given resource, i.e. stop, start, suspend, delete, encrypt, etc.

1
2
3
4
5
6
actions:
- type: tag
    key: c7n_status
    value: "Unencrypted EBS! Please recreate with Encryption"
- type: terminate
    force: true

Create Required AWS Resources

Before you move forward be sure to have an IAM user that can create resources on AWS. With this user you can login to AWS Console or use aws-cli in order to create the required resources. We will also use this IAM user on custodian development instance.

For OS specific installation instructions check out AWS CLI Installation document.

  • We will create a EC2 Key Pair
    • We will name this key EC2 Key Pair as custodian-key.
  • We will create a security group to allow ingress access for port 22
  • Finally we will be creating an EC2 instance named custodian01 as our Cloud Custodian development machine.

Create EC2 Key Pairs

Chef will need an EC2 key pair in order to login to EC2 instances that it will provison.

Switch to the folder where you keep your key pairs and create an EC2 Key pair named chef-dev01-key.

1
2
$ cd ~/Workdocs/keys # this is the folder where I keep key pairs
$ aws ec2 create-key-pair --key-name custodian-key --query 'KeyMaterial' --output text --region us-east-1 > custodian-key.pem

Now we have a .pem file.

1
2
$ ls custodian*.pem
custodian-key.pem  # we use this to login to `custodian01`

Change permission on these two .pem files.

1
$ chmod 600 custodian-key.pem

Create Security Group

To create a security group, we need to pick the VPC.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
$ aws ec2 describe-vpcs --region us-east-1
{
    "Vpcs": [
        {
            "VpcId": "vpc-6fe5000b",
            "InstanceTenancy": "default",
            "Tags": [
                {
                    "Value": "Default",
                    "Key": "Name"
                }
            ],
            "State": "available",
            "DhcpOptionsId": "dopt-928173f7",
            "CidrBlock": "172.31.0.0/16",
            "IsDefault": true
        }
    ]
}

I chose vpc-6fe5000b which is the default VPC. You can choose any VPC.

Now, we can create the security group.

1
2
3
4
$ aws ec2 create-security-group --group-name custodian-sg --description "Cloud Custodian security group" --vpc-id vpc-6fe5000b --region us-east-1
{
    "GroupId": "sg-987feee4"
}

Note the GroupId value as we are using it in the following commands to add ingress rules to the security group.

1
$ aws ec2 authorize-security-group-ingress --group-id sg-987feee4 --protocol tcp --port 22 --cidr 0.0.0.0/0 --region us-east-1

Let’s describe the security group to check the rules we’ve added.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
$ aws ec2 describe-security-groups --group-names custodian-sg --region us-east-1
{
    "SecurityGroups": [
        {
            "IpPermissionsEgress": [
                {
                    "IpProtocol": "-1",
                    "PrefixListIds": [],
                    "IpRanges": [
                        {
                            "CidrIp": "0.0.0.0/0"
                        }
                    ],
                    "UserIdGroupPairs": [],
                    "Ipv6Ranges": []
                }
            ],
            "Description": "Cloud Custodian security group",
            "IpPermissions": [
                {
                    "PrefixListIds": [],
                    "FromPort": 22,
                    "IpRanges": [
                        {
                            "CidrIp": "0.0.0.0/0"
                        }
                    ],
                    "ToPort": 22,
                    "IpProtocol": "tcp",
                    "UserIdGroupPairs": [],
                    "Ipv6Ranges": []
                }
            ],
            "GroupName": "custodian-sg",
            "VpcId": "vpc-6fe5000b",
            "OwnerId": "676452272092",
            "GroupId": "sg-987feee4"
        }
    ]
}

Create Custodian Development Instance

List the subnet IDs available in your vpc (vpc-6fe5000b is the one I chose earlier).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-6fe5000b" --region us-east-1
{
    "Subnets": [
        {
            "VpcId": "vpc-6fe5000b",
            "AvailableIpAddressCount": 4079,
            "MapPublicIpOnLaunch": true,
            "DefaultForAz": true,
            "Ipv6CidrBlockAssociationSet": [],
            "State": "available",
            "AvailabilityZone": "us-east-1c",
            "SubnetId": "subnet-da8afdf1",
            "CidrBlock": "172.31.48.0/20",
            "AssignIpv6AddressOnCreation": false
        },
        {
            "VpcId": "vpc-6fe5000b",
            "AvailableIpAddressCount": 4075,
            "MapPublicIpOnLaunch": true,
            "DefaultForAz": true,
            "Ipv6CidrBlockAssociationSet": [],
            "State": "available",
            "AvailabilityZone": "us-east-1a",
            "SubnetId": "subnet-897032d0",
            "CidrBlock": "172.31.16.0/20",
            "AssignIpv6AddressOnCreation": false
        },
        ....
        ....

I chose subnet-897032d0 from the list.

I am creating custodian01 instance in subnet subnet-897032d0 and using security group sg-987feee4.

1
2
$ aws ec2 run-instances --image-id ami-0b33d91d --count 1 --instance-type t2.micro --key-name custodian-key --security-group-ids sg-987feee4 --subnet-id subnet-897032d0 --query 'Instances[0].{ID:InstanceId}'  --output text --region us-east-1
i-08d7b9609a8c1f12d

Add a name tag to the instance.

1
$ aws ec2 create-tags --resources i-08d7b9609a8c1f12d --tags Key=Name,Value=custodian01

Find public dns name for custodian01

1
2
$ aws ec2 describe-instances --instance-ids i-08d7b9609a8c1f12d --query 'Reservations[0].Instances[0].PublicDnsName' --output text --region us-east-1
ec2-52-90-80-237.compute-1.amazonaws.com

Copy custodian-key.pem from your computer to Custodian development instance. You can use scp or just copy paste the chef-user.pem file.

1
2
# on your laptop, in the folder where you have `pem` file.
$ scp -i custodian-key.pem custodian-key.pem ec2-user@ec2-52-55-208-165.compute-1.amazonaws.com:~/.ssh/custodian-key.pem
1
2
3
4
5
The authenticity of host 'ec2-52-55-208-165.compute-1.amazonaws.com (52.55.208.165)' can't be established.
ECDSA key fingerprint is SHA256:4gyFwnuRi68ZghCg3G8Wxtv9/2UlPR7cywHcogfaTYk.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-52-55-208-165.compute-1.amazonaws.com,52.55.208.165' (ECDSA) to the list of known hosts.
custodian-key.pem                                                                                                                    100% 1671    28.2KB/s   00:00

Login to the custodian01 instance using public DNS name.

1
2
3
4
5
6
7
$ ssh ec2-user@ec2-52-55-208-165.compute-1.amazonaws.com -i ~/WorkDocs/keys/custodian-key.pem

    __|  __|_  )
    _|  (     /   Amazon Linux AMI
    ___|\___|___|

https://aws.amazon.com/amazon-linux-ami/2016.09-release-notes/

Do the updates and configure ssh timeout.

1
2
3
4
5
$ sudo yum update -y
$ echo "ServerAliveInterval 50" > ~/.ssh/config
$ chmod 644 ~/.ssh/config
$ sudo yum install tree -y #optional
$ sudo yum install emacs -y #optional

Change permissions of custodian.pem file.

1
$ chmod 600 /home/ec2-user/.ssh/custodian-key.pem

Finally create ~/.aws/credentials file and put credentials for an IAM user that can create use AWS API to run queries on resources.

1
2
3
[default]
aws_access_key_id = AKIAJGBFBGOJJJJMKYXQ
aws_secret_access_key = EnlAfEjw4S4JWfX9ABCAAAAAAAAAABC

Configure Custodian Development Instance

If you prefer, create a screen session. More about screen is here.

1
$ screen -S custodian

Install Cloud Custodian

We will have to create a virtualenv in order to install Cloud Custodian properly. (Without a virtualenv everything except Lambda functionality works.)

1
2
3
4
5
6
$ virtualenv .custodian
$ source .custodian/bin/activate
$ sudo pip install c7n
...
...
Successfully installed boto3-1.4.4 botocore-1.5.7 c7n-0.8.22.0 functools32-3.2.3.post2 ipaddress-1.0.18 jsonschema-2.5.1 s3transfer-0.1.10

Create a folder to keep custodian configuration.

1
2
$ mkdir custodian; cd custodian/
$ mkdir output

Write Your First Policy

Under ~/custodian/ folder create a file named custodian.yml with the content below.

1
2
3
4
5
6
7
policies:
- name: my-first-policy
    resource: ec2
    filters:
    - "tag:Custodian": present
    actions:
    - stop

Run Your First Policy

1
$ custodian run --output-dir=output/ --config=custodian.yml --region=us-east-1

Output should look like below.

1
2
2017-02-01 18:33:25,557: custodian.policy:INFO Running policy my-first-policy resource: ec2 region:us-east-1 c7n:0.8.22.0
2017-02-01 18:33:26,445: custodian.policy:INFO policy: my-first-policy resource:ec2 has count:0 time:0.89

Since we didn’t have any instance tagged with Custodian, no instance was stopped.

Create A Test Instance

Now lets create an instance named custodian-test, tag it with Custodian and run this policy again.

1
2
$ INSTANCE=`aws ec2 run-instances --image-id ami-0b33d91d --count 1 --instance-type t2.micro --key-name custodian-key --security-group-ids sg-987feee4 --subnet-id subnet-897032d0 --query 'Instances[0].{ID:InstanceId}'  --output text --region us-east-1`
$ aws ec2 create-tags --resources $INSTANCE --tags Key=Name,Value=custodian-test Key=Custodian,Value= --region=us-east-1

If you check on the console you can see we have an instance with Custodian tag.

policy

Run Policy Again

Let’s run our first policy again.

1
2
3
4
5
$ custodian run --output-dir=output/ --config=custodian.yml --region=us-east-1
2017-02-01 18:48:28,354: custodian.policy:INFO Running policy my-first-policy resource: ec2 region:us-east-1 c7n:0.8.22.0
2017-02-01 18:48:28,860: custodian.policy:INFO policy: my-first-policy resource:ec2 has count:1 time:0.51
2017-02-01 18:48:28,861: custodian.actions:INFO Stop 1 of 1 instances
2017-02-01 18:48:29,038: custodian.policy:INFO policy: my-first-policy action: stop resources: 1 execution_time: 0.18

This time custodian found the instance with Custodian tag and stopped it.

If custodian can not match the instance, wait a few minutes and run the command again.

down

Getting Familiar with Custodian

Supported Resources

Let’s find which resources Custodian supports.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$ custodian schema
resources:
- account
- acm-certificate
- alarm
- ami
...
...
- ec2
- ecr
...
...
- vpn-gateway
- waf
- waf-regional

Let’s find what actions and filter Custodian has for ec2 resource.

1
2
3
4
5
6
7
8
$ custodian schema ec2
ec2:
actions: [auto-tag-user, invoke-lambda, mark, mark-for-op, modify-security-groups,
    normalize-tag, notify, remove-tag, rename-tag, resize, snapshot, start, stop,
    tag, tag-trim, terminate, unmark, untag]
filters: [and, default-vpc, ebs, ephemeral, event, image, image-age, instance-age,
    instance-uptime, marked-for-op, metrics, offhour, onhour, or, security-group,
    state-age, subnet, tag-count, value]

In our first policy we already used stop action and tag filter.

We want to go a step further and tag the instance we shutdown with Custodian.

Using Tags

Let’s see details on tag actions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
$ custodian schema ec2.actions.tag

Help
----

Tag an ec2 resource.


Schema
------

{   'additionalProperties': False,
    'properties': {   'key': {   'type': 'string'},
                    'tag': {   'type': 'string'},
                    'tags': {   'type': 'object'},
                    'type': {   'enum': ['tag', 'mark']},
                    'value': {   'type': 'string'}},
    'required': ['type'],
    'type': 'object'}

Modify custodian.yml from this

1
2
3
4
5
6
7
policies:
- name: my-first-policy
    resource: ec2
    filters:
    - "tag:Custodian": present
    actions:
    - stop

to this

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
policies:
- name: my-first-policy
    resource: ec2
    filters:
    - "tag:Custodian": present
    actions:
        - stop
        - type: tag
        key: Custodian
        value: shutdown

Run the policy again

1
$ custodian run --output-dir=output/ --config=custodian.yml --region=us-east-1

If succeeded, output should be like below.

1
2
3
4
5
2017-02-01 19:34:34,351: custodian.policy:INFO Running policy my-first-policy resource: ec2 region:us-east-1 c7n:0.8.22.0
2017-02-01 19:34:34,353: custodian.policy:INFO policy: my-first-policy resource:ec2 has count:1 time:0.00
2017-02-01 19:34:34,353: custodian.actions:INFO Stop 0 of 1 instances
2017-02-01 19:34:34,353: custodian.policy:INFO policy: my-first-policy action: stop resources: 1 execution_time: 0.00
2017-02-01 19:34:34,563: custodian.policy:INFO policy: my-first-policy action: tag resources: 1 execution_time: 0.21

If you check the logs, you will see Custodian found a match and ran two actions listed in the policy.

We can check the tag on AWS Console.

tagged

Using Query Filters

In this case, custodian-test instance was already shutdown. We would like to have a policy which will only shutdown the instances that are running. We can achieve this using a query filter which is instance-state-name: running.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
policies:
- name: my-first-policy
    resource: ec2
    query:
    - instance-state-name: running
    filters:
    - "tag:Custodian": present
    actions:
    - stop
    - type: tag
        key: Custodian
        value: shutdown

When we run the policy again, we see it won’t match any instances as custodian-test is already in stopped state.

1
2
3
$ custodian run --output-dir=output/ --config=custodian.yml --region=us-east-1
2017-02-01 20:33:21,727: custodian.policy:INFO Running policy my-first-policy resource: ec2 region:us-east-1 c7n:0.8.22.0
2017-02-01 20:33:21,728: custodian.policy:INFO policy: my-first-policy resource:ec2 has count:0 time:0.00

Below are the valid EC2 query filters, as listed in the source code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
EC2_VALID_FILTERS = {
    'architecture': ('i386', 'x86_64'),
    'availability-zone': str,
    'iam-instance-profile.arn': str,
    'image-id': str,
    'instance-id': str,
    'instance-lifecycle': ('spot',),
    'instance-state-name': (
        'pending',
        'terminated',
        'running',
        'shutting-down',
        'stopping',
        'stopped'),
    'instance.group-id': str,
    'instance.group-name': str,
    'tag-key': str,
    'tag-value': str,
    'tag:': str,
    'vpc-id': str}

Multiple Accounts via STS Assume Role

If you want to manage multiple AWS accounts, you will have to indicate a different cache file for each account. Below is the sample command that can be put in a cronjob.

1
2
3
4
5
6
7
8
9
/usr/local/custodian/bin/custodian run \
--cache-period=15 \
--cache /home/custodian/.accountname.cache \
-v \
-m \
-l /cloud-custodian/sts-prod/us-east-1 \
-s s3://mybucketnamehere/accounts/aws-account-name-here/us-east-1/policies \
--assume="arn:aws:iam::00000000000:role/Custodian" \
-c /etc/custodian/policies/hourly.yml &>> /var/log/custodian/hourly.log

Using CloudWatch Events

Before triggering a Custodian lambda function via events, we will create a new policy file named ami-policy.yml to check if the instance has the allowed AMIs. We will run this policy as we did previously from the command line.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
policies:
- name: ec2-ami-check
    resource: ec2
    filters:
    - type: value
        key: ImageId
        op: in
        value:
        - ami-0b33d91d
    actions:
        - type: tag
        key: ami-status
        value: approved

Now let’s run the policy.

1
2
3
4
5
$ custodian run --output-dir=output/ --config=ami-policy.yml --region=us-east-1
# output
2017-02-01 21:10:01,188: custodian.policy:INFO Running policy ec2-ami-check resource: ec2 region:us-east-1 c7n:0.8.22.0
2017-02-01 21:10:01,865: custodian.policy:INFO policy: ec2-ami-check resource:ec2 has count:2 time:0.68
2017-02-01 21:10:02,149: custodian.policy:INFO policy: ec2-ami-check action: tag resources: 2 execution_time: 0.28

So far we had created two instances. custodian01 and custodian-test from the same AMI with the id of ami-0b33d91d. Both of these matched the policy.

Being able to find instances which match the policy is very convenient but we wouldn’t like to run this policy over and over again. Instead we would like to run it only when a new instance is created.

Let’s change the policy mode so it will be deployed as a lambda function and will be triggered by RunInstance event. Notice the role we gave for the lambda function. This role will have to have the permissions to execute the action defined in the policy.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
policies:
- name: ec2-ami-check
    resource: ec2
    mode:
    type: ec2-instance-state
    role: arn:aws:iam::676452272092:role/lambdaAll
    events:
    - running
    filters:
    - type: value
        key: ImageId
        op: in
        value:
        - ami-0b33d91d
    actions:
    - type: tag
        key: ami-status
        value: approved

Now, run the policy.

1
2
3
4
5
$ custodian run --output-dir=output/ --config=ami-policy.yml --region=us-east-1
2017-02-01 21:36:47,878: custodian.policy:INFO Provisioning policy lambda ec2-ami-check
/usr/local/lib/python2.7/site-packages/c7n/mu.py:156: UserWarning: Duplicate name: './README'
self._zip_file.writestr(dest, contents)
2017-02-01 21:36:48,016: custodian.lambda:INFO Publishing custodian policy lambda function custodian-ec2-ami-check

Once the lambda function is deployed, create an instance with the AMI id ami-0b33d91d.

1
$ aws ec2 run-instances --image-id ami-0b33d91d --count 1 --instance-type t2.micro --key-name custodian-key --security-group-ids sg-987feee4 --subnet-id subnet-897032d0 --query 'Instances[0].{ID:InstanceId}'  --output text --region us-east-1

If you check the instance tags a few seconds after state turns into running you will see the ami-status tag on the instance.

ami

It is also important to note, Custodian creates a CloudWatch Rule which triggers the lambda function. Policy name is used in these resources created by Custodian.

Examples

Terminating Instances without the approved AMIs

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
policies:
- name: ec2-ami-check
    resource: ec2
    mode:
    type: ec2-instance-state
    role: arn:aws:iam::676452272092:role/lambdaAll
    events:
    - running
    filters:
    - type: value
        key: ImageId
        op: ni
        value:
        - ami-0b33d91d
        - ami-0b12dda1
    actions:
    - type: tag
        key: ami-status
        value: Unapproved AMI
    - stop
    - type: mark-for-op
        op: terminate
        days: 30

If you create the policy above and create an instance with an AMI different than the ones listed in the policy, the instance will be stopped and tagged as seen below.

stopped

A policy like the one seen below, would terminate the instance after 30 days.

1
2
3
4
5
6
7
8
9
- name: ec2-terminate-marked
resource: ec2
comments: |
    Delete any EC2 instances with un-approved AMIs after 30 days.    
filters:
    - type: marked-for-op
        op: terminate
actions:
    - terminate

Multiple Resources and Policies in a Single File

The policy description below has two policies. One of the policies listen to EC2 instance status change the other one listens EBS volume creation. The main difference between these two is, EBS policy listens CloudTrail event which takes longer than a CloudWatch Event to appear.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
policies:
- name: ec2-ami-check
    resource: ec2
    mode:
    type: ec2-instance-state
    role: arn:aws:iam::676452272092:role/lambdaAll
    events:
        - running
    filters:
        - type: value
            key: ImageId
            op: ni
            value:
            - ami-0b33d91d
    actions:
    - type: tag
        key: ami-status
        value: Unapproved AMI
    - stop
    - type: mark-for-op
        op: terminate
        days: 30
- name: ebs-volume
    resource: ebs
    mode:
    type: cloudtrail
    role: arn:aws:iam::676452272092:role/lambdaAll
    events:
        - CreateVolume
    actions:
        - type: tag
        key: custodian
        value: Verified volume

When you run the policy above, Custodian will create two lambda functions as seen below.

1
2
3
4
5
6
$ custodian run  -c multi-test.yml -s . --region us-east-1

2017-02-02 17:27:27,127: custodian.policy:INFO Provisioning policy lambda ec2-ami-check
2017-02-02 17:27:27,471: custodian.lambda:INFO Publishing custodian policy lambda function custodian-ec2-ami-check
2017-02-02 17:27:28,800: custodian.policy:INFO Provisioning policy lambda ebs-volume
2017-02-02 17:27:28,887: custodian.lambda:INFO Publishing custodian policy lambda function custodian-ebs-volume

If you mix up push mode policies with pull mode policies, Custodian will deploy lambda functions for push policies and just run pull policies.

Scheduling policies

You can also schedule a policy to run as a scheduled lambda function by using periodic type.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
- name: tag-check
    resource: ec2
    mode:
    type: periodic
    role: arn:aws:iam::676452272092:role/lambdaAll
    schedule: "rate(1 day)"
    filters:
    - "tag:Custodian": present
    actions:
    - stop

Tips

You can validate the policy without running it.

1
$ custodian validate -c custodian.yml

You can also test the policy without running the actions

1
$ custodian run --dryrun -c custodian.yml -s .
comments powered by Disqus