Policy Language¶
The policies/
directory in the configuration repository contains the c7n policies, one per file. Policies must have a name that matches their filename (NAME.yml
) and should all have a “comment” or “comments” section that provides a human-readable summary of what the policy does (this comment is used to generate the Current Policies documentation).
All policies are built on top of defaults.yml
; see Defaults merging for further information.
Policies are built via the policygen
command (or the manheim-c7n-tools
policygen step), which runs Policygen and generates per-region custodian_REGION.yml
files.
Policy Repository Layout¶
The overall layout of the configuration repository must be as follows:
mailer-templates/ (optional)
manheim-c7n-tools.yml
policies/
├── all_accounts
│ └── common
│ └── policy-one.yml
├── defaults.yml
└── ACCOUNT-NAME
├── common
│ ├── policy-three.yml
│ └── policy-two.yml
├── us-east-1
│ ├── policy-five-us-east-1.yml
│ └── policy-four-us-east-1.yml
├── us-east-2
│ └── policy-four-us-east-2.yml
├── us-west-1
│ └── policy-four-us-west-1.yml
└── us-west-2
└── policy-four-us-west-2.yml
The policies/
directory contains:
defaults.yml
, the defaults used for ALL policies in all accounts (see Defaults merging for further information).A
all_accounts/
directory of policies shared identically across all accounts.A directory of account-specific policies for each account; the directory name must match the
account_name
value inmanheim-c7n-tools.yml
.
Within each subdirectory (all_accounts
or an account name) is a directory called common
and optionally directories for one or more specific regions. Policies in the common/
directory will be applied in all regions and policies in a region-specific directory will only be applied in that region.
When building the final configuration, policies from the account-specific directory will be layered on top of policies from the all_accounts/
directory. A policy with the exact same file and policy name in a per-account directory will override a policy with the same name from the all_accounts/
directory. Similarly, within the all_accounts/ or account-named directories, a region-specific policy will override a common/
policy with the same name and filename.
An example configuration repository can be seen at https://github.com/manheim/manheim-c7n-tools/tree/master/example_config_repo.
Multiple Repository Layout¶
mailer-templates/ (optional)
manheim-c7n-tools.yml
policies/
├── app
│ ├── defaults.yml
│ ├── mailer-templates/ (optional)
│ └── ACCOUNT-NAME
│ ├── us-east-1
│ │ ├── policy-five-us-east-1.yml
│ │ └── policy-six-us-east-1.yml
│ ├── us-east-2
│ │ └── policy-six-us-east-2.yml
│ ├── us-west-1
│ │ └── policy-six-us-west-1.yml
│ └── us-west-2
│ └── policy-six-us-west-2.yml
├── common
│ ├── all_accounts
│ │ └── common
│ │ └── policy-one.yml
│ ├── defaults.yml
│ ├── mailer-templates/ (optional)
│ └── ACCOUNT-NAME
│ └── common
│ ├── policy-three.yml
│ └── policy-two.yml
└── team
├── all_accounts
│ └── common
│ └── policy-six.yml
├── mailer-templates/ (optional)
└── ACCOUNT-NAME
├── us-east-1
│ ├── policy-five-us-east-1.yml
│ └── policy-four-us-east-1.yml
├── us-east-2
│ └── policy-four-us-east-2.yml
├── us-west-1
│ └── policy-four-us-west-1.yml
└── us-west-2
└── policy-four-us-west-2.yml
An example configuration for a multiple repository setup can be seen at https://github.com/manheim/manheim-c7n-tools/tree/master/example_config_multi_repo.
If mailer-templates/
directories are present in one or more of the subdirectories, their contents will be combined into ./mailer-templates/
, with later files of the same name overwriting earlier ones according to the order defined in the policy_source_paths
configuration item.
Policy Interpolation¶
When Policygen generates configuration files for each AWS Region that we deploy into, it will replace all instances of the string %%AWS_REGION%%
with the specific region name. As such, the %%AWS_REGION%%
macro must be used in all policies as well as the mailer config, where the current region needs to be referenced.
The list of regions that we generate configs for is taken from the regions
key of manheim-c7n-tools.yml
.
There are also some other values from manheim-c7n-tools.yml
(the ManheimConfig
class) that can be interpolated in the policies:
String |
Config Value |
Description |
---|---|---|
%%AWS_REGION%% |
n/a |
Replaced with the current region name, for each per-region config |
%%BUCKET_NAME%% |
output_s3_bucket_name |
Name of the S3 bucket used for cloud-custodian output |
%%LOG_GROUP%% |
custodian_log_group |
Name of the CloudWatch Log Group for custodian to log to |
%%DLQ_ARN%% |
dead_letter_queue_arn |
ARN of the Dead Letter Queue for Custodian Lambdas |
%%ROLE_ARN%% |
role_arn |
ARN of the IAM Role to run Custodian functions with |
%%MAILER_QUEUE_URL%% |
mailer_config.queue_url |
c7n-mailer SQS queue URL |
%%ACCOUNT_NAME%% |
account_name |
Configured name of the current AWS account |
%%ACCOUNT_ID%% |
account_id |
Configured ID of the current AWS account |
In addition, any POLICYGEN_ENV_
-prefixed environment variables present when policygen
is run will be interpolated into the configuration. Running policygen with a POLICYGEN_ENV_foo
environment variable set to bar
will result in all occurrences of %%POLICYGEN_ENV_foo%%
in the configuration replaced with bar
.
Anatomy of a Policy¶
Policies in this repository are augmented with the contents of defaults.yml
according to the rules described under Defaults merging.
As an example, our onhour-start-ec2
policy contains:
# REMINDER: defaults.yml will be merged in to this. See the README.
name: onhour-start-ec2
comments: Start tagged EC2 Instances daily at 06:00 Eastern, or per tag value
resource: ec2
filters:
- type: onhour
onhour: 6
default_tz: America/New_York
tag: custodian_downtime
actions:
- start
- type: notify
violation_desc: The following EC2 Instance(s)
action_desc: have been started per onhour configuration
subject: '[cloud-custodian {{ account }}] Onhour Started EC2 Instances in {{ region }}'
mode:
schedule: rate(1 hour)
And our defaults.yml
contains:
actions:
- type: notify
questions_email: foo@example.com
questions_slack: our-channel
template: redefault.html
to:
- resource-owner
- 'splunkhec://%%POLICYGEN_ENV_SPLUNK_INDEX%%'
owner_absent_contact:
- bar@example.com
- baz@example.com
transport:
queue: 'https://sqs.us-east-1.amazonaws.com/111111111111/cloud-custodian-111111111111'
type: sqs
mode:
execution-options: {log_group: /cloud-custodian/111111111111/us-east-1, output_dir: 's3://c7n-logs-111111111111/logs'}
role: arn:aws:iam::111111111111:role/cloud-custodian-111111111111
schedule: rate(1 hour)
tags: {Component: onhour-start-ec2, Environment: dev, OwnerEmail: foo@example.com,
Project: cloud-custodian}
timeout: 300
type: periodic
After merging with defaults.yml
, the policy for the us-east-1 region of a sample “dev” account becomes (this example has been manually sorted to look more like the original, above; the actual output will have keys sorted alphabetically):
name: onhour-start-ec2
comments: Start tagged EC2 Instances daily at 06:00 Eastern, or per tag value
resource: ec2
filters:
- type: onhour
onhour: 6
default_tz: America/New_York
tag: custodian_downtime
actions:
- start
- type: notify
violation_desc: The following EC2 Instance(s)
action_desc: have been started per onhour configuration
subject: '[cloud-custodian {{ account }}] Onhour Started EC2 Instances in {{ region }}'
questions_email: foo@example.com
questions_slack: our-channel
template: redefault.html
to:
- resource-owner
- 'splunkhec://%%POLICYGEN_ENV_SPLUNK_INDEX%%'
owner_absent_contact:
- bar@example.com
- baz@example.com
transport:
queue: 'https://sqs.us-east-1.amazonaws.com/111111111111/cloud-custodian-111111111111'
type: sqs
mode:
execution-options: {log_group: /cloud-custodian/111111111111/us-east-1, output_dir: 's3://c7n-logs-111111111111/logs'}
role: arn:aws:iam::111111111111:role/cloud-custodian-111111111111
schedule: rate(1 hour)
tags: {Component: onhour-start-ec2, Environment: dev, OwnerEmail: foo@example.com,
Project: cloud-custodian}
timeout: 300
type: periodic
The full list of top-level keys valid for a policy can be found by viewing the source code of c7n.schema.generate
or via the custodian
CLI schema
command, but the above example illustrates the keys that most, if not all, of our policies will have.
name - The unique name of the policy. For this repo, the filename must be the policy name with a
.yml
suffix.comments - A one- or two-sentence description of what the policy does. The Jenkins deployment job extracts all of these and uses them to build the generated documentation for the configuration repo.
resource - The AWS resource type that this policy acts on; e.g.
ec2
,asg
,rds
, etc. Supported resource types can be found in the upstream documentation; see the"type" attributes (strings) of the various c7n.resources classes
.filters - Filters tell a policy which resources it should match. The
filters
key here is an array/list of 0 or more filters to select resources that the policy should match. Multiple filters areand
-ed together, unless you nest them under anor
block (see the upstream documentation on collection operators). See the Filters section, below, for more information.actions - Actions tell c7n what to do with or about resources that the filters matched. The
actions
key here is an array/list of 0 or more actions for this policy to take. See the Actions section, below, for more information.mode - The
mode
key determines how the policy will be deployed and run. See the Mode section, below, for more information.notify_only - This is a manheim-c7n-tools addition, which is used internally and removed from the policy before Policygen generates the final YAML files for custodian. See Notify-Only Option for Policies for further information.
disable - This is a manheim-c7n-tools addition, which is used internally and removed from the policy before Policygen generates the final YAML files for custodian. See Disabling a policy for further information.
Filters¶
Cloud-custodian has support for many different kinds of filters to match various resource attributes.
Upstream documentation exists on both the Generic filters
as well as the AWS-specific filters.
In addition to that manually-curated documentation, there is also generated
documentation for the generic
and resource-specific filters
, as well as the source
code for each (which is liked from that documentation).
The Generic value filters can match any attribute of the resource instance, which is generally the return value of the Describe AWS API call for the resource type. There are also some transformations that can be performed on the values, such as type conversion, array counting, normalization (lower-case) or calculating age from a date type.
VPC filters
for things like subnet, security groups, etc.IAM filters
to assist with finding cross-account or public access in policies.Health filters
to identify resources with associated AWS Health events.Metric filters
to retrieve and filter based on CloudWatch metrics for resources.The
offhours filters
.
Actions¶
Note
manheim-c7n-tools’ Notify-Only Option for Policies option on a policy can effect the actions specified. See that section for more information.
Cloud-custodian has both generic/global actions (such as notify
) and resource-specific actions
(such as stop
and start
). Some actions are specified as only a string (i.e. stop
or
start
), whereas others need to be specified as a dictionary/hash/mapping including configuration options.
Global actions
include:
Notify
- Send email to static addresses, or addresses from tags on the resource, via c7n_mailer. Our defaults include configuration required for using this action with our c7n_mailer instance. The only configuration needed to make this action work is as shown in the example above; specifically, thetype: notify
key and thesubject
,violation_desc
andaction_desc
keys.invoke-lambda
- Invoke an arbitraty Lambda function, passing it details of the policy, action, triggering event, and matched resource(s).modify-security-groups
- Modify the security groups assigned to a resource.put-metric
- Send a custom metric to CloudWatch
To identify available resource-specific actions, either find the appropriate resource type module in the
cloud-custodian AWS documentation or the
c7n source code
and find all classes in it that are based on c7n.actions.Action
, or use the custodian schema
command line tool. There is also
manually-curated documentation on resource-specific filters and actions
that is helpful but incomplete.
In addition to notify
, some of our most-used actions are the various resource-specific stop
or
suspend
and start
or resume
actions, as well as the terminate
or delete
actions,
as well as the resource-specific actions to add/modify/delete tags and tag (“mark”) a resource for later action.
Marking Resources for Later Action¶
IMPORTANT: See the Data Collection/Notification to Action Transition section, below.
c7n has built-in logic for using tags to “mark” resources for action at a future time. Note that these actions are actually resource-specific, and unfortunately some of them have different names on different resources.
The following snippet will mark matched resources with a c7n-foo
tag, with a value of the specified message.
In the message, {op}
will be replaced with the operation (delete
) and {action_date}
will be replaced
with the date when the action should occur (in this example, the current time plus 5 days).
filters:
# not tagged for this policy; otherwise, we'd just keep pushing the mark date forward
- {'tag:c7n-foo': absent}
actions:
- type: mark-for-op
tag: c7n-foo
op: delete
message: "asg-inactive-mark: {op}@{action_date}"
days: 5
In a separate policy, we can then filter for resources which were marked for a specific action
at or before the current date/time with the marked-for-op
filter:
filters:
- type: marked-for-op
tag: c7n-asg-inactive
op: delete
That example will filter all resources that were marked for deletion at
or before the current time, with the c7n-asg-inactive
tag.
The skew
parameter on the marked filter skews the current date by adding a number of days to it.
This allows us to filter for resources that are marked for an operation N days in the future, i.e.
to send out a warning notification ahead of time. The following filter will match the same
resources as the previous example, but two days before that example.
filters:
- type: marked-for-op
tag: c7n-asg-inactive
op: delete
skew: 2
The combination of these actions and filters are commonly used to build a “group” of four complementary policies:
A
-mark
policy matches desired resources with a filter and uses themark-for-op
action to tag them for action at a later date. Note that it is extremely important to make sure the policy also incldes a filter to exclude resources that already have the marking tag present; if not, the date to take action will continually move forward every time the policy runs, and the action will never be taken.An
-unmark
policy matches resources that have themark
tag present but no longer meet the desired criteria, and removes the mark tag from them. For example: if we’re writing a policy to identify and terminate EC2 instances lacking required tags, the-unmark
policy would match resources that were previously marked by its counterpart (1) but now have the required tags, and would remove the marking tag from them.An early-action policy using
skew
that warns owners of impending action, and may take some preliminary action (i.e. stopping an EC2 instance a few days before it will be terminated).A termination/deletion policy that takes the final action.
Mode¶
We have standardized on deploying our policies as Lambda functions, to take advantage of c7n’s excellent
:std:label:`cloud custodian:lambda`. The type
key of the mode
section
of the policy defines how the policy will be deployed and executed.
defaults.yml
should specify everything needed to deploy a policy in periodic
mode. If the mode
section is completely
omitted from a policy, the default periodic mode will be applied.
Supported mode
type
options for Lambda functions include:
periodic - (default for our policies) runs on a set schedule using timer-based CloudWatch Events as a trigger.
cloudtrail - runs every time a CloudTrail event of a certain type is received. Note that tags may not have been applied to resources yet when this triggers.
ec2-instance-state - runs every time an EC2 Instance enters the specified state (e.g.
running
,stopped
,pending
, etc). Note that tags may not have been applied to instances yet when this triggers.config-rule - triggers via AWS Config rules. Note that not all resource types are supported by AWS Config; see the AWS Config - Supported Resources documentation for a list of which resource types are supported.
For full documentation on the required and optional configuration keys for each mode, see the upstream documentation.
Other keys under the mode
section include:
role - the IAM role that the policy executes under. They should all use the same terraform-managed role.
tags - Tags to apply to the Lambda function.
policygen.py
will add the policy name as theComponent
tag.timeout - The timeout, in seconds, for the Lambda function. This should be left at the default (maximum) of 300.
execution_options - Internal options of the Lambda function. Our defaults send logs to a CloudWatch log group and output to an S3 bucket, and setup the Dead Letter Queue.
Disabling a policy¶
It is possible to disable a policy. Simply setting the disable
key in a policy to true
will stop that policy from being
deployed.
name: onhour-start-ec2
comments: Start tagged EC2 Instances daily at 06:00 Eastern, or per tag value
resource: ec2
disable: true
The disable
key can be added to an existing policy to temporarily disable the policy, and can also be used to disable a
policy inherited from a higher-level policy source location by creating a new policy with the same name as the inherited
policy and adding the disable
key. Only the name
and disable
keys are required in this case, though adding about
comment can help explain why the policy is disabled.
name: onhour-start-ec2
comments: Disabled due to ...
disable: true
Data Collection/Notification to Action Transition¶
A common pattern that we use when testing new policies is to set up some
policies to either only send email notifications or to only collect data,
analyze that data, and then enable real actions (i.e. stop,
terminate, delete, etc.) after some data has been collected. However it
is very important to note that if a “testing only” policy used the
mark-for-op
action to tag a resource for later action, and actions
are later enabled for corresponding policies, the actions might be taken
immediately when enabled as a result of the “notify only” policies
marking resources for action.
As of version 1.3.0, manheim-c7n-tools supports a Notify-Only Option for Policies flag to help simplify this transition. For older versions, or policies that existed prior to 1.3.0, see the following section on manual tag cleanup.
Manual Tag Cleanup¶
As a result, when adding actions to policies that have been running in data collection mode, it’s important to manually purge the relevant tags so the policies don’t take any action based on tags applied during data collection.
For example, if you’re adding a “delete” action to policies that were previously only collecting data and included a mark action like:
- type: mark-for-op
tag: c7n-foo-policy
op: delete
message: "foo-mark {op}@{action_date}"
days: 7
Before enabling the real delete action, you should purge all of those tags with something like (example for EC2 instances):
TAGNAME=c7n-foo-policy
for i in $(aws ec2 describe-instances --filters Name=tag-key,Values=$tagname --output text --query 'Reservations[*].Instances[*].[InstanceId]')
do
echo "removing tag from: $i"
aws ec2 delete-tags --resources $i --tags Key=$tagname
done
Notify-Only Option for Policies¶
As described above in Data Collection/Notification to Action Transition, it’s common to want to run new policies in a “notify only” mode that sends notifications (and collects data) but does not yet take actions, assess those notifications, and enable actually taking action at a later date.
To support this, manheim-c7n-tools (specifically Policygen) supports the addition of a boolean notify_only
option at the top level of policy files, or in defaults.yml
for account- / repository-wide notify-only. Setting this flag will cause Policygen to pass the effected policies through NotifyOnlyPolicy
for pre-processing. This will cause the following changes to the final YAML policy:
The
comment
/comments
/description
fields will be prefixed with the stringNOTIFY ONLY:
If the policy has a
tags
list, anotify-only
tag will be appended to it.All tagging actions will have the string
-notify-only
appended to their tag names, to automate the above-described transition. Specifically:Any
mark
ortag
actions in the actions list will have the string-notify-only
appended to theirtag
orkey
values (if present) or appended to every item in theirtags
list (if present). If none of the above are present, thetag
item will be set to custodian’sDEFAULT_TAG
value, with-notify-only
appended.Any
mark-for-op
actions will have the string-notify-only
appended to theirtag
value. If they do not already have atag
value, it will be set to custodian’sDEFAULT_TAG
value, with-notify-only
appended.Any
remove-tag
/unmark
/untag
actions wukk have the string-notify-only
appended to all items in theirtags
list.
All
notify
actions will have theirviolation_desc
, if present, prefixed withNOTIFY ONLY:
. Theiraction_desc
, if present, will be prefixed within the future (currently notify-only)
.Any
filters
items withtag:NAME
keys, which match up withNAME
tags used inmark-for-op
actions, will be updated totag:NAME-notify-only
to retain their intended functionality.All other action types, not listed above, will be removed from the policy. We enforce notify-only by only retaining specifically whitelisted actions in the policy.