Policy Language¶
The policies/ directory in the configuration repository contains the c7n policies, one per file. Policies must have a name that matches their filename (NAME.yml) and should all have a “comment” or “comments” section that provides a human-readable summary of what the policy does (this comment is used to generate the Current Policies documentation).
All policies are built on top of defaults.yml; see Defaults merging for further information.
Policies are built via the policygen command (or the manheim-c7n-tools policygen step), which runs Policygen and generates per-region custodian_REGION.yml files.
Policy Repository Layout¶
The overall layout of the configuration repository must be as follows:
mailer-templates/ (optional)
manheim-c7n-tools.yml
policies/
├── all_accounts
│ └── common
│ └── policy-one.yml
├── defaults.yml
└── ACCOUNT-NAME
├── common
│ ├── policy-three.yml
│ └── policy-two.yml
├── us-east-1
│ ├── policy-five-us-east-1.yml
│ └── policy-four-us-east-1.yml
├── us-east-2
│ └── policy-four-us-east-2.yml
├── us-west-1
│ └── policy-four-us-west-1.yml
└── us-west-2
└── policy-four-us-west-2.yml
The policies/ directory contains:
defaults.yml, the defaults used for ALL policies in all accounts (see Defaults merging for further information).A
all_accounts/directory of policies shared identically across all accounts.A directory of account-specific policies for each account; the directory name must match the
account_namevalue inmanheim-c7n-tools.yml.
Within each subdirectory (all_accounts or an account name) is a directory called common and optionally directories for one or more specific regions. Policies in the common/ directory will be applied in all regions and policies in a region-specific directory will only be applied in that region.
When building the final configuration, policies from the account-specific directory will be layered on top of policies from the all_accounts/ directory. A policy with the exact same file and policy name in a per-account directory will override a policy with the same name from the all_accounts/ directory. Similarly, within the all_accounts/ or account-named directories, a region-specific policy will override a common/ policy with the same name and filename.
An example configuration repository can be seen at https://github.com/manheim/manheim-c7n-tools/tree/master/example_config_repo.
Multiple Repository Layout¶
mailer-templates/ (optional)
manheim-c7n-tools.yml
policies/
├── app
│ ├── defaults.yml
│ ├── mailer-templates/ (optional)
│ └── ACCOUNT-NAME
│ ├── us-east-1
│ │ ├── policy-five-us-east-1.yml
│ │ └── policy-six-us-east-1.yml
│ ├── us-east-2
│ │ └── policy-six-us-east-2.yml
│ ├── us-west-1
│ │ └── policy-six-us-west-1.yml
│ └── us-west-2
│ └── policy-six-us-west-2.yml
├── common
│ ├── all_accounts
│ │ └── common
│ │ └── policy-one.yml
│ ├── defaults.yml
│ ├── mailer-templates/ (optional)
│ └── ACCOUNT-NAME
│ └── common
│ ├── policy-three.yml
│ └── policy-two.yml
└── team
├── all_accounts
│ └── common
│ └── policy-six.yml
├── mailer-templates/ (optional)
└── ACCOUNT-NAME
├── us-east-1
│ ├── policy-five-us-east-1.yml
│ └── policy-four-us-east-1.yml
├── us-east-2
│ └── policy-four-us-east-2.yml
├── us-west-1
│ └── policy-four-us-west-1.yml
└── us-west-2
└── policy-four-us-west-2.yml
An example configuration for a multiple repository setup can be seen at https://github.com/manheim/manheim-c7n-tools/tree/master/example_config_multi_repo.
If mailer-templates/ directories are present in one or more of the subdirectories, their contents will be combined into ./mailer-templates/, with later files of the same name overwriting earlier ones according to the order defined in the policy_source_paths configuration item.
Policy Interpolation¶
When Policygen generates configuration files for each AWS Region that we deploy into, it will replace all instances of the string %%AWS_REGION%% with the specific region name. As such, the %%AWS_REGION%% macro must be used in all policies as well as the mailer config, where the current region needs to be referenced.
The list of regions that we generate configs for is taken from the regions key of manheim-c7n-tools.yml.
There are also some other values from manheim-c7n-tools.yml (the ManheimConfig class) that can be interpolated in the policies:
String |
Config Value |
Description |
|---|---|---|
%%AWS_REGION%% |
n/a |
Replaced with the current region name, for each per-region config |
%%BUCKET_NAME%% |
output_s3_bucket_name |
Name of the S3 bucket used for cloud-custodian output |
%%LOG_GROUP%% |
custodian_log_group |
Name of the CloudWatch Log Group for custodian to log to |
%%DLQ_ARN%% |
dead_letter_queue_arn |
ARN of the Dead Letter Queue for Custodian Lambdas |
%%ROLE_ARN%% |
role_arn |
ARN of the IAM Role to run Custodian functions with |
%%MAILER_QUEUE_URL%% |
mailer_config.queue_url |
c7n-mailer SQS queue URL |
%%ACCOUNT_NAME%% |
account_name |
Configured name of the current AWS account |
%%ACCOUNT_ID%% |
account_id |
Configured ID of the current AWS account |
In addition, any POLICYGEN_ENV_-prefixed environment variables present when policygen is run will be interpolated into the configuration. Running policygen with a POLICYGEN_ENV_foo environment variable set to bar will result in all occurrences of %%POLICYGEN_ENV_foo%% in the configuration replaced with bar.
Anatomy of a Policy¶
Policies in this repository are augmented with the contents of defaults.yml according to the rules described under Defaults merging.
As an example, our onhour-start-ec2 policy contains:
# REMINDER: defaults.yml will be merged in to this. See the README.
name: onhour-start-ec2
comments: Start tagged EC2 Instances daily at 06:00 Eastern, or per tag value
resource: ec2
filters:
- type: onhour
onhour: 6
default_tz: America/New_York
tag: custodian_downtime
actions:
- start
- type: notify
violation_desc: The following EC2 Instance(s)
action_desc: have been started per onhour configuration
subject: '[cloud-custodian {{ account }}] Onhour Started EC2 Instances in {{ region }}'
mode:
schedule: rate(1 hour)
And our defaults.yml contains:
actions:
- type: notify
questions_email: foo@example.com
questions_slack: our-channel
template: redefault.html
to:
- resource-owner
- 'splunkhec://%%POLICYGEN_ENV_SPLUNK_INDEX%%'
owner_absent_contact:
- bar@example.com
- baz@example.com
transport:
queue: 'https://sqs.us-east-1.amazonaws.com/111111111111/cloud-custodian-111111111111'
type: sqs
mode:
execution-options: {log_group: /cloud-custodian/111111111111/us-east-1, output_dir: 's3://c7n-logs-111111111111/logs'}
role: arn:aws:iam::111111111111:role/cloud-custodian-111111111111
schedule: rate(1 hour)
tags: {Component: onhour-start-ec2, Environment: dev, OwnerEmail: foo@example.com,
Project: cloud-custodian}
timeout: 300
type: periodic
After merging with defaults.yml, the policy for the us-east-1 region of a sample “dev” account becomes (this example has been manually sorted to look more like the original, above; the actual output will have keys sorted alphabetically):
name: onhour-start-ec2
comments: Start tagged EC2 Instances daily at 06:00 Eastern, or per tag value
resource: ec2
filters:
- type: onhour
onhour: 6
default_tz: America/New_York
tag: custodian_downtime
actions:
- start
- type: notify
violation_desc: The following EC2 Instance(s)
action_desc: have been started per onhour configuration
subject: '[cloud-custodian {{ account }}] Onhour Started EC2 Instances in {{ region }}'
questions_email: foo@example.com
questions_slack: our-channel
template: redefault.html
to:
- resource-owner
- 'splunkhec://%%POLICYGEN_ENV_SPLUNK_INDEX%%'
owner_absent_contact:
- bar@example.com
- baz@example.com
transport:
queue: 'https://sqs.us-east-1.amazonaws.com/111111111111/cloud-custodian-111111111111'
type: sqs
mode:
execution-options: {log_group: /cloud-custodian/111111111111/us-east-1, output_dir: 's3://c7n-logs-111111111111/logs'}
role: arn:aws:iam::111111111111:role/cloud-custodian-111111111111
schedule: rate(1 hour)
tags: {Component: onhour-start-ec2, Environment: dev, OwnerEmail: foo@example.com,
Project: cloud-custodian}
timeout: 300
type: periodic
The full list of top-level keys valid for a policy can be found by viewing the source code of c7n.schema.generate or via the custodian CLI schema command, but the above example illustrates the keys that most, if not all, of our policies will have.
name - The unique name of the policy. For this repo, the filename must be the policy name with a
.ymlsuffix.comments - A one- or two-sentence description of what the policy does. The Jenkins deployment job extracts all of these and uses them to build the generated documentation for the configuration repo.
resource - The AWS resource type that this policy acts on; e.g.
ec2,asg,rds, etc. Supported resource types can be found in the upstream documentation; see the"type" attributes (strings) of the various c7n.resources classes.filters - Filters tell a policy which resources it should match. The
filterskey here is an array/list of 0 or more filters to select resources that the policy should match. Multiple filters areand-ed together, unless you nest them under anorblock (see the upstream documentation on collection operators). See the Filters section, below, for more information.actions - Actions tell c7n what to do with or about resources that the filters matched. The
actionskey here is an array/list of 0 or more actions for this policy to take. See the Actions section, below, for more information.mode - The
modekey determines how the policy will be deployed and run. See the Mode section, below, for more information.notify_only - This is a manheim-c7n-tools addition, which is used internally and removed from the policy before Policygen generates the final YAML files for custodian. See Notify-Only Option for Policies for further information.
disable - This is a manheim-c7n-tools addition, which is used internally and removed from the policy before Policygen generates the final YAML files for custodian. See Disabling a policy for further information.
Filters¶
Cloud-custodian has support for many different kinds of filters to match various resource attributes.
Upstream documentation exists on both the Generic filters
as well as the AWS-specific filters.
In addition to that manually-curated documentation, there is also generated
documentation for the generic
and resource-specific filters, as well as the source
code for each (which is liked from that documentation).
The Generic value filters can match any attribute of the resource instance, which is generally the return value of the Describe AWS API call for the resource type. There are also some transformations that can be performed on the values, such as type conversion, array counting, normalization (lower-case) or calculating age from a date type.
VPC filtersfor things like subnet, security groups, etc.IAM filtersto assist with finding cross-account or public access in policies.Health filtersto identify resources with associated AWS Health events.Metric filtersto retrieve and filter based on CloudWatch metrics for resources.The
offhours filters.
Actions¶
Note
manheim-c7n-tools’ Notify-Only Option for Policies option on a policy can effect the actions specified. See that section for more information.
Cloud-custodian has both generic/global actions (such as notify) and resource-specific actions
(such as stop and start). Some actions are specified as only a string (i.e. stop or
start), whereas others need to be specified as a dictionary/hash/mapping including configuration options.
Global actions include:
Notify- Send email to static addresses, or addresses from tags on the resource, via c7n_mailer. Our defaults include configuration required for using this action with our c7n_mailer instance. The only configuration needed to make this action work is as shown in the example above; specifically, thetype: notifykey and thesubject,violation_descandaction_desckeys.invoke-lambda- Invoke an arbitraty Lambda function, passing it details of the policy, action, triggering event, and matched resource(s).modify-security-groups- Modify the security groups assigned to a resource.put-metric- Send a custom metric to CloudWatch
To identify available resource-specific actions, either find the appropriate resource type module in the
cloud-custodian AWS documentation or the
c7n source code
and find all classes in it that are based on c7n.actions.Action, or use the custodian schema
command line tool. There is also
manually-curated documentation on resource-specific filters and actions
that is helpful but incomplete.
In addition to notify, some of our most-used actions are the various resource-specific stop or
suspend and start or resume actions, as well as the terminate or delete actions,
as well as the resource-specific actions to add/modify/delete tags and tag (“mark”) a resource for later action.
Marking Resources for Later Action¶
IMPORTANT: See the Data Collection/Notification to Action Transition section, below.
c7n has built-in logic for using tags to “mark” resources for action at a future time. Note that these actions are actually resource-specific, and unfortunately some of them have different names on different resources.
The following snippet will mark matched resources with a c7n-foo tag, with a value of the specified message.
In the message, {op} will be replaced with the operation (delete) and {action_date} will be replaced
with the date when the action should occur (in this example, the current time plus 5 days).
filters:
# not tagged for this policy; otherwise, we'd just keep pushing the mark date forward
- {'tag:c7n-foo': absent}
actions:
- type: mark-for-op
tag: c7n-foo
op: delete
message: "asg-inactive-mark: {op}@{action_date}"
days: 5
In a separate policy, we can then filter for resources which were marked for a specific action
at or before the current date/time with the marked-for-op filter:
filters:
- type: marked-for-op
tag: c7n-asg-inactive
op: delete
That example will filter all resources that were marked for deletion at
or before the current time, with the c7n-asg-inactive tag.
The skew parameter on the marked filter skews the current date by adding a number of days to it.
This allows us to filter for resources that are marked for an operation N days in the future, i.e.
to send out a warning notification ahead of time. The following filter will match the same
resources as the previous example, but two days before that example.
filters:
- type: marked-for-op
tag: c7n-asg-inactive
op: delete
skew: 2
The combination of these actions and filters are commonly used to build a “group” of four complementary policies:
A
-markpolicy matches desired resources with a filter and uses themark-for-opaction to tag them for action at a later date. Note that it is extremely important to make sure the policy also incldes a filter to exclude resources that already have the marking tag present; if not, the date to take action will continually move forward every time the policy runs, and the action will never be taken.An
-unmarkpolicy matches resources that have themarktag present but no longer meet the desired criteria, and removes the mark tag from them. For example: if we’re writing a policy to identify and terminate EC2 instances lacking required tags, the-unmarkpolicy would match resources that were previously marked by its counterpart (1) but now have the required tags, and would remove the marking tag from them.An early-action policy using
skewthat warns owners of impending action, and may take some preliminary action (i.e. stopping an EC2 instance a few days before it will be terminated).A termination/deletion policy that takes the final action.
Mode¶
We have standardized on deploying our policies as Lambda functions, to take advantage of c7n’s excellent
:std:label:`cloud custodian:lambda`. The type key of the mode section
of the policy defines how the policy will be deployed and executed.
defaults.yml should specify everything needed to deploy a policy in periodic mode. If the mode section is completely
omitted from a policy, the default periodic mode will be applied.
Supported mode type options for Lambda functions include:
periodic - (default for our policies) runs on a set schedule using timer-based CloudWatch Events as a trigger.
cloudtrail - runs every time a CloudTrail event of a certain type is received. Note that tags may not have been applied to resources yet when this triggers.
ec2-instance-state - runs every time an EC2 Instance enters the specified state (e.g.
running,stopped,pending, etc). Note that tags may not have been applied to instances yet when this triggers.config-rule - triggers via AWS Config rules. Note that not all resource types are supported by AWS Config; see the AWS Config - Supported Resources documentation for a list of which resource types are supported.
For full documentation on the required and optional configuration keys for each mode, see the upstream documentation.
Other keys under the mode section include:
role - the IAM role that the policy executes under. They should all use the same terraform-managed role.
tags - Tags to apply to the Lambda function.
policygen.pywill add the policy name as theComponenttag.timeout - The timeout, in seconds, for the Lambda function. This should be left at the default (maximum) of 300.
execution_options - Internal options of the Lambda function. Our defaults send logs to a CloudWatch log group and output to an S3 bucket, and setup the Dead Letter Queue.
Disabling a policy¶
It is possible to disable a policy. Simply setting the disable key in a policy to true will stop that policy from being
deployed.
name: onhour-start-ec2
comments: Start tagged EC2 Instances daily at 06:00 Eastern, or per tag value
resource: ec2
disable: true
The disable key can be added to an existing policy to temporarily disable the policy, and can also be used to disable a
policy inherited from a higher-level policy source location by creating a new policy with the same name as the inherited
policy and adding the disable key. Only the name and disable keys are required in this case, though adding about
comment can help explain why the policy is disabled.
name: onhour-start-ec2
comments: Disabled due to ...
disable: true
Data Collection/Notification to Action Transition¶
A common pattern that we use when testing new policies is to set up some
policies to either only send email notifications or to only collect data,
analyze that data, and then enable real actions (i.e. stop,
terminate, delete, etc.) after some data has been collected. However it
is very important to note that if a “testing only” policy used the
mark-for-op action to tag a resource for later action, and actions
are later enabled for corresponding policies, the actions might be taken
immediately when enabled as a result of the “notify only” policies
marking resources for action.
As of version 1.3.0, manheim-c7n-tools supports a Notify-Only Option for Policies flag to help simplify this transition. For older versions, or policies that existed prior to 1.3.0, see the following section on manual tag cleanup.
Manual Tag Cleanup¶
As a result, when adding actions to policies that have been running in data collection mode, it’s important to manually purge the relevant tags so the policies don’t take any action based on tags applied during data collection.
For example, if you’re adding a “delete” action to policies that were previously only collecting data and included a mark action like:
- type: mark-for-op
tag: c7n-foo-policy
op: delete
message: "foo-mark {op}@{action_date}"
days: 7
Before enabling the real delete action, you should purge all of those tags with something like (example for EC2 instances):
TAGNAME=c7n-foo-policy
for i in $(aws ec2 describe-instances --filters Name=tag-key,Values=$tagname --output text --query 'Reservations[*].Instances[*].[InstanceId]')
do
echo "removing tag from: $i"
aws ec2 delete-tags --resources $i --tags Key=$tagname
done
Notify-Only Option for Policies¶
As described above in Data Collection/Notification to Action Transition, it’s common to want to run new policies in a “notify only” mode that sends notifications (and collects data) but does not yet take actions, assess those notifications, and enable actually taking action at a later date.
To support this, manheim-c7n-tools (specifically Policygen) supports the addition of a boolean notify_only option at the top level of policy files, or in defaults.yml for account- / repository-wide notify-only. Setting this flag will cause Policygen to pass the effected policies through NotifyOnlyPolicy for pre-processing. This will cause the following changes to the final YAML policy:
The
comment/comments/descriptionfields will be prefixed with the stringNOTIFY ONLY:If the policy has a
tagslist, anotify-onlytag will be appended to it.All tagging actions will have the string
-notify-onlyappended to their tag names, to automate the above-described transition. Specifically:Any
markortagactions in the actions list will have the string-notify-onlyappended to theirtagorkeyvalues (if present) or appended to every item in theirtagslist (if present). If none of the above are present, thetagitem will be set to custodian’sDEFAULT_TAGvalue, with-notify-onlyappended.Any
mark-for-opactions will have the string-notify-onlyappended to theirtagvalue. If they do not already have atagvalue, it will be set to custodian’sDEFAULT_TAGvalue, with-notify-onlyappended.Any
remove-tag/unmark/untagactions wukk have the string-notify-onlyappended to all items in theirtagslist.
All
notifyactions will have theirviolation_desc, if present, prefixed withNOTIFY ONLY:. Theiraction_desc, if present, will be prefixed within the future (currently notify-only).Any
filtersitems withtag:NAMEkeys, which match up withNAMEtags used inmark-for-opactions, will be updated totag:NAME-notify-onlyto retain their intended functionality.All other action types, not listed above, will be removed from the policy. We enforce notify-only by only retaining specifically whitelisted actions in the policy.