Sending EC2 logs to S3 via lifecycle hooks

Shishir Khandelwal
8 min readDec 24, 2021

--

An EC2 may get terminated for a variety of reasons. And since it usually runs something crucial for the applications such as frontend, backend, or the database, it’s important to figure out what actually went wrong with it.

Therefore, as a DevOps/SRE/SysAdmin, it’s a good idea to have logs and other important files available as a backup somewhere safe before the VM gets terminated completely.

One of the ways to approach this is to use the lifecycle hooks of an EC2 to trigger the action of transferring log files from the terminating VM to an S3 bucket.

In this article, I am going to talk about the various configurations and steps to be followed in order to make this idea possible.

After spending some time with this problem and trying out different things for it, I am confident that it will be a good project for anyone looking to learn about LifeCycle hooks inside AWS.

Just so we are on the same page,

SSM = Systems Manager.
ASG = Auto Scaling Group.

Steps for setup

The diagram below displays the architecture that we would be creating.

The steps, in brief, would be as follows:

  1. Create instance roles for providing access SSM access to EC2s.
  2. Creating a launch configuration for launching the EC2s.
  3. Create an ASG to spin up EC2s.
  4. Create a ‘termination’ lifecycle hook for the ASG.
  5. Create a Lambda function
  6. Create an event to trigger the Lambda function using Eventbridge.
  7. Create an S3 bucket where logs would be stored.
  8. Create an SSM document to run a shell script that does S3 operations inside the EC2.
  9. Update EC2 instance IAM role
  10. Configure Lambda function for listening to Eventbridge events & run SSM document.

Step 1

In order for SSM to be able to run commands on an EC2, it requires the EC2 to have a role with some specific permissions. It can have other permissions as well, but the permission “AmazonSSMManagedInstanceCore” is compulsory.

So, let’s create one.

Since this role would be applied on an EC2, it must have “EC2” as its trusted entity.

Let’s name the role as “ssm-ec2”.

Step 2

Since our goal is to deal with the lifecycle hooks and s3 operations, we can set up a bare minimum launch configuration. I used the following configurations, you may change this according to your use case.

AMI: ami-04505e74c0741db8d (It’s ubuntu 20 in us-east-1 region)
Instance Type: t2.micro
IAM Instance Profile: “ssm-ec2” (It’s the IAM role we created in Step 1)
No extra storage volumes.
Security group with 22 open for the world.
(I usually create a keypair file as well, in case, I need to SSH and debug something)

In case, you are a beginner trying to create a project like this — I suggest you follow along with the same configuration as mine.

To do the S3 operations, the plan is to do it via the aws-cli. Let’s add the installation steps of aws-cli to the launch configuration so that it’s already installed when a new EC2 is launched.

#!/bin/bash
cd /home/ubuntu/
sudo apt update -y
sudo apt install unzip
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Add this script under “advanced details” -> “userdata”.

Let’s name the launch configuration “webapp-lc” & create it.

Step 3

Let’s create an ASG with the newly created launch configurations. There’s no requirement of a load balancer and instance count can be kept to a minimum of 1.

Here are the configurations that I used -

Name the ASG as “webapp-asg” & create it.

Step 4

Before creating a lifecycle hook, it’s important to understand what they are & how they work.

What are lifecycle hooks?

Lifecycle hooks let you pause an instance to perform custom actions whenever an EC2 starts or is terminated.

How does it work?

In order to understand it’s working, let’s first try to understand the behavior of EC2s without a lifecycle hook.

The diagrams below depict the behaviors. It’s pretty straightforward.

Now, let’s see how lifecycle hooks intercept the states to run user-defined custom actions.

From the diagrams, you can make out that lifecycle hooks intercept the states to run custom actions. After the actions are run, they expect a command to be invoked which instructs the EC2 to move to the next state.

Command

The command looks like this:

aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id i-1a2b3c4d --lifecycle-hook-name my-launch-hook --auto-scaling-group-name my-asg

Heartbeat timeout

The amount of time, in seconds, for the instances to remain in a wait state.

In case this command is not run, the lifecycle would remain paused for this many seconds & then proceed ahead.

Creation

We know pretty much everything about lifecycle hooks. Let’s create one now by going to ASG -> Instance Management.

Since we want to take custom action (of copying log files to the S3 bucket) at termination, we will create a “termination” lifecycle hook.

Let’s name it “e” (as in “ending”).

Step 5

Let’s create a Lambda function with the following settings. Let’s name it “ec2-lifecycle”.

We in the future steps, I will add some more permissions to the IAM role of this lambda function.

Step 6

What is eventbridge?

EventBridge is a serverless event bus that makes it easier to build event-driven applications using events generated by other AWS services.

What’s an eventbridge rule?

An EventBridge rule watches for certain events and then routes them to AWS targets that you choose. You can create a rule that performs an AWS action automatically when another AWS action happens.

Creation

Let’s create a rule “eventbridge-ec2” that watches out for all EC2 terminate events. The below configurations are appropriate for our use case.

Step 7

Create an S3 bucket where the logs would be stored when instances terminate.

I had named my bucket “shishir-personal-04-b”.

Step 8

In this step, we are going to create an SSM document to run a shell script on the EC2. The goal is to copy the log files to an S3 bucket.

Let’s start creating the document by going to AWS-SystemManager -> Document -> Create Document -> Command option.

Parameters

Let’s define some parameters since we are going to trigger this SSM document via Lambda functions. These parameters would be used to execute the lifecycle hooks complete command as discussed in step 4.

  • hookname: Name of the lifecycle hook. “e” in our case.
  • asgname: Name of the ASG. “webapp-asg” in our case.
  • instanceid: ID of the instance for which the command needs to be executed. This would be present in the eventbridge event and would be passed to the SSM document via the Lambda function.

Commands

Find command can be used to filter out the log files that need to be copied to the S3 bucket created in step 7.

for i in `find /var/log -maxdepth 1 -type f -name '*.log'`; do echo $i; /usr/local/bin/aws s3 cp $i s3://shishir-personal-04-b/; done

ASG’s complete lifecycle action command will be using the defined parameters -

aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id {{ instanceid }} --lifecycle-hook-name {{ hookname }} --auto-scaling-group-name {{ asgname }}

Document

The document would look something like this — let’s save it as “ssm-poc”.

{
"schemaVersion": "2.2",
"description": "Command Document Example JSON Template",
"parameters": {
"hookname": {
"type": "String",
"description": "hook_name",
"default": "hook_name"
},
"asgname": {
"type": "String",
"description": "asg_name",
"default": "asg_name"
},
"instanceid": {
"type": "String",
"description": "instance id",
"default": "none"
}
},
"mainSteps": [
{
"action": "aws:runShellScript",
"name": "example",
"inputs": {
"runCommand": [
"#!/bin/bash",
"for i in `find /var/log -maxdepth 1 -type f -name '*.log'`; do echo $i; /usr/local/bin/aws s3 cp $i s3://shishir-personal-04-b/; done",
"aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id {{ instanceid }} --lifecycle-hook-name {{ hookname }} --auto-scaling-group-name {{ asgname }}"
]
}
}
]
}

Step 9

Since the SSM document’s commands run on the EC2 and the command involves S3, ASG actions - It’s important to add permissions for these operations to the EC2 instance IAM role we created in Step 1.

Step 10

Until this point, our Lambda function has a triggered configured but no action. We need to write code that extracts data such as instance-id and run the SSM document that we created in Step 8.

I will be using python along with the boto3 library to invoke the ssm document from the lambda function. The below code extracts the data and invokes the command using the ssm_client object. It also prints the response from SSM.

import json
import boto3
import time
ssm_client = boto3.client('ssm')
def lambda_handler(event, context):
ec2_instance=event['detail']['EC2InstanceId']
document_name='ssm-poc'
document_version='1'
response = ssm_client.send_command(InstanceIds=[ec2_instance],DocumentName=document_name,DocumentVersion=document_version,TimeoutSeconds=300,Parameters={'hookname': ['e'], 'asgname': ['webapp-as'], 'instanceid': [ec2_instance]})

command_id = response['Command']['CommandId']
time.sleep(5)

output = ssm_client.get_command_invocation(
CommandId=command_id,
InstanceId=ec2_instance
)
print(output)

return {
'statusCode': 200,
'body': json.dumps(output)
}

But this isn’t enough. Currently, lambda has only the basic permissions. We need to give more permissions to lambda.

To do so, go to the “Configurations” of the function and then go to the IAM role. We need to add “AmazonSSMFullAccess” permissions to this role so that it can trigger the SSM document.

That’s it. The setup is complete.

Steps for testing

  • Set the instance count in the ASG to 0 to terminate the event. This should create an event bridge event, which would be sent to the Lambda function where the SSM document would be invoked.
  • To check the state of the SSM document invocation, go to “Systems Manager” -> “Run Command” -> “Command History”.
  • In case it shows a failure, check the output to get some hint about what is going wrong. If you get stuck, leave a comment on this post.

Verification

To verify, check out the contents of the S3 bucket.

Final Words

If you found this post helpful & knowledgeable, be sure to follow & leave lots of 👏🏻 Claps 👏🏻 It encourages me to keep writing and helps other people in finding it :)

I share tips, experiences & articles on my Linkedin Account. You’ll love it if you are into Cloud, DevOps, Kubernetes, Integrations, etc. Follow me on LinkedIn — https://www.linkedin.com/in/shishirkhandelwal/

--

--

Shishir Khandelwal

I spend my day learning AWS, Kubernetes & Cloud Native tools. Nights on LinkedIn & Medium. Work: Engineering @ PayPal.