AWS Systems Manager or SSM to manage infrastructure

Hello folks. This is Utpal and in this article, I have discussed AWS Systems Manager. Please read the entire article.

AWS Systems Manager (SSM) is an AWS management tool using which we can manage EC2 instances, On-Premise systems at scale. We can get into the infrastructure state and detect problems. Maximum services in AWS Systems Manager are free.

To manage EC2 instances or On-Premise systems we need to install SSM agents on them. They report and communicate with SSM service. For this, we have to give proper IAM permission.

There are many services-

Session manager,
Run command,
Patch manager,
Maintenance window,
Automation,
State manager,
Parameter store,
AppConfig,
Change Calendar etc.

Here is an example-

I have created an IAM role with Amazon managed policy- AmazonEC2RoleforSSM.

Then I have launched 3 EC2 instances with Amazon Linux 2 AMI, t2.micro type, and with the just created IAM role. So SSM agents are already installed.

Now after they are launched, they have appeared with ‘Online’ status in the ‘Managed Instances’ window of SSM. It will not work without a proper IAM role.

To create, view, and manage the resources based on different logical groups, we can use tags. Using tags, we can group the resources, do automation, and allocate cost. Here I have added a tag with ‘environment’ as key and ‘Dev’, ‘prod’ as values. Then in the Resource Group menu (at top of the console) clicked on ‘create a group’.

There we can group the resources based on tags or CloudFormation stack. I have chosen tags here. Then in the grouping criteria section, there are many options in the drop-down menu. I have chosen AWS::EC2::instance. Then chosen ‘environment’ as a tag key and ‘Dev’ as value.

Then in the group resources, the two instances (tagged with Dev environment) appeared. I have given the group name ‘MyDevInstances’. Then clicked on the Create group. Similarly, we can create many resource groups as needed.

In SSM, we can write documents (script) in JSON or YAML to define parameters, actions. Documents can be applied to State manager, Patch manager, Automation, Run command.

AWS has created many documents for us. We can view them in the option under Shared Resources at the left menu of SSM. There are groups of documents- Owned by Amazon, owned by me (which we create), shared with me. There are many document types, platform types, and versions.

We can create our document by clicking on ‘Create command or session’. I have created a document with the name ‘ApacheInstallandConfig’ and type ‘Command document’ (also there are Policy and Automation types). Written the content in YAML format.

After creation, I can view it in the group ‘owned by me’. By clicking on it, I can see it’s description, content, versions, and details.

To execute commands on many EC2 instances or On-Premise servers based on resource groups we need to execute the documents. We can also control the rate and error. To run commands using ‘SSM Run command’ we don’t need to open the SSH port on the security group. Using the console, we can do that and view the results. This service is tightly integrated with IAM & CloudTrail.

To run the command with the document either click on the ‘Run command’ option at the description of the document or click on ‘Run command’ from the left menu. Here we can choose documents based on it’s name prefix, owner, platform types, and tag key. I have chosen ‘Owner: Owned by me’ and default version-1. Then in the Command parameters section entered 'Hello from Utpal'.

Then in the target section selected ‘Choose instances manually’ for selecting the targets. Here we can also select 'Specify instance tags' or 'Choose a resource group'.

In the Other parameters section entered 600 seconds timeout value.

In the Rate Control section, the concurrency specifies how many numbers or percentage of targets, the commands, or tasks can be executed simultaneously. I have chosen 1. The 'Error threshold' specifies in how many numbers or percentage of targets if the task failed, then the task execution will stop. I have chosen 1 here.

In the 'Output options' section, we can send complete output to an S3 bucket. Only the last 2500 characters of output are displayed in the console. For this, we need to give S3 write permission to the EC2 instances. Or we can write command output to Amazon CloudWatch logs. I have entered 'CustomRunCommands' as the Log group name.

We can optionally enable SNS notification. Finally, AWS provides the CLI command to reproduce this run in the future. Clicked on Run.

The command has been started on 3 targets. Click on refresh to view progress. In the console, we can view the Command description and parameters.

Also, we can view the output with the public IP address of the instances (for this open HTTP port on the instance security group).

In SSM, Inventory lists the software on an instance i.e. it gives information about the software running on the instances. We can patch the software using Inventory and Run Command. Also, to patch the OS on the EC2 instance we use the Patch manager & Maintenance window.

We can find the Inventory console from the left menu of AWS SSM. Then in the ‘Managed instances with inventory enabled’ section click on the link to enable inventory on all instances.

After clicking we can see that the red color (disabled) is turned into green (enabled). Then click on view detail. In the description, we can see that the 'AWS-GatherSoftwareInventory' document has run. The Status will be initially pending but after some time, it becomes successful. The Schedule expression rate (30 minutes) is 30 minutes i.e. in every 30 minutes the instances will report to SSM about the software running on them.

Again, in the inventory dashboard, we can view the details- inventory coverage per type, top 5 OS Versions, top 5 Applications etc.

At the bottom, we can view the corresponding managed instances. By clicking on any instance, we can view it’s description, inventory information, tag, associations, patch, and configuration compliance. In the inventory tab, we can view all the software installed on the instance with version, publisher, type, installed time, architecture & URL. We can even search and filter.

In the Patch manager console, we can configure patching for OS. By clicking on ‘Configure patching’ we can view the configuration page. Patch Manager uses Run Command to patch the instances. There select the instances to be patched based on tags, patch group, or select manually. I have selected the instances manually.

In the ‘Patching schedule’ section we can specify, schedule, but here I have selected ‘Skip scheduling and patch instances now’.

In the ‘Patching operation’ section we can select 'Scan and install' or 'Patch only the target instances I specify'.

After clicking on ‘Configure patching, we can view the patch baselines.

When I have gone to the Run Command console and clicked on ‘Command history’, I can see the 'AWS-RunPatchBaseline' has succeeded. By clicking on the command ID, I can view details.

By clicking on the 'Compliance' option under 'Instances & Nodes' we can view the outcome of scan i.e. Compliance resources summary, Details overview for resources.

SSM session manager is a service that allows SSH on instances without using SSH access (no need to open port-22) or any bastion hosts but using the SSM agent. All the actions done through the SSM session manager can be logged in CloudWatch logs or in an S3 bucket. All we have to do is enable CloudWatch logging and give IAM permission to EC2 instances to access SSM, S3, CloudWatch. We can audit the session by AWS CloudTrail.

At the left menu under Instances & Nodes, click on Session Manager. Then click on the preferences tab where we can edit and setup S3 and CloudWatch logging. There we can enable encryption using AWS KMS. We have to choose the S3 bucket and for CloudWatch choose the log group, then click on Save. Hence the CloudWatch log stream will be enabled.

Now to start a session click on Start session in the Sessions tab and after selecting the instance again click on Start session. Then a browser-based terminal will be opened.

There we can enter commands and perform intended actions. After completion, click on Terminate at the upper right corner. A confirmation window will appear and click on Terminate.

We can view all the sessions in the Session history tab.

In the CloudWatch console click on Logs. Then by clicking on the log group name which has been selected in Session Manager, we can view the log details.

At the left menu under Actions & Change click on Automation. There we can execute automation documents. There are four execution modes- Simple execution, Rate control, Multi-account and Region and Manual execution.

In the Simple execution, we can run the automation on one or more targets by specifying the resource ID for those targets.

In the Rate control mode, we can run the automation across a fleet of resources. For this, we have to specify tags or AWS Resource Groups.

In the Multi-account and Region mode, we can execute run the automation in multiple accounts and Regions.

In the Manual execution mode, we can control when the workflow proceeds. Here the Automation workflow starts in a Waiting status and between each step pauses in the Waiting status.

SSM Parameter Store is a scalable, durable, serverless store where we can store configuration data and secrets securely in a hierarchical manner. We can encrypt those using AWS KMS. We can track the saved data using the version. We can restrict access on the SSM Parameter Store using IAM. We can even enable notification with CloudWatch events.

We can find it’s console from the left menu of SSM or just by searching in the Services menu at the top of the AWS console. By clicking on the Create parameter we shall get the configuration page. There are two tiers- Standard and Advanced. There are three types- string, stringlist, and securestring (encrypted using KMS). Also, we can optionally add a tag.

After clicking on the Create parameter we can see it. By clicking on the name we can see the overview.

AWS AppConfig is used to create, manage, and deploy application configurations, in a controlled manner, from a central location for the applications hosted on EC2 instances, AWS Lambda, containers, mobile applications, or IoT devices. We can reduce errors in configuration changes, update applications without interruption. AppConfig also offers built-in validation checks and monitoring.

First, an IAM role has to be created which allows AppConfig to monitor CloudWatch alarms (“Cloudwatch:DescribeAlarms”).

From the left menu under Application Management, click on AppConfig. Then click on Create application.

Then we have to create an environment that is a logical deployment group of AppConfig targets.

Click on Create environment and enter name, description. Then select the created IAM role for AppConfig and the CloudWatch alarm/alarms to monitor and click on Create environment.

Next, we have to create configuration profile which allows AppConfig to access the configuration stored in a location.

At the 1^st step, enter the name and optionally description, and tag.

At the 2^nd step, select the source of configuration. I have selected AWS Systems Manager Parameter and entered the parameter previously created on SSM Parameter Store.

At the 3^rd step, select the service role, and the validator. Validator provides a syntactic or semantic check to ensure the configuration will work as intended. I have selected AWS Lambda with the latest version. Note: we have to add a resource policy that allows AppConfig to invoke the Lambda function.

Finally, click on Create configuration profile. We can view the details and version.

Then click on start deployment. Select the environment, parameter version, deployment strategy; add a description, and optionally tag. Here we can also create a custom deployment strategy. AppConfig monitors the CloudWatch alarms for the defined time interval. If no alarms are received in this interval, the deployment is completed. But if an alarm received, the AppConfig rollback the deployment. Finally, click on Start deployment.

During this deployment, AppConfig monitors the application, and if an error occurs then roll back the change to minimize impact to the application users. It shows complete when the configuration profile fully deployed.

SSM Maintenance Window is a service using which we can schedule various tasks like patching OS, updating drivers, installing software, creating AMIs etc.

At first, a schedule is defined and the duration is specified. Then target instances are registered based on tags or resource groups or manually. Then tasks (Run Command, Automation, Lambda function) are registered.

On the left menu under Actions & Change, click on Maintenance Window, and then Create maintenance window. Enter the name and description.

At scheduling section, I have selected Cron schedule builder and selected the time as every Saturday at 2:30am. Then selected the maintenance window duration as 2 hours and after 1-hour stop initiating any more tasks. You can also select the date & time to start & stop the maintenance window. Then selected the time zone and clicked on Create maintenance window.

Then after it has been created clicked on it to view the description. Then from the Actions menu at the upper right corner clicked on Register targets.

Here, entered the target name, description. Then selected the instances manually. Finally clicked on Register target.

Then again from actions menu at the upper right corner clicked on the Register Run Command task.

Here I want to update the SSM agent on the EC2 instance. So entered name & description. From the Command document selected the 'AWS-UpdateSSMAgent' document with the default version.

At the Targets section, Selected registered target groups, and then the target ID.

At the Rate control section entered 1 as both the concurrency & the error threshold.

Then selected Create and use a service-linked role for Systems Manager.

At output options, I have enabled writing to S3 and entered the bucket name. Optionally you can enable SNS notification to send notifications about command statuses. I haven’t selected any parameter version and leave this as default.

In the Tasks tab, it can be viewed.

AWS Change Calendar is a fully managed service used to create a calendar and define important business events. It can also whitelist/blacklist specific times/days. There are two types- open and closed calendar. By defaults the open calendar allows actions and the closed calendar denies actions. We can create a master change calendar to use it across multiple AWS accounts.

At first, a calendar is created. Then schedule events are added to whitelist or blacklist a specific time frame. Then using the SSM API or automation playbook we can query if the calendar state is opened or closed.

On the left menu under Actions & Change, click on Change Calendar, then Create change calendar. Then enter name, description, and select whether it is open or closed type. Click on Create calendar.

Then after it has been created select it and it looks like this-

To define an event, click on Create event. Then enter the event name & optional description. Select the start & end date & time. And select the time zone. Then click on Create schedule event.

Then we can view the event dates in the calendar.

There are many more services in the AWS Systems Manager. In this article, I haven’t discussed them all. You can learn all of them from AWS documentation and official AWS YouTube page.