Serverless cloud infrastructure on AWS

Hi folks, this is Utpal and in this article I have discussed Serverless architecture on AWS with multiple projects, explanations and use cases.

NOTE: I like to say that here in this article, many pictures or screenshots are not clearly appeared. You have to just click on them, and they will be opened clearly in a new tab.

Before begin, we should clear some primary concepts-

API or Application Programming Interface is a set of protocols that allow applications or services to communicate with each other in a programmatic way without knowing how they are implemented. It helps in business and IT team collaboration by helping developers to integrate application components into an existing infrastructure.

While using an app which connects to a server, we want some data from it. The app send request to the server with all required information (as a part of URL, query-string parameters, body, event header). The server understands and processes the request, creates an object of the requested data and send the value of the object as a response in a structural format (JSON or XML). For example, in a news app if we request the score of a team in a cricket match at any time then an object is created on the server side and we get the value or state of the object in JSON or XML format. This is Representational State Transfer or REST. It is an architectural style based on resources which helps in leveraging less bandwidth usage.

There are four HTTP methods in a REST API to do CRUD (Create, Read, Update, Delete) operation, they are- Post, Get, Put and Delete.

To provide better performance some of the apps are made cacheable. For those responses which are defined as cacheable, the client cache reuses the response data for equivalent responses.

REST API can be created in multiple ways, in this article I have created this in a serverless way.

Now the question is why serverless?

As the name suggest, it is server less for us. Here we construct and run our applications & services without managing the underlying infrastructure. It is an architecture of the cloud where our operational responsibilities are shifted to the cloud service providers and they charge only for the used amount of resources. That’s why serverless is gaining popularity day by day.

The application codes run inside stateless containers. The containers can be triggered by various events for example HTTP requests, database events, queuing services, monitoring alerts, file uploads, scheduled events etc.

Usually the codes are sent to the cloud providers in the form of a function. That’s why serverless is referred to as ‘Functions as a Service (FaaS)’. Examples are AWS Lambda, Azure Function, GCP Function.

One drawback of serverless architecture is that, it is dependent on one specific cloud provider. To overcome this, provider agnostic or multi-provider applications are needed. For this, Event Gateway (an open source tool) can be used which allows us to react to any event with serverless functions across providers by making different cloud providers compatible with each other.

Now let dive into more depth.

In serverless architectures and microservices, event-driven architecture is used which is a form of asynchronous service-to-service communication. Here the function code is executed in response to an event such as a change in state or an endpoint request. In event-driven architectures, state and code are decoupled; here the integration between components is typically done with messaging to create asynchronous connections. Asynchronous connections are key to creating a resilient serverless architecture, and they contribute to a good customer experience.

Asynchronous connections-

Decouple the user experience and code execution.
Provide reduced latency on HTTP responses.
Reduce the potential for timeouts or deadlocks.

Serverless applications are event-driven and use technology-agnostic APIs with decoupled communications.

Amazon SQS and SNS are used to decouple components.

Amazon SQS or Simple Queue Service is a secure, durable queue service that integrate and decouple the components of distributed architecture. One or more producers send messages to the queue and one or more consumers poll the messages from the queue. There are two types- Standard and FIFO queue.

Amazon SNS or Simple Notification Service is a pub-sub messaging service. In SNS when an event gets published, it sends the message to each subscriber. Subscribers do not need to know anything about the publishers, and vice-versa other than the publisher is a legitimate publisher and subscriber is entitled to receive information.

The best practices of event-driven architecture are-

Use managed services when possible.
Don't just copy code from other applications to run it in Lambda. Instead apply event-driven thinking.
Be aware of updates in AWS cloud. As services and available serverless applications evolve quickly, there might be an easier way to do something.
Don’t write code for the communication between the AWS services. Instead, let them talk to each other directly whenever possible.
Verify the limits of all the involved services.
No point-to-point integrations.
Make high performance and highly scalable systems.

Before discussing more, let’s see a simple architecture-

I have used AWS cloud to build a serverless REST API with these services- S3, API Gateway, AWS Lambda and DynamoDB. To create an asynchronous connection between the API Gateway and Lambda, introduce SQS in front of Lambda. This allows to satisfy the API request without regard for how long the Lambda function (or other back-end services) will run. But in this example, I haven’t used this.

The typical architecture is depicted in the diagram-

At first a REST API is created in API Gateway [us-east-1 (North Virginia) region]. Then from the Action menu a resource is created with the name ‘addstudentdetails’ with CORS enabled.

Next, I have created an IAM role with ‘DynamoDBwriteOnly’ (custom policy) & AWSLambdaFullAccess policies and named the role- ‘addStudentDetailsonDynamoDB’.

In the AWS Lambda console selected us-east-1 as the region, a new function is created with the name ‘addStudentDetails’, runtime as Node.js 10.x and choosing the created role ‘addStudentDetailsonDynamoDB’ as its execution role. Be careful while creating and choosing execution role for Lambda function because too many ‘access denied’ or ‘function don’t have permission to do this’ errors happen due to improper permissions given in the execution role (NOTE: here I have created custom policy). Also, as per AWS best practice (Principal of least privilege) only give the required permissions.

After creation, the function code (Node.js) is written and clicked on Save at the upper right corner. As this is a simple code, so I have written it in the console. You can get this code from my GitHub repository.

Next, opened DynamoDB console to create a table named ‘StudentDetails’ and ‘Roll’ as the partition key with type string. Here in this project, I haven’t selected any short key and use default settings.

After creation went to ‘Item’ tab and created a dummy item from the ‘Create item’ option. Then the table looks like this-

Next to test the Lambda function, again went to Lambda console, clicked on Test at the upper right corner and a floating window appeared. There selected ‘Create new test event’, named this ‘test’ and inserted one dummy student’s details in JSON format for testing. Then clicked on ‘Save’ and clicked on ‘Test’ and it’s succeeded.

And also new DynamoDB item created with the given value.

Next at API Gateway console within the ‘addstudentdetails’ resource, created POST method from the Action menu. Selected Lambda Function as Integration type, us-east-1 as region and ‘addStudentDetails’ as the Lambda Function.

Now I have to configure the validation of query string parameters. So that if someone is going to access the API without query string parameters then it will not work. I have clicked on Method Request, changed the Request Validator into 'Validate query string parameters and headers', then under ‘URL Query String Parameters' added Roll as Name and ticked the Required checkbox. If you want, you may also enable Caching.

I want to map the body as I want, depending on input parameter through Body Mapping Template. I have clicked on Integration Request, then under Mapping Templates selected ‘When there are no templates defined (recommended)’, then in the application/json written this code-

{

“Roll”: “$input.params(‘Roll’);

“FirstName”: “$input.params(‘FirstName’);

“LastName”: “$input.params(‘LastName’);

“Department”: “$input.params(‘Department’);

“Year”: “$input.params(‘Year’);

}

You have to put these as per your parameters.

Then deployed it from the Action menu into a stage named ‘Dev’.

After deployment a URL is generated in the format- https://restAPI_id.execute-api.region.amazonaws.com/stage at the end added the resource (/addstudentdetails). Then opened Postman, selected POST method, pasted the URL. At the ‘Params’ tab added Roll as key and 10303 as value. At the Body tab entered another dummy student’s details in JSON format for testing. Clicked on Send, and it succeeded.

[NOTE: In Postman’s params tab, you have to pass the parameter configured in Method Request. Also, in Body tab, you have to pass the parameters configured in Body Mapping Template. Or you will get errors like “Missing required request parameters: [****]”, “Could not parse request body into json: ……………………….”]

In Lambda console, I have edited the code and added a new attribute ‘mobile’ and then added versions of the function. For testing added version 1 & 2. Then added an alias named ‘addstudentinfo’ pointing to the second version.

NOTE: here we can use traffic shifting i.e. send some weights (%) of traffic to newer version and rest to current stable version. This is good practice for production. We can change the pointed version of the alias from Lambda management console as well as CLI. The command is-

C:\Users\user> aws lambda update-alias --function-name NAMEofFUNCTION --name ALIASNAME --function-version VERSIONNUMBER

On success, the result is like this-

{
"AliasArn": "arn:aws:lambda:REGION:ACCOUNTID:function:NAMEofFUNCTION:ALIASNAME",
"Name": "ALIASNAME",
"FunctionVersion": "VERSIONNUMBER",
"Description": "",
"RevisionId": "REVISION_ID"
}

Then I went to the API Gateway console and clicked on ‘Stages’ from the left menu. Added a stage variable named ‘collegealias’ with value ‘addstudentinfo’ (the alias name of Lambda function) in the Dev stage. Then clicked Integration Request at the Resources section at the left menu. Edited the Lambda function and at the end of the function name, added :${stageVariables.collegealias}

The form of the name must be like these – functionName:${stageVariables.NAMEOFVARIABLE} or

${stageVariables.NAMEOFVARIABLE}

[Carefully and correctly configure the stage Variable, or you will get error. I have also got this error previously-

{
"logref": "some_uid",
"message":"invalid stage variable value: null. please use values with alphanumeric characters and the symbols ' ', -', '.', '_', ':', '/', '?', '&', '=', and ','."
}]

Then while saving it API Gateway prompted to run the CLI command to add lambda:InvokeFunction permission. I have run this command from CLI and got success statement-

C:\Users\user> aws lambda add-permission --function-name "arn:aws:lambda:us-east-1:myaccid:function:addStudentDetails:addstudentinfo" --source-arn "arn:aws:execute-api:us-east-1:myaccid:dn0oexij07/*/POST/addstudentdetails" --principal apigateway.amazonaws.com --statement-id 96f653df-0cdb-4c21-9992-e1124cb5cb00 --action lambda:InvokeFunction

Result-

{
"Statement": "{\"Sid\":\"96f653df-0cdb-4c21-9992-e1124cb5cb00\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"apigateway.amazonaws.com\"},\"Action\":\"lambda:InvokeFunction\",\"Resource\":\"arn:aws:lambda:us-east-1:myaccid:function:addStudentDetails:addstudentinfo\",\"Condition\":{\"ArnLike\":{\"AWS:SourceArn\":\"arn:aws:execute-api:us-east-1:myaccid:dn0oexij07/*/POST/addtudentdetails\"}}}"
}

Though stage variable is not needed in this project, I have used it.

Then in the Integration Request, appended corresponding template-code of mobile in the Body Mapping Template, and deployed it. Again, tested this from Postman and it succeeded.

To add API key, I clicked on ‘API Keys’ at the left menu. Entered name & description and choose “Auto Generate’ API key (you can choose Custom if needed). Then clicked on save. Then it shows the created key where I have clicked Show at the right side of API Key. Copied the key and save it in notepad.

Then clicked on Usage Plan >> Create. Entered name and description, enabled throttling with the rate of 100 requests per second and burst as 50 requests. Also enables quota of 10000 requests per month, then clicked on Next.

On the next page added the API stage (Dev) and Configure Method Throttling and clicked on tick sign followed by Next.

On the next page added the previously created API key and clicked Done. It shows the details of newly created usage plan. Finally, in the method request under Resources section made the API Key Required option to true. Don’t forget to do it.

Now from Postman, tried to add student information by POST method without the API key and got forbidden error.

But when in Headers tab, added x-api-key as key and the API key (saved in my notepad previously) as value then the POST method succeeded. Also, item added in DynamoDB table.

Similarly, I have created another Lambda function with the name ‘getStudentDetails’ to retrieve data from DynamoDB table, runtime as Node.js 10.x and choosing the created role ‘getStudentDetails’ as its execution role (with ‘DynamoDBReadOnlyAccess’ & AWSLambdaFullAccess policies). You can get this code from my GitHub repository.

Then in API Gateway console, created a resource with the name ‘getstudentdetails’ with CORS enabled.

Next created GET method from the Action menu. Selected Lambda Function as Integration type, us-east-1 as region and ‘getStudentDetails’ as the Lambda Function.

Then similar to the POST method configured Query String Parameters and Body Mapping Template, enabled request validation and API key requirement. Finally, deployed it from the Action menu into a stage named ‘Dev’.

[Though I have created two resources but this is not necessary. You can create one resource and within it create two or more methods.]

Finally, tested it from Postman to retrieve data, and succeeded.

Next, I have created two S3 buckets with unique names, uploaded index.html, error.html, jquery files and enabled Static Website Hosting with bucket policy allowing public access on the buckets (one for POST and one for GET). Replaced the API URL under ajax. [As I am not a web developer and have only basic knowledge of html, CSS etc so I have taken these codes from GitHub.]

Next, I have enabled DynamoDB stream with StreamViewType=NEW_IMAGE; configured a Lambda function that reads data from the stream record for new items and publishes a message to a topic in Amazon Simple Notification Service.

At first, I have created a topic in Amazon SNS with the name ‘StudentNotification’ and added an email subscription.

Then created an IAM role (name ‘Lambdarolefordynamodbstream’) with the following policies that allow the role to-

1. 1. Read data from the DynamoDB stream for my table.

{
"Effect": "Allow",
"Action": [
"dynamodb:DescribeStream",
"dynamodb:GetRecords",
"dynamodb:GetShardIterator",
"dynamodb:ListStreams"
],
"Resource": "arn:aws:dynamodb:region:accountID:table/StudentDetails/stream/*"
}

2. Publish messages to Amazon SNS.

{
"Effect": "Allow",
"Action": [
"sns:Publish"
],
"Resource": "arn:aws:sns:region:accountID:StudentNotification"
}

3. Access Amazon CloudWatch Logs to write diagnostics at runtime.

{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource":"arn:aws:logs:region:accountID:*"
}

In the Lambda console, created a Lambda function with name ‘streamfunction’, runtime as Node.js 10.x and choosing the created role ‘Lambdarolefordynamodbstream’. Then written Nodejs code for my intended job. You can get this code from my GitHub repository.

In the DynamoDB console clicked on More, then on the drop-down menu chosen Existing Lambda Function and selected the Lambda function ‘streamfunction’; BatchSize=1 (you can choose more as per your need) and checked the Enable Trigger. Then it looks like this-

Then opened the URL of the S3 static website and both adding student details and retrieving student details are succeeded.

Also got an email for new entry in this table.

Also, CloudWatch log events show all the events in this process.

In today’s serverless architecture many Lambda functions are used for different purposes. Maybe one giant Lambda function can do various tasks or many functions invoke one another. But there are some requirements-

Need of sequential functions
Need to run functions in parallel
Need to select functions based on data
Need to retry functions for a number of times before failing
Need try/cache/finally to handle errors and avoid failure.

Use of Amazon SQS or database can be a solution. But then more overhead of development and maintenance arise.

Instead use AWS Step Functions.

AWS Step Functions are used to coordinate components and orchestrate the serverless workflow. With Step Functions it's easy to connect and coordinate distributed components of microservices to quickly create apps. Also, it’s easy to manage the logic of apps & hence remove repeated codes.

Step Functions automatically triggers and tracks each step, and can retry steps when there are errors. It also logs the state of each step, so when things do go wrong, we can diagnose and debug problems quickly. With Step Functions, we can orchestrate different types of back-end processes based on the work. We can also use retry, cache.

All work in Step Functions is done by tasks. A task performs work by using an activity (which may be an application that we write and host on AWS/on-premises/mobile devices) or an AWS Lambda function or by passing parameters to the API actions of other services.

There are two types of workflow in AWS Step Function-

I. Standard workflow- durable workflow designed for ML, DevOps automation, ETL etc long duration job and
II. Express workflow- designed for high-volume, short-duration job.

AWS Step Function is also based on concept of state machine. State machine is defined using a JSON-based language called Amazon States Language. There are 7 types of state-

Task- a single unit of work.
Choice- adds branching logic.
Parallel- fork and join the data across tasks.
Wait- delay for a specific time.
Fail- stops an execution and marked it as failure.
Succeed- - stops an execution successfully.
Pass- passes it’s input to its output.

Let’s see a simple example-

Here the Step Function reads values from DynamoDB table and send them to SQS. Here the Step Function uses Lambda function to populate the DynamoDB table, uses a ‘for loop’ to read each entries of the table, and then send them to Amazon SQS.

In the AWS Management console opened the Step Function console and clicked on create. Then in the dashboard, selected ‘Run a sample project’ followed by ‘Transfer data records’. Then a definition is shown in Amazon States Language along with a workflow diagram. Then clicked on Next.

Then clicked on Deploy resources.

[When we create State machine by Author with code snippet then we have to create an IAM role for Step function and have to select it in the Specify Details page while creating the state machine. Then in the logging section, we can indicate which execution history events to log- ALL, ERROR, FATAL. We can keep it OFF by default. Optionally we can add tag.]

After deploying resources clicked on Start Execution. Here optionally we can enter execution name and then input values. Then clicked on Start Execution.

Then details of execution is shown with graph inspector and execution history. We can view execution input and output. By clicking on each state on the diagram we can view it’s details. Also, we can view a list of execution event history. There by clicking on each of the events, we can expand to see details on JSON format. In the screenshot, I have given few of the events in the history section. In case of failure, we can do troubleshooting by using these details of the events.

When one step function workflow invokes another step function workflow via service integration, then it is called nested workflow.

In serverless architecture we can use Application Load Balancer for steady stream of traffic and API Gateway for spiky traffic.

We can use AWS Fargate as a serverless compute service for-

longer-running processes or larger deployment packages;
predictable, consistent workload;
processes where we need more than 3 GB of memory;
application with a non-HTTP/S listener etc.

Let’s see a simple example-

In the AWS Management console opened Elastic Container Services and clicked on Create Cluster.

Selected Networking only option Powered by AWS Fargate and clicked on Next step.

Entered a cluster name. You can also create a new VPC and add tags. Clicked Create.

Then within few seconds it shows the cluster. Here we have to create the service. But before this, Task Definition has to be created which is a JSON formatted text file that describes one or more containers and is required to run containers in ECS.

From the left menu clicked on Task Definition. Then Create New Task Definition.

Selected Fargate launch type and clicked on Next step.

Then entered Task Definition Name, optionally can select Task Role.

Then in the Task size section, selected 0.5GB Task memory (you can select other value from the drop-down menu where memories are given in 0.5GB increment).

In Task CPU, selected 0.25 vCPU (you can select other value from the drop-down menu where vCPUs are given).

Clicked on Add container.

Here I have entered httpd as the Container name and httpd:2.4 as image. It will pull the image with tag 2.4 from Dockerhub.com (you can enter your image name and tag). Also, image can be pulled from AWS Elastic Container Registry by entering the image URL.

Then in the Memory Limit section entered 512 MiB hard limit; you can also enter soft limit.

In Port mappings entered 80 as Container port and protocol TCP. Finally clicked on Add.

Then left all other options as default and clicked on Create.

Then returned in my created cluster and clicked on Create under Services tab.

In the first step, service has to be configured. Here, selected Fargate Launch Type followed by the Task Definition (just created) and LATEST Platform Version. Then entered Service name, Number of tasks as 2, Minimum healthy percent 100 and Maximum 200, selected Rolling update as Deployment type. (You can select as your requirement). Clicked on Next step.

In the second step, selected Cluster VPC, Subnets, security group. Enabled Auto-assign public IP.

In the load balancer type selected Application Load Balancer. Then I have chosen a previously created Application Load Balancer name.

In the ‘Container to load balance’ section, selected httpd:80:80 as Container name: port and clicked on Add to load balancer. Then selected same port as Production listener port and entered new target group name. In Path pattern entered ‘/’ and in evaluation order entered 1.

Also entered '/' in Health check path and unchecked the option Enable service discovery integration. Clicked on Next step.

In the third step, we can optionally set up auto scaling. But I haven’t done it. Clicked on Next step.

Finally, after review clicked on Create Service.

Then after few seconds the service has been created.

In the Tasks tab, two tasks have been being provisioning and after few moments their status become PENDING and then finally RUNNING.

Here we haven’t provisioned any EC2 instance. AWS automatically provisioned docker container for us.

Hence this is serverless.

We can also use Amazon Aurora serverless as database. It starts up on demand, automatically scales with demand and shuts down when not in use.

Let’s see a simple example-

In the AWS Management console opened the RDS console. Then clicked on Create database.

Chosen 'Standard Create' as a database creation method and Amazon Aurora as the database engine. Then chosen Amazon Aurora with MySQL compatibility, capacity type serverless and a version.

In the Settings section, entered the DB cluster identifier, master username and master password (also confirmed it).

In the Capacity settings section, chosen 1 (2GM RAM) as Minimum Aurora capacity unit and 2 (4GM RAM) as Maximum Aurora capacity unit. Left the Additional scaling configuration options unchecked.

Next in the connectivity section, selected the VPC. In Additional connectivity configuration left subnet group as default and selected a previously created database security group which only allow database connection from another security group. Also checked the Data API.

In Additional configuration, entered Initial database name and selected the default DB cluster parameter group. It acts as a container for the engine configuration values that are applied to one or more DB instances.

[We can create our custom parameter group from the left menu in the RDS console. There we can compare two parameter groups to identify changes and to troubleshoot the incompatible-parameters issues. Also, we can edit, copy, reset and delete parameter group. To know more view AWS documentation.]

Then selected Backup retention period as 1day, chosen default aws/rds encryption master key and checked Enable deletion protection.

Finally clicked on create database.

After few minutes, database has been created.

By clicking on the database, we can see details like its endpoint.

Scaling consideration in serverless architecture-

Synchronous serverless architecture goes well in low or moderate scale of traffic.

But as scale increases (for millions of traffic per day) many problems arise.

Few of them are-

API Gateway integration timeout(29s),
Lambda execution timeout(15min),
Lambda-RDS connection failure,
Latency of dependencies in the function impacts Lambda,
throttling in Lambda,
Secrets Manager (1500rps),
CloudWatch metrics.
Cost increases for Lambda, CloudWatch.
As concurrency increases, number of execution context increases.

To scale up serverless architecture to millions of users, we should consider-

Evaluation trade-off- Don't designing for infinite scaling of the architecture. Instead know the key drivers for the serverless application and build what needed.

Production-like end to end load testing- We need to know where the bottlenecks are, across the application and remove them with an eye toward the overall flow.

Stay updated- AWS improves their services and adds new features time to time. So, we should stay aware of service updates and take advantage of improvements.

As per Mr Ben Thurgood (Principal Solutions Architect, AWS) it's not just a single thing that fix everything. We should use some best practices-

Take advantage of AWS global infrastructure and edge locations.
Identify and avoid heavy lifting.
Separate application and database.
Refactor as we go.
Use all of the available monitoring tools in the testing and the production environment.

AWS Lambda Power Tuning is an open-source project that runs Lambda function at multiple memory configurations and provides feedback across execution time & Lambda power to help making best choice.

Let’s see how to do it-

Here I have a simple REST API using AWS Lambda and API Gateway with GET method-

To do Lambda power tuning, I have used 'aws-lambda-power-tuning' application from AWS serverless repository by Mr. Alex Casalboni - Link.

After clicking on Deploy, the 'Review, configure and deploy' page has come under ‘Lambda create function’ option. Then checked the 'I acknowledge that this app creates custom IAM roles' and clicked on Deploy.

Then after few moments five Lambda functions, a Step Functions State Machine and their IAM roles are created using CloudFormation.

Then in the State Machine, clicked on Start Execution, entered input (as shown in picture) and clicked on Start Execution. Here you should replace the Lambda ARN.

After the execution completed, clicked on execution output tab and copied the link given in ‘visualization’. Pasted it in a new tab and viewed the result.

Here it is seen that best cost is in 128 MB Lambda power (i.e. memory) and worst cost is in 3008 MB. Also, best time is in 256 MB Lambda power (i.e. memory) and worst time is in 128 MB. We can also hover over the line and view instantaneous values.

Again, in the State Machine clicked on Start Execution, entered input and changed strategy to ‘speed’ then clicked on Start Execution.

After the execution completed, copied the ‘visualization’ link. Pasted it in a new tab and viewed the result.

Here it is seen that best cost is in 128 MB Lambda power and worst cost is in 3008 MB. Also, best time is in 256 MB Lambda power and worst time is in 128 MB.

Again, in the State Machine clicked on Start Execution, entered input and changed strategy to ‘balanced’ then clicked on Start Execution.

After the execution completed, copied the ‘visualization’ link. Pasted it in a new tab and viewed the result.

Here it is seen that best cost is in 128 MB Lambda power and worst cost is in 3008 MB. Also, best time is in 512 MB Lambda power and worst time is in 128 MB.

Also, you can add payload in the input of Start Execution in the State Machine.

Concurrency is the number of Lambda invocations that can be executing at the same time. If the number of requests for invocations exceeds the account or Lambda function concurrency limit, requests are throttled.

The two primary impacts on concurrency are: the invocation model of the event source that’s triggering Lambda and the service limits of the AWS services we use.

To manage concurrency, we should do-

Testing in all scenarios where different Lambda functions may be in use on the same account.
Consider multi-account strategy.
Use AWS Organizations to group accounts into logical categories.
Consider AWS Control tower for new multi-account environment.

AWS Lambda is designed to manage sudden bursts of concurrency within the account and function limits in predictable way. Bursts are handled based on a combination of the limits and a predetermined Immediate Concurrency Increase amount dependent on the region where the Lambda function is running.

For example, let the account limit is 5000 concurrent executions in a region and the immediate concurrency increase is 3000. Here, if the concurrent executions burst from 100 to 4000, Lambda will add the Immediate Concurrency Increase (3000) to the concurrent executions (100).

After one minute, Lambda will evaluate whether more were needed or not. If needed then will add 500 additional concurrent executions.

These evaluation and addition of 500 more concurrent executions will be continued until the available concurrent executions become enough to handle the burst or the account limit reached or the burst level reached.

The performance of Lambda function can be improved by reusing the execution environment. To do this, we should follow some practices-

a. Store and reference external dependencies (database connection, SSM parameter Store, Secret Manager) locally after initial execution.

b. Limit the re-initialization of variables in the Lambda function by writing them outside of handler.

c. Check and reuse existing connection by adding some logic in code.

d. Use /tmp space as the local cache.

e. Check that background processes and callbacks are completed before the code exits because they will resume in case of warm start.

So, some of the solutions of scaling problem in serverless architecture are-

Understand the scaling behavior of services by doing load test.
Decouple the architecture with SQS, SNS.
Use retry with back-off.
Write to SQS with batches.
Use DynamoDB Accelerator in front of DynamoDB.
Don't orchestrate inside Lambda function. Instead use multiple Lambda functions and AWS Step Function.
Fetch secrets outside of the Lambda handler and use in-memory cache (with expiry policy) for frequent secret refreshment.
For CloudWatch, log metrics to stdout in a standard format, use open source helper libraries.
Use asynchronous pattern.
In SQS, set visibility timeout to 6 times the Lambda function timeout.
Use Amazon RDS proxy which is fully managed service. Lambda will connect to the proxy and proxy will connect to the database. Also, the proxy preserves the connection during DB fail-over, manages the DB credentials with Secrets Manager and IAM. It can be found in the left menu of RDS console-

Security consideration in serverless architecture-

In serverless architecture, we are responsible for the security of code, storage of data, client side data encryption and controlling access to data and resources.

There are three best practices-

Follow the principle of least privilege.

Protect data at rest and in transit by encryption.

Audit system for changes, unexpected access, unusual patterns, or errors.

As API Gateway is the front-door of serverless app, we should secure this using-

AWS IAM- for those who are existing in AWS environment or retrieve temporary IAM credentials. We should add the minimum needed permissions to the IAM role to invoke API.
Lambda authorizers- a Lambda function is invoked to authenticate or validate users against existing identity provider. This is also useful for centralize API access across accounts.
Amazon Cognito- to authenticate users from mobile, 3^rd party identity provider. Amazon Cognito implements 3 types of tokens-

Identity tokens- a JSON web token used for authentication and it include user profile information.
Access tokens- a JSON web token used to authorize requests including Amazon Cognito API.
Refresh tokens- an opaque blob token used to get new ID and access tokens without re-authentication.

Also, we should enable request validator in the Method Request of API Gateway. We can use API keys and Usage plan. (as given in the 1^st REST API project in this article)

To protect AWS Lambda function, we should implement access control by resource policy. Also, instead of hard-coding sensitive data, use environment variable, SSM Parameter Store or AWS Secrets Manager.

To protect DynamoDB we should-

Use IAM roles to authenticate access to DynamoDB.
Use VPC Endpoint if we only need the access from within a VPC. In that case, we should add proper rules in network ACL and security group.
To store sensitive data, we should consider client side encryption. Also we can specify whether DynamoDB use AWS owned CMK (default) or an AWS managed CMK to encrypt user data.

To protect S3 we should-

Implement S3 bucket policy.
Implement S3 Access Control List.
Implement encryption at rest. There are three modes of server-side encryption - SSE-S3, SSE-C, SSE-KMS. Also, we can encrypt data at client side before sending to S3.
In the S3 bucket policy we can include encryption headers to deny uploading data without desired type of encryption. Also, we can deny S3 actions without SSL by condition.
Also, we can enable access logging for auditing purpose.

We can use AWS Web Application Firewall to protect resources.

We can use CloudFront distribution and AWS Shield Advanced to prevent DDoS attack.

As per Mr George Mao (Serverless specialist, AWS) the security best practices are-

Be aware to take advantage of AWS managed services to reduce the security burden.
Think about security end to end at each integration point in the distributed architecture.
Following the principle of least privilege, use narrowly scoped IAM permissions and roles to protect access to AWS services.
Create smaller Lambda functions that perform scoped activities and don’t share IAM roles between functions.
Protect data at rest and in transit by encrypting the data and using SSL.
Do not send, log, or store unencrypted sensitive data, whether it’s part of an HTTP request path/query string or standard output of a Lambda function. Encrypt what we write.
Audit the system for changes, unexpected access, unusual patterns, or errors.
To authenticate users, store their information in database as encrypted data or use Amazon Cognito.
Instead of managing multiple identities, use centralize identity management and privilege management using a serverless standard-based identity service.

SAM & serverless deployment-

When we first begin to do lab with Lambda, API Gateway, DynamoDB etc we do this in the AWS Management Console. It is great for the beginner level but not ideal for real world production environments.

It’s like logging into the production servers and editing the files. This is not a recommended practice.

Also, in industry continuous update and changes are require due to customer requirements, fixing bugs, new features.

To successfully deploy serverless app we need-

The ability to audit and validate changes during deployment.
The ability to rollback a bad deployment into previous more stable version of the code.
The ability to deploy changes through a planned and automated process. First deploy the application into development environment, then into test and finally into production environment -this need to be automated with minimum human error.

For this we need to deploy serverless app with CICD tools.

Also, there are 2 key truths about serverless:

Developers need the ability to build and test code locally and
They need the ability to deploy code into a sandbox.

Both of these problems can be solved by using the AWS Serverless Application Model or SAM which is an extension of AWS CloudFormation that simplifies the work in deploying and building serverless applications. With SAM, we can create templates in YAML format to define Lambda functions, API Gateway APIs, DynamoDB tables and the serverless application from the AWS Serverless Application Repository.

AWS SAM is made up of two main components-

SAM template- It is a YAML formatted template compatible with AWS CloudFormation. It uses infrastructure as code to define Lambda functions, API Gateway, DynamoDB tables and the serverless application from the AWS Serverless Application Repository. While deploying the template if any error detected, AWS CloudFormation will roll back the template and delete any resources that were created, hence make the environment exactly as before the deployment.
SAM Command Line Interface (CLI)- Using Docker it allows us to test serverless code and emulate the Lambda environment locally. To install SAM CLI, we need to install Docker and python.

Here is a sample SAM template-

There are several resources types-

AWS::Serverless::Function- AWS Lambda.

AWS::Serverless::Api- API Gateway.

AWS::Serverless::SimpleTable- DynamoDB.

AWS::Serverless::Application- AWS Serverless Application Repository.

AWS::Serverless::HttpApi- API Gateway HTTP API.

AWS::Serverless::LayerVersion- Lambda layers.

In the process of developing with AWS SAM, at first write Lambda function code and define all serverless resources inside an AWS SAM template. Then SAM CLI can be used to emulate the Lambda environment and perform local tests on Lambda functions. After the code and templates are validated, use the SAM package command (an alias of aws cloudformation package command) to create a deployment package which is a .zip file that SAM stores in Amazon S3. After that, the SAM deploy command (an alias of aws cloudformation deploy command) instructs AWS CloudFormation to deploy the .zip file to create resources inside of AWS console.

Here is an example. I have created a python code and a template.yaml file which is a SAM template.

Then in the terminal given the command- aws cloudformation package --template-file template.yaml --s3-bucket utpalsambucket --output-template-file packaged-template.yml

It created a zip file of my code & dependencies, uploaded it into the utpalsambucket bucket on AWS S3, and returned a copy of the template file with the S3 location as an output template file (which is packaged-template.yml).

Then I have run the deploy command- aws cloudformation deploy --template-file packaged-template.yml --stack-name mysamstack --capabilities CAPABILITY_IAM

Now the changeset and CloudFormation stack are created and resources are created by CloudFormation as per the template. Here during the CloudFormation stack creation IAM role is also created, for this we have to give the capabilities to explicitly allow CloudFormation to do so. That’s why, CAPABILITY_IAM or CAPABILITY_NAMED_IAM (for IAM resources with custom name) are added in the command.

In the SAM template, I have added API key and usage plan for the API Gateway. So, after creation of the resources while performing GET operation from Postman without the API key, it shows forbidden. But when added the API key in the header, it succeeded.

There are three types of deployment options for configuration data-

Hard-code in Application- it is specific to one Lambda function with low latency. This is not a good practice, because the configuration data and secrets may be accidentally exposed. Always separate configurations and secrets from the code, to prevent accidental check-ins to public source control.
Environment Variables- it is also specific to one Lambda function with low latency but secure.
Loading Data at Runtime from Parameter Store- it is for multiple Lambda function with higher latency. Lambda function make calls to retrieve that configuration information.

AWS Systems Manager Parameter Store is a free, fully managed, centralized storage system for configuration data and secret management. Here data can be stored as plaintext or as encrypted with AWS KMS. It tracks all parameter changes through versioning, so if we need to roll back deployment, we can also choose to use an earlier version of the configuration data.

Parameter Store offers standard and advanced parameters-

In real world, serverless architectures are deployed by deploying the SAM templates and codes through AWS developer tools like CodeCommit, CodeBuild, CodePipeline, CodeDeploy. Also audit the changes by using AWS CloudTrail and CloudWatch.

I have another article for AWS CICD, please read this.

While deploying serverless app, code is pushed into a source code repository like AWS CodeCommit, GitHub, Bitbucket.

Then in the build stage service like AWS CodeBuild prepares the code for deployment by preparing the environment, compiling the code, doing unit test & style check, creating function deployment packages.

Then in the test stage various testing (integration, load, UI, security) performed in a production like environment by service like AWS CodeBuild.

Finally, the code is deployed into production environment by service like AWS CodeDeploy.

The whole process can be orchestrated by AWS CodePipeline.

I have created a small project on SAM deployment using CodeCommit, CodeBuild and CodePipeline. But before this, let see the process with more detailed view-

The source code is pushed along with the SAM template (optionally with buildspec.yml and others files) into the source code repository. Or if some changes occur in the source code and [git commit] it is pushed into the repository by ‘git push’ then AWS CodePipeline copies it into an Amazon S3 bucket and passes it to CodeBuild.

AWS CodeBuild gets the source code from the pipeline and does various testing, runs security checks, installs dependencies and prepares the SAM template for deployment. When CodeBuild hits the deployment preferences section of the SAM template, CodeDeploy will take over the execution of the resources.

To deploy the resources, CodeDeploy calls CloudFormation to create or update a CloudFormation stack using change sets. To provision the resources in the template, CloudFormation must assume an IAM role that has the appropriate permissions.

In the CodePipeline setting we can skip build or deploy stage depending on the requirement. Also, in the build and deploy stages we can use others tools and services rather than CodeBuild and CodeDeploy. In my project, I have skipped the deploy stage, and have given CodeBuild all the required IAM permissions to perform the tasks.

Before starting this project, it is suggested to read my article CICD in AWS with explanation.

At first, I have written app.py, template.yml and buildspec.yml files in a folder. And also, I have created an AWS CodeCommit repository.

Then in the Git bash, run git init, git add . , git status, git commit -am “the serverless commit-1”, git remote add origin URL_of_Repository, git push -u origin master. The git commands guide can be found form GitHub or AWS Documentation and many others websites. The files are pushed into the CodeCommit repository.

Then created AWS CodeBuild project. Here I have faced many errors due to access deny. So, in the IAM role of CodeBuild, I have given the permission of S3, CloudFormation, CloudWatch and others.

Then created AWS CodePipeline project.

The CloudFormation Stack is created and API Gateway, AWS Lambda, DynamoDB table also got created.

Next, I have edited the pipeline and added a stage named ‘approval’ with an action group. It is manual approval. Here you can add more stage and action group with build and prod.

Then I have changed the app.py file which is the Lambda function code. Committed and pushed it into the CodeCommit repository.

At the 2^nd stage (approval) of CodePipeline it was on pending mode, waiting for manual approval.

To keep it simple I haven’t added SNS notification feature on the manual approval action group. So here I have approved the pipeline execution.

Then the CodePipeline resumed and the build stage succeeded. The change in Lambda function code got reflected.

In Lambda function, the first code is written in $LATEST version. It is mutable i.e. can be modified. When a function code run successfully we can publish a version with a given name. The published version become immutable.

Now, while sending traffic to Lambda function, we can create alias which points to a specific version and send 100% traffic to it. When a new version is published, we can change alias to send 100% traffic to the newer version. This is known as all-at-once which is the fastest type. This is good for low concurrency event model.

In real world production environment, all-at-once is not recommended. Because maybe in the newer version of code, some issue occurs (due to bug or any other cause) which will affect 100% customers. Then we have to redeploy older version for rollback which will affect business revenue.

Instead we should use Canary/Linear type. Here after publication of a new version of code, we send few traffic (suppose 10%) to newer version and rest will go to older stable version of code. This is good for high concurrency event model.

Here if some issue occurs in the newer version, this will affect only the fewer customers and we can then revert them (hence 100% traffic) back to older stable version of code. This is not fastest deployment type but is better for production environment.

In canary type we shift a specific portion (let 10%) of traffic to a new version and if all go well and satisfactory then shift the rest i.e. 100% traffic to the new version.

In linear type we shift a specific portion (let 10%) of traffic to a new version for every predetermined amount of time (let 1 minute). Which means after 10 minutes 10*10=100% traffic will be shifted to the new version. In between this time if any issue occurs we can rollback the traffic to older stable version.

In the first REST API example of this article, Lambda canary deployment is shown.

These three types of deployments can be automated by SAM.

In the SAM template we can add ‘deploymentpreference’ section. These are the types-

CanaryXPercentYMinutes- Send X% traffic to new version; after Y minute send rest (hence all) traffic to the new.
LinearXPercentEveryYMinutes- Send X% traffic to new version every Y minute until all traffic shifted to the new.
AllAtOnce- Send all traffic to new version at once.

If any issue occurs, we can use CloudWatch alarm to automatically trigger rollback.

Using Hooks we can test the code both before and after traffic shifting. After deploying new version we can perform tests using PreTraffic hook and then shift traffic to the new version. Then after traffic shifting we can perform tests using PostTraffic hook.

An example code is-

Like AWS SAM, we can use the Serverless Framework offered by Serverless Inc. This is provider agnostic i.e. we can use this for other cloud platforms. We can even use this framework to run serverless app on different cloud platforms simultaneously.

I have installed this through Windows command prompt by using npm. And checked the version. Here we have to run the commands starting with serverless or sls (an alias of serverless).

After the installation, while running serverless commands from VS Code or others terminals, you may get this error- File C:\---\---\---\---\npm\serverless.ps1 cannot be loaded because running scripts is disabled on this system. For more information, see about_Execution_Policies at https:/go.microsoft.com/fwlink/?LinkID=135170

This is because the execution policy in Windows protects our system from scripts we don’t trust. To change, run this command in Windows PowerShell (Administrator mode)- Set-ExecutionPolicy RemoteSigned this will allow local scripts and remote signed scripts. Press Y for confirmation.

Then you can run serverless commands from VS Code or others terminal.

To get started create a folder and open it with VS Code. Then to create a boilerplate run this command-

serverless create --template aws-nodejs --path my-serverless

Then deploy it with these commands- cd my-serverless and serverless deploy

It has created a S3 bucket to upload JSON and .zip file.

The CloudFormation stack and resources are created.

We can remove the stack and all the created resources by using serverless remove command.

We can use serverless offline plugins to run and test APIs locally. To install run the command npm install --save-dev serverless-offline (you can change ‘dev’ to ‘test’ or any other; the plugin will be installed as a dev/test dependency.)

Then inside the serverless.yml file after the service declaration, declare the serverless-offline plugin (as shown in the picture). Then run the command serverless offline to run the server locally at this URL- http://localhost:3000

we can test this using Postman.

There are many more in Serverless Framework. But as this article has already become very large, so details of the Serverless Framework is not discussed. Later will be discussed in another article.

So, this is the end of this article.