Lesson 0004: Auto Scaling and Elastic Load Balancing
Distributing traffic, monitoring resources, and scaling compute capacity automatically
1. Why Scaling Matters
Scaling is the ability to increase or decrease the compute capacity of your application based on demand. Without automatic scaling, you face a capacity dilemma:
Over-provision — You allocate enough capacity for peak demand, but resources sit idle most of the time. Costs are not optimized.
Under-provision — You allocate less capacity to save money, but your application underperforms or becomes unavailable during traffic spikes.
In the cloud, computing power is a programmatic resource. You can take a flexible approach to scaling by using services that automatically respond to changes in demand.
2. Elastic Load Balancing
Elastic Load Balancing (ELB) is an AWS service that distributes incoming application or network traffic across multiple targets in a single Availability Zone or across multiple Availability Zones. It scales your load balancer automatically as traffic changes over time.
ELB Targets: EC2 instances, containers, IP addresses, and Lambda functions.
Millions of requests per second, ultra-low latency, volatile traffic patterns
Gateway Load Balancer (GWLB)
Layer 3 (Network)
All IP packets
Virtual appliances such as firewalls, intrusion detection, deep packet inspection
How ELB Works
A listener checks for connection requests. It is configured with a protocol and port number for connections from clients to the load balancer, and from the load balancer to the targets.
You register targets in target groups and route traffic to those groups.
The load balancer performs health checks on registered targets. It routes traffic only to healthy targets. If a target becomes unhealthy, traffic stops until the target recovers.
ELB Use Cases
High availability and fault tolerance — Traffic is balanced across healthy targets in multiple Availability Zones. If targets in one AZ become unhealthy, traffic routes to targets in other AZs.
Containerized applications — Deep integration with Amazon ECS. You register a service with a load balancer, and ECS transparently manages container registration and deregistration.
Automatic scaling — ELB works with Amazon CloudWatch and EC2 Auto Scaling to scale applications to meet customer demand.
VPC entry point — Create a public or internal load balancer within your VPC. An internal load balancer does not need an internet gateway.
Hybrid environments — Load balance across AWS and on-premises resources using the same load balancer.
Invoke Lambda functions — Register Lambda functions as targets with an Application Load Balancer to serve HTTP(S) requests.
Monitoring Load Balancers
Amazon CloudWatch metrics — ELB publishes metrics to CloudWatch. You can create alarms that initiate actions when metrics go outside acceptable ranges.
Access logs — Capture detailed information about requests sent to your load balancer and store them in Amazon S3.
AWS CloudTrail logs — Capture detailed information about API calls made to the ELB API, stored in Amazon S3.
3. Amazon CloudWatch
Amazon CloudWatch is a monitoring and observability service that monitors your AWS resources and the applications that run on AWS in real time.
CloudWatch Capabilities
Collect and track metrics — Standard metrics from AWS services, plus custom metrics from your own applications.
Alarms — Monitor any CloudWatch metric and automatically send a notification to an Amazon SNS topic, or perform an EC2 Auto Scaling or EC2 action.
Events — Define rules that match changes in your AWS environment and route them to targets for processing, such as Lambda functions, Kinesis streams, ECS tasks, and Step Functions.
CloudWatch Alarm Components
Component
Description
Namespace
Contains the metric, for example AWS/EC2
Metric
The variable to measure, for example CPU Utilization
Statistic
Average, sum, minimum, maximum, sample count, or percentile
Period
The evaluation period for the alarm
Conditions
Threshold comparison: Greater, Greater or Equal, Lower or Equal, or Lower
Actions
Send notification to SNS, or perform EC2 Auto Scaling or EC2 action
Alarm Types: You can create alarms based on a static threshold, anomaly detection, or a metric math expression.
4. Amazon EC2 Auto Scaling
Amazon EC2 Auto Scaling helps you maintain application availability by automatically adding or removing EC2 instances according to conditions you define.
Auto Scaling Group
An Auto Scaling group is a collection of EC2 instances that are treated as a logical grouping for automatic scaling and management. You configure three key values:
Minimum size — The lowest number of instances the group will maintain.
Desired capacity — The number of instances the group should have.
Maximum size — The highest number of instances the group will allow.
If you specify scaling policies, EC2 Auto Scaling can launch or terminate instances as demand increases or decreases, staying within the minimum and maximum bounds.
Launch Configuration
A launch configuration is an instance configuration template that defines what you are scaling. It includes:
You manually change the minimum, maximum, or desired capacity.
One-off changes or testing.
Scheduled scaling
Scaling actions run automatically based on date and time.
Predictable workloads, such as weekly traffic patterns.
Dynamic scaling
Scaling policies respond to changing demand in real time, triggered by CloudWatch alarms.
Unpredictable or variable workloads.
Predictive scaling
Uses machine learning on your EC2 usage data to predict traffic and scale proactively.
Workloads with daily or weekly patterns. Requires at least one day of historical data.
Scale Out vs. Scale In: Launching instances is called scaling out. Terminating instances is called scaling in.
How Dynamic Scaling Works
A typical dynamic scaling configuration uses CloudWatch, EC2 Auto Scaling, and Elastic Load Balancing together:
CloudWatch monitors a metric such as average CPU utilization across your EC2 fleet.
When the metric breaches a threshold for a specified duration, a CloudWatch alarm triggers.
The alarm runs an EC2 Auto Scaling policy that launches or terminates instances.
EC2 Auto Scaling registers new instances with the load balancer.
The load balancer performs health checks and begins distributing traffic to the new instances.
The load balancer feeds performance metrics back to CloudWatch, completing the loop.
5. AWS Auto Scaling
AWS Auto Scaling is a separate service from Amazon EC2 Auto Scaling. It monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost.
AWS Auto Scaling can build scaling plans for:
Amazon EC2 instances and Spot Fleets
Amazon ECS tasks
Amazon DynamoDB tables and indexes
Amazon Aurora Replicas
EC2 Auto Scaling vs. AWS Auto Scaling: EC2 Auto Scaling scales EC2 instances. AWS Auto Scaling is a broader service that can scale multiple resource types, including ECS, DynamoDB, and Aurora.
6. Decision Guide
Need
Service
Why
Distribute HTTP/HTTPS traffic with content-based routing
Application Load Balancer
Layer 7, routes based on request content
Handle millions of requests per second with ultra-low latency
Network Load Balancer
Layer 4, optimized for sudden traffic spikes
Deploy virtual appliances like firewalls or IDS/IPS
Gateway Load Balancer
Layer 3, transparent gateway for virtual appliances
Monitor AWS resources and trigger actions on metrics
Amazon CloudWatch
Metrics, alarms, and events in one service
Automatically add or remove EC2 instances based on demand
Amazon EC2 Auto Scaling
Maintains availability and optimizes cost
Scale multiple AWS resource types from a single interface
AWS Auto Scaling
EC2, ECS, DynamoDB, and Aurora in one plan
Quiz: Auto Scaling and Elastic Load Balancing
Select one answer per question. You will receive immediate feedback.
1. A company runs a web application on EC2 instances and expects traffic to spike every Monday morning. They want to add instances automatically at 6:00 AM every Monday. Which scaling option should they use?
2. Which load balancer type operates at OSI Layer 7 and is best for routing HTTP traffic to microservices based on the content of the request?
3. An Auto Scaling group has a minimum size of 2, a desired capacity of 4, and a maximum size of 8. A scaling policy triggers and needs to add 6 instances. How many instances will the group have?
4. Which AWS service publishes metrics for your load balancers and targets, and can trigger alarms that initiate EC2 Auto Scaling actions?
5. A company needs to deploy third-party firewall virtual appliances across multiple Availability Zones. Which load balancer type should they use?
6. What is the term for terminating EC2 instances in an Auto Scaling group when demand decreases?
7. A company wants to scale their Amazon DynamoDB tables and Amazon Aurora Replicas automatically from a single scaling plan. Which service should they use?
8. Which load balancer type is optimized to handle sudden and volatile traffic patterns while maintaining ultra-low latencies?
Progress: 0/8 correct (0%). Answer all questions to see the final recommendation.
Ask your teacher: Ask your agent about anything unclear: scaling policies, health check behavior, or the difference between EC2 Auto Scaling and AWS Auto Scaling.
Primary Source: AWS Academy Module 10: Automatic Scaling and Monitoring.