Lesson 0004: Auto Scaling and Elastic Load Balancing

Distributing traffic, monitoring resources, and scaling compute capacity automatically

1. Why Scaling Matters

Scaling is the ability to increase or decrease the compute capacity of your application based on demand. Without automatic scaling, you face a capacity dilemma:

In the cloud, computing power is a programmatic resource. You can take a flexible approach to scaling by using services that automatically respond to changes in demand.

2. Elastic Load Balancing

Elastic Load Balancing (ELB) is an AWS service that distributes incoming application or network traffic across multiple targets in a single Availability Zone or across multiple Availability Zones. It scales your load balancer automatically as traffic changes over time.

ELB Targets: EC2 instances, containers, IP addresses, and Lambda functions.

Types of Load Balancers

TypeOSI LayerProtocolsBest For
Application Load Balancer (ALB)Layer 7 (Application)HTTP, HTTPSContent-based routing, microservices, container-based applications
Network Load Balancer (NLB)Layer 4 (Transport)TCP, UDP, TLSMillions of requests per second, ultra-low latency, volatile traffic patterns
Gateway Load Balancer (GWLB)Layer 3 (Network)All IP packetsVirtual appliances such as firewalls, intrusion detection, deep packet inspection

How ELB Works

ELB Use Cases

Monitoring Load Balancers

3. Amazon CloudWatch

Amazon CloudWatch is a monitoring and observability service that monitors your AWS resources and the applications that run on AWS in real time.

CloudWatch Capabilities

CloudWatch Alarm Components

ComponentDescription
NamespaceContains the metric, for example AWS/EC2
MetricThe variable to measure, for example CPU Utilization
StatisticAverage, sum, minimum, maximum, sample count, or percentile
PeriodThe evaluation period for the alarm
ConditionsThreshold comparison: Greater, Greater or Equal, Lower or Equal, or Lower
ActionsSend notification to SNS, or perform EC2 Auto Scaling or EC2 action
Alarm Types: You can create alarms based on a static threshold, anomaly detection, or a metric math expression.

4. Amazon EC2 Auto Scaling

Amazon EC2 Auto Scaling helps you maintain application availability by automatically adding or removing EC2 instances according to conditions you define.

Auto Scaling Group

An Auto Scaling group is a collection of EC2 instances that are treated as a logical grouping for automatic scaling and management. You configure three key values:

If you specify scaling policies, EC2 Auto Scaling can launch or terminate instances as demand increases or decreases, staying within the minimum and maximum bounds.

Launch Configuration

A launch configuration is an instance configuration template that defines what you are scaling. It includes:

Scaling Options

OptionHow It WorksBest For
Maintain current levelsPeriodically health-checks running instances. Replaces unhealthy instances automatically.Always. This is the baseline behavior.
Manual scalingYou manually change the minimum, maximum, or desired capacity.One-off changes or testing.
Scheduled scalingScaling actions run automatically based on date and time.Predictable workloads, such as weekly traffic patterns.
Dynamic scalingScaling policies respond to changing demand in real time, triggered by CloudWatch alarms.Unpredictable or variable workloads.
Predictive scalingUses machine learning on your EC2 usage data to predict traffic and scale proactively.Workloads with daily or weekly patterns. Requires at least one day of historical data.
Scale Out vs. Scale In: Launching instances is called scaling out. Terminating instances is called scaling in.

How Dynamic Scaling Works

A typical dynamic scaling configuration uses CloudWatch, EC2 Auto Scaling, and Elastic Load Balancing together:

  1. CloudWatch monitors a metric such as average CPU utilization across your EC2 fleet.
  2. When the metric breaches a threshold for a specified duration, a CloudWatch alarm triggers.
  3. The alarm runs an EC2 Auto Scaling policy that launches or terminates instances.
  4. EC2 Auto Scaling registers new instances with the load balancer.
  5. The load balancer performs health checks and begins distributing traffic to the new instances.
  6. The load balancer feeds performance metrics back to CloudWatch, completing the loop.

5. AWS Auto Scaling

AWS Auto Scaling is a separate service from Amazon EC2 Auto Scaling. It monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost.

AWS Auto Scaling can build scaling plans for:

EC2 Auto Scaling vs. AWS Auto Scaling: EC2 Auto Scaling scales EC2 instances. AWS Auto Scaling is a broader service that can scale multiple resource types, including ECS, DynamoDB, and Aurora.

6. Decision Guide

NeedServiceWhy
Distribute HTTP/HTTPS traffic with content-based routingApplication Load BalancerLayer 7, routes based on request content
Handle millions of requests per second with ultra-low latencyNetwork Load BalancerLayer 4, optimized for sudden traffic spikes
Deploy virtual appliances like firewalls or IDS/IPSGateway Load BalancerLayer 3, transparent gateway for virtual appliances
Monitor AWS resources and trigger actions on metricsAmazon CloudWatchMetrics, alarms, and events in one service
Automatically add or remove EC2 instances based on demandAmazon EC2 Auto ScalingMaintains availability and optimizes cost
Scale multiple AWS resource types from a single interfaceAWS Auto ScalingEC2, ECS, DynamoDB, and Aurora in one plan

Quiz: Auto Scaling and Elastic Load Balancing

Select one answer per question. You will receive immediate feedback.

1. A company runs a web application on EC2 instances and expects traffic to spike every Monday morning. They want to add instances automatically at 6:00 AM every Monday. Which scaling option should they use?
2. Which load balancer type operates at OSI Layer 7 and is best for routing HTTP traffic to microservices based on the content of the request?
3. An Auto Scaling group has a minimum size of 2, a desired capacity of 4, and a maximum size of 8. A scaling policy triggers and needs to add 6 instances. How many instances will the group have?
4. Which AWS service publishes metrics for your load balancers and targets, and can trigger alarms that initiate EC2 Auto Scaling actions?
5. A company needs to deploy third-party firewall virtual appliances across multiple Availability Zones. Which load balancer type should they use?
6. What is the term for terminating EC2 instances in an Auto Scaling group when demand decreases?
7. A company wants to scale their Amazon DynamoDB tables and Amazon Aurora Replicas automatically from a single scaling plan. Which service should they use?
8. Which load balancer type is optimized to handle sudden and volatile traffic patterns while maintaining ultra-low latencies?
Progress: 0/8 correct (0%). Answer all questions to see the final recommendation.
Ask your teacher: Ask your agent about anything unclear: scaling policies, health check behavior, or the difference between EC2 Auto Scaling and AWS Auto Scaling.
Primary Source: AWS Academy Module 10: Automatic Scaling and Monitoring.