Lesson 0011: NoSQL Database and Data Warehouse

Amazon DynamoDB and Amazon Redshift

1. Relational vs. Non-Relational Databases

Feature	Relational (SQL)	Non-Relational (NoSQL)
Data storage	Rows and columns	Key-value, document, graph
Schema	Fixed	Dynamic / flexible
Querying	Uses SQL	Focuses on collections of documents
Scalability	Vertical (larger instances)	Horizontal (more nodes)

2. Amazon DynamoDB

Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit-millisecond latency at any scale. It is fully managed, and AWS handles all the underlying data infrastructure.

Core idea: DynamoDB is a serverless NoSQL database that runs exclusively on SSDs. It supports document and key-value store models with unlimited storage and throughput.

Key Characteristics

No practical table size limit: Some customers have production tables with billions of items.
Flexible schema: Items in the same table can have different attributes. No schema migrations needed.
Scalable throughput: Provision read/write capacity manually or enable automatic scaling. DynamoDB monitors load and adjusts automatically.
Global Tables: Automatically replicate tables across your choice of AWS Regions.
Encryption at rest and item Time-to-Live (TTL).

Core Components

Component	Description
Table	A collection of data.
Item	A group of attributes uniquely identifiable among all other items.
Attributes	Fundamental data elements (like columns in a relational database).

Primary Keys

Partition key (simple primary key): A single attribute that uniquely identifies an item.
Partition key + Sort key (composite key): Two attributes combined to identify items. Useful when frequently querying by a category plus a detail (e.g., author + title).

Query vs. Scan

Query: Uses the primary key to efficiently locate items. Takes advantage of partitioning.
Scan: Examines every item in the table to find matches on non-key attributes. Less efficient for large tables.

Common Use Cases

Mobile and web applications, gaming, ad tech, IoT applications — especially when you have a large number of clients generating data and making many requests per second.

3. Amazon Redshift

Amazon Redshift is a fast, fully managed petabyte-scale data warehouse. It enables you to run complex analytic queries against structured data using standard SQL and your existing business intelligence (BI) tools.

Core idea: Redshift is for analytics, not transaction processing. Think OLAP (Online Analytical Processing) vs. OLTP (Online Transaction Processing). Redshift = analytics queries on massive datasets.

Architecture

Leader node: Manages communications with clients, parses queries, develops execution plans, and compiles code for compute nodes.
Compute nodes: Run compiled code and send intermediate results back to the leader node for final aggregation.
Redshift Spectrum: Runs queries against exabytes of data directly in Amazon S3 without loading it into Redshift.

Key Features

Columnar storage: Data is stored by column instead of row, which dramatically speeds up analytic queries.
Massively parallel processing: Distributes data and queries across multiple nodes for high performance. Most results return in seconds.
Automatic monitoring and backup: Continuously monitors the cluster and backs up data for easy restore.
Built-in encryption: Encryption at rest and in transit.
Scalable: Add more nodes with no downtime. Pricing starts at 25 cents per hour.

Use Cases

Enterprise data warehouse migration with agility and low upfront cost
Big data analytics at a low price point
SaaS applications providing analytic capabilities

DynamoDB vs. RDS vs. Redshift

Transactional workload with complex queries and joins? → Amazon RDS or Aurora (relational).

Simple key-value lookups at massive scale with single-digit millisecond latency? → Amazon DynamoDB (NoSQL).

Analytic queries on petabytes of structured data using BI tools? → Amazon Redshift (data warehouse).

4. Quick Quiz

Primary Source: AWS Academy Module 8: Databases (module-8.pdf).

Ask your teacher: If you confuse when to use DynamoDB versus RDS, or Redshift versus RDS, remember: DynamoDB = NoSQL key-value at scale, RDS = transactional SQL, Redshift = analytics/data warehouse with SQL and BI tools.