IoT platforms present unique engineering challenges: thousands of devices sending continuous telemetry, real-time alerting requirements, and the need for both hot and cold analytics paths. AWS provides a powerful set of serverless services for building IoT platforms that scale automatically and cost-effectively.
IoT Platform Architecture Overview
A production IoT platform typically consists of four layers:
- Ingestion: Securely receiving data from devices
- Processing: Real-time transformation and enrichment
- Storage: Hot, warm, and cold data tiers
- Analytics: Dashboards, alerts, and ML models
Device Ingestion with AWS IoT Core
AWS IoT Core handles secure device connectivity at scale. Key features we leverage:
Device Authentication
Each device gets a unique X.509 certificate for mutual TLS authentication. This provides:
- Device identity: Every message is cryptographically tied to a specific device
- Revocation: Compromised devices can be immediately blocked
- No shared secrets: Unlike API keys, certificates can't be easily stolen from memory
MQTT Topics and Message Structure
Design your topic hierarchy for flexibility and efficient routing:
# Topic structure
sensors/{device_id}/telemetry
sensors/{device_id}/status
sensors/{device_id}/alerts
# Example message
{
"deviceId": "sensor-001",
"timestamp": "2026-01-31T10:30:00Z",
"readings": {
"flowRate": 125.5,
"pressure": 45.2,
"temperature": 18.3
},
"batteryLevel": 85,
"signalStrength": -65
}Device Shadows for Offline Capability
Device shadows maintain the last known state of each device, enabling:
- Offline devices: Apps can query the shadow even when devices are disconnected
- Desired state: Set configuration that devices receive when they reconnect
- State diff: Track changes between desired and reported state
Real-Time Processing with Kinesis
IoT Rules Engine routes messages to Kinesis Data Streams for real-time processing:
Stream Configuration
Resources:
TelemetryStream:
Type: AWS::Kinesis::Stream
Properties:
Name: iot-telemetry
ShardCount: 4 # Start with 4, scale based on throughput
StreamModeDetails:
StreamMode: PROVISIONED # Or ON_DEMAND for unpredictable loadsLambda Consumer Pattern
Lambda functions process Kinesis records in batches:
export const handler = async (event) => {
const alerts = [];
for (const record of event.Records) {
const payload = JSON.parse(
Buffer.from(record.kinesis.data, 'base64').toString()
);
// Check thresholds
if (payload.readings.pressure > 60) {
alerts.push({
deviceId: payload.deviceId,
type: 'HIGH_PRESSURE',
value: payload.readings.pressure,
threshold: 60,
timestamp: payload.timestamp
});
}
// Check for anomalies
if (await isAnomaly(payload)) {
alerts.push({
deviceId: payload.deviceId,
type: 'ANOMALY_DETECTED',
readings: payload.readings,
timestamp: payload.timestamp
});
}
}
if (alerts.length > 0) {
await sendAlerts(alerts);
}
return { processed: event.Records.length, alerts: alerts.length };
};Alerting System Design
A robust alerting system needs multiple components:
Alert Types and Configuration
Store alert configurations in DynamoDB for runtime flexibility:
{
"alertId": "high-pressure-zone-a",
"type": "THRESHOLD",
"metric": "pressure",
"operator": "GREATER_THAN",
"threshold": 60,
"duration": 300, // Must exceed for 5 minutes
"severity": "HIGH",
"zones": ["zone-a", "zone-b"],
"notifications": {
"sms": ["+1234567890"],
"email": ["[email protected]"],
"webhook": "https://..."
}
}Alert Deduplication
Prevent alert storms with deduplication:
const shouldSendAlert = async (alert) => {
const key = alert.deviceId + ':' + alert.type;
const lastAlert = await getLastAlert(key);
if (lastAlert) {
const timeSince = Date.now() - lastAlert.timestamp;
const cooldown = alert.severity === 'HIGH' ? 300000 : 900000;
if (timeSince < cooldown) {
return false; // Still in cooldown
}
}
await recordAlert(key, Date.now());
return true;
};Multi-Channel Notification
Use SNS for fan-out to multiple channels:
const sendAlerts = async (alerts) => {
for (const alert of alerts) {
const config = await getAlertConfig(alert.type);
// Fan out to all configured channels
await Promise.all([
config.notifications.sms?.length &&
sendSMS(config.notifications.sms, formatSMS(alert)),
config.notifications.email?.length &&
sendEmail(config.notifications.email, formatEmail(alert)),
config.notifications.webhook &&
callWebhook(config.notifications.webhook, alert)
]);
}
};Time-Series Storage with TimeStream
Amazon TimeStream is purpose-built for IoT time-series data:
Table Design
-- Create database and table
CREATE DATABASE iot_analytics;
CREATE TABLE iot_analytics.sensor_data (
-- Dimensions (indexed)
device_id VARCHAR,
zone VARCHAR,
sensor_type VARCHAR,
-- Measures (time-series values)
flow_rate DOUBLE,
pressure DOUBLE,
temperature DOUBLE,
battery_level BIGINT
);Query Patterns
-- Average readings over time windows
SELECT
device_id,
bin(time, 1h) as hour,
AVG(flow_rate) as avg_flow,
MAX(pressure) as max_pressure
FROM iot_analytics.sensor_data
WHERE time > ago(24h)
AND zone = 'zone-a'
GROUP BY device_id, bin(time, 1h)
ORDER BY hour DESC;
-- Anomaly detection with statistics
SELECT
device_id,
time,
pressure,
AVG(pressure) OVER (
PARTITION BY device_id
ORDER BY time
RANGE INTERVAL '1' HOUR PRECEDING
) as rolling_avg
FROM iot_analytics.sensor_data
WHERE time > ago(6h);Data Tiering Strategy
IoT data has different access patterns over time:
- Hot (0-24 hours): TimeStream memory tier for real-time dashboards
- Warm (1-30 days): TimeStream magnetic tier for operational analytics
- Cold (30+ days): S3 with Parquet format for compliance and ML training
# TimeStream retention configuration
MemoryStoreRetentionPeriodInHours: 24
MagneticStoreRetentionPeriodInDays: 30
# S3 export for long-term storage
EventBridge Schedule -> Lambda -> TimeStream Query -> S3 ParquetDashboard Architecture
Real-time dashboards need WebSocket connections for live updates:
API Gateway WebSocket
// Connection manager
const connections = new Map();
export const connect = async (event) => {
const connectionId = event.requestContext.connectionId;
const { zones } = JSON.parse(event.body);
await dynamodb.put({
TableName: 'WebSocketConnections',
Item: { connectionId, zones, connectedAt: Date.now() }
});
return { statusCode: 200 };
};
// Broadcast updates to subscribed connections
export const broadcast = async (zone, data) => {
const connections = await getConnectionsByZone(zone);
await Promise.all(
connections.map(conn =>
apiGateway.postToConnection({
ConnectionId: conn.connectionId,
Data: JSON.stringify(data)
}).catch(() => deleteConnection(conn.connectionId))
)
);
};Cost Optimization Tips
- Message batching: Devices should batch readings before sending (e.g., every 30 seconds)
- Kinesis on-demand: Use on-demand mode for unpredictable traffic patterns
- Lambda batch size: Process 100+ records per invocation to reduce costs
- TimeStream retention: Move data to magnetic storage quickly (24h memory is often enough)
- S3 intelligent tiering: For cold storage, let AWS optimize storage class automatically
Key Takeaways
- Design for scale from day one: IoT device counts grow rapidly
- Use device shadows: They solve many offline/sync challenges
- Implement alert deduplication: Prevent alert fatigue
- Tier your data: Different access patterns need different storage
- Monitor ingestion lag: Real-time means nothing if data is delayed
These patterns form the foundation of IoT platforms processing millions of messages daily. The serverless approach means you pay only for what you use while maintaining the ability to handle traffic spikes automatically.