Serverless IoT Platform Design Patterns on AWS

IoT platforms present unique engineering challenges: thousands of devices sending continuous telemetry, real-time alerting requirements, and the need for both hot and cold analytics paths. AWS provides a powerful set of serverless services for building IoT platforms that scale automatically and cost-effectively.

IoT Platform Architecture Overview

A production IoT platform typically consists of four layers:

Ingestion: Securely receiving data from devices
Processing: Real-time transformation and enrichment
Storage: Hot, warm, and cold data tiers
Analytics: Dashboards, alerts, and ML models

Device Ingestion with AWS IoT Core

AWS IoT Core handles secure device connectivity at scale. Key features we leverage:

Device Authentication

Each device gets a unique X.509 certificate for mutual TLS authentication. This provides:

Device identity: Every message is cryptographically tied to a specific device
Revocation: Compromised devices can be immediately blocked
No shared secrets: Unlike API keys, certificates can't be easily stolen from memory

MQTT Topics and Message Structure

Design your topic hierarchy for flexibility and efficient routing:

# Topic structure
sensors/{device_id}/telemetry
sensors/{device_id}/status
sensors/{device_id}/alerts

# Example message
{
  "deviceId": "sensor-001",
  "timestamp": "2026-01-31T10:30:00Z",
  "readings": {
    "flowRate": 125.5,
    "pressure": 45.2,
    "temperature": 18.3
  },
  "batteryLevel": 85,
  "signalStrength": -65
}

Device Shadows for Offline Capability

Device shadows maintain the last known state of each device, enabling:

Offline devices: Apps can query the shadow even when devices are disconnected
Desired state: Set configuration that devices receive when they reconnect
State diff: Track changes between desired and reported state

Real-Time Processing with Kinesis

IoT Rules Engine routes messages to Kinesis Data Streams for real-time processing:

Stream Configuration

Resources:
  TelemetryStream:
    Type: AWS::Kinesis::Stream
    Properties:
      Name: iot-telemetry
      ShardCount: 4  # Start with 4, scale based on throughput
      StreamModeDetails:
        StreamMode: PROVISIONED  # Or ON_DEMAND for unpredictable loads

Lambda Consumer Pattern

Lambda functions process Kinesis records in batches:

export const handler = async (event) => {
  const alerts = [];

  for (const record of event.Records) {
    const payload = JSON.parse(
      Buffer.from(record.kinesis.data, 'base64').toString()
    );

    // Check thresholds
    if (payload.readings.pressure > 60) {
      alerts.push({
        deviceId: payload.deviceId,
        type: 'HIGH_PRESSURE',
        value: payload.readings.pressure,
        threshold: 60,
        timestamp: payload.timestamp
      });
    }

    // Check for anomalies
    if (await isAnomaly(payload)) {
      alerts.push({
        deviceId: payload.deviceId,
        type: 'ANOMALY_DETECTED',
        readings: payload.readings,
        timestamp: payload.timestamp
      });
    }
  }

  if (alerts.length > 0) {
    await sendAlerts(alerts);
  }

  return { processed: event.Records.length, alerts: alerts.length };
};

Alerting System Design

A robust alerting system needs multiple components:

Alert Types and Configuration

Store alert configurations in DynamoDB for runtime flexibility:

{
  "alertId": "high-pressure-zone-a",
  "type": "THRESHOLD",
  "metric": "pressure",
  "operator": "GREATER_THAN",
  "threshold": 60,
  "duration": 300,  // Must exceed for 5 minutes
  "severity": "HIGH",
  "zones": ["zone-a", "zone-b"],
  "notifications": {
    "sms": ["+1234567890"],
    "email": ["[email protected]"],
    "webhook": "https://..."
  }
}

Alert Deduplication

Prevent alert storms with deduplication:

const shouldSendAlert = async (alert) => {
  const key = alert.deviceId + ':' + alert.type;
  const lastAlert = await getLastAlert(key);

  if (lastAlert) {
    const timeSince = Date.now() - lastAlert.timestamp;
    const cooldown = alert.severity === 'HIGH' ? 300000 : 900000;

    if (timeSince < cooldown) {
      return false;  // Still in cooldown
    }
  }

  await recordAlert(key, Date.now());
  return true;
};

Multi-Channel Notification

Use SNS for fan-out to multiple channels:

const sendAlerts = async (alerts) => {
  for (const alert of alerts) {
    const config = await getAlertConfig(alert.type);

    // Fan out to all configured channels
    await Promise.all([
      config.notifications.sms?.length &&
        sendSMS(config.notifications.sms, formatSMS(alert)),
      config.notifications.email?.length &&
        sendEmail(config.notifications.email, formatEmail(alert)),
      config.notifications.webhook &&
        callWebhook(config.notifications.webhook, alert)
    ]);
  }
};

Time-Series Storage with TimeStream

Amazon TimeStream is purpose-built for IoT time-series data:

Table Design

-- Create database and table
CREATE DATABASE iot_analytics;

CREATE TABLE iot_analytics.sensor_data (
  -- Dimensions (indexed)
  device_id VARCHAR,
  zone VARCHAR,
  sensor_type VARCHAR,

  -- Measures (time-series values)
  flow_rate DOUBLE,
  pressure DOUBLE,
  temperature DOUBLE,
  battery_level BIGINT
);

Query Patterns

-- Average readings over time windows
SELECT
  device_id,
  bin(time, 1h) as hour,
  AVG(flow_rate) as avg_flow,
  MAX(pressure) as max_pressure
FROM iot_analytics.sensor_data
WHERE time > ago(24h)
  AND zone = 'zone-a'
GROUP BY device_id, bin(time, 1h)
ORDER BY hour DESC;

-- Anomaly detection with statistics
SELECT
  device_id,
  time,
  pressure,
  AVG(pressure) OVER (
    PARTITION BY device_id
    ORDER BY time
    RANGE INTERVAL '1' HOUR PRECEDING
  ) as rolling_avg
FROM iot_analytics.sensor_data
WHERE time > ago(6h);

Data Tiering Strategy

IoT data has different access patterns over time:

Hot (0-24 hours): TimeStream memory tier for real-time dashboards
Warm (1-30 days): TimeStream magnetic tier for operational analytics
Cold (30+ days): S3 with Parquet format for compliance and ML training

# TimeStream retention configuration
MemoryStoreRetentionPeriodInHours: 24
MagneticStoreRetentionPeriodInDays: 30

# S3 export for long-term storage
EventBridge Schedule -> Lambda -> TimeStream Query -> S3 Parquet

Dashboard Architecture

Real-time dashboards need WebSocket connections for live updates:

API Gateway WebSocket

// Connection manager
const connections = new Map();

export const connect = async (event) => {
  const connectionId = event.requestContext.connectionId;
  const { zones } = JSON.parse(event.body);

  await dynamodb.put({
    TableName: 'WebSocketConnections',
    Item: { connectionId, zones, connectedAt: Date.now() }
  });

  return { statusCode: 200 };
};

// Broadcast updates to subscribed connections
export const broadcast = async (zone, data) => {
  const connections = await getConnectionsByZone(zone);

  await Promise.all(
    connections.map(conn =>
      apiGateway.postToConnection({
        ConnectionId: conn.connectionId,
        Data: JSON.stringify(data)
      }).catch(() => deleteConnection(conn.connectionId))
    )
  );
};

Cost Optimization Tips

Message batching: Devices should batch readings before sending (e.g., every 30 seconds)
Kinesis on-demand: Use on-demand mode for unpredictable traffic patterns
Lambda batch size: Process 100+ records per invocation to reduce costs
TimeStream retention: Move data to magnetic storage quickly (24h memory is often enough)
S3 intelligent tiering: For cold storage, let AWS optimize storage class automatically

Key Takeaways

Design for scale from day one: IoT device counts grow rapidly
Use device shadows: They solve many offline/sync challenges
Implement alert deduplication: Prevent alert fatigue
Tier your data: Different access patterns need different storage
Monitor ingestion lag: Real-time means nothing if data is delayed

These patterns form the foundation of IoT platforms processing millions of messages daily. The serverless approach means you pay only for what you use while maintaining the ability to handle traffic spikes automatically.