Data Architecture & CAP Trade-offs
✅ Required: How do you model document state for distributed operations?
✅ Required: What are your CAP theorem trade-offs? Justify your consistency model choice
✅ Optional: How do you handle data partitioning and replication?
✅ Optional: What's your approach to version control and state synchronization?
1. Modeling Document State for Distributed Operations
-
Each document is modeled as a CRDT sequence, specifically using the LSEQ algorithm that users perform edits locally, producing deterministic operations (inserts, deletes) that maintain order.
-
All operations are propagated via WebSocket and Kafka, ensuring all replicas eventually converge.
-
LSEQ provides unique deterministic IDs, allowing operations to be applied multiple times without corrupting the document state.
-
Changes made while offline are stored locally and merged upon reconnection, ensuring eventual consistency.
1.1 Document Modeling
The document itself contains only the metadata and the properties related to the edited content. It does not need to store client-specific state like the last processed offset, nor idempotency keys, because these responsibilities are handled by other ways:
-
CRDT/LSEQ: Provides deterministic IDs for each operation, ensuring idempotency and correct ordering of edits across all replicas.
-
Client state: Each client maintains it's own lastOffsetProcessed to track which deltas from the global log (Kafka topic) have been applied.
"Document": {
"id": "string",
"title": "string",
"createdAt": "timestamp",
"updatedAt": "timestamp",
"authorId": "string",
"content": LSEQ sequence
...any other relevant metadata
}
In this design, messages do not send the entire document content. Instead, only deltas are transmitted. Each delta includes the document ID, LSEQ element ID, operation type (insert, delete, etc.), and the value affected. Clients apply these deltas locally to update their copy of the LSEQ sequence.
2. CAP Theorem Trade-offs
2.1 Consistency (C)
Eventual consistency is guaranteed by CRDT operations. All replicas converge to the same state over time, even after disconnections or message delays. Immediate global consistency is not required, as local edits can safely diverge temporarily.
Kafka: Ensures ordered writes per partition and provides durable commit logs. While replicas may momentarily lag behind the leader, the log guarantees eventual consistency across brokers.
PostgreSQL: Acts as the source of truth for persisted snapshots and recent deltas. Provides strong consistency (ACID transactions) at the database level, ensuring correct ordering and atomicity when merging deltas.
S3: Used for archival and long-term storage of compacted deltas and snapshots. It is eventually consistent, but its durability (11 nines) ensures no data loss, even across regions.
2.2 Availability (A)
The system maintains high availability, allowing users to continue editing documents locally even when disconnected from the network. Operations are queued and later synchronized when connectivity is restored.
WebSocket layer: Keeps active sessions for real-time collaboration. In case of disconnection, the client caches operations locally and resumes automatically.
Kafka: Enables high throughput and distributed availability. If a broker fails, producers and consumers automatically reroute to replicas, maintaining event flow continuity.
PostgreSQL with read replicas: Supports horizontal scaling for read-heavy workloads. Even if the primary database is temporarily unavailable, replicas can continue to serve recent data for read operations.
2.3 Partition tolerance (P)
Network partitions are tolerated. Clients can continue editing locally, accumulate operations, and synchronize once the partition heals.
Kafka: Designed with partition tolerance as a core principle, it continues to operate even if some brokers or network links fail, ensuring durability of messages in isolated partitions.
PostgreSQL: Can be deployed in multi-AZ or multi-region setups with logical replication, mitigating partition risks between nodes.
S3: Naturally partition-tolerant and globally available. Data replication across multiple availability zones ensures resilience during regional outages.
3. Storage Strategy
DynamoDB handles high-throughput writes and replayability, PostgreSQL and Redis handle efficient reads and snapshot access, and S3 ensures long-term durability and recovery.
PK = "documentId#<documentId>#operationId#<crdtOperationId>"
SK = "documentId#<documentId>#kafkaEventOffset#<kafkaEventOffset>"
PostgreSQL is used solely for storing consolidated snapshots, enabling efficient queries and read operations without replaying the full event log. Redis can be leveraged similarly for fast access to recent snapshots.
Amazon S3 serves only for storing snapshots to guarantee durability and to support disaster recovery strategies, ensuring that historical document states can be restored if necessary.
3.1 DynamoDB
DynamoDB act as the event sourcing layer with high throughput performant writes and autoscalable.
For a high volume of events from 100k+ clients, the most suitable architecture is to use DynamoDB as the event sourcing layer, since it scales automatically and provides high write throughput. Each CRDT operation’s deterministic ID can serve as the table's primary key, while the Kafka event identifier can be used as the sort key, avoiding reliance on timestamps for ordering.
{
"pk": "documentId#<documentId>#operationId#<crdtOperationId>",
"sk": "documentId#<documentId>#kafkaEventOffset#<kafkaEventOffset>",
"documentId": "string",
"title": "string",
"author_id": "string",
"delta": "object", // CRDT delta
"version": "number",
"metadata": "object",
"created_at": "timestamp",
"updated_at": "timestamp"
}
3.2 PostgreSQL
PostgreSQL acts as the source of truth for document metadata and latest consistent state. It stores the most recent snapshot of each document, as well as operational metadata required for consistency and access control.
PostgreSQL provides strong durability guarantees (ACID), ensuring no data loss once operations are committed.
CREATE TABLE documents (
id UUID PRIMARY KEY,
title TEXT,
author_id UUID,
content JSONB,
version BIGINT,
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
It supports JSONB for flexible document structures while preserving queryability.
3.3 S3 Cold Storage / Archival Layer
Amazon S3 will be the long-term, cost-efficient storage layer for historical and non-frequently accessed data. It’s not required for system operation but provides critical benefits for durability, scalability, and disaster recovery.
This multi-tier storage strategy separates hot operational data (PostgreSQL) from cold historical data (S3), balancing performance, cost, and durability.
It also facilitates future scalability as S3 can act as the backbone for multi-region replication.
3.4 Offline Persistence with IndexedDB for LSEQ Operations
When the client is offline, it is crucial to store LSEQ operations locally to ensure that all user actions are not lost. IndexedDB provides a persistent, client-side storage mechanism that allows us to queue these operations. Once the client comes back online, the stored operations can be synchronized with the server, maintaining eventual consistency and preserving the correct order of edits in collaborative environments. This approach ensures seamless offline support and data integrity for real-time collaborative applications.
4. Data Partitioning and Replication
Global replication strategy: CRDT operations can be replicated across regions to reduce latency for global users. DynamoDB Global Tables, cross-region Kafka replication, and S3 multi-region replication can be leveraged depending on requirements.
Kafka topics can be partitioned per document or group of documents to distribute load and parallelize processing.
Kafka ensures data replication across brokers, providing durability and failover.
For future global deployment, CRDT operations can be replicated across regions, leveraging Kafka or S3 for storage of deltas and snapshots.
5. Version Control and State Synchronization
Snapshots of the full document state are taken and stored periodically (Redis for fast cache, PostgreSQL as source of truth for documents and S3 for durable storage), but are used only for reconnections or new clients. During normal operation, sending just the deltas ensures network efficiency and allows the system to scale to hundreds of thousands of simultaneous clients without overloading the server or network.
5.1 Version Control and Delta Replay
Each CRDT operation is recorded as a delta and stored in the event sourcing layer (DynamoDB / Kafka).
Snapshots (PostgreSQL / Redis) are taken periodically to provide a fast access point for reconnections or new clients.
Deltas allow replay of operations, enabling clients to reconstruct document state from any point in time.
Conflict resolution is handled deterministically by the LSEQ algorithm, ensuring eventual consistency across all replicas.
This approach supports auditability, as each operation is tied to a deterministic ID and metadata (author, timestamp), making it possible to trace the document’s edit history.