Skip to content

Non functional requirements

1. Security and compliance

  • For private document, only users with permission should be able access the document.
  • Is expected to have in flight and at rest encryption for documents.
  • Is expected to have in flight encryption for operations.
  • Is expected to implement firewall rules.
  • Is expected to avoid most common attacks (DDoS, XSS, CSRF and SQL Injection).
  • Is expected to provide authentication and authorization.
  • Should implement local throttling or debounce when sending operations to sync service to avoid DDoS.
  • All document access and modifications should be logged with user identity, timestamp, and operation type.
  • Audit logs should be retained according to retention policies.

2. Availability

2.1 Balancing high availability and offline support

  • System should provide continuous document editing for users offline and availability requirements focus on synchronization and conflict resolution when reconnecting.
  • Availability requirements focus on backend services for real-time synchronization and collaboration, rather than local offline editing.
  • Multi-region replication with automatic conflict resolution ensures consistency when offline changes are merged.

2.1.1 Recovery Time Objective (RTO)

  • Should be minimized to ensure quick system restoration after incidents.

2.1.2 Recovery Point Objective (RPO)

  • Should aim to prevent loss of collaborative operations, using snapshots and replication strategies.

2.2 Distributed architecture

  • Handle network partitions, latency, and failure scenarios.
  • Database replications.
  • Documents snapshots and restoration.
  • Global load balancer.
  • Global DNS.

2.3 Avoid data loss (CRDT)

  • Is expected to provide offline support to collaborations even without internet connectivity.
  • Is expected to use LSEQ to handle sequential and ordered.
  • LSEQ algorithm should be a great solution to ensure sequential, ordered and idempotent operations while optimizing the data structure to support bigger documents.

2.4 Fault tolerant

  • Should be able to handle network partitions, latency, and failure scenarios.
  • Is expected to use a multi-region strategy.
  • Is expected to use a fail-over strategy.
  • Is expected to use data replication strategy.

3. Scalability

  • Is expected to support from 3 users to 100,000+ concurrent users.
  • Should enable horizontal scalability.
  • Should enable distributed sessions throught instances.
  • Should provide partitioned storage for documents.
  • Vertical scaling may be used as a fallback.
  • Is expected to implement a rate-limit to ensure scalable concurrency.

3.1 Memory usage

  • Is expected to use an implementation on the synchronization service (for conflicts merge) with NodeJs streams to reduce memory overhead in each instance.
  • Is expected to use Kafka persisted streams to keep track of checkpoints to avoid data loss in case of any failure or interruption during stream processing.

3.2 Horizontal scaling

  • System should support horizontal scaling across multiple instances to handle peak loads.
  • Is expected to use autoscaling strategy.
  • Is expected to use cluster mode with NodeJs to use all available cores of each instance CPU.

4. Performance

  • Is expected to provide local operations with sub-50ms response times with instanteneous renderizations.
  • Should ensure online syncrhonization with a p99 of sub-200ms response times for collaborative operations.
  • Is expected to provide Content Delivery Network (CDN) to optimize to edge users.
  • Is expected to provide edge caching.
  • Is expected a low latency for global users with multi-region strategy.

5. Consistency

  • Should maintain data integrity across distributed components.
  • Is expected to have an eventual consistency across multi-region databases and services.

6. Traceability: Version control

  • Should provide checkpoints with snapshots to enable restorations.
  • Should provide version control with audit mode (metadata and author).

7. Observability

  • Should provide logs, distributed traces, metrics, alarms and dashboards to monitor and provide support for incidents.