Scalable NoSQL Database Design for E-Commerce

Project Overview

As part of my postgraduate program in Data Architecture and Engineering, I developed a scalable NoSQL database solution for an e-commerce platform. This project applied advanced data modeling techniques to optimize read and write operations, ensuring high availability and efficient query performance using Amazon DynamoDB.

Project Context

The project focused on designing and implementing a data architecture for Amazonas, a growing e-commerce platform that sells electronics, apparel, household items, and books. The system required a NoSQL database to handle user registrations, shopping carts, orders, payments, and product reviews while maintaining scalability and efficiency.

Key Technologies & Tools

  • Database: Amazon DynamoDB
  • Data Modeling Tool: Hackolade
  • Development & Deployment: Python, Boto3, NoSQL Workbench
  • Solution Architecture

    1. Conceptual Model

    The core entities and relationships of the system included:
  • Customers (1:N with Orders, 1:1 with Shopping Cart)
  • Products (1:N with Order Items, 1:N with Reviews)
  • Orders (N:1 with Customers, 1:N with Order Items, 1:1 with Payments)
  • Order Items (N:1 with Orders, N:1 with Products)
  • Reviews (N:1 with Customers, N:1 with Products)
  • 2. Logical Model & NoSQL Adaptation

    To optimize DynamoDB’s query performance, I applied denormalization. This involved embedding frequently accessed data within related entities. Key decisions included:
  • Storing order items inside Orders and Shopping Cart to minimize JOIN-like queries.
  • Embedding addresses in Customers and categories in Products to streamline lookups.
  • Separating Reviews into a dedicated table to prevent performance degradation for products with a high number of reviews.
  • 3. Partitioning & Sharding Strategy

    To ensure horizontal scalability, I implemented a strategic sharding approach:
  • Customers: Partitioned by _id for uniform distribution.
  • Orders: Partitioned by _id to balance multiple orders per customer.
  • Products: Partitioned by _id to optimize product lookups.
  • Reviews: Used composite keys (_id_product as partition key, _id as sort key) to avoid hotspotting.
  • 4. High Availability & Scalability Measures

    To enhance system resilience and performance in the future, the following measures can be incorporated:
  • Global Tables: Ensure multi-region availability with 99.999% uptime.
  • Auto Scaling: Adjust read/write capacity based on traffic patterns.
  • Secondary Indexes: Design additional indexes for high-traffic queries.
  • Caching: Integrate Redis to reduce database load for frequent reads.
  • Implementation

    The database schema was implemented in a local DynamoDB instance using NoSQL Workbench. A Python script (Boto3) was developed to automate table creation and insert sample data. The complete codebase is available on GitHub: GitHub Repository.

    Conclusion

    This project provided hands-on experience in designing a highly scalable, fault-tolerant NoSQL database for a real-world e-commerce application. The implementation of DynamoDB Global Tables, auto-scaling strategies, and efficient partitioning showcased advanced data engineering principles critical for building robust distributed systems.