Migrating On-Premises Data Warehouse to AWS RedShift

Customer Profile

Our customer is a prominent player in e-learning solutions globally, based in India. We are associated with them since past 1 year through multiple projects on AWS cloud platform along with major DevOps tools, including Docker and Kubernetes.

Goal

During our discussion with customer SPOC {VP-Technology} we zeroed in for following goals:

  1. Migrate Data from on-prem data warehouse to AWS Redshift
  2. Remove single point of failures
  3. Create a staging environment
  4. Create data-pipelines and configure alerts on pipeline failures

Solution

By utilizing principles of AWS’ well architected framework, we implemented following solution:

  1. Deployed AWS Schema Conversion Tool (SCT) on the customers premises for schema conversion and data migration
  2. Moved Data to S3 from the user’s on-premises data warehouse
  3. Created Data Pipelines on AWS
  4. Copied Data to AWS Redshift in staging environment using highly available EC2 instances
  5. Saved Pipeline statistics to RDS instances for Analytics
  6. Created a AWS Quicksight Dashboard for Redshift and RDS instances data

Tech Stack

This solution used following tool, platforms, services and programming languages:

  1. Public cloud platform–AWS. Following services are prominently used:
    • Simple Storage Service (S3)
    • Elastic Compute Cloud (EC2)
    • AWS Redshift
    • SNS for Pipeline alerts and notifications
    • Lambda to save Pipeline stats on RDS
    • RDS to save Pipeline statistics
    • Dynamodb for pipeline configurations
    • AWS IAM Role to allow SCT to put records on S3
    • AWS Quicksight for Analytics

Solution Architecture

Click to see large view

Results

Based on changes implemented as per AWS’ well architected framework, customer gained following in addition to predefined goals:

  1. Improved performance with RedShift
  2. Fast scaling with fewer complications to meet the changing storage demands
  3. Better security with SSL encryption for data in transit
  4. Notifications for pipeline failures
  5. Detailed statistics on QuickSight