Project Overview
Zydus Lifesciences, a global pharmaceutical leader, operated a complex on-premises infrastructure spanning 70 servers and 14 mission-critical applications across nine locations. With 100 TB of cGMP-compliant data at risk, the company faced escalating challenges in scalability, cybersecurity, hardware failures, and disaster recovery preparedness. Frequent unplanned outages underscored the need for a robust disaster recovery (DR) solution. To enhance business continuity, Zydus Lifesciences sought to establish a highly resilient DR site as part of its Business Continuity Plan (BCP), interconnecting all remote locations through MPLS (Multiprotocol Label Switching).
About Zydus Lifesciences: A Global Pharma Leader
Zydus Group is a leading global pharmaceutical and healthcare company headquartered in India, committed to manufacturing excellence under stringent cGMP standards. The company is dedicated to improving global healthcare accessibility through innovation and high-quality products. Zydus Lifesciences, the flagship division of Zydus Group, is a fully integrated healthcare organization specializing in pharmaceuticals, active pharmaceutical ingredients (APIs), biologics, and wellness products. With a presence in highly regulated markets such as the U.S., Europe, Latin America, and emerging economies, Zydus Lifesciences is a key player in the global pharmaceutical landscape. The company operates state-of-the-art research and manufacturing facilities strategically located worldwide, ensuring efficiency, innovation, and compliance with global healthcare standards. Its commitment to sustainability and patient-centric solutions has solidified its reputation as a trusted partner in the pharmaceutical industry.
Challenges Faced by the Client
1. Scalability Issues Expanding the traditional DR setup required significant investments in hardware and software to accommodate increased workloads. 2. Ransomware & Cybersecurity Risks A resilient environment to safeguard against ransomware attacks and ensure rapid recovery is essential. 3. Compliance & Infrastructure Modernization Constant hardware and software upgrades were necessary to maintain compliance and security, increasing operational complexity. 4. Security Incidents & Accidental Deletes Delayed response times during security breaches and accidental data deletions posed operational and regulatory risks. 5. Licensing & Investment Challenges Ongoing expenses for traditional DR solutions, including compute hardware, software licenses, and data center management, strained financial resources. 6. Complex Infrastructure Management Maintaining a synchronized and readily available DR system added to the IT workload.
The AWS-Driven Disaster Recovery Solution
1. Continuous Replication & Rapid Failover • AWS DRS for block-level replication with near-zero Recovery Point Objective (RPO). • DNS cutover via Application Load Balancer for seamless application accessibility. • Private hosting for critical servers with encryption at rest (EBS, Amazon S3). 2. Secure & Scalable Hybrid Architecture • AWS Direct Connect for a high-speed, secure link between on-premises and cloud. • Geo-redundant VPCs and subnets for uninterrupted failover operations. 3. Cost-Optimized Staging & Compute Efficiency • On-Demand instances for replication servers. • Right-sizing instances and transitioning to a Compute Savings Plan, reducing compute costs by 78%. • Migration from gp2 to gp3 storage volumes to optimize costs further. 4. Automation & Governance • AWS Systems Manager for automated patching and disk management. • CloudWatch and CloudTrail for real-time monitoring and compliance tracking. • Centralized governance using AWS Config and IAM role management. 5. Disaster Recovery Drills & Compliance Assurance • Quarterly DR drills ensuring operational readiness. • RTO: 2-3 hours; RPO: 5-15 minutes. • AWS Key Management Service (KMS) for encrypted data protection. 6. Proactive Monitoring & 24x7 Support • Centralized dashboard for monitoring replication health, CPU utilization, and network performance • 24x7 L1/L2 support for incident resolution and root-cause analysis • Upgraded on-prem infra to latest version • OS upgrade • Traditional backup system replaced with AWS cloud backup
Post-Deployment Challenges & Solutions
1. Initial Replication Lag • Fine-tuned replication settings to handle high concurrent writes. • Optimized firewall policies to enable seamless AWS agent connectivity. 2. Bandwidth Limitations & Sync Delays • Implemented AWS Direct Connect for stable connectivity. • Scheduled replication during off-peak hours to minimize congestion. 3. Legacy System Compatibility Issues • Upgraded outdated drivers and reconfigured legacy systems for AWS compatibility. • Enhanced server resources, including RAM and storage optimizations. 4. Challenges during DR Drills • Resolved data duplication and missing volume issues by refining disk mapping. • Addressed boot failures by aligning configurations with AWS recovery standards. • Optimized instance performance for seamless failover transitions.
Key Results & Business Impact
• Reduced Disaster Recovery Costs by 75% with optimized compute and storage strategies. • Achieved Near-Zero Downtime with an RTO of 2-3 hours and an RPO of 5-15 minutes. • Ensured Compliance with stringent regulatory standards through AWS-native governance. • Enhanced Cybersecurity & Resilience against ransomware and accidental data losses. • Optimized Performance & Scalability using an AWS-first approach for DR readiness.