See More

Pattern’s for Deploying On-prem BigData Applications on Amazon Web Services (AWS)

Cloud Automation

One of our clients, a leading financial organization was looking to move their existing on-premises applications into AWS cloud. They turned to Apps Associates to help them design and develop several frameworks to automate the deployment securely to AWS.

Working with them closely, Apps Associates followed important principles for any application before moving into AWS. Our approach included:

    • Automate Everything
      • Develop automated routines to create, manage and update your applications
      • Refrain from using the AWS console to make manual changes


    • Encrypt Everything
      • All data must be encrypted at rest
      • All S3 buckets must be encrypted using the KMS master key
      • All EBS volumes must be encrypted
      • All data in transit must be encrypted using TLS


    • Secure Everything
      • Do not store any credentials in the source code
      • Always externalize the credentials from the application artifacts and make the application artifacts common to all environments
      • Always use the right role, policy and permissions to operate the application. Do not share roles and policies across applications.


    • Be Frugal
      • Always shutdown the resources when they are not needed. Use automation to bring the resources online when they are needed
      • Choose use of serverless capabilities over dedicated infrastructure


  • Operate being mindful of privileges
    • Use the minimum set of permissions to operate a service. Do not use an overly broad set of permissions
    • Do not use security groups or firewall rules that are very broad or overly permissive

Framework for Deploying On-prem BigData Applications into AWS EMR

Apps Associates worked with the client to develop the framework to deploy their on-prem Big Data Applications. The main objectives included:

  • Enable teams to focus on application development
  • Configure Driven EMR Platform Service
  • Provision the Transient EMR clusters
  • Submit a job once job is completed, terminate the EMR cluster and cleanup resources
  • Create consistent Logging and Alerting Mechanism
  • Ease management of core EMR related services

Shared EMR Cluster

Shared EMR cluster is designed and developed for application teams to submit their BigData jobs (it can be Spark job or any Hive/Presto query executions) for processing data on AWS. It is a highly available cluster configured with Active-StandBy configuration to avoid downtime at any point in time for disaster recovery purposes. And also Active, StandBy master nodes are registered to the load balancer to distribute the application load. All the data is encrypted at rest and also in transit. Auto scaling is enabled based on some metrics like Memory utilization to auto provision core nodes to take care the processing load. Benefits include:

  • Provision ELB, Active and StandBy EMR clusters
  • The ELB sits in front of the primary EMR cluster and the CloudWatch alarm monitors the health of the cluster
  • If there is something wrong and the master node becomes unhealthy, the alarm will be on and the SNS topic will notify all the subscribers, including emails and the Lambda function
  • The lambda function then will trigger the deregistering of the current primary cluster master and register the standby master
  • The standby cluster can be created small and as the failure process starts and queries started to be routed to the standby cluster, auto scaling/manual scaling can start and scale the cluster to the ideal size
Patterns for Transient EMR Provisioning

Patterns for Transient EMR provisioning

Transient EMR provisioning framework patterns are developed to help application teams to deploy their BigData applications into AWS cloud, but these patterns are transient in nature, as soon as the application job completes EMR cluster will be terminated and the corresponding backend resources will be cleaned. Transient EMR patterns helps application teams focus on application development instead of provisioning EMR clusters. Everything is configuration driven so application teams can update the parameters based on their requirements and necessity.

We developed multiple patterns based on the application team needs including:

  • S3 notification event pattern
  • CloudWatch schedule pattern
S3 Notification Event Pattern

S3 notification event pattern

  • Create a notification on S3 bucket
  • If any object arrives into S3 bucket Lambda will be triggered
  • Lambda will execute the Cloud Formation templates to provision EMR cluster
  • Once the EMR cluster created submit a Spark job
  • Once the Spark job execution completed, terminate the EMR cluster and cleanup the resources

CloudWatch schedule pattern

  • Create a Cloud watch schedule event
  • On the specified schedule Lambda will be triggered
  • Lambda will execute the Cloud Formation templates to provision EMR cluster
  • Once the EMR cluster created submit a Spark job
  • Once the Spark job execution completed, terminate the EMR cluster and cleanup the resources

By utilizing the approach above we were able to provide the client with key benefits that include:

  • Configuration Driven EMR Platform Service
  • Enable teams to focus on application development
  • Centralized security and compliance routines
  • Consistent Logging and Alerting Mechanism
  • Enhanced AWS-infrastructure design with better maintainability, cost optimization
  • Ease management of core EMR related services

With Apps Associates help the client was able to address the following challenges:

  • Application teams can focus on application development instead of creating Cloudformation templates and other platform functions for provisioning EMR and related services
  • Application teams now just need to select parameters and execute a Jenkins job to provision EMR
  • Once the EMR job completes EMR will terminate immediately saving a lot of cost
  • If the EMR failed to provision or the EMR job failed to complete, a detailed email notification is sent to the person who initiated the EMR provisioning
  • Automatically scales the core and task nodes based on the utilization

If you have any questions or need help with an integration, please reach out to me directly at [email protected].