AWS for Industries
Deploying multi-physics simulations for biopharma process development on AWS
This blog was co-authored by Fabrice Schlegel, Senior Manager of Data Sciences at Amgen; Joao Alberto de Faria, Senior Associate Software Engineer at Amgen; Ammar Latif, Senior SA at AWS; and Pierre-Yves Aquilanti, Principal HPC Specialist SA at AWS.
Amgen relies on computational modeling to gain insight into their biopharma processes. Modeling improves product designs through rapid prototyping and and reduce simulation cycle times. Moreover, these computational models help reduce total development costs, shorten product-process development cycles, and reduce time to market.
These multi-physics computational models encompass structural mechanics, Computational Fluid Dynamics (CFD), and more, such as heat transfer and chemistry. They are often very CPU and memory intensive, requiring a powerful computer or cluster of computers to be solved. Such a large cluster is referred to as a High Performance Computing (HPC) environment.
The Amgen Digital Integration and Predictive Technologies (DIPT) and Information Systems (IS) teams built and integrated a HPC platform based on the open-source solution Scale-Out Compute on AWS (SOCA) as an extension of their current AWS cloud infrastructure. The integration of the new platform considered Amgen’s cloud networking best practices, the security roles and guardrails, and streamline user authentication when accessing HPC cluster.
In this post, we describe Amgen’s approach to build and integrate a SOCA cluster to host multi-physics workloads and the details of the integration with existing Amgen cloud infrastructure and policies to enable users to connect transparently to the SOCA cluster. Then, we cover the benefits that Amgen found by using their new infrastructure in production. Scientists and engineers involved in multi-physics simulation and IT decision makers supporting them will understand the benefit of SOCA as a platform that enables customers of all sizes to fully realize the benefits of HPC.
Business and system requirements
The Amgen team partnered with AWS to build a solution for Amgen scientists who needed a scalable and easy to use platform to run their multi-physics simulations. The following requirements had to be met when designing and building the new platform:
- Scalability: Ability to increase or decrease system resources as needed to meet changing demands based on scientists’ evolving projects
- Ease of use: Intuitive and accessible user interface (UI) for scientists using existing tools
- Cost/performance flexibility: Allow scientists to select compute nodes with varying performance and cost
- Automation: Minimize manual steps in platform administration and maintenance, as well as infrastructure as code options
- Built-in visualization tools for scientists: 3D modeling and visualization, as well as debugging of design applications
- Enterprise guardrails: Fits within Amgen’s existing enterprise security and infrastructure guardrails
- Visibility: Dashboards for scientists to easily track job status, usage, and costs
After validating the performances of their multi-physics applications on Amazon Elastic Compute Cloud (Amazon EC2), Amgen deployed Scale-Out Computing on AWS (SOCA), an open-source solution that provided a platform to run a user-friendly HPC system in the cloud. Selecting an open-source platform aligned with the team’s vision of embracing a strong builder culture and engineering expertise instead of selecting off-the-shell alternatives.
With SOCA, end users can submit their computational jobs on SOCA using pre-defined templates through a web-interface or a command line interface (CLI). Once submitted, the job will run on Amazon EC2 instances that will be created for the job duration.
In addition, SOCA provides several dashboards to monitor cluster utilization by user, project, or application. The dashboards also enable administrators to setup budgets. As an open-source solution, SOCA allows customers to deploy and operate it themselves or through a partner who operates it on the customers’ behalf. (If you are looking to use an AWS supported service with a similar functionality to SOCA, see AWS ParallelCluster.)
Onboarding SOCA into Amgen’s AWS Cloud environment
Amgen deployed an AWS Landing Zone to manage a multi-account AWS environment. It is configured to comply with key security policies as well as integrate with important shared services like a cloud-based identity provider that handles authentication for all Amgen users. The open-source SOCA solution provides a quick and easy out-of-box experience that works well in a typical AWS account, but required a bit of customization to be an ideal fit for the Amgen environment.
The primary customizations made to allow SOCA to be a great fit in the Amgen environment were:
- Cloud connectivity
- IAM roles and policies
- User authentication
Cloud connectivity
The Amgen landing zone environment leverages shared Amazon Virtual Private Cloud (Amazon VPC) and AWS Transit Gateway together with AWS Direct Connect in order to provide a low-maintenance, highly scalable AWS network infrastructure. Subnets within each Amazon VPC are shared from a primary network account managed by a central team, so each account owner does not need to configure their own network infrastructure.
Outside of security-group creation, most VPC configurations are prohibited through AWS Identity and Access Management (IAM) permission boundaries. However, SOCA has sample CloudFormation in the install-with-existing-resources.template to demonstrate using an existing VPC and subnet IDs, along with other resources that’ll be discussed below, rather than creating them at provisioning time. This let Amgen simply plug in the appropriate shared VPC and subnet information into the configuration.
IAM roles and policies
There is a standard IAM boundary policy that must be set for all user-assumable roles in Amgen’s landing zone environment. Part of the enforcement mechanism is that the boundary policy prohibits creation of any new IAM roles that do not also have the same permission boundary. This often means that automated installation of solutions like SOCA will fail when attempting to provision new IAM roles, because they do not have a way to specify a permission boundary policy.
To align with compliance requirements that all new roles have a standard permission boundary attached, the Amgen team added permission boundaries to all roles created. To work around this, the roles required by SOCA were pre-created in the SOCA prerequisites template referenced earlier, and each role was modified to include the standard permission boundary. An example IAM role is shown in the following image.
User authentication
Amgen uses a standard cloud-based identity provider for all user authentication in the landing zone environment and desired the same for the SOCA UI. Fortunately, SOCA includes an option for users to create an Amazon Cognito user pool for authentication. This is simply a matter of creating the user pool, setting up Amgen’s cloud identity provider as a federated IDP in Amazon Cognito, then configuring the SOCA web app to use the Amazon Cognito user pool for authentication. This allows for the following authentication path, which is quicker than it looks and requires little to no intervention by the users.
HPC system operations
Persistent infrastructure
The Amgen team also wanted to minimize any permanent Amazon EC2 instances that would require ongoing maintenance and patching. Since most data and custom applications in SOCA reside on managed infrastructure, the Amgen and AWS teams were able to collaborate on factoring out a set of permanent infrastructure components that would continue to persist through updates to SOCA’s infrastructure.
This resulted in a SOCA “prerequisites” CloudFormation template that includes:
- Amazon Elastic File System (EFS) for applications and data
- Application Load Balancer (ALB) with appropriate listeners and a single default target
- Security groups to enable communications to flow between different components of the infrastructure
Through testing, Amgen’s team discovered that using m5.large Amazon EC2 instances for the scheduler node and m5.12xlarge Amazon EC2 instances for worker nodes worked best. This configuration may be different in other use cases, generally depending on the application and dataset, among other parameters.
The SOCA platform provides users with a remote desktop session utilizing NICE DCV, a high-performance remote display protocol that provides customers with a secure way to deliver remote desktops and application streaming. These sessions are like typical remote desktop sessions, but natively support compression and encryption. This Amazon EC2 instance would be the user’s own private virtual machine for running computational application and submitting additional simulation jobs. Desktop nodes the require Graphical Processing Units (GPU) for 3D image processing would utilize g4dn.8xlarge Amazon EC2 instance type.
Computational software can either be installed onto a base Amazon Machine Image (AMI) or it can be used from prebuilt AMIs from the AWS Marketplace. Amgen’s image management policy allows for private AMIs to be launched in Amgen accounts. To comply with this policy, Amgen team configured EC2 Image Builder pipelines to make lightly modified versions of the default AMIs that SOCA would typically use, and also ensure the platform administrators are always using the latest, patched Amazon Linux 2 AMI when building a new SOCA release.
As most workloads are not throughput or input/output (I/O) intensive, the Amgen team opted to use Amazon Elastic File Services (EFS). To ensure consistent performance for the application, Amgen utilized provisioned throughput for the EFS filesystems to ensure a consistent performance while the file system size is still relatively small, and thus does not have a lot of burst credits. This may become less necessary as the file system utilization grows.
Blue-green deployment and update
The rest of the SOCA infrastructure was deployed using readily available CloudFormation templates, with a few notable exceptions in the Amgen customization section in the preceding image. As the SOCA infrastructure was ready to move to production, Amgen’s team shifted focus to the system upgrade process and ensuring that it was automated with minimal downtime for end users. The Amgen team selected the blue/green upgrade pattern.
A blue/green deployment is a strategy in which you create two separate, but identical environments. One environment (green) is running the previous application version and one environment (blue) is running the new application version. Using a blue/green deployment strategy increases application availability and reduces deployment risk by simplifying the rollback process if a deployment fails. Once testing is completed on the blue environment, live application traffic is directed to the blue environment and the green environment is deprecated.
With that approach, updating to a new version of SOCA is reduced to a few, automate-able steps to minimize downtime for users:
- Export user and group information to maintain consistency following the upgrade
- Re-merge Amgen’s specific customization and deploy the latest SOCA build via executing manual_build.py
- Import saved user and group information on a new scheduler
- Validate the new environment before switching the ALB
- Switch Application Load Balancer (ALB) target definition to new stack
Updating the environment ensures that the SOCA cluster is always kept up-to-date by bringing up the images to the latest level of approved patching. It can also benefit from latest Amazon EC2 instance types as needed.
Lessons learned
By selecting a cloud-based HPC solution, the Amgen team does not need to worry about computation time availability, the need to build time slices for different Amgen engineering teams to run their simulations, or oversizing their HPC infrastructure for a peak usage when considering rebuilding the existing HPC system.
Platform selection is not just about the speeds and feeds; it should focus also on the user experience. This new platform provided end users with an enhanced experience with access to interactive remote desktop sessions that allow for 3D rendering to visualize simulation results. This is a major advantage for engineering users of the simulation software when compared to running visualization on local laptops that lack discrete GPU. This is also a benefit over situations where model data had to be downloaded through network connections with limited bandwidth.
Platform performance is about the end users and how easily they can interact with the platform. The SOCA platform is more convenient with the web interface and more convenient for performing required 3D simulations using remote desktops.
“High-Performance Computing capability is foundational to driving increased speed and process efficiencies at optimal cost. The advances made for Amgen’s HPC capabilities as a part of this initiative are fully aligned with IS City Plan & architectural standards. They not only benefit the DIPT and FPT organizations, but the design, methodology and experience will help accelerate HPC implementations for other projects & functions [such as Next Gen Sequencing, etc.].”
— Justin Porth & Venki Anantharam, OIS Sr. Manager & OIS Director, Amgen
Conclusion
With the SOCA platform, the Amgen DIPT and IS teams were able to offer their scientists and engineers a convenient and easy-to-use platform to run computation fluid dynamics simulations, expediting medical device development and biopharmaceutical process development. They achieved the following:
- Improved end user experience: By utilizing the simple UI and remote desktop sessions offered by SOCA, Amgen team was able to enhance end user experience in simplifying the interaction with HPC platform. It also allowed the scientists and engineers to run testing and validation of simulations without the need for specialist compute power on available laptops.
- Adherence to Amgen governance practices: Using SOCA platform allowed Amgen team to seamlessly integrate the platform with existing Amgen cloud governance practices thus leading to approval and adoption by IS organization and faster time to going into production.
- Cost saving: Running HPC simulation jobs in SOCA utilizes AWS services with pay-as-you-go pricing models a so processing tasks cost far less than managing internal servers and personnel to handle the ongoing management of resources.
Amgen is an active member of open-source communities, and the Amgen IS team enjoys identifying ways to contribute to open-source projects to support the growth of the community and the field. To learn more about Amgen’s digital innovation opportunities, visit careers.amgen.com/opportunities/digital-innovation-technology/.
To learn more about Healthcare & Life Sciences on AWS, visit aws.amazon.com/health.
References
1-Roush, David, Dilip Asthagiri, Deenesh K. Babi, Steve Benner, Camille Bilodeau, Giorgio Carta, Philipp Ernst, Fabrice Schlegel et al. “Toward in silico CMC: An industrial collaborative approach to model‐based process development.” Biotechnology and Bioengineering (2020).