Welcome to the AWS PEARC 2020 Research Computing Tutorial

Cloud computing lets researchers match computational infrastructure to the needs of specific research applications, execute without queue wait times, scale to meet workload demands, and share and reproduce scientific analyses. However, there are many ways to run high performance computing (HPC) applications in the cloud, and selecting the best and most cost-effective way can be daunting. This tutorial describes the specific benefits of cloud computing to research workloads, defines cloud-based computational methods such as auto-scaling and serverless computing, and teaches best practices for computationally intensive research workloads in the cloud. Attendees will apply what they have learned in a hands-on tutorial on Amazon Web Services (AWS) . We will use specific scientific examples from genomics and weather forecasting to illustrate the appropriate use of both serverless batch computation and tightly-coupled clusters running applications with the Message Passing Interface (MPI) library. We will describe how to parallelize workflows and ensure reproducibility through workflow management systems and containers, illustrated with the open-source software NextFlow. We will also cover practical aspects of cloud computing for research, such as budgeting and minimizing cost, leveraging ongoing technological advances, sharing data and research infrastructure, and adapting to changing research methods and standards.