Today’s guest blog comes from Rodrigo Borges, senior bioinformatician of Mendelics, a molecular diagnostics company based in São Paulo, Brazil. The company diagnoses genetic diseases by sequencing the human genome to identify mutations that cause inherited diseases. While the company currently has fewer than 20 employees, it tackles a multi-billion dollar problem by helping the over 100,000 children that are born with genetic diseases in Brazil.

Next-generation sequencing, the reading of DNA and storing the information as a digital file, generates a huge amount of data: 25 GB per exome and 150 GB per genome. These files contain hundreds of millions of short sequences including information like where each read came from in the patient’s genome or sequences existing in the patient that aren’t present in healthy individuals. These short sequences then need to be assembled like a jigsaw puzzle and genotyped to determine all of the differences between the patient’s DNA sequence and reference sequence. After that, we interpret the list of genetic variants or mutations, a process that typically uses a cluster for up to seven days for a single patient.

Our workload for processing DNA sequencing requests varies from day to day. Thus, the ability to rapidly scale with increased demand for processing power was the main reason we migrated to the cloud. We have a web-based app in Google App Engine which controls the workflow of samples at Mendelics. At the end of the process, physicians are able to search among the variety of samples in an easy-to-use way where almost all necessary information is one click away. In this dynamic app, several filters are automatically applied so that more significant variants among millions of possibilities emerge for physicians. Before this app, physicians had to manually examine spreadsheets with thousands of genetic mutations and are now thrilled to do real-time analysis with Google Cloud Platform.

We moved to Google Compute Engine for better integration with App Engine, which we were using for our Web-based workflow. We also use Google Cloud Storage for our bioinformatics pipeline as well as Google BigQuery for extremely fast and flexible processing and interpretation of DNA variants. The migration to Compute Engine was straightforward and took only one month.

We’re currently using Compute Engine for our development and analysis processing. Our app in App Engine starts up our pipeline, and one instance is created for each test. The code for the processing pipeline is on a persistent disk attached to all instances running a test. Instances only live while being processed, taking advantage of Google’s per-minute pricing.

The pipeline and App Engine communicate during the process for information about test status, and when the process is done, the results are uploaded to Cloud Storage so that App Engine can process and deliver results to the physicians. Finally, the instance is killed automatically.

We find that Compute Engine scales quickly, allowing us to easily meet the flow of new sequencing requests. In addition to scalability and integration with App Engine, it is simple to use, requires low maintenance and has high availability. Compute Engine also has great security management, custom metadata and friendly APIs.

Compute Engine has helped us scale with our demands and has been a key component to helping our physicians diagnose and cure genetic diseases in Brazil and around the world.

-Contributed by Rodrigo Borges, Senior Bioinformatician, Mendelics