“Finally she sat in the wee little chair and it was just right. But Goldilocks rocked so hard in the chair that it broke into pieces!” The Story of the Three Bears by Robert Southey (1837)

Getting auto scaling just right is one of the hardest things about cloud apps. Scale too slowly and you can't meet customer demand. Scale too quickly and you pay for more than you need. For new applications in particular, you might need to have strong prediction skills to provision the correct number of virtual machine (VM) instances to meet customer demand.

We don’t want your Google Compute Engine app to break when it faces increased demand but, just as importantly, we’d like to ensure you don’t pay for excess capacity when you don't need it.

Sure, you could estimate peak load and provision for it. However, if your application’s peak load only occurs during the holiday period, your infrastructure will be largely unused for the majority of the year. Alternatively, you could provision for estimated average use. But then, by definition, half the time your infrastructure will be over-provisioned and half the time it will be under-provisioned, so you end up disappointing either your customers or your managers all the time.

To learn how to automatically scale Compute Engine instances to meet demand, read the newly published article Auto Scaling on the Google Cloud Platform. In this paper, we explore a framework for a Google App Engine application that scales Compute Engine instances up or down as demand increases or decreases. We designed the framework to be extremely flexible, so that you can adjust it to meet your business needs.

To get started, download the sample apps:

Orchestrating an App Engine+ Compute Engine application

Unlike Goldilock’s adventures, this isn’t a fairy tale. You can download the apps and start implementing today.

-Posted by Kathrin Probst, Solutions Architect