Google Cloud Platform Blog: Aerospike Hits One Million Writes Per Second with just 50 Nodes on Google Compute Engine

Today’s guest blogger is Sunil Sayyaparaju, Director of Product & Technology at Aerospike, the open source, flash-optimized, in-memory NoSQL database.

What exactly is the speed of Google? We at Aerospike take pride in meeting our challenges of high throughput, consistently low latency, and real time processing that we know will be characteristic of tomorrow’s cloud applications. That’s why after we saw Ivan Santa Maria Filho, Performance Engineering Lead at Google, demonstrate 1 Million Writes Per Second with Cassandra on Google Compute Engine, our team at Aerospike decided to benchmark our product’s performance on Google Compute Engine and push the boundaries of Google’s speed.

And guess what we found out. Aerospike scaled on Google Compute Engine with consistently low latency, required smaller clusters and was simpler to operate. The combined Aerospike-Google Cloud Platform solution could fuel an entirely new category of applications that must process data in real-time and at scale from the very start, enabling a new class of startups with business models that were not viable economically previously.

Our benchmark used a similar setup as the Cassandra benchmark: 100 Million records at 200 bytes each, debian 7 backports, servers on n1-standard-8 instances with data-in-memory with on-disk persistence on a 500GB non-SSD persistent disks at $0.504/hr, clients on n1-highcpu-8 instances at $0.32/hr, and followed these steps. In addition to pure write performance, we also documented pure read and mixed read/write performance. Our findings:

High Throughput for both Reads and Writes

1 Million Writes per Second with just 50 Aerospike servers
1 Million Reads per Second with just 10 Aerospike servers

Consistent low latency, no jitter for both Reads and Writes

7ms median latency for Writes with 83% of writes < 16ms and 96% < 32
1ms median latency for Reads with 80% of reads < 4ms and 96.5% < 16ms
Note that latencies are measured on the server (latencies on the client will be higher)

Unmatched Price / Performance for both Reads and Writes

1 Million Writes Per Second for just $41.20/hour
1 Million Reads Per Second for just $11.44/hour

Aerospike is used as a front edge operational database for a variety of purposes: a session or user context store for real-time bidding, personalization, fraud detection, and real-time analytics. These applications must read and write billions of keys and terabytes, from click-streams to sensor data. Data in Aerospike is replicated synchronously in-memory to ensure immediate consistency and written to disk asynchronously.

Here are details on our experiments with Aerospike on Google Compute Engine. Using 10 server nodes and 20 client nodes, we first examined throughput for a variety of different read and write ratios and documented latency results for those same workloads. Then, we documented how throughput scaled with cluster size, as we pushed a 100% read load and a 100% write load onto Aerospike clusters, going from a 2 nodes to 10.

High Throughput at different Read / Write ratios (10 server nodes, 20 client nodes)

For all read/write ratios, 80% of TPS in this graph is achieved with 50% of the clients (10), adding more clients only marginally improves throughput.

Disk IOPS depend on size. We used 500GB non-SSD persistent disks to ensure high IOPS, so the disk is not the bottleneck. For larger clusters, 500GB is a huge over-allocation and can be reduced for lower costs. To achieve this high performance, we did not need to use SSD persistent disks to get higher IOPS.

Consistent Low Latency for different Read / Write ratios (10 server nodes, 20 client nodes)

For a 100% read load, only ~20% of reads took more than 4ms and ~3.5% reads took more than 16ms. This is because reads in Aerospike are only 1 hop (network round trip) away from the client, while writes take 2 network-roundtrips for synchronous replication. Even with a 100% write load, only 16% of writes took more than 32ms. We ran Aerospike on 7 of 8 cores on each server node. Leaving 1 core idle helped latency; if all cores were busy, network latencies increased. Latencies are as measured on the server.

Linear Scalability for both Reads and Writes

This graph shows linear scalability for 100% reads and 100% writes but you can expect linear scaling for mixed workload too. For reads, a 2:1 clients:server ratio was used i.e for a 6 node cluster, we used 12 client machines to saturate the cluster. For writes, a 1:1 client:server ratio was enough because of the lower throughput of writes.

A new generation of applications with mixed read/write data access patterns sense and respond to what users do on websites and on mobile apps across the Internet. These applications perform data writes with every click and swipe, make decisions, record, and respond in real-time.

Aerospike running on Google Compute Engine showcases an example application that requires very high throughput and consistently low latency for both reads and writes. Aerospike processes 1 Million writes per second with just 50 servers, a new level of price and performance for us. You too can follow these steps to see the results for yourself and maybe even challenge us.

- Posted by Sunil Sayyaparaju, Director of Product & Technology at Aerospike