Skip to content
7wData Data and AI tools, companies, events, podcast
  • Tools
  • Companies
  • Podcast
  • Articles
  • Events
  • Newsletter
  • Sponsor

Table of Contents

Big Data 2022 • By Yves Mulkers

Improve Performance and Data Availability with Elastic Block Store (EBS)

Improve Performance and Data Availability with Elastic Block Store (EBS)
4 min read
Amazon Web Services, BigQuery, Cloud database
Curated from datasciencecentral.com →

Nowadays, many Database-as-a-Service (DBaaS) solutions separate the computation layer and the storage layer. These include, for example, Amazon Aurora and Google BigQuery. This solution is attractive, as the data storage and data replication can be handled by existing services. DBaaS takes off the need to worry about this complexity; however, the performance of this design sometimes may not be as good as the traditional ways – using a local disk as storage.  

In this blog, we show that with a careful selection of Elastic Block Store (EBS) types and clever optimizations, deploying DBaaS on EBS can achieve even better performance than on local disks. 

Why do we consider EBS in the first place? 

To explain our motivation for using EBS, we’d like to briefly introduce TiDB. TiDB is a MySQL-compatible, distributed database. TiDB Servers are the computation nodes, which process SQL requests. The Placement Driver (PD) is the brain for TiDB, which configures load balancing and provides metadata services. TiKV is a row-oriented key-value store that processes transactional queries. TiFlash is a columnar storage extension that handles analytical queries. In this blog, we focus on TiKV. 

TiKV provides distributed key-value service. First, it splits the data into several Regions, the smallest data unit for replication and load balancing. To achieve High Availability (HA), each Region is replicated three times and then distributed among different TiKV nodes. The replicas for one Region form a Raft group. Losing one node, and thus losing one replica in some Regions is acceptable for TiDB. However, losing two replicas simultaneously causes problems, because the majority of members of a Raft group are lost. This makes a Region unavailable; its data can no longer be accessed. Human intervention is needed to address such issues.  

When deploying TiDB Cloud, we have placement rules, which guarantee that the replica of a Region will be spread across multiple Availability Zones (AZ). Losing one Availability Zone (AZ) will not have a huge impact on TiDB Cloud. However, with AZ + 1 failure (one Availability Zone and at least one node failure in another Availability Zone) the Region becomes unavailable. We had such a failure in production, and it took a lot of work to bring the TiDB cluster online. To avoid such painful experiences again, EBS comes into our sight. 

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

AWS Elastic Block Store (EBS) is a Block Store service provided by AWS, which can be attached to EC2 instances. The data on EBS, however, are independent of the EC2 instance, so when an EC2 instance fails, the data persists. When an EC2 instance fails, the EBS can be automatically remounted to a working EC2 instance by using Kubernetes. Moreover, EBS volumes are designed for mission-critical systems, so they are replicated within an AZ. This means that EBS are less likely to fail, which gives us extra peace of mind.  

In general, there are four SSD-based EBS volume types: gp2, gp3, io1, and io2. (When we designed and implemented TiDB Cloud, io2 Block Express was still in preview mode, so we didn’t consider it.) The following table summarizes the characteristics of these volume types. 

Now, let’s get our hands dirty and do some performance comparison. Note that in the following figures, the four EBS volume types are attached to the r5b instance, while the measurements on local disk are conducted on the i3 instance. This is because that r5b instance can only use EBS. We use i3 as a close alternative. Each figure shows the average and 99th percentile latency for all operations. 

We’ll start with benchmarking the read and write latency. The first workload is a simple one. It has 1,000 IOPS, and each I/O is 4 KB.  The following two figures show the average and 99-percentile latency. 

Write latency in a simple workload with one thread. (Lower numbers are better) 

Read latency in a simple workload with one thread.

Continue Reading

Enjoyed this summary? Read the complete article at the source:

Continue at datasciencecentral.com →

Yves Mulkers

Yves Mulkers is the founder of 7wData and a widely followed voice in the data and AI community. He curates the 7wData and AI Beat newsletters, reaching hundreds of thousands of data and AI professionals, and writes on data strategy, analytics, AI, and the evolving data ecosystem.

Want the structural read on any AI or data company?
INS7GHTS

Want a sharper read on this topic?

Ask ins7ghts how the players compare, what people are actually shipping with, and where the trade-offs land.

Tweet LinkedIn Bluesky Threads Email

Related Articles

Helping data scientists to map a Knowledge Graph future
Data Science

Helping data scientists to map a Knowledge Graph future

3 min read • 2018
The biggest Big Data project in the universe
Apache Hadoop

The biggest Big Data project in the universe

4 min read • Aug 2016
Space technology and artificial intelligence to monitor whale mass stranding events
Business Intelligence

Space technology and artificial intelligence to monitor whale mass stranding events

2 min read • 2021
7wData

Independent reporting on AI and data: daily newsletter, podcast, deep dives.

Read

  • Ins7ghts newsletter
  • AI Beat newsletter
  • Latest articles
  • Podcast
  • Research guides

Use

  • Tools directory
  • Company directory
  • Events
  • ins7ghts

Company

  • About
  • Contact
  • Sponsor a slot
  • Media kit
  • RSS feed

Follow

  • LinkedIn
  • X
  • YouTube
  • Instagram

© 2026 7wData. Independent. Belgium-based.

Privacy Cookies Terms Imprint Cookie settings
INS7GHTS
Cookies on 7wData

We use strictly necessary cookies for the site to work, and optional analytics cookies to understand how readers use 7wData. We never share your data with advertisers. See our Cookie Policy.