TSDBs at Scale – Part Two

This is the second half of a two-part series focusing on the challenges of Time Series Databases (TSDBs) at scale. This half focuses on the challenges of balancing read vs. write performance, data aggregation, large dataset analysis, and operational complexity in TSDBs.

Balancing Read vs. Write Performance

Time series databases are tasked with ingesting concurrent metric streams, often in large volumes. This data ultimately needs to be persisted to permanent storage, where later it can be retrieved. While portions of the ingest pipeline may be temporarily aggregated in memory, certain workloads require either write queuing or a high-speed data storage layer to keep up with high inbound data volumes.

Data structures such as Adaptive Radix Trees (ARTs) and Log-Structured Merge-trees (LSMs) provide a good starting point for in-memory and memory/disk indexed data stores. However, the requirement to quickly persist large volumes of data presents a conundrum of read/write asymmetry. The greater the capacity to ingest and store metrics, the larger the volume of data available for analysis, creating challenges for read-based data analysis and visualization.

Analyzing time-series data reveals an inherent constraint: one must be able to read data at an exponentially higher rate than it was ingested at. For example, retrieving a week’s worth of time-series data within a single second to support some type of visualization and analysis.

Read/write asymmetry of analyzing 1 weeks worth of time-series based telemetry data

How do you scale reads for large amounts of data in a non-volatile storage medium? The typical solution to this asymmetry is data aggregation — reducing the requisite volume of read data while simultaneously attempting to maintain its fidelity.

Data Aggregation

Data aggregation is a crucial component of performant reads. Many TSDBs define downsampling aggregation policies, storing distinct sampling resolutions for various retention periods. These aggregations can be asynchronously applied during ingestion, allowing for write performance optimizations. Recently ingested data is often left close to sample resolution, as it loses value when aggregated or downsampled.

These aggregations are accomplished by applying an aggregation function over data spanning a time interval. Averaging is the most common aggregation function used, but certain TSDBs such as IRONdb and OpenTSDB provide the ability to implement other aggregation functions such as max(), sum(), or histogram merges. The table below lists some well-known TSDBs and the aggregation methods they use.

TSDB/Monitoring Platform Solution to Consistency Problem
IRONdb Automatic rollups of raw data after configurable time range
DalmatinerDB DQL aggregation via query clause
Graphite (default without IRONdb) In memory rollups via carbon aggregator
InfluxDB InfluxQL user defined queries run periodically within a database, GROUP BY time()
OpenTSDB Batch processed, queued TSDs, or stream based aggregation via Spark (or other)
Riak User defined SQL type aggregations
TimescaleDB SQL based aggregation functions
M3DB User defined rollup rules applied at ingestion time, data resolution per time interval

As mentioned in part one, histograms are useful in improving storage efficiency, as are other approximation approaches. These techniques often provide significant read performance optimizations. IRONdb uses log linear histograms to provide these read performance optimizations. Log linear histograms allow one to store large volumes of numeric data which can be statistically analyzed with a quantifiable error rates, in the band of 0-5% on the bottom of the log range, and 0-0.5% on the top of the log range.

Approximate histograms such as t-digest are storage efficient, but can exhibit error rates upwards of 50% for median quantile calculations. Uber’s M3DB uses Bloom filters to speed data access times, which exhibit single digit false positive error rates for large data sets in return for storage efficiency advantages. Efficiency versus accuracy tradeoffs should be understood choosing an approximation based aggregation implementation.

Default aggregation policies, broken down by type, within IRONdb

It is important to note a crucial trade-off of data aggregation: spike erosion. Spike erosion is a phenomenon exhibited where visualizations containing aggregated data over wide intervals display lower interval sample maximums. This occurs in scenarios where averages are used as the aggregation function (which is the case for most TSDBs). The use of histograms as a data source can guard against spike erosion by allowing application of a max() aggregation function for intervals. However, a histogram is significantly larger to store than a rollup, so that accuracy comes at a cost.

Analysis of Large Datasets

One of the biggest challenges with analyzing epic data sets is that moving or copying data for analysis becomes impractical due to the sheer mass of the dataset. Waiting days or weeks for these operations to complete is incompatible with the needs of today’s data scientists.

The platform must not only handle large volumes of data, but also provide tools to perform internally consistent statistical analyses. Workarounds won’t suffice here, as cheap tricks such as averaging percentiles produce mathematically incorrect results. Meeting this requirement means performing computations across raw data, or rollups that have not suffered a loss of fidelity from averaging calculations.

Additionally, a human-readable interface is required, affording users the ability to query and introspect their datasets in arbitrary ways. Many TSDBs use the “in place” query approach. Since data cannot be easily moved at scale, you have to bring the computation to the data.

PromQL, from Prometheus, is one example of such a query language. IRONdb, on the other hand, uses the Circonus Analytics Query Language (CAQL), which affords custom user-definable functions via Lua.

Anyone who has worked with relational databases and non-procedural languages has experienced the benefits of this “in place” approach. It is much more performant to delegate analytics workloads to resources which are computationally closer to the data. Sending gigabytes of data over the wire for transformation is grossly inefficient when it can be reduced at the source.

Operational Complexity

Operational complexity is not necessarily a “hard problem,” but is sadly an often ignored one. Many TSDBs will eventually come close to the technical limits imposed by information theory. The primary differentiator then becomes efficiency and overall complexity of operation.

In an optimal operational scenario, a TSDB could automatically scale up and down as additional storage or compute resources are needed. Of course, this type of idealized infrastructure is only present on trade-show marketing literature. In the real world, operators are needed, and generally some level of specialized knowledge is required to keep the infrastructure properly running.

Let’s take a quick look at what’s involved in scaling out some of the more common TSDBs in the market:

TSDB/Monitoring Platform Solution to Consistency Problem
IRONdb Generate new topology config, kick off
DalmatinerDB Rebalance via REST call
Graphite (default without IRONdb) Manual, file based. Add HAProxy or other stateless load balancer
InfluxDB Configure additional name and/or data nodes
OpenTSDB Expand your HBASE cluster
Riak RiakTS cluster tools
TimescaleDB Write clustering in development
M3DB M3DB Docs

There are other notable aspects of operational complexity. For example, what data protection mechanisms are in-place?

For most distributed TSDBs, the ability to retain an active availability zone is sufficient. When that’s not enough (or if you don’t have an online backup), ZFS snapshots offer another solution. There are, unfortunately, few other alternatives to consider. Typical data volumes are often large enough that snapshots and redundant availability zones are the only practical options.

A key ingestion performance visualization for IRONdb, a PUT latency histogram, shown as it appears within IRONdb’s management console

Another important consideration is observability of the system, especially for distributed TSDBs. Each of the previously mentioned options conveniently expose some form of performance metrics, providing a way through which one may monitor the health of the system. IRONdb is no exception, offering a wealth of performance metrics and associated visualizations such that one can easily operate and monitor it.


There are a number of factors to consider when either building your own TSDB, or choosing an open-source or commercial option. It’s important to remember that your needs may differ from those of Very Large Companies. These companies often have significant engineering and operations resources to support the creation of their own bespoke implementations. However, these same companies often have niche requirements that prevent them from using some of the readily available options in the market, requirements that smaller companies simply don’t have.

If you have questions about this article, or Time Series Databases in general, feel free to join our slack channel and ask us. To see what we’ve been working on which inspired this article, feel free to have a look here.

TSDBs at Scale – Part One

TSDBs at Scale

This two-part series examines the technical challenges of Time Series Databases (TSDBs) at scale. Like relational databases, TSDBs have certain technical challenges at scale, especially when the dataset becomes too large to host on a single server. In such a case you usually end up with a distributed system, which are subject to many challenges including the CAP theorem. This first half focuses on the challenges of data storage and data safety in TSDBs.

What is a Time Series Database?

Time series databases are databases designed to tracks events, or “metrics,” over time. The three most often used types of metrics (counters, gauges, and distributions) were popularized in 2011 with the advent of statsd by Etsy.

Time Series graph of API response latency median, 90th, and 99th percentiles.
The design goals of a TSDB are different from that of an RDBMS (relational database management systems). RDBMSs organize data into tables of columns and rows, with unique identifiers linking together rows in different tables. In contrast, TSDB data is indexed primarily by event timestamps. As a result, RDBMSs are generally ill-suited for storing time series data, just as TSDBs are ill-suited for storing relational data.


TSDB design has driven in large part by industry as opposed to academic theory. For example, PostgreSQL evolved from academia to industry, having started as Stonebraker’s Ingres (Post Ingres). But, sometimes ideas flow in the other direction, with industry informing academia. Facebook’s Gorilla is an example of practitioner driven development that has become noted in academia.


Data Storage

Modern TSDBs must be able to store hundreds of millions of distinct metrics. Operational telemetry from distributed systems can generate thousands of samples per host, thousands of times per second, for thousands of hosts. Historically successful storage architectures now face challenges of scale orders of magnitude higher than they were designed for.

How do you store massive volumes of high-precision time-series data such that you can run analysis on them after the fact? Storing averages or rollups usually won’t cut it, as you lose fidelity required to do a mathematically correct analysis.


From 2009 to 2012, Circonus stored time series data in PostgreSQL. We encountered a number of operational challenges, the biggest of which was data storage volume. A meager 10K concurrent metric streams, sampling once every 60 seconds, generates 5 billion data points per year. Redundantly storing these same streams requires approximately 500GB of disk space. This solution scaled, but the operational overhead for rollups and associated operations was significant. Storage nodes would go down, and master/slave swaps take time. Promoting a read only copy to write master can take hours. An unexpectedly high amount of overhead, especially if you want to reduce or eliminate downtime — all to handle “just” 10K concurrent metric streams.

While PostgreSQL is able to meet these challenges, the implementation ends up being cost prohibitive to operate at scale. Circonus evaluated other time series storage solutions as well, but none sufficiently addressed the challenges listed above.

So how do we store trillions of metric samples?

TSDBs typically store numeric data (i.e. gauge data) for a given metric as a set of aggregates (average, 50th / 95th / 99th percentiles) over a short time interval, say 10 seconds. This approach can generally yield a footprint of three to four bytes per sample. While this may seem like an efficient use of space, because aggregates are being stored, only the average value can be rolled up to longer time intervals. Percentiles cannot be rolled up except in the rare cases where the distributions which generated them are identical. So 75% of that storage footprint is only usable within the original sampling interval, and unusable for analytics purposes outside that interval.

A more practical, yet still storage efficient approach, is to store any high frequency numeric data as histograms. In particular, log linear histograms, which store a count of data points for every tenth power in approximately 100 bins. A histogram bin can be expressed with 1 byte for sample value, 1 byte for bin exponent (factor of 10), and 8 bytes for bin sample count. At 10 bytes per sample, this approach may initially seem less efficient, but because histograms have the property of mergeability, samples from multiple sources can be merged together to obtain N increases of efficiency, where N is the number of sample sources. Log linear histograms also offer ability to calculate quantiles over arbitrary sample intervals, providing a significant advantage over storing aggregates.

Data is rolled up dynamically, generally at 1 minute, 30 minute, and 12 hour intervals, lending itself to analysis and visualization. These intervals are configurable, and can be changed by the operator to provide optimal results for their particular workloads. These rollups are a necessary evil in a high volume environment, largely due to the read vs write performance challenges when handling time series data. We’ll discuss this further in part 2.

Data Safety

Data safety is a problem common to all distributed systems. How do you deal with failures in a distributed system? How do you deal with data that arrives out of order? If you put the data in, will you be able to get it back out?

All computer systems can fail, but distributed systems tend to fail in new and spectacular ways. Kyle Kingsbury’s series “Call Me Maybe” is a great educational read on some of the more entertaining ways that distributed databases can fail.

Before developing IRONdb in 2011, Circonus examined the technical considerations of data safety. We quickly realized that the guarantees and constraints related to ACID databases can be relaxed. You’re not doing left outer joins across a dozen tables; that’s a solution for a different class of problem.


Can a solution scale to hundreds of millions of metrics per second WITHOUT sacrificing data safety? Some of the existing solutions could scale to that volume, but they sacrificed data safety in return.


Kyle Kingsbury’s series provides hours of reading detailing the various failure modes experienced when data safety was lost or abandoned.

IRONdb uses OpenZFS as its filesystem. OpenZFS is the open source successor to the ZFS filesystem which was developed by Sun Microsystems for the Solaris operating system. OpenZFS compares the checksums of incoming writes to checksums of the existing on-disk data and avoid issuing any write I/O for data that has not changed. It supports on the fly LZ4 compression of all user data. It also provides repair on reads, and the ability to scrub file system and repair bitrot.

To make sure that whatever data is written to the system can be gotten out, IRONdb writes data to multiple nodes. The number of nodes is dependent on the replication factor, which is 2 in this example diagram. On each node, data is mirrored to multiple disks for redundancy.


Data written to multiple nodes, with multiple disks per node, and a replication factor of 2. Cross data center node replication, no single node can be on both sides.


The CAP theorem says that since all systems experience network partitions, they will sacrifice either availability or consistency. The effects of the CAP theorem can be minimized, as shown in Google’s Cloud Spanner blog post by Eric Brewer (the creator of the CAP theorem). But in systems that provide high availability, it’s inevitable that some the data will be out of order.

Dealing with data that arrives out of order is a difficult problem in computing. Most systems attempt to resolve this problem through the use of consensus algorithms such as Raft or Paxos. Managing consistency is a HARD problem. A system can give up availability in favor of consistency, but this ultimately sacrifices scalability. The alternative is to give up consistency and present possibly inconsistent data to the end user. Again, see Kyle Kingsbury’s series above for examples of the many ways that goes wrong.

IRONdb tries to avoid these issues all together through the use of commutative operations. This means that the final result recorded by the system is independent of the order of operations applied. Commutative operations can be expressed mathematically by f(A,B) = f(B,A). This attribute separates IRONdb from pretty much every other TSDB.

The use of commutative operations provides the core of IRONdb’s data safety, and avoids the most complicated part of developing and operating a distributed system. The net result is improved reliability and operational simplicity, allowing operators to focus on producing business value as opposed to rote system administration.

There are a lot of other ways to try and solve the consistency problem. All have weaknesses and are operationally expensive. Here’s a comparison of known consensus algorithms used by other TSDBs:

TSDB/Monitoring Platform Solution to Consistency Problem
IRONdb Avoids problem via commutative operations
DalmatinerDB Multi-Paxos via Riak core
DataDog Unknown, rumored to use Cassandra (Paxos)
Graphite (default without IRONdb) None
InfluxDB Raft
OpenTSDB HDFS/HBase (Zookeeper’s Zab)
Riak Multi-Paxos
TimescaleDB Unknown, PostgreSQL based

That concludes part 1 of our series on solving the technical challenges of TSDBs at scale. We’ve covered data storage and data safety. Next time we’ll describe more of the technical challenges inherent in distributed systems and how we address them in IRONdb. Check back for part 2, when we’ll cover balancing read vs. write performance, handling sophisticated analysis of large datasets, and operational simplicity.

Introducing the IRONdb Prometheus Adapter

Prometheus, an open-source project from CNCF, is an infrastructure and service monitoring system which has become popular due to its ease of deployment and general purpose feature set. Prometheus supports features such as metric collection, alerting, and metric visualizations — but falls short when it comes to long-term data retention.

Prometheus is based on autonomous, single-server nodes which rely on local storage. While this model has its advantages, it introduces some obvious data storage scaling issues. As a result, Prometheus tends to be deployed with shorter retention intervals, which can limit its overall utility.

Today we’re happy to introduce the Beta of our IRONdb Prometheus adapter. IRONdb is Circonus’s internally developed TSDB. It’s a drop-in solution for organizations struggling to scale Prometheus or ones that have become frustrated with maintaining a high-availability metrics infrastructure. Prometheus users who integrate with IRONdb unlock the potential for historical analysis of their metric data, while simultaneously benefiting from IRONdb’s support for replication and clustering.

Here’s a high-level overview of the features that a Prometheus installation gains by adding IRONdb into its data storage architecture.


IRONdb Prometheus without IRONdb
Storage node cluster ceiling >100 1
Data retention Years Weeks
High Availability Yes No
Partitioning methods Automatic Sharding
Consistency methods Immediate per node, catches up across nodes None
Replication methods Configurable replication factor, 2 by default
multi-datacenter capability
by federation
Server-side scripts Yes, in Lua No
Data scheme Schema-free Yes
Data typing Supports numeric, text, and histogram data Supports numeric data

How it Works

The IRONdb Prometheus Adapter provides remote read and write capabilities for Prometheus to IRONdb. These capabilities allow IRONdb to provide metric storage for Prometheus installations, providing a seamless integration between Prometheus and IRONdb. No other data storage solutions can operate at the scale of cardinality that IRONdb can, with one IRONdb cluster supporting many individual Prometheus instances.

Multiple Prometheus instances supported by IRONdb nodes, through our adapter (created in Go)
The Go Gopher created by Renee French is licenced under creative commons.

Once configured, Prometheus writes are translated into Circonus’s raw flatbuffer message format for performant writes into IRONdb. The adapter takes protobuf from Prometheus and converts it into raw flatbuffer messages that are then handled by IRONdb.

When Prometheus needs data from IRONdb, it makes a request to the the configured ‘remote_read’ endpoint. This endpoint points at the Adapter. The endpoint proxies the request to IRONdb, receives back a response, and ultimately translates the response such that Prometheus can understand it.

Implementation Details

Prometheus support is handled within the adapter through a read and a write handler. These two handlers take in snappy encoded protobuf messages as a request payload from Prometheus which are decoded by the adapter. After the messages are decoded and classified into read or write operations the adapter prepares the associated IRONdb specific submission or rollup retrievals outlined below.

For write operations, the decoded request is converted into a list of metrics and then sent to an available IRONdb node. The process through which the adapter translates the Prometheus time-series request is shown here:


for _, ts := range timeseries {
	// convert metric and labels to IRONdb format:
	for _, sample := range ts.GetSamples() {
		// MakeMetric takes in a flatbuffer builder, the metric from
		// the prometheus metric family and results in an offset for
		// the metric inserted into the builder
		mOffset, err := MakeMetric(b, ts.GetLabels(), sample, accountID, checkName, checkUUID)
		if err != nil {
			return []byte{}, errors.Wrap(err,
				"failed to encode metric to flatbuffer")
		// keep track of all of the metric offsets so we can build the
		// MetricList Metrics Vector
		offsets = append(offsets, mOffset)


For remote read capabilities, the Prometheus protobuf request contains a query which we extract from the request payload. We will take each query matcher from the request and fold them all into a single stream tag query for IRONdb, as shown here:


switch m.Type {
case prompb.LabelMatcher_EQ:
	// query equal
	tag := fmt.Sprintf(`b"%s":b"%s"`, matcherName, matcherValue)
	streamTags = append(streamTags, tag)
case prompb.LabelMatcher_NEQ:
	// query not equal
	tag := fmt.Sprintf(`b"%s":b"%s"`, matcherName, matcherValue)
case prompb.LabelMatcher_RE:
	// query regular expression
	tag := fmt.Sprintf(`b"%s":b/%s/`, matcherName, matcherValue)
	streamTags = append(streamTags, tag)
case prompb.LabelMatcher_NRE:
	// query not regular expression
	tag := fmt.Sprintf(`b"%s":b/%s/`, matcherName, matcherValue)


After we retrieve all of the matched metric names from the IRONdb find tags API, we will perform a rollup request from IRONdb and return the results to the caller, afterwards we then encode the results as a Prometheus QueryResult set.

Getting Started

The repository includes a quick guide to getting started, with everything you need to know to build and run the adapter. A Makefile is included, which will perform all the build tasks you will need in order to build the service. After building, you can run the Prometheus Adapter either directly or through docker.


Configuration Screenshot


You’ll be able to modify the behavior of the IRONdb Prometheus adapter by modifying the “prometheus.yml” config file. The config file specifies how often to read and write to stores and defaults to scraping every 15 seconds, which may not fill your needs.

What’s Next?

The IRONdb Prometheus Adapter opens the door for Prometheus to integrate with the suite of alerting and analytics tools in the Circonus monitoring platform. Once data is stored in IRONdb, it’s a simple matter to set up Prometheus to integrate with Circonus. We’re working towards this goal right now.

Later, once we’re finished with the Beta, we’ll release the Adapter for use with all on-premise IRONdb installations.

You can learn more about IRONdb here or contact us to participate in the IRONdb Prometheus Adapter Beta.

Grafana heatmaps with IRONdb have landed

GrafanaCon EU 2018 Recap

A couple weeks ago at GrafanaCon EU 2018 we announced the beta release of the IRONdb Data Source for Grafana. We’ve continued to make improvements to some features, such as the heatmap visualization of histogram data. In this blog post, we’ll show you how to use the IRONdb data source to produce these visualizations. We will start with the assumption that you already have IRONdb up and running; if not, the installation instructions are here. If you don’t want to install IRONdb, and instead want to try out the free hosted IRONdb version on Circonus, just keep reading to see the hosted example; we got you covered!

Data Source Installation

The first order of business is to get the data source installed. You’ll need Grafana v5.0 or 4.6.3 installed as a prerequisite, as it contains a number of updates needed for rendering the heatmap visualization. The IRONdb data source can be found here on GitHub. As with most Grafana plugins, the code is installed in /var/lib/grafana/plugins, and a server restart makes the data source available in the UI. Follow the data source configuration instructions, and you should have the IRONdb datasource installed on the Grafana host.

Data Source Configuration

Hosted or Standalone

Your data source should look something like this; note that this is an example using the Circonus API (URL is set to https://api.circonus.com). If you don’t want to install IRONdb, you can create a Circonus account, grab an API token, and setup a hosted instance. Select IRONdb for the Type field under settings. Enter your IRONdb cluster url in the URL field (https://api.circonus.com for hosted, something like http://localhost:8112 for standalone). You’ll want proxy set under the Access field, since direct mode is not supported yet (this means requests to IRONdb are proxied through Grafana).

Auth and Advanced HTTP Settings

No changes are needed here from the default.

IRONdb Details

The rest of the configuration is specifying hosted or standalone under the installation type, and entering in the API Token.

For standalone IRONdb installation:

  • Set the IRONdb Type field to standalone.
  • Enter the Account ID to the value set in your irondb.conf file.
  • Set the Query Prefix field to the root value of your metrics namespace for the metrics selector.

For hosted IRONdb installation:

  • Set the IRONdb Type field to hosted.
  • Enter the API Token from the API Token Page in your Circonus account.
  • You will not need to make any change to the Query Prefix setting unless you are collecting your own custom metrics (like via Statsd).

Save & Test

Click to save the configuration and test the datasource; if it is working, you’ll see the “Data source is working” status message. If not, revisit the values you entered. Feel free to reach out to us at the Circonus Labs Slack #irondb channel if you have questions or problems you can’t resolve.

Collecting Histogram data

If you are an existing IRONdb user who has histogram metrics already available, you can go to the next step. If not, you’ll need to get histogram data into your instance. To generate a meaningful heatmap, you’ll likely want to be using data that represents latency or a duration, such as HTTP request duration.

For standalone IRONdb installations, see the IRONdb documentation on how to write histogram metrics.

For hosted IRONdb installations, modify the metric type on an existing check you have (Integrations -> Checks) by clicking the histogram icon.

Creating the Heatmap Panel

Panel Creation

In your Grafana instance, click the + sign on the left nav, then select Heatmap from the grid.

Data Source Selection

Select Edit at the top of the panel, and then under the Metrics tab, select your IRONdb data source from the drop down. Then to create a new metric, click the Histogram and CAQL boxes, and click the hamburger menu to the right and select Toggle Edit Mode.

Add Metrics

Now enter the check uuid and the metric name in the following CAQL (Circonus Analytics Query Language) format:

metric:histogram("<check_uuid>", "<metric_name>")

You can also merge multiple histograms metrics. If the upper left of the panel shows a red exclamation point, click the Query Inspector to debug the issue. Again, feel free to reach out to us at the Circonus Labs Slack #irondb channel if you don’t see data – this step isn’t always easy to get right the first time.

    metric:histogram("<check_uuid>", "<metric1_name>"),
    metric:histogram("<check_uuid>", "<metric2_name>") 

Setup Axes, Data format, and Colors

Select Time series buckets for Data Format, and select a Y-Max value so that you can see data on your display on the right scale.

On the Display tab, select spectrum for Mode, and Spectral for Scheme. Click the Show legend box to display the legend and range


Let there be data!

At this point you should see a display something like this. There is a wealth of data encoded in this visualization of load balancer request latency. The band of red centers around the median response time of 500 nanoseconds, and is quite consistent across the map. This is an excellent visualization of the health of services, and is the D component in our RED dashboard. One thing to note here is that you can display the aggregate performance of an entire cluster, which is something that you can’t do with most other TSDB based tools. Other implementations can give you quantiles for individual hosts, but unless you are storing the telemetry data as a histogram (vs a quantile), you can’t calculate the cluster wide quantiles.


Other graphs

You can also display standard line graphs that have been the mainstay of monitoring displays. Here is a graph of load balancer request rates. I used the following CAQL statements to generate this by taking the rate of the requests serviced by each load balancer. Note that this is done on IRONdb and not in Grafana because we are using CAQL. We can still use the is_rate() function with the metric selector, but this is more efficient.

    metric:histogram("<check_uuid>", "<metric1_requests>"),
    metric:histogram("<check_uuid>", "<metric2_requests>"),
    metric:histogram("<check_uuid>", "<metric_n_requests>")
} | histogram:rate()

Metrics Selector

The regular Grafana metrics selector is also available with the IRONdb data source, you can select metrics via this standard interface. Leave the CAQL box unchecked for now, that’s for manually specifying the queries shown above.

Take it for a spin

We hope you enjoyed the walk through of the capabilities of our IRONdb Data Source. We’ll be adding some cool new features to this plugin over the next few months. You can keep up to date with the latest on this data source by following IRONdb on Twitter.

Announcing the beta release of the IRONdb Grafana Data Source

Our team is at GrafanaCon 2018 in Amsterdam this week, and we are pleased to share the news of the beta release of our IRONdb Grafana Data Source. IRONdb has been in production at some of the largest technology companies in the world for nearly a year now. The release of this data source enables existing Grafana users to simplify their TSDB operational workloads and expand their metrics capacity with IRONdb.



Click here to join the wait list and be one of the first to access the new Grafana Data Source plugin when it becomes available.


The IRONdb Grafana Data Source also allows users to unlock the power of Grafana’s new Histogram and Heatmap visualizations. IRONdb is the only time series database to store histogram data, which allows users to access a wealth of metric metadata other than just aggregated percentiles and averages. To allow users to get up and running with these visualizations, we will be open sourcing our IRONdb RED Grafana dashboard. RED (Rate, Errors, Duration) dashboards surface all the crucial metrics needed to visualize microservice health.



In addition, we are pleased to announce that we will be open sourcing our IRONdb USE Grafana dashboard. USE (Utilization, Saturation, Errors) dashboards surface all the crucial metrics needed to visualize system host health.



Introducing IRONdb

Software is eating the world. Devices that run that software are ubiquitous and multiplying rapidly. Without adequate monitoring on these services, operators are mostly flying blind, either relying on customers to report issues or manually jumping on boxes and spot checking. Centralized collection of telemetry data is becoming even more important than it ever was. It is becoming significantly harder to monitor the volume of telemetry generated by the sheer number of devices and the increasingly elastic architecture of modern infrastructure. On top of this, your telemetry data store needs to be extremely reliable and performant in today’s world. The last thing you need while researching and diagnosing a production issue is for the stethoscope into your system to break down or introduce debilitating delays to diagnosing the problem. In a lot of ways, it is the most important piece of infrastructure you can run.

Circonus has been delivering a Time Series Database (TSDB) to our on-prem customers for many years, specifically architected to be reliable, scalable, and fast. Many of these on-prem customers run large installations with millions of time-series and Circonus’s TSDB has been meeting these demands for years. What has been missing up until today was the ability to ingest other sources of data, and to interoperate with other data collection tools and monitoring systems already in place.

Today, starting with Graphite, that is no longer a barrier.


IRONdb is Circonus’s internally developed TSDB, now with extensions to support the Graphite data format on ingestion and with interoperability with Graphite-web (and Grafana by extension). It is a drop-in solution for organizations struggling to scale Graphite or frustrated with maintaining a high availability metrics infrastructure during surges and outages and even routine maintenance. Highly scalable and robust, IRONdb has support for replication and clustering for redundancy of data, is multi-data center capable, and comes with a full suite of administration tools.


  • Replication – Don’t lose data during routine maintenance or outages.
  • Multi-DC Capabilities – Don’t lose data if Amazon has an outage in us-east-1.
  • Performance – It’s fast to write and equally fast to read.
  • Data Robustness and compression – Keep more data and don’t worry about corruption.
  • Administration tools – View system health and show op latencies.
  • Graphite-web interoperability – Plug it right into your existing tooling.


One of the major issues with data management in Graphite or other time series databases is the management of nodes. If you lose a node or have to take it down intentionally for maintenance, what happens to all your data during this outage? Sadly, for many users of TSDBs, they live with the outage and any poll based alerts that happen to trigger are just part of the quirks of that system.

IRONdb completely side steps this problem by keeping multiple copies of your data in the cluster of IRONdb nodes. As data arrives, we store the data on the local node if we determine that it belongs there based on a consistent hash of the incoming metric name, and we also journal it out to other nodes based on your configuration settings for the number of copies you want to keep. We then replicate this data to the other nodes through a background thread process. You can see and hear more details about this process in Circonus Founder and CEO, Theo Schlossnagle’s talk on the Architecture of an Analytics and Storage Engine for Time Series Data. Theo and I discuss more TSDB design in this video:

The HTTP POST of data to the Graphite handler in IRONdb is guaranteed to not respond with a 200 OK until data has both been committed locally and written to the journals for the other nodes it belongs on.

The Graphite network listener cannot make these same claims because there is no acknowledgement mechanism in a plain network socket. It is provided for interoperability purposes, but keep in mind you can lose data if you employ this method of ingestion.

Multi Data Center Capabilities

As an extension of replication, IRONdb can be deployed in a sided configuration which makes it aware that a piece of data must reside on nodes on both sides of the topology.

A piece of data which was destined for n2-2 in the above diagram would be guaranteed to be replicated to a node on the other side of the ring in “Availability Zone 2”. By setting up your IRONdb cluster in this way, you could lose an entire availability zone and still have all of your data available for querying and your cluster available for ingesting. When the downed nodes come back online, the journaled data that has been waiting for them is then replicated in the background until they catch back up.

Since IRONdb is a distributed database, it would not be complete if you had to know where the data was to ask for it. You can ask any node in the IRONdb cluster for a time series and range and it will satisfy as much of the query as it can using local data for performance reasons, and it will proxy to other nodes for time series that don’t live on that node. Keep in mind that in sided configurations where one side is geographically distant this can lead to the speed of light penalty for data fetches. We have plans in the pipeline to fix this weakness and to try to prefer local replicas if they exist, but for now, if your data centers are far apart and you use a sided config you will likely pay the speed of light RTT to proxy data fetches.


There are lies, damned lies, and benchmarks as the saying goes. I encourage you to take all of this with a grain of salt and also test for yourself. In the wild, the actual numbers you can achieve with IRONdb are dependent on hardware, your replication factor of data within the cluster, and what your data actually looks like on the way in.

All of that caveat aside, Influx data has created a nice benchmark suite to compare its time series database to other popular solutions in the open source world. This isn’t an exact analog to IRONdb because IRONdb does not yet support stream based tags (that is coming soon), but the test of ingestion speed with a fixed cardinality can be mimicked. Basically, I synthesized a Graphite metric name from the unique set of tags + fields + measurement name that the Influx benchmark suite uses. This ensured that IRONdb was ingesting the same unique set of metrics that influx-db, Cassandra, or OpenTSDB would ingest in the same test.

In the influx-data comparison of Influx-db vs. Cassandra, Cassandra achieved 100K metrics ingested per second and Influx achieved 470K metrics per second. I repeated this test using IRONdb:

The Cassandra and Influx-db numbers were pulled from their original post. I did not repeat the benchmark for these other 2 databases on this hardware.

IRONdb (single node) is on the same scale of Influx-db for the same ingestion set and maybe slightly faster. There is no info in the original Influx-data post about the hardware this test was run on. This initial test was from a remote sender sending data to IRONdb over HTTP. I then repeated this test sending data from localhost on the IRONdb node itself:

When eliminating the roundtrip penalty of the benchmark test suite, IRONdb goes significantly faster.

I ran the IRONdb test on a zone on a development box with the following configuration:


root@circonus:/root# psrinfo -vp
The physical processor has 8 cores and 16 virtual processors (0-7 16-23)
  The core has 2 virtual processors (0 16)
  The core has 2 virtual processors (1 17)
  The core has 2 virtual processors (2 18)
  The core has 2 virtual processors (3 19)
  The core has 2 virtual processors (4 20)
  The core has 2 virtual processors (5 21)
  The core has 2 virtual processors (6 22)
  The core has 2 virtual processors (7 23)
    x86 (GenuineIntel 306E4 family 6 model 62 step 4 clock 2600 MHz)
      Intel(r) Xeon(r) CPU E5-2650 v2 @ 2.60GHz
The physical processor has 8 cores and 16 virtual processors (8-15 24-31)
  The core has 2 virtual processors (8 24)
  The core has 2 virtual processors (9 25)
  The core has 2 virtual processors (10 26)
  The core has 2 virtual processors (11 27)
  The core has 2 virtual processors (12 28)
  The core has 2 virtual processors (13 29)
  The core has 2 virtual processors (14 30)
  The core has 2 virtual processors (15 31)
    x86 (GenuineIntel 306E4 family 6 model 62 step 4 clock 2600 MHz)
      Intel(r) Xeon(r) CPU E5-2650 v2 @ 2.60GHz


       0. c0t5000CCA05CCEDCDDd0 
       1. c0t5000CCA05CCF8421d0 
       2. c0t5000CCA07313BA35d0 
       3. c0t5000CCA073109B75d0 
       4. c0t5000CCA073111CCDd0 
       5. c0t5000CCA073156BFDd0 
       9. c3t55CD2E404C53549Bd0 

Configured in a 6 way stripe under ZFS with an L2ARC on a single SSD drive:

root@circonus:/root# zpool status data
  pool: data
 state: ONLINE
  scan: none requested

        NAME                     STATE     READ WRITE CKSUM
        data                     ONLINE       0     0     0
          c0t5000CCA07313BA35d0  ONLINE       0     0     0
          c0t5000CCA073156BFDd0  ONLINE       0     0     0
          c0t5000CCA073111CCDd0  ONLINE       0     0     0
          c0t5000CCA073109B75d0  ONLINE       0     0     0
          c0t5000CCA05CCEDCDDd0  ONLINE       0     0     0
          c0t5000CCA05CCF8421d0  ONLINE       0     0     0
          c3t55CD2E404C53549Bd0  ONLINE       0     0     0

The CPUs on this box were mostly bored during this ingestion test:

Being a 16 core box with hyper-threading on, there are 32 vCPUs to address here. The IRONdb process (called snowthd here) is eating about 5 of the cores (mostly in parsing the incoming ASCII text). The drives are pretty busy though:

Data Robustness and Compression

IRONdb runs on OmniOS. OmniOS uses ZFS as it’s file system. ZFS has many amazing features that keep your data safe and small. A few of the important ones:

“A 2012 research showed that neither any of the then-major and widespread filesystems (such as UFS, Ext,[12] XFS, JFS, or NTFS) nor hardware RAID (which has some issues with data integrity) provided sufficient protection against data corruption problems.[13][14][15][16] Initial research indicates that ZFS protects data better than earlier efforts.[17][18] It is also faster than UFS[19][20] and can be seen as its replacement.” – Wikipedia

“In addition to handling whole-disk failures, … can also detect and correct silent data corruption, offering “self-healing data”: when reading a … block, ZFS compares it against its checksum, and if the data disks did not return the right answer, ZFS reads the parity and then figures out which disk returned bad data. Then, it repairs the damaged data and returns good data to the requestor.” – Wikipedia

“ZFS is a 128-bit file system,[31][32] so it can address 1.84 × 1019 times more data than 64-bit systems such as Btrfs. The limitations of ZFS are designed to be so large that they should never be encountered in practice. For instance, fully populating a single zpool with 2128 bits of data would require 1024 3 TB hard disk drives.[33]” – Wikipedia

  • Compression

“Transparent filesystem compression. Supports LZJB, gzip[55] and LZ4.” – Wikipedia

In the above ingestion test, the resultant data occupied 5 bytes per data point due to LZ4 compression enabled on the zpool that the data was written to.

Our CEO has written about this before.

Administration tools

This admin UI gives you introspection into what each IRONdb node is doing. A thorough discussion of the Admin UI would require it’s own blog post, but at a high level you can see:

  • Current ingest rates and storage space info
  • Replication latency among nodes in the cluster
  • The topology of the cluster
  • Latency measurements of almost every single operation we perform in the database on the “Internals” tab

This last one also provides a nice histogram visualization of the distribution of each operation. Here is an example of the distribution of write operations for the Influx-data ingestion benchmark above:

The 50th to 75th percentile band of write operation latency for batch PUTS to IRONdb was between 570µs and 920µs

Graphite-web Interoperability

IRONdb is compatible with Graphite-web 0.10 and above. We have written a storage finder plugin for use with Graphite-web. Simply install this plugin and configure it to point to one or more of your IRONdb nodes and you can render metrics right from Graphite-web:

Or use Grafana with the Graphite datasource:

There is also a Grafana Datasource in the works which will expose even more of the power of IRONdb, so stay tuned!


IRONdb is a replacement for Whisper and Carbon-cache in Graphite that’s faster, more efficient, and easier to operate and scale.

Read the IRONdb Documentation or click below to sign up and install IRONdb today!