10 minute read

Introduction

When running time series data processing systems, we often need to scale data ingestion. Two common scenarios are:

  1. Granular Monitoring: Shortening a 1-minute collection interval to 1 second to increase data density.
  2. System Performance Testing: Stress-testing a Time Series Database (TSDB) with massive data to find its limits.

Initially, I took a simple approach: why not just copy the existing data multiple times? However, this “simple” method caused unexpected problems. It forced me to rethink the challenge from a completely different angle.

This article shares my trial-and-error process and the improved architecture we built as a result.

The First Try: Simple Multiplication

Initial Requirement and Intuitive Solution

Current: Low-density metric data (e.g., N-minute intervals)
Target: High-density metric data (e.g., M-second intervals)

When faced with this, my first thought was to simply “copy the existing data by the required multiplier.”

// Initial implementation: Simple multiplication
public void processData(List<Metric> metrics) {
    List<Metric> expandedMetrics = new ArrayList<>();
    
    for (Metric metric : metrics) {
        // Generate data by the target multiplier
        for (int multiplier = 0; multiplier < TARGET_MULTIPLIER; multiplier++) {
            Metric duplicated = metric.copy();
            duplicated.adjustTimestamp(multiplier);
            expandedMetrics.add(duplicated);
        }
    }
    
    // Send all data to the sink at once
    for (Metric expanded : expandedMetrics) {
        sink.write(expanded); // OOM occurs due to the massive data volume!
    }
}

Unexpected Problems

This simple code quickly revealed serious issues:

1. OutOfMemoryError (OOM)

Existing data volume: X
Multiplier: N
Result: X * N items loaded into memory simultaneously → OOM

2. Garbage Collection (GC) Pressure

  • The app created massive amounts of temporary objects.
  • This caused long GC pause times, which delayed the whole system.

3. Scalability Limitations

  • Memory usage grew exponentially as the multiplier increased.
  • We couldn’t use this method for tests that required even larger data volumes.

These problems led me to ask a fundamental question: “Is this approach truly practical?”

Analyzing the Problem and Finding a New Design

Limitations of the Existing Method

Why did the first approach fail? Here are the main reasons:

  1. Memory-centric thinking: We tried to load all data into memory before processing it.
  2. Batch processing mindset: We treated continuous streaming data as a single batch.
  3. Lack of realism: The design didn’t match how a real-world production environment works.

The key insight: “The code wasn’t the problem. The way we thought about the data was the problem.”

Root Cause Analysis: Two Different Test Objectives

Looking closer, our initial requirement actually hid two different test objectives:

1. The M-Dimension: Increasing Data Density

Objective: More granular monitoring.
Method: 1-minute intervals → 1-second intervals (temporal refinement).
Impact: More data points for the *same* time series.
Test Target: TSDB write throughput and storage capacity limits.

2. The N-Dimension: Increasing Cardinality

Objective: System performance testing (measuring TSDB limits).
Method: Diversifying unique identifiers (e.g., server IDs, instance IDs).
Impact: An N-fold increase in the number of unique time series, i.e., N-fold cardinality.
Test Target: TSDB performance for indexing, metadata processing, and label-based queries.

Cardinality = The number of unique time series (a combination of metric name + labels).
※ The timestamp does not affect cardinality.
※ Different unique identifiers are treated as distinct time series.

I failed initially because I didn’t understand this distinction. I only focused on “increasing data volume.”

This realization demanded a new approach. I needed to drop the “load everything into memory” mindset. Instead, I needed to embrace stream processing while handling both the M and N dimensions.

I decided to tackle the more complex M-dimension (increasing data density) first. I looked at two alternative approaches.

New Approaches for the M-Dimension: Memory Holding vs. Immediate Transformation

Note: The N-dimension (increasing cardinality) is easy to implement. We simply add unique identifiers to the data inside each processing cluster. Therefore, we will focus on the harder M-dimension here.

Approach 1: The Memory Holding Method

My first idea was to distribute the data perfectly over time.

// Pseudocode: Memory Holding Method
public class WindowedProcessor {
    private Map<String, List<Metric>> oneMinuteBuffer;
    
    public void process(Metric metric) {
        // Collect data for a 1-minute window
        buffer.add(metric);
        
        if (windowComplete()) {
            // Dispatch at precise intervals (e.g., every 1 second)
            sendAt("14:00:00", createMetrics(buffer, 0));
            sendAt("14:00:01", createMetrics(buffer, 1));
            sendAt("14:00:02", createMetrics(buffer, 2));
            // ... (interval is configurable)
        }
    }
}

Advantages:

  • It perfectly copies the time distribution pattern of high-density monitoring.
  • It provides a very accurate simulation.

Disadvantages:

  • High memory usage: O(M × time_window).
  • It requires complex timers and state management.
  • We risk a memory explosion when we scale the N-dimension later.

Approach 2: The Immediate Transformation Method

// Pseudocode: Immediate Transformation Method
public class ImmediateProcessor {
    public void process(Metric metric) {
        LocalDateTime baseTime = metric.getTimestamp()
            .truncatedTo(ChronoUnit.MINUTES);
        
        // Immediately transform one metric into M metrics upon arrival
        for (int offset = 0; offset < INTERVAL_SECONDS; offset += TARGET_INTERVAL) {
            Metric transformed = metric.copy();
            transformed.setTimestamp(baseTime.plusSeconds(offset));
            emit(transformed);
        }
    }
}

Advantages:

  • Memory efficient: O(M).
  • Simple to code.
  • Scales very well.

Disadvantages:

  • It creates a sudden burst of data instead of spreading the load evenly over time.

The Critical Insight: “The Essence of Streaming”

While debating between the two approaches, I had an “Aha!” moment.

How Does Real-World Data Arrive?

I thought about how data actually arrives in a real monitoring environment.

Data flow in high-density monitoring:
Actual Arrival Time   Metric Data Content
──────────────────────────────────────────
12:00:03              server1_cpu (timestamp: 12:00:00)
12:00:07              server2_cpu (timestamp: 12:00:00)  
12:00:13              server1_cpu (timestamp: 12:00:10)
12:00:15              server3_cpu (timestamp: 12:00:00)
12:00:17              server2_cpu (timestamp: 12:00:10)
12:00:23              server1_cpu (timestamp: 12:00:20)
...

→ Characteristic: Each server sends data at its own, uncoordinated time.
→ Result: Data arrives as a continuous, irregular stream.

The Key Insight:

“In a real streaming environment, data arrives individually and continuously. Is there any good reason to buffer it and process it as a batch?”

This changed everything. In the real world:

  • Metrics arrive one by one.
  • They naturally spread out over time.
  • The system handles them as a stream, not a bulk batch.

Even though the Immediate Transformation method created small bursts, it actually mirrored real-world streaming much better than the buffering method!

Detailed Technical Review

Checking Memory Usage

Let’s look closely at the memory usage of the Immediate Transformation method.

public void process(Metric input) {
    // M objects are created momentarily
    for (int i = 0; i < M; i++) {
        Metric transformed = input.copy(); // M objects
        emit(transformed);
    }
}

The conclusion? It uses O(M) memory. Yes, it creates M objects, but it emits them immediately. The garbage collector can clean them up right away. This is vastly more efficient than holding onto O(M × time_window) data in memory.

Checking Data Volume Impact

Someone asked me, “If we increase M, does that also increase cardinality?”

The answer is no: Changing the timestamp interval does not increase cardinality.

Cardinality only depends on the metric name + labels.
The timestamp does not change cardinality.

Example:
- cpu_usage{server="web01"} → 1 time series
- If we collect this every 10 seconds instead of 1 minute, it is still only 1 time series.

However, increasing M definitely increases the number of data points, which affects our storage and raw write performance.

Final Architecture Decision

Selected Method: Immediate Transformation

In the end, I chose the Immediate Transformation method.

flowchart LR A["Source Topic
(Low-Density Metrics)"] --> B["Stream Processor
(Transformation Engine)"] B --> |"Immediate M-fold Transformation
(1 → M)"| C["Target Topic
(High-Density Metrics)"] C --> D["TSDB
(Time Series DB)"] subgraph "M-Dimension: Time Division" E["timestamp: T1
metric{labels}: value"] E --> F["timestamp: T1
metric{labels}: value"] E --> G["timestamp: T1+Δ
metric{labels}: value"] E --> H["timestamp: T1+2Δ
metric{labels}: value"] E --> I["..."] end B -.-> E style A fill:#e1f5fe style B fill:#f3e5f5 style C fill:#e8f5e8 style D fill:#fff3e0

How we scaled the N-Dimension: We ran processing engines in parallel.

  • Each engine: Created unique time series by altering label values.
  • Implementation: We added a unique identifier to the labels.
  • Example: metric{server="web01"} became metric{server="web01", metric_id="1"}, metric{server="web01", metric_id="2"}

Result: We generated N times the number of unique time series simply by changing the identifiers.

Why We Chose This

  1. Realism: The data flow matches a real-world streaming environment.
  2. Efficiency: It keeps memory usage low.
  3. Simplicity: We avoided complex state and timer management.
  4. Scalability: It runs smoothly even when we crank up the N-dimension.
  5. Test Objective: It perfectly hits our goal of generating massive data volumes.

The Integrated M×N Scaling Strategy

I built a two-dimensional strategy to control data volume and cardinality independently.

Why test both dimensions?

  • M-Dimension (Data Density): Tests pure data throughput. It measures how fast the TSDB can write data and how much disk space it uses.
  • N-Dimension (Cardinality): Tests the TSDB’s indexing and metadata engine. High cardinality usually breaks TSDBs faster than raw data volume.

We must separate them because a TSDB handles raw data points very differently than it handles unique time series indexes.

M-Dimension: Increasing Data Points

// M-Implementation: Timestamp division
LocalDateTime baseTime = metric.getTimestamp().truncatedTo(ChronoUnit.MINUTES);

for (int offset = 0; offset < INTERVAL_SECONDS; offset += TARGET_INTERVAL) {
    Metric transformed = metric.copy();
    transformed.setTimestamp(baseTime.plusSeconds(offset));
    emit(transformed); // Generates M times the data points
}

Effect:

  • Cardinality Impact: None (same metric + labels).
  • Data Volume Impact: M-fold increase.

N-Dimension: Increasing Cardinality

// N-Implementation: Diversifying time series via unique identifiers
public class TimeSeriesIdentifierTransformer {
    private final int metricId;
    
    public TimeSeriesIdentifierTransformer(int metricId) {
        this.metricId = metricId;
    }
    
    public Metric transform(Metric input) {
        Metric transformed = input.copy();
        
        // Add a new unique identifier to the labels
        transformed.addLabel("metric_id", String.valueOf(metricId));
        
        return transformed; // Generates N-fold cardinality
    }
}

Effect:

  • Cardinality Impact: N-fold increase (creates new time series).
  • Data Volume Impact: N-fold increase.

Putting It Together

// M×N Integrated Implementation
for (int timeOffset = 0; timeOffset < M; timeOffset++) {
    for (int identifierOffset = 0; identifierOffset < N; identifierOffset++) {
        Metric transformed = metric.copy();
        
        // M: Refine the timestamp (increase data density)
        transformed.setTimestamp(baseTime.plusSeconds(timeOffset * TARGET_INTERVAL));
        
        // N: Diversify the unique identifier (increase cardinality)
        transformed.addLabel("metric_id", 
            String.valueOf(identifierOffset));
        
        emit(transformed); // Total data volume is M×N
    }
}

Final Effect:

Total Data Volume = Original Volume × M × N
Total Cardinality = Original Cardinality × N (M has no effect)

How This Actually Impacts the TSDB

Let’s look at exactly what happens to the database when we turn these dials.

Impact of M-Dimension Scaling

What it stresses:

  • Write I/O: The disks must write more data points per second.
  • Network Bandwidth: The network must transfer M times more data.
  • Disk Storage: The disk fills up M times faster.
  • Compression Efficiency: Because data points arrive closer together, compression often improves.

What we monitor:

- Write throughput (points/sec)
- Disk usage growth rate
- Memory usage for write buffers
- Query response time for time-range queries

Impact of N-Dimension Scaling

What it stresses:

  • Index Memory: The TSDB must create and hold indexes for new label combinations in RAM.
  • Metadata Management: The system does N times more work to discover and manage series.
  • Label-based Search: Regex queries like {server="web01_virtual_*"} become much slower.
  • Aggregate Queries: GROUP BY operations must scan N times more series.

What we monitor:

- Memory usage for the series index
- Label query performance (milliseconds)
- Cardinality limit warnings
- Query planning time for complex aggregations

Integrated Load Testing Scenarios

With the M×N strategy, we can run targeted scenarios to find exact weaknesses:

Scenario 1: M=60, N=1  → High data density, existing cardinality
Scenario 2: M=1, N=100 → Existing density, high cardinality
Scenario 3: M=10, N=10 → Balanced load test

This lets us find out exactly which part of the TSDB breaks first.

Lessons Learned

1. Understand the Real Goal

It was important to look past the basic request of “make more data.” The true goal was measuring the performance limits of the TSDB.

2. Realism Over Perfection

Building a system that behaves like the real world is much more valuable than building a theoretically “perfect” but unnatural simulation.

3. Weigh the Trade-offs

I learned to constantly weigh performance against complexity, and perfection against practicality.

4. Talk to Your Team

Discussing these ideas with colleagues helped me fix blind spots I never would have seen on my own.

5. Take Small Steps

Starting with a simple multiplication idea and slowly refining it worked much better than trying to design a massive, complex system on day one.

Conclusion

This experience taught me how vital it is to understand how data flows over time.

It’s easy to fall into the trap of over-engineering a solution. This process reminded me that the best development involves finding the core problem, matching the real-world environment, and validating ideas with your team.


Part 2 Preview

In Part 2, I’ll explain how we tried to implement this M×N strategy in a real Proof of Concept (PoC) environment.

  • Technology Selection: Why we compared Kafka Streams and Flink.
  • PoC Architecture Design: How we planned to build the M×N engine.
  • Validation: Did the “Immediate Transformation” approach actually work in practice?

Spoiler Alert: When I showed this to my team, their feedback led us to scrap the Flink idea entirely. We pivoted to a completely different, much simpler approach. Discover how teamwork turned a complex architecture into a pragmatic solution in Part 2.


All technical content in this article is based on actual production experience. Specific system names and configuration values have been generalized for security.

Leave a comment