Scaling Time Series Data Processing: A Pivot to Pragmatism (Part 2)

5 minute read

Introduction

In Part 1, I explained the “M×N” strategy for scaling time series data. My original idea was to build a dedicated stream processing layer using Flink or Kafka Streams. It was a clean, theoretically perfect design that kept data transformation completely separate from data ingestion.

However, a “theoretically perfect” design in a whiteboard session often clashes with the gritty reality of a production environment.

Before diving into the Proof of Concept (PoC) development, we conducted a rigorous architectural review. This review process proved to be incredibly valuable, leading us to completely scrap the Flink idea in favor of a brutally simple, highly pragmatic alternative.

This post details that architectural pivot. It highlights how critically evaluating operational constraints guided us away from a complex new system and toward an efficient, battle-tested solution.

The Architectural Review: Idealism vs. Operational Reality

During our design review, we stress-tested the Flink proposal against the realities of our infrastructure. As we analyzed the operational requirements, we quickly identified two critical risks:

Deployment and Operational Overhead: Building a new Flink job solely for a PoC meant provisioning new clusters, managing configurations, and maintaining a new deployment pipeline. This introduced significant delay before we could even begin our core task: testing the database.
Cross-IDC Network Latency: This was the dealbreaker. Our existing data writer service uses a custom parallel consumer specifically built to handle the severe network delays between our isolated data centers (IDCs). Standard Kafka consumers struggle in this environment. If we deployed Flink in a separate IDC, it would immediately hit this exact network bottleneck, invalidating our performance tests.

It became clear that introducing a new stream processing engine would force us to solve infrastructure and network problems before we could even test the Time Series Database (TSDB).

We needed a new direction.

The Pragmatic Pivot: Scaling the Existing Writer

To solve these constraints, we pivoted to an elegant and highly practical alternative: Why not embed the M×N transformation logic directly into our existing data writer service and scale that horizontally?

The concept was simple: Have each instance of the data writer service read the same raw data from Kafka, but configure each instance to generate a specific, non-overlapping slice of the final M×N data.

How It Works

Each writer instance handles both the M (time division) and N (cardinality expansion) dimensions internally. For example, one instance handles timestamps at 00s, 10s, and 20s. Another handles 30s, 40s, and 50s. Both instances also multiply the data to increase the cardinality (N) before writing it to the database.

// Inside the existing Data Writer Service
public class DataProcessor {
    private final List<Integer> assignedTimestampOffsets; // M: time division
    private final int cardinalityMultiplier;              // N: data volume expansion
    
    public void process(Metric metric) {
        LocalDateTime baseTime = metric.getTimestamp().truncatedTo(ChronoUnit.MINUTES);
        
        // M: Time division - split into multiple timestamps
        for (int offset : assignedTimestampOffsets) {
            // N: Cardinality expansion - increase data volume
            for (int instanceOffset = 0; instanceOffset < cardinalityMultiplier; instanceOffset++) {
                Metric transformed = metric.copy();
                transformed.setTimestamp(baseTime.plusSeconds(offset));
                transformed.setInstanceNo(metric.getInstanceNo() + instanceOffset);
                
                emitToWriterQueue(transformed);
            }
        }
    }
}

Configuration Example

// Writer Instance A: handles 00s, 10s, 20s with 6x cardinality
DataProcessor writerA = new DataProcessor(
    Arrays.asList(0, 10, 20),  // M: 3 timestamps
    6                           // N: 6x data volume
);

// Writer Instance B: handles 30s, 40s, 50s with 6x cardinality  
DataProcessor writerB = new DataProcessor(
    Arrays.asList(30, 40, 50), // M: 6x data volume
    6                           // N: 6x data volume
);

Data Volume Calculation

Let’s do the math:

M = 6: We split 1-minute data into 6 timestamps (00s, 10s, 20s, 30s, 40s, 50s)
N = 6: We multiply each timestamp by 6 using different instance numbers.
Total Increase: M × N = 6 × 6 = 36x data volume

This approach achieved the exact same M×N scaling effect (36x data volume) as the complex Flink architecture, but completely bypassed the need for new infrastructure.

Most importantly, it leveraged a battle-tested service that had already solved our cross-IDC network latency issues.

Head-to-Head: Comparing the Architectures

We formalized the comparison to ensure we were making the right long-term trade-offs.

Aspect	Flink/Stream Processor Approach	Direct Writer Scaling Approach	Analysis
Time to Market (PoC)	Slow	Fast	No new infrastructure or deployment pipelines needed.
Operational Risk	High	Low	Introduces an unproven component vs. scaling a stable, known one.
Resource Efficiency	Low	High	Avoids duplicating data in a second Kafka topic, saving massive storage.
M×N Implementation	High	High	Both approaches successfully implement the scaling strategy.
Architectural Purity	High	Low	Concerns are cleanly separated vs. writer handling both transformation and I/O.
Network Resilience	Low	High	Flink would struggle with our IDC latency; the writer already handles it perfectly.

For our PoC, the decision was obvious. We happily traded textbook architectural “purity” for a massive gain in speed, resource efficiency, and network resilience.

Visual Comparison of Approaches

graph TB subgraph "Theoretical Approach (Complex Flink)" A1[Source Topic] --> B1[Flink Cluster 1] A1 --> B2[Flink Cluster 2] B1 --> C1[Target Topic] B2 --> C1 C1 --> D1[TSDB Writer] D1 --> E1[TSDB] style A1 fill:#ffcccc style C1 fill:#ffcccc style B1 fill:#ffcc99 style B2 fill:#ffcc99 end subgraph "Pragmatic Pivot (Direct Writer Scaling)" A2[Source Topic] --> D2[Writer Instance 1
00s, 10s, 20s] A2 --> D3[Writer Instance 2
30s, 40s, 50s] D2 --> E2[TSDB] D3 --> E2 style A2 fill:#ccffcc style D2 fill:#ccffcc style D3 fill:#ccffcc style E2 fill:#ccffcc end subgraph "Key Differences" F1["Flink: 2 Topics, High Latency Risk
Complex, Resource-Heavy"] F2["Direct: 1 Topic, Latency Optimized
Simple, Resource-Efficient"] style F1 fill:#ffcccc style F2 fill:#ccffcc end

Conclusion: Architectural Pragmatism in Practice

This pivot from a theoretical ideal to a highly pragmatic solution reinforced several core principles of senior-level system design.

1. The Goal is Validation, Not Perfection

A “good” architecture is one that solves the business problem efficiently. Our goal was to test TSDB limits safely and quickly. By recognizing that building a perfect stream processing pipeline was a distraction from our actual goal, we saved weeks of engineering time.

2. Infrastructure Constraints Drive Design

You cannot design software in a vacuum. The specific network latency between our data centers was a hard constraint that immediately invalidated our theoretical Flink design. Truly robust architectures are built around—and optimized for—the unique limitations of their physical environments.

3. Complexity is a Cost

Every new component (like Flink) introduces operational overhead, deployment risk, and maintenance burdens. By strategically reusing and scaling an existing, battle-tested component, we achieved our high-throughput goal while keeping the system architecture as lean as possible.

This experience proved that the best architectures are rarely the most complex ones. They are the ones that balance technical rigor with operational reality to deliver results efficiently.

All technical content in this article is based on actual production experience. Specific system names and configuration values have been generalized for security.

Share on

X Facebook LinkedIn Bluesky

SeungHyeon Lee