<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://2nanoori.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://2nanoori.github.io/" rel="alternate" type="text/html" /><updated>2026-05-10T08:32:04+00:00</updated><id>https://2nanoori.github.io/feed.xml</id><title type="html">SeungHyeon Lee</title><subtitle>A passionate Cloud Platform &amp; Backend Engineer sharing insights on technology, algorithms, and software development.</subtitle><author><name>SeungHyeon Lee</name><email>tmdgus8490@gmail.com</email></author><entry><title type="html">Scaling Time Series Data Processing: Discovering TSDB Limits in Practice (Part 3)</title><link href="https://2nanoori.github.io/system%20architecture/work%20experience/scaling-timeseries-data-processing-breaking-100m-barrier-part3/" rel="alternate" type="text/html" title="Scaling Time Series Data Processing: Discovering TSDB Limits in Practice (Part 3)" /><published>2025-09-25T00:00:00+00:00</published><updated>2025-09-25T00:00:00+00:00</updated><id>https://2nanoori.github.io/system%20architecture/work%20experience/scaling-timeseries-data-processing-breaking-100m-barrier-part3</id><content type="html" xml:base="https://2nanoori.github.io/system%20architecture/work%20experience/scaling-timeseries-data-processing-breaking-100m-barrier-part3/"><![CDATA[<script src="https://unpkg.com/mermaid@9.4.3/dist/mermaid.min.js"></script>

<script>
  mermaid.initialize({
    startOnLoad: true,
    theme: 'default',
    securityLevel: 'loose',
    flowchart: {
      useMaxWidth: true,
      htmlLabels: true
    }
  });
</script>

<h2 id="prologue-testing-tsdb-limits-in-the-real-world">Prologue: Testing TSDB Limits in the Real World</h2>

<p>In <a href="/scaling-timeseries-data-processing-mx-n-strategy-part1">Part 1</a>, we introduced the M×N scaling strategy to test the performance limits of our Time Series Database (TSDB). In <a href="/scaling-timeseries-data-processing-mx-n-strategy-part2">Part 2</a>, we shared how our team’s feedback led us to drop a complex stream processing design in favor of a simpler, more practical approach using our existing data writer service.</p>

<p>Now comes the moment of truth: <strong>actually testing the TSDB’s limits and finding the real bottlenecks</strong>.</p>

<p>What we found changed how we look at system performance. The bottlenecks were not where we expected them to be. Fixing them required us to look far beyond our application code. This is the story of how we uncovered our database’s true limits and learned to scale beyond them.</p>

<h2 id="chapter-1-preparing-for-the-test">Chapter 1: Preparing for the Test</h2>

<h3 id="our-strategy">Our Strategy</h3>

<p>Our goal was simple: <strong>push the TSDB to its breaking point</strong>. Using our M×N strategy, we planned to turn up the data volume step by step until the system buckled.</p>

<p>Here is what we expected to see in theory:</p>

<ul>
  <li><strong>Base throughput</strong>: ~22M metrics/minute</li>
  <li><strong>M=6 (10-second intervals)</strong>: 22M × 6 = 132M expected</li>
  <li><strong>M=12 (5-second intervals)</strong>: 22M × 12 = 264M expected</li>
</ul>

<h3 id="the-first-obstacle-application-bottlenecks">The First Obstacle: Application Bottlenecks</h3>

<p>When we ran the tests, the results were sobering:</p>

<table>
  <thead>
    <tr>
      <th>Test</th>
      <th>Parallelism</th>
      <th>M</th>
      <th>Expected/min</th>
      <th>Actual/min</th>
      <th>Efficiency</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>4</td>
      <td>6</td>
      <td>130.2M</td>
      <td>~73M</td>
      <td><strong>56%</strong></td>
    </tr>
    <tr>
      <td>2</td>
      <td>6</td>
      <td>6</td>
      <td>130.2M</td>
      <td>~104M</td>
      <td><strong>80%</strong></td>
    </tr>
    <tr>
      <td>3</td>
      <td>6</td>
      <td>12</td>
      <td>260.4M</td>
      <td>~130M</td>
      <td><strong>50%</strong></td>
    </tr>
    <tr>
      <td>4</td>
      <td>6</td>
      <td>20</td>
      <td>434M</td>
      <td>~130M</td>
      <td><strong>30%</strong></td>
    </tr>
  </tbody>
</table>

<p><strong>The hard truth</strong>: When we pushed more data (increased M), our efficiency dropped fast. We hit a brick wall at about 140-145 million metrics per minute, no matter how we tweaked the application settings.</p>

<h3 id="what-was-going-wrong">What Was Going Wrong?</h3>

<p>We checked the TSDB monitoring dashboard, and to our surprise, the database was barely breaking a sweat:</p>
<ul>
  <li><strong>CPU Utilization</strong>: 20-40%</li>
  <li><strong>Peak Ingestion Rate</strong>: 18M datapoints/min</li>
  <li><strong>Memory/Disk</strong>: All perfectly normal</li>
</ul>

<p><strong>The realization</strong>: The database wasn’t the problem. Our own application was too slow to generate enough load to stress the database!</p>

<p>Before we could test the TSDB’s limits, we had to fix our own code.</p>

<h2 id="chapter-2-eliminating-application-bottlenecks">Chapter 2: Eliminating Application Bottlenecks</h2>

<h3 id="problem-1-blocking-io">Problem #1: Blocking I/O</h3>

<p><strong>The Issue</strong>: When our application sent data to the database, it stopped and waited for a response. During this waiting time, the thread was blocked and couldn’t process any new messages from Kafka.</p>

<p><strong>Before: Synchronous Architecture</strong></p>

<pre><code class="language-mermaid">&lt;div class="mermaid"&gt;
graph TD
    subgraph "Consumer Thread"
        A[Kafka Poll] --&gt; B[Data Amplification]
        B --&gt; C[timeSeriesService.write()]
        C -- "I/O Wait (seconds)" --&gt; D[Write Complete]
        D --&gt; A
    end
&lt;/div&gt;
</code></pre>

<p><strong>After: Asynchronous Architecture with Virtual Threads</strong></p>

<pre><code class="language-mermaid">&lt;div class="mermaid"&gt;
graph TD
    subgraph "Consumer Thread"
        A[Kafka Poll] --&gt; B[Data Amplification]
        B --&gt; Q[Submit Write Task]
        Q --&gt; A
    end

    subgraph "Virtual Thread Pool"
        Q --&gt; VT1[Virtual Thread 1: Write Batch 1]
        Q --&gt; VT2[Virtual Thread 2: Write Batch 2]
        Q --&gt; VTx[Virtual Thread N...]
    end
&lt;/div&gt;
</code></pre>

<p><strong>The Fix</strong>: We used Java 21’s Virtual Threads to make the database writes asynchronous. Now, the main thread simply hands the write task to a virtual thread and immediately goes back to fetching more data.</p>

<details>
<summary><b>Code Example: Asynchronous Write Implementation</b></summary>

```java
// Before: Synchronous blocking write
try {
    if(!timeSeries.isEmpty()) {
        timeSeriesService.write(storageTier, timeSeries); // Blocks here
    }
} catch (Exception e) {
    log.error("Write failed", e);
    // Retry logic...
}

// After: Asynchronous non-blocking write  
if (!timeSeries.isEmpty()) {
    writerExecutor.submit(() -&gt; {
        int retries = writeRetryCount;
        while (retries &gt;= 0) {
            try {
                timeSeriesService.write(storageTier, timeSeries);
                writtenCounter.addAndGet(timeSeries.size());
                break; // Success
            } catch (Exception e) {
                retries--;
                log.error("Write failed. Retries left: {}", retries, e);
                if (retries &gt;= 0) {
                    Thread.sleep(1000); // Retry delay
                }
            }
        }
    });
}
```
</details>

<h3 id="problem-2-memory-overload-from-object-creation">Problem #2: Memory Overload from Object Creation</h3>

<p><strong>The Issue</strong>: Even after fixing the I/O problem, our CPU usage was still too high. Why? Because we were creating millions of temporary <code class="language-plaintext highlighter-rouge">HashMap</code> objects every minute during the M×N data multiplication. This caused the Garbage Collector (GC) to work overtime, slowing down the whole system.</p>

<p><strong>The Fix</strong>: Instead of creating a new <code class="language-plaintext highlighter-rouge">HashMap</code> in every loop, we pre-created the required maps once and reused them. This cut our object creation down massively.</p>

<details>
<summary><b>Code Example: Memory Optimization</b></summary>

```java
// Before: Creating new HashMap in every loop iteration
for (int i = 0; i &lt; timeMultiplier; i++) {
    for (int instanceOffset = 0; instanceOffset &lt; instanceMultiplier; instanceOffset++) {
        // New HashMap created M×N times
        String newInstanceNo = generateNewInstanceNo(originalInstanceNo, instanceOffset);
        TimeSeries expanded = original.copyWithTimestampAndInstanceNo(newTimestamp, newInstanceNo);
        result.add(expanded);
    }
}

// After: Pre-create dimension variants (N times) and reuse
List&lt;Map&lt;String, String&gt;&gt; dimensionVariants = new ArrayList&lt;&gt;(instanceMultiplier);
for (int instanceOffset = 0; instanceOffset &lt; instanceMultiplier; instanceOffset++) {
    Map&lt;String, String&gt; variant = new HashMap&lt;&gt;(baseDimensions);
    variant.put("instanceNo", generateNewInstanceNo(originalInstanceNo, instanceOffset));
    dimensionVariants.add(variant);
}

// Main loop reuses pre-created Maps
for (int i = 0; i &lt; timeMultiplier; i++) {
    for (Map&lt;String, String&gt; dims : dimensionVariants) {
        TimeSeries expanded = TimeSeries.builder()
                .timestamp(newTimestamp)
                .dimensions(dims) // Reuse, don't recreate
                .value(original.getValue())
                .build();
        result.add(expanded);
    }
}
```
</details>

<h3 id="the-result-of-our-optimizations">The Result of Our Optimizations</h3>

<ul>
  <li><strong>Efficiency shot up</strong>: M=6 tests went from 56% to ~90% efficiency.</li>
  <li><strong>Stable processing</strong>: We comfortably hit ~120M metrics/minute.</li>
  <li><strong>Application bottlenecks gone</strong>: Our generator was finally ready.</li>
</ul>

<p><strong>Mission Accomplished</strong>: With our application running smoothly, it was time to find the database’s real limits.</p>

<h2 id="chapter-3-discovering-the-real-tsdb-bottleneck">Chapter 3: Discovering the Real TSDB Bottleneck</h2>

<h3 id="the-true-test">The True Test</h3>

<p>Now we could push harder. We cranked M up to 12, aiming for over 260M metrics per minute.</p>

<p><strong>The Surprise</strong>: Performance didn’t go up. It actually <strong>dropped</strong> to around 100M/minute. We also started seeing <code class="language-plaintext highlighter-rouge">Slow write request</code> warnings that lasted 7 to 9 seconds!</p>

<p>We had finally found the TSDB’s breaking point. But why was it breaking?</p>

<h3 id="the-smoking-gun-database-concurrency-limits">The Smoking Gun: Database Concurrency Limits</h3>

<p>We dug back into the TSDB dashboards:</p>
<ol>
  <li><strong>CPU Usage</strong>: Still only ~30%. The database wasn’t working too hard.</li>
  <li><strong>Concurrent Inserts</strong>: Here was the clue. Even though our application opened ~200 connections, the database was only processing <strong>16 write operations at a time</strong>.</li>
</ol>

<h3 id="the-root-cause">The Root Cause</h3>

<p>This wasn’t a bug; it was a built-in safety feature. Our TSDB (VictoriaMetrics) uses a component called <code class="language-plaintext highlighter-rouge">vminsert</code> which protects itself from overload by limiting concurrent processing to <code class="language-plaintext highlighter-rouge">CPU cores × 2</code>.</p>

<p><strong>The Math:</strong></p>
<ul>
  <li><strong>Our Database Servers</strong>: 8 CPU cores</li>
  <li><strong>Hard Limit</strong>: 8 × 2 = 16 concurrent operations</li>
</ul>

<p><strong>The Reality</strong>: Our hundreds of parallel write requests were piling up in the database’s waiting line. They had to wait for one of those 16 slots to open up. That waiting time is exactly what caused the 7-9 second delays.</p>

<h3 id="a-successful-discovery">A Successful Discovery</h3>

<p>This is exactly what we wanted to find. We discovered a hard, physical limit that would affect our production systems.</p>

<p><strong>What we learned:</strong></p>
<ul>
  <li><strong>The Limit</strong>: 16 simultaneous write operations.</li>
  <li><strong>The Cause</strong>: Database self-protection, not bad application code.</li>
  <li><strong>The Solution</strong>: Adding more application servers wouldn’t help. We had to upgrade the database hardware.</li>
</ul>

<h2 id="chapter-4-scaling-beyond-the-limits">Chapter 4: Scaling Beyond the Limits</h2>

<h3 id="breaking-the-ceiling">Breaking the Ceiling</h3>

<p>Now we knew the problem: the 16-slot concurrency limit. Since this limit is tied to CPU cores, the only way to get more slots was to get more cores.</p>

<h3 id="the-pragmatic-move-a-massive-hardware-upgrade">The Pragmatic Move: A Massive Hardware Upgrade</h3>

<p>Instead of trying to hack the database settings, we chose a straightforward infrastructure upgrade:</p>

<p><strong>The Solution: High-Performance Physical Servers</strong></p>
<ul>
  <li><strong>What we found</strong>: 8 idle Physical Machines (PMs) were available in our datacenter.</li>
  <li><strong>The Specs</strong>: 40 cores and 256GB RAM each! (A huge jump from our 8-core VMs).</li>
  <li><strong>The Win</strong>: This solved both our compute needs and our concurrency limits at the same time.</li>
</ul>

<p><strong>The Migration Impact:</strong></p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Before: Virtual Machines (VMs)</span>
<span class="na">Servers</span><span class="pi">:</span> <span class="s">8 × VMs (8 cores, 32GB)</span>
<span class="na">Concurrency Limit</span><span class="pi">:</span> <span class="s">8 × 2 = 16 slots per server</span>
<span class="na">Capacity</span><span class="pi">:</span> <span class="s">~120M metrics/min</span>

<span class="c1"># After: Physical Machines (PMs)</span>
<span class="na">Servers</span><span class="pi">:</span> <span class="s">8 × PMs (40 cores, 256GB)</span>
<span class="na">Concurrency Limit</span><span class="pi">:</span> <span class="s">40 × 2 = 80 slots per server</span>
<span class="na">Expected Jump</span><span class="pi">:</span> <span class="s">A massive 5x increase in capacity!</span>
</code></pre></div></div>

<h3 id="why-this-was-the-right-call">Why This Was the Right Call</h3>

<p>This decision highlights some great lessons about real-world engineering:</p>

<ol>
  <li><strong>Hardware over Hacking</strong>: Sometimes, using powerful, idle hardware is much smarter than spending weeks writing complex software workarounds.</li>
  <li><strong>Whole-System Thinking</strong>: We didn’t just scale our app; we scaled the underlying compute and database concurrency limits together.</li>
  <li><strong>Speed to Value</strong>: Reusing idle servers gave us an immediate performance boost without waiting for new budgets or orders.</li>
</ol>

<p>This upgrade instantly gave us 5x more capacity and set us up perfectly for our final tests.</p>

<h2 id="chapter-5-lessons-learned">Chapter 5: Lessons Learned</h2>

<h3 id="technical-takeaways">Technical Takeaways</h3>

<ol>
  <li><strong>Async I/O is Magic</strong>: Virtual Threads are an incredible tool for fixing I/O-heavy bottlenecks.</li>
  <li><strong>Watch Your Objects</strong>: Reusing objects (like our HashMaps) can save your Garbage Collector and speed up your app.</li>
  <li><strong>Know Your Limits</strong>: Systems have hard limits built in for protection. You need to find them.</li>
  <li><strong>Dashboards Don’t Lie</strong>: We never would have found the 16-slot limit without good monitoring.</li>
</ol>

<h3 id="architectural-rules-of-thumb">Architectural Rules of Thumb</h3>

<ol>
  <li><strong>Fix Your Own House First</strong>: Optimize your application before blaming the database.</li>
  <li><strong>Test With Real Data</strong>: Guesses are dangerous. Measure everything.</li>
  <li><strong>Respect Boundaries</strong>: Understand how the systems you connect to actually work.</li>
  <li><strong>Scale Boldly</strong>: When you hit a hard hardware limit, don’t be afraid to upgrade the hardware.</li>
</ol>

<h3 id="why-simple-is-better">Why Simple is Better</h3>

<p>Choosing the “Direct Writer” approach back in Part 2 proved to be a brilliant move:</p>

<ul>
  <li><strong>Speed</strong>: We spent our time optimizing code, not setting up Flink clusters.</li>
  <li><strong>Familiarity</strong>: Because we used our existing code, debugging was fast and easy.</li>
  <li><strong>Clear Progress</strong>: Every tweak we made showed immediate, measurable results.</li>
</ul>

<h2 id="conclusion-1-second-granularity-achieved-and-tsdb-choice-validated">Conclusion: 1-Second Granularity Achieved and TSDB Choice Validated</h2>

<p>The journey from the M×N strategy concept to successfully testing our TSDB’s limits was challenging but incredibly rewarding. Through application-level optimizations and strategic infrastructure scaling, we achieved our ultimate goal.</p>

<p><strong>Mission Accomplished: From 60 Seconds to 1 Second</strong>
We successfully completed the PoC, proving our system can seamlessly handle the transition from a 60-second collection interval down to a 1-second interval (M=60). By upgrading our infrastructure to 40-core Physical Machines, we broke through the initial concurrency bottlenecks and confidently absorbed the massive influx of over 400M+ metrics per minute.</p>

<p><strong>Validating Our TSDB Choice (VictoriaMetrics)</strong>
Perhaps the most important outcome of this extreme load testing was the validation of our core architectural decision. Hitting the <code class="language-plaintext highlighter-rouge">vminsert</code> concurrency limit early on wasn’t a failure of the database; rather, it was a testament to its self-protective design. Once provided with the appropriate hardware, VictoriaMetrics demonstrated incredible vertical scalability and raw ingestion efficiency.</p>

<p>It proved beyond a doubt that it can handle the punishing load of 1-second granularity metrics across our entire cloud infrastructure without breaking a sweat. This PoC gave us the definitive answer we needed: <strong>our choice of VictoriaMetrics as our TSDB was absolutely the right one.</strong></p>

<hr />

<p><strong>What’s Next?</strong> With our ingestion capabilities proven for 1-second intervals, we’re now ready to tackle the next challenge: optimizing for cardinality explosion and complex query patterns on the read side. But that’s a story for another day.</p>

<hr />

<p><em>This concludes our three-part series on scaling time series data processing. From initial strategy through team collaboration to discovering real-world database constraints, we’ve covered the complete journey from theoretical concept to practical system success.</em></p>]]></content><author><name>SeungHyeon Lee</name><email>tmdgus8490@gmail.com</email></author><category term="System Architecture" /><category term="Work Experience" /><category term="Time Series" /><category term="Data Processing" /><category term="System Design" /><category term="Performance Optimization" /><category term="TSDB" /><category term="Load Testing" /><category term="Virtual Threads" /><category term="Memory Optimization" /><category term="Scalability" /><category term="Architecture Patterns" /><summary type="html"><![CDATA[The journey from theory to reality: implementing the M×N strategy, discovering the real TSDB bottlenecks, and overcoming system-level limitations through infrastructure scaling.]]></summary></entry><entry><title type="html">Scaling Time Series Data Processing: A Pivot to Pragmatism (Part 2)</title><link href="https://2nanoori.github.io/system%20architecture/work%20experience/scaling-timeseries-data-processing-mx-n-strategy-part2/" rel="alternate" type="text/html" title="Scaling Time Series Data Processing: A Pivot to Pragmatism (Part 2)" /><published>2025-08-27T00:00:00+00:00</published><updated>2025-08-27T00:00:00+00:00</updated><id>https://2nanoori.github.io/system%20architecture/work%20experience/scaling-timeseries-data-processing-mx-n-strategy-part2</id><content type="html" xml:base="https://2nanoori.github.io/system%20architecture/work%20experience/scaling-timeseries-data-processing-mx-n-strategy-part2/"><![CDATA[<script src="https://unpkg.com/mermaid@9.4.3/dist/mermaid.min.js"></script>

<script>
  mermaid.initialize({
    startOnLoad: true,
    theme: 'default',
    securityLevel: 'loose',
    flowchart: {
      useMaxWidth: true,
      htmlLabels: true
    }
  });
</script>

<h2 id="introduction">Introduction</h2>

<p>In <a href="/system%20architecture/work%20experience/scaling-timeseries-data-processing-mx-n-strategy-part1/">Part 1</a>, I explained the “M×N” strategy for scaling time series data. My original idea was to build a dedicated stream processing layer using Flink or Kafka Streams. It was a clean, theoretically perfect design that kept data transformation completely separate from data ingestion.</p>

<p>However, a “theoretically perfect” design in a whiteboard session often clashes with the gritty reality of a production environment.</p>

<p>Before diving into the Proof of Concept (PoC) development, we conducted a rigorous architectural review. This review process proved to be incredibly valuable, leading us to completely scrap the Flink idea in favor of a brutally simple, highly pragmatic alternative.</p>

<p>This post details that architectural pivot. It highlights how critically evaluating operational constraints guided us away from a complex new system and toward an efficient, battle-tested solution.</p>

<h2 id="the-architectural-review-idealism-vs-operational-reality">The Architectural Review: Idealism vs. Operational Reality</h2>

<p>During our design review, we stress-tested the Flink proposal against the realities of our infrastructure. As we analyzed the operational requirements, we quickly identified two critical risks:</p>

<ol>
  <li><strong>Deployment and Operational Overhead</strong>: Building a new Flink job solely for a PoC meant provisioning new clusters, managing configurations, and maintaining a new deployment pipeline. This introduced significant delay before we could even begin our core task: testing the database.</li>
  <li><strong>Cross-IDC Network Latency</strong>: This was the dealbreaker. Our existing <code class="language-plaintext highlighter-rouge">data writer service</code> uses a custom parallel consumer specifically built to handle the severe network delays between our isolated data centers (IDCs). Standard Kafka consumers struggle in this environment. If we deployed Flink in a separate IDC, it would immediately hit this exact network bottleneck, invalidating our performance tests.</li>
</ol>

<p>It became clear that introducing a new stream processing engine would force us to solve infrastructure and network problems before we could even test the Time Series Database (TSDB).</p>

<p>We needed a new direction.</p>

<h2 id="the-pragmatic-pivot-scaling-the-existing-writer">The Pragmatic Pivot: Scaling the Existing Writer</h2>

<p>To solve these constraints, we pivoted to an elegant and highly practical alternative: <strong>Why not embed the M×N transformation logic directly into our existing <code class="language-plaintext highlighter-rouge">data writer service</code> and scale that horizontally?</strong></p>

<p>The concept was simple: Have each instance of the <code class="language-plaintext highlighter-rouge">data writer service</code> read the same raw data from Kafka, but configure each instance to generate a specific, non-overlapping slice of the final M×N data.</p>

<h4 id="how-it-works">How It Works</h4>

<p>Each writer instance handles both the <strong>M (time division)</strong> and <strong>N (cardinality expansion)</strong> dimensions internally. 
For example, one instance handles timestamps at 00s, 10s, and 20s. Another handles 30s, 40s, and 50s. Both instances also multiply the data to increase the cardinality (N) before writing it to the database.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Inside the existing Data Writer Service</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">DataProcessor</span> <span class="o">{</span>
    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">List</span><span class="o">&lt;</span><span class="nc">Integer</span><span class="o">&gt;</span> <span class="n">assignedTimestampOffsets</span><span class="o">;</span> <span class="c1">// M: time division</span>
    <span class="kd">private</span> <span class="kd">final</span> <span class="kt">int</span> <span class="n">cardinalityMultiplier</span><span class="o">;</span>              <span class="c1">// N: data volume expansion</span>
    
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">process</span><span class="o">(</span><span class="nc">Metric</span> <span class="n">metric</span><span class="o">)</span> <span class="o">{</span>
        <span class="nc">LocalDateTime</span> <span class="n">baseTime</span> <span class="o">=</span> <span class="n">metric</span><span class="o">.</span><span class="na">getTimestamp</span><span class="o">().</span><span class="na">truncatedTo</span><span class="o">(</span><span class="nc">ChronoUnit</span><span class="o">.</span><span class="na">MINUTES</span><span class="o">);</span>
        
        <span class="c1">// M: Time division - split into multiple timestamps</span>
        <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">offset</span> <span class="o">:</span> <span class="n">assignedTimestampOffsets</span><span class="o">)</span> <span class="o">{</span>
            <span class="c1">// N: Cardinality expansion - increase data volume</span>
            <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">instanceOffset</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">instanceOffset</span> <span class="o">&lt;</span> <span class="n">cardinalityMultiplier</span><span class="o">;</span> <span class="n">instanceOffset</span><span class="o">++)</span> <span class="o">{</span>
                <span class="nc">Metric</span> <span class="n">transformed</span> <span class="o">=</span> <span class="n">metric</span><span class="o">.</span><span class="na">copy</span><span class="o">();</span>
                <span class="n">transformed</span><span class="o">.</span><span class="na">setTimestamp</span><span class="o">(</span><span class="n">baseTime</span><span class="o">.</span><span class="na">plusSeconds</span><span class="o">(</span><span class="n">offset</span><span class="o">));</span>
                <span class="n">transformed</span><span class="o">.</span><span class="na">setInstanceNo</span><span class="o">(</span><span class="n">metric</span><span class="o">.</span><span class="na">getInstanceNo</span><span class="o">()</span> <span class="o">+</span> <span class="n">instanceOffset</span><span class="o">);</span>
                
                <span class="n">emitToWriterQueue</span><span class="o">(</span><span class="n">transformed</span><span class="o">);</span>
            <span class="o">}</span>
        <span class="o">}</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<h4 id="configuration-example">Configuration Example</h4>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Writer Instance A: handles 00s, 10s, 20s with 6x cardinality</span>
<span class="nc">DataProcessor</span> <span class="n">writerA</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">DataProcessor</span><span class="o">(</span>
    <span class="nc">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">10</span><span class="o">,</span> <span class="mi">20</span><span class="o">),</span>  <span class="c1">// M: 3 timestamps</span>
    <span class="mi">6</span>                           <span class="c1">// N: 6x data volume</span>
<span class="o">);</span>

<span class="c1">// Writer Instance B: handles 30s, 40s, 50s with 6x cardinality  </span>
<span class="nc">DataProcessor</span> <span class="n">writerB</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">DataProcessor</span><span class="o">(</span>
    <span class="nc">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="mi">30</span><span class="o">,</span> <span class="mi">40</span><span class="o">,</span> <span class="mi">50</span><span class="o">),</span> <span class="c1">// M: 6x data volume</span>
    <span class="mi">6</span>                           <span class="c1">// N: 6x data volume</span>
<span class="o">);</span>
</code></pre></div></div>

<h4 id="data-volume-calculation">Data Volume Calculation</h4>

<p>Let’s do the math:</p>
<ul>
  <li><strong>M = 6</strong>: We split 1-minute data into 6 timestamps (00s, 10s, 20s, 30s, 40s, 50s)</li>
  <li><strong>N = 6</strong>: We multiply each timestamp by 6 using different instance numbers.</li>
  <li><strong>Total Increase</strong>: M × N = 6 × 6 = <strong>36x data volume</strong></li>
</ul>

<p>This approach achieved the exact same M×N scaling effect (36x data volume) as the complex Flink architecture, but completely bypassed the need for new infrastructure.</p>

<p>Most importantly, it leveraged a battle-tested service that had already solved our cross-IDC network latency issues.</p>

<h2 id="head-to-head-comparing-the-architectures">Head-to-Head: Comparing the Architectures</h2>

<p>We formalized the comparison to ensure we were making the right long-term trade-offs.</p>

<table>
  <thead>
    <tr>
      <th>Aspect</th>
      <th>Flink/Stream Processor Approach</th>
      <th>Direct Writer Scaling Approach</th>
      <th>Analysis</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Time to Market (PoC)</strong></td>
      <td>Slow</td>
      <td><strong>Fast</strong></td>
      <td>No new infrastructure or deployment pipelines needed.</td>
    </tr>
    <tr>
      <td><strong>Operational Risk</strong></td>
      <td>High</td>
      <td><strong>Low</strong></td>
      <td>Introduces an unproven component vs. scaling a stable, known one.</td>
    </tr>
    <tr>
      <td><strong>Resource Efficiency</strong></td>
      <td>Low</td>
      <td><strong>High</strong></td>
      <td>Avoids duplicating data in a second Kafka topic, saving massive storage.</td>
    </tr>
    <tr>
      <td><strong>M×N Implementation</strong></td>
      <td><strong>High</strong></td>
      <td><strong>High</strong></td>
      <td>Both approaches successfully implement the scaling strategy.</td>
    </tr>
    <tr>
      <td><strong>Architectural Purity</strong></td>
      <td><strong>High</strong></td>
      <td>Low</td>
      <td>Concerns are cleanly separated vs. writer handling both transformation and I/O.</td>
    </tr>
    <tr>
      <td><strong>Network Resilience</strong></td>
      <td>Low</td>
      <td><strong>High</strong></td>
      <td>Flink would struggle with our IDC latency; the writer already handles it perfectly.</td>
    </tr>
  </tbody>
</table>

<p>For our PoC, the decision was obvious. We happily traded textbook architectural “purity” for a massive gain in speed, resource efficiency, and network resilience.</p>

<h4 id="visual-comparison-of-approaches">Visual Comparison of Approaches</h4>

<div class="mermaid">
graph TB
    subgraph "Theoretical Approach (Complex Flink)"
        A1[Source Topic] --&gt; B1[Flink Cluster 1]
        A1 --&gt; B2[Flink Cluster 2]
        B1 --&gt; C1[Target Topic]
        B2 --&gt; C1
        C1 --&gt; D1[TSDB Writer]
        D1 --&gt; E1[TSDB]
        
        style A1 fill:#ffcccc
        style C1 fill:#ffcccc
        style B1 fill:#ffcc99
        style B2 fill:#ffcc99
    end
    
    subgraph "Pragmatic Pivot (Direct Writer Scaling)"
        A2[Source Topic] --&gt; D2[Writer Instance 1<br />00s, 10s, 20s]
        A2 --&gt; D3[Writer Instance 2<br />30s, 40s, 50s]
        D2 --&gt; E2[TSDB]
        D3 --&gt; E2
        
        style A2 fill:#ccffcc
        style D2 fill:#ccffcc
        style D3 fill:#ccffcc
        style E2 fill:#ccffcc
    end
    
    subgraph "Key Differences"
        F1["Flink: 2 Topics, High Latency Risk<br />Complex, Resource-Heavy"]
        F2["Direct: 1 Topic, Latency Optimized<br />Simple, Resource-Efficient"]
        
        style F1 fill:#ffcccc
        style F2 fill:#ccffcc
    end
</div>

<h2 id="conclusion-architectural-pragmatism-in-practice">Conclusion: Architectural Pragmatism in Practice</h2>

<p>This pivot from a theoretical ideal to a highly pragmatic solution reinforced several core principles of senior-level system design.</p>

<h4 id="1-the-goal-is-validation-not-perfection">1. The Goal is Validation, Not Perfection</h4>
<p>A “good” architecture is one that solves the business problem efficiently. Our goal was to test TSDB limits safely and quickly. By recognizing that building a perfect stream processing pipeline was a distraction from our actual goal, we saved weeks of engineering time.</p>

<h4 id="2-infrastructure-constraints-drive-design">2. Infrastructure Constraints Drive Design</h4>
<p>You cannot design software in a vacuum. The specific network latency between our data centers was a hard constraint that immediately invalidated our theoretical Flink design. Truly robust architectures are built around—and optimized for—the unique limitations of their physical environments.</p>

<h4 id="3-complexity-is-a-cost">3. Complexity is a Cost</h4>
<p>Every new component (like Flink) introduces operational overhead, deployment risk, and maintenance burdens. By strategically reusing and scaling an existing, battle-tested component, we achieved our high-throughput goal while keeping the system architecture as lean as possible.</p>

<p>This experience proved that the best architectures are rarely the most complex ones. They are the ones that balance technical rigor with operational reality to deliver results efficiently.</p>

<hr />
<p><em>All technical content in this article is based on actual production experience. Specific system names and configuration values have been generalized for security.</em></p>]]></content><author><name>SeungHyeon Lee</name><email>tmdgus8490@gmail.com</email></author><category term="System Architecture" /><category term="Work Experience" /><category term="Time Series" /><category term="Data Processing" /><category term="System Design" /><category term="Stream Processing" /><category term="TSDB" /><category term="Load Testing" /><category term="Cardinality" /><category term="Scalability" /><category term="Performance Optimization" /><category term="Architecture Patterns" /><category term="Collaboration" /><summary type="html"><![CDATA[The story of how team feedback transformed a complex stream processing architecture into a simple, pragmatic solution for our PoC, with a deep dive into the trade-offs and lessons learned.]]></summary></entry><entry><title type="html">Scaling Time Series Data Processing: The M×N Strategy and a Stream-Based Approach (Part 1)</title><link href="https://2nanoori.github.io/system%20architecture/work%20experience/scaling-timeseries-data-processing-mx-n-strategy-part1/" rel="alternate" type="text/html" title="Scaling Time Series Data Processing: The M×N Strategy and a Stream-Based Approach (Part 1)" /><published>2025-08-25T00:00:00+00:00</published><updated>2025-08-25T00:00:00+00:00</updated><id>https://2nanoori.github.io/system%20architecture/work%20experience/scaling-timeseries-data-processing-mx-n-strategy-part1</id><content type="html" xml:base="https://2nanoori.github.io/system%20architecture/work%20experience/scaling-timeseries-data-processing-mx-n-strategy-part1/"><![CDATA[<script src="https://unpkg.com/mermaid@9.4.3/dist/mermaid.min.js"></script>

<script>
  mermaid.initialize({
    startOnLoad: true,
    theme: 'default',
    securityLevel: 'loose',
    flowchart: {
      useMaxWidth: true,
      htmlLabels: true
    }
  });
</script>

<h2 id="introduction">Introduction</h2>

<p>When running time series data processing systems, we often need to scale data ingestion. Two common scenarios are:</p>

<ol>
  <li><strong>Granular Monitoring</strong>: Shortening a 1-minute collection interval to 1 second to increase data density.</li>
  <li><strong>System Performance Testing</strong>: Stress-testing a Time Series Database (TSDB) with massive data to find its limits.</li>
</ol>

<p>Initially, I took a simple approach: why not just copy the existing data multiple times? However, this “simple” method caused unexpected problems. It forced me to rethink the challenge from a completely different angle.</p>

<p>This article shares my trial-and-error process and the improved architecture we built as a result.</p>

<h2 id="the-first-try-simple-multiplication">The First Try: Simple Multiplication</h2>

<h3 id="initial-requirement-and-intuitive-solution">Initial Requirement and Intuitive Solution</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Current: Low-density metric data (e.g., N-minute intervals)
Target: High-density metric data (e.g., M-second intervals)
</code></pre></div></div>

<p>When faced with this, my first thought was to simply <strong>“copy the existing data by the required multiplier.”</strong></p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Initial implementation: Simple multiplication</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">processData</span><span class="o">(</span><span class="nc">List</span><span class="o">&lt;</span><span class="nc">Metric</span><span class="o">&gt;</span> <span class="n">metrics</span><span class="o">)</span> <span class="o">{</span>
    <span class="nc">List</span><span class="o">&lt;</span><span class="nc">Metric</span><span class="o">&gt;</span> <span class="n">expandedMetrics</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">ArrayList</span><span class="o">&lt;&gt;();</span>
    
    <span class="k">for</span> <span class="o">(</span><span class="nc">Metric</span> <span class="n">metric</span> <span class="o">:</span> <span class="n">metrics</span><span class="o">)</span> <span class="o">{</span>
        <span class="c1">// Generate data by the target multiplier</span>
        <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">multiplier</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">multiplier</span> <span class="o">&lt;</span> <span class="no">TARGET_MULTIPLIER</span><span class="o">;</span> <span class="n">multiplier</span><span class="o">++)</span> <span class="o">{</span>
            <span class="nc">Metric</span> <span class="n">duplicated</span> <span class="o">=</span> <span class="n">metric</span><span class="o">.</span><span class="na">copy</span><span class="o">();</span>
            <span class="n">duplicated</span><span class="o">.</span><span class="na">adjustTimestamp</span><span class="o">(</span><span class="n">multiplier</span><span class="o">);</span>
            <span class="n">expandedMetrics</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">duplicated</span><span class="o">);</span>
        <span class="o">}</span>
    <span class="o">}</span>
    
    <span class="c1">// Send all data to the sink at once</span>
    <span class="k">for</span> <span class="o">(</span><span class="nc">Metric</span> <span class="n">expanded</span> <span class="o">:</span> <span class="n">expandedMetrics</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">sink</span><span class="o">.</span><span class="na">write</span><span class="o">(</span><span class="n">expanded</span><span class="o">);</span> <span class="c1">// OOM occurs due to the massive data volume!</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="unexpected-problems">Unexpected Problems</h3>

<p>This simple code quickly revealed serious issues:</p>

<p><strong>1. OutOfMemoryError (OOM)</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Existing data volume: X
Multiplier: N
Result: X * N items loaded into memory simultaneously → OOM
</code></pre></div></div>

<p><strong>2. Garbage Collection (GC) Pressure</strong></p>
<ul>
  <li>The app created massive amounts of temporary objects.</li>
  <li>This caused long GC pause times, which delayed the whole system.</li>
</ul>

<p><strong>3. Scalability Limitations</strong></p>
<ul>
  <li>Memory usage grew exponentially as the multiplier increased.</li>
  <li>We couldn’t use this method for tests that required even larger data volumes.</li>
</ul>

<p>These problems led me to ask a fundamental question: <strong>“Is this approach truly practical?”</strong></p>

<h2 id="analyzing-the-problem-and-finding-a-new-design">Analyzing the Problem and Finding a New Design</h2>

<h3 id="limitations-of-the-existing-method">Limitations of the Existing Method</h3>

<p>Why did the first approach fail? Here are the main reasons:</p>

<ol>
  <li><strong>Memory-centric thinking</strong>: We tried to load all data into memory before processing it.</li>
  <li><strong>Batch processing mindset</strong>: We treated continuous streaming data as a single batch.</li>
  <li><strong>Lack of realism</strong>: The design didn’t match how a real-world production environment works.</li>
</ol>

<p><strong>The key insight</strong>: <em>“The code wasn’t the problem. The way we thought about the data was the problem.”</em></p>

<h3 id="root-cause-analysis-two-different-test-objectives">Root Cause Analysis: Two Different Test Objectives</h3>

<p>Looking closer, our initial requirement actually hid <strong>two different test objectives</strong>:</p>

<h4 id="1-the-m-dimension-increasing-data-density">1. The M-Dimension: Increasing Data Density</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Objective: More granular monitoring.
Method: 1-minute intervals → 1-second intervals (temporal refinement).
Impact: More data points for the *same* time series.
Test Target: TSDB write throughput and storage capacity limits.
</code></pre></div></div>

<h4 id="2-the-n-dimension-increasing-cardinality">2. The N-Dimension: Increasing Cardinality</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Objective: System performance testing (measuring TSDB limits).
Method: Diversifying unique identifiers (e.g., server IDs, instance IDs).
Impact: An N-fold increase in the number of unique time series, i.e., N-fold cardinality.
Test Target: TSDB performance for indexing, metadata processing, and label-based queries.

Cardinality = The number of unique time series (a combination of metric name + labels).
※ The timestamp does not affect cardinality.
※ Different unique identifiers are treated as distinct time series.
</code></pre></div></div>

<p>I failed initially because I didn’t understand this distinction. I only focused on “increasing data volume.”</p>

<p>This realization demanded a new approach. I needed to drop the “load everything into memory” mindset. Instead, I needed to embrace <strong>stream processing</strong> while handling both the M and N dimensions.</p>

<p>I decided to tackle the more complex <strong>M-dimension (increasing data density)</strong> first. I looked at two alternative approaches.</p>

<h3 id="new-approaches-for-the-m-dimension-memory-holding-vs-immediate-transformation">New Approaches for the M-Dimension: Memory Holding vs. Immediate Transformation</h3>

<blockquote>
  <p><strong>Note</strong>: The N-dimension (increasing cardinality) is easy to implement. We simply add unique identifiers to the data inside each processing cluster. Therefore, we will focus on the harder M-dimension here.</p>
</blockquote>

<h3 id="approach-1-the-memory-holding-method">Approach 1: The Memory Holding Method</h3>

<p>My first idea was to distribute the data perfectly over time.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Pseudocode: Memory Holding Method</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">WindowedProcessor</span> <span class="o">{</span>
    <span class="kd">private</span> <span class="nc">Map</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">List</span><span class="o">&lt;</span><span class="nc">Metric</span><span class="o">&gt;&gt;</span> <span class="n">oneMinuteBuffer</span><span class="o">;</span>
    
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">process</span><span class="o">(</span><span class="nc">Metric</span> <span class="n">metric</span><span class="o">)</span> <span class="o">{</span>
        <span class="c1">// Collect data for a 1-minute window</span>
        <span class="n">buffer</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">metric</span><span class="o">);</span>
        
        <span class="k">if</span> <span class="o">(</span><span class="n">windowComplete</span><span class="o">())</span> <span class="o">{</span>
            <span class="c1">// Dispatch at precise intervals (e.g., every 1 second)</span>
            <span class="n">sendAt</span><span class="o">(</span><span class="s">"14:00:00"</span><span class="o">,</span> <span class="n">createMetrics</span><span class="o">(</span><span class="n">buffer</span><span class="o">,</span> <span class="mi">0</span><span class="o">));</span>
            <span class="n">sendAt</span><span class="o">(</span><span class="s">"14:00:01"</span><span class="o">,</span> <span class="n">createMetrics</span><span class="o">(</span><span class="n">buffer</span><span class="o">,</span> <span class="mi">1</span><span class="o">));</span>
            <span class="n">sendAt</span><span class="o">(</span><span class="s">"14:00:02"</span><span class="o">,</span> <span class="n">createMetrics</span><span class="o">(</span><span class="n">buffer</span><span class="o">,</span> <span class="mi">2</span><span class="o">));</span>
            <span class="c1">// ... (interval is configurable)</span>
        <span class="o">}</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p><strong>Advantages:</strong></p>
<ul>
  <li>It perfectly copies the time distribution pattern of high-density monitoring.</li>
  <li>It provides a very accurate simulation.</li>
</ul>

<p><strong>Disadvantages:</strong></p>
<ul>
  <li>High memory usage: O(M × time_window).</li>
  <li>It requires complex timers and state management.</li>
  <li>We risk a memory explosion when we scale the N-dimension later.</li>
</ul>

<h3 id="approach-2-the-immediate-transformation-method">Approach 2: The Immediate Transformation Method</h3>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Pseudocode: Immediate Transformation Method</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">ImmediateProcessor</span> <span class="o">{</span>
    <span class="kd">public</span> <span class="kt">void</span> <span class="nf">process</span><span class="o">(</span><span class="nc">Metric</span> <span class="n">metric</span><span class="o">)</span> <span class="o">{</span>
        <span class="nc">LocalDateTime</span> <span class="n">baseTime</span> <span class="o">=</span> <span class="n">metric</span><span class="o">.</span><span class="na">getTimestamp</span><span class="o">()</span>
            <span class="o">.</span><span class="na">truncatedTo</span><span class="o">(</span><span class="nc">ChronoUnit</span><span class="o">.</span><span class="na">MINUTES</span><span class="o">);</span>
        
        <span class="c1">// Immediately transform one metric into M metrics upon arrival</span>
        <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">offset</span> <span class="o">&lt;</span> <span class="no">INTERVAL_SECONDS</span><span class="o">;</span> <span class="n">offset</span> <span class="o">+=</span> <span class="no">TARGET_INTERVAL</span><span class="o">)</span> <span class="o">{</span>
            <span class="nc">Metric</span> <span class="n">transformed</span> <span class="o">=</span> <span class="n">metric</span><span class="o">.</span><span class="na">copy</span><span class="o">();</span>
            <span class="n">transformed</span><span class="o">.</span><span class="na">setTimestamp</span><span class="o">(</span><span class="n">baseTime</span><span class="o">.</span><span class="na">plusSeconds</span><span class="o">(</span><span class="n">offset</span><span class="o">));</span>
            <span class="n">emit</span><span class="o">(</span><span class="n">transformed</span><span class="o">);</span>
        <span class="o">}</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p><strong>Advantages:</strong></p>
<ul>
  <li>Memory efficient: O(M).</li>
  <li>Simple to code.</li>
  <li>Scales very well.</li>
</ul>

<p><strong>Disadvantages:</strong></p>
<ul>
  <li>It creates a sudden burst of data instead of spreading the load evenly over time.</li>
</ul>

<h2 id="the-critical-insight-the-essence-of-streaming">The Critical Insight: “The Essence of Streaming”</h2>

<p>While debating between the two approaches, I had an “Aha!” moment.</p>

<h3 id="how-does-real-world-data-arrive">How Does Real-World Data Arrive?</h3>

<p>I thought about how data actually arrives in a real monitoring environment.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Data flow in high-density monitoring:
Actual Arrival Time   Metric Data Content
──────────────────────────────────────────
12:00:03              server1_cpu (timestamp: 12:00:00)
12:00:07              server2_cpu (timestamp: 12:00:00)  
12:00:13              server1_cpu (timestamp: 12:00:10)
12:00:15              server3_cpu (timestamp: 12:00:00)
12:00:17              server2_cpu (timestamp: 12:00:10)
12:00:23              server1_cpu (timestamp: 12:00:20)
...

→ Characteristic: Each server sends data at its own, uncoordinated time.
→ Result: Data arrives as a continuous, irregular stream.
</code></pre></div></div>

<p><strong>The Key Insight</strong>:</p>
<blockquote>
  <p>“In a real streaming environment, data arrives individually and continuously. Is there any good reason to buffer it and process it as a batch?”</p>
</blockquote>

<p>This changed everything. In the real world:</p>
<ul>
  <li>Metrics arrive <strong>one by one</strong>.</li>
  <li>They naturally <strong>spread out over time</strong>.</li>
  <li>The system handles them as a <strong>stream</strong>, not a bulk batch.</li>
</ul>

<p>Even though the <strong>Immediate Transformation</strong> method created small bursts, it actually mirrored real-world streaming much better than the buffering method!</p>

<h2 id="detailed-technical-review">Detailed Technical Review</h2>

<h3 id="checking-memory-usage">Checking Memory Usage</h3>

<p>Let’s look closely at the memory usage of the Immediate Transformation method.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kt">void</span> <span class="nf">process</span><span class="o">(</span><span class="nc">Metric</span> <span class="n">input</span><span class="o">)</span> <span class="o">{</span>
    <span class="c1">// M objects are created momentarily</span>
    <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="no">M</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
        <span class="nc">Metric</span> <span class="n">transformed</span> <span class="o">=</span> <span class="n">input</span><span class="o">.</span><span class="na">copy</span><span class="o">();</span> <span class="c1">// M objects</span>
        <span class="n">emit</span><span class="o">(</span><span class="n">transformed</span><span class="o">);</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>The conclusion? It uses <strong>O(M) memory</strong>. Yes, it creates M objects, but it emits them immediately. The garbage collector can clean them up right away. This is vastly more efficient than holding onto O(M × time_window) data in memory.</p>

<h3 id="checking-data-volume-impact">Checking Data Volume Impact</h3>

<p>Someone asked me, “If we increase M, does that also increase cardinality?”</p>

<p><strong>The answer is no</strong>: Changing the timestamp interval <strong>does not increase cardinality.</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cardinality only depends on the metric name + labels.
The timestamp does not change cardinality.

Example:
- cpu_usage{server="web01"} → 1 time series
- If we collect this every 10 seconds instead of 1 minute, it is still only 1 time series.
</code></pre></div></div>

<p>However, increasing M definitely increases the <strong>number of data points</strong>, which affects our storage and raw write performance.</p>

<h2 id="final-architecture-decision">Final Architecture Decision</h2>

<h3 id="selected-method-immediate-transformation">Selected Method: Immediate Transformation</h3>

<p>In the end, I chose the Immediate Transformation method.</p>

<div class="mermaid">
flowchart LR
    A["Source Topic<br />(Low-Density Metrics)"] --&gt; B["Stream Processor<br />(Transformation Engine)"]
    B --&gt; |"Immediate M-fold Transformation<br />(1 → M)"| C["Target Topic<br />(High-Density Metrics)"]
    C --&gt; D["TSDB<br />(Time Series DB)"]
    
    subgraph "M-Dimension: Time Division"
        E["timestamp: T1<br />metric{labels}: value"] 
        E --&gt; F["timestamp: T1<br />metric{labels}: value"]
        E --&gt; G["timestamp: T1+Δ<br />metric{labels}: value"]
        E --&gt; H["timestamp: T1+2Δ<br />metric{labels}: value"]
        E --&gt; I["..."]
    end
    
    B -.-&gt; E
    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style C fill:#e8f5e8
    style D fill:#fff3e0
</div>

<p><strong>How we scaled the N-Dimension</strong>: We ran processing engines in parallel.</p>
<ul>
  <li><strong>Each engine</strong>: Created unique time series by altering label values.</li>
  <li><strong>Implementation</strong>: We added a unique identifier to the labels.</li>
  <li><strong>Example</strong>: <code class="language-plaintext highlighter-rouge">metric{server="web01"}</code> became <code class="language-plaintext highlighter-rouge">metric{server="web01", metric_id="1"}</code>, <code class="language-plaintext highlighter-rouge">metric{server="web01", metric_id="2"}</code></li>
</ul>

<p>→ <strong>Result</strong>: We generated N times the number of unique time series simply by changing the identifiers.</p>

<h3 id="why-we-chose-this">Why We Chose This</h3>

<ol>
  <li><strong>Realism</strong>: The data flow matches a real-world streaming environment.</li>
  <li><strong>Efficiency</strong>: It keeps memory usage low.</li>
  <li><strong>Simplicity</strong>: We avoided complex state and timer management.</li>
  <li><strong>Scalability</strong>: It runs smoothly even when we crank up the N-dimension.</li>
  <li><strong>Test Objective</strong>: It perfectly hits our goal of generating massive data volumes.</li>
</ol>

<h3 id="the-integrated-mn-scaling-strategy">The Integrated M×N Scaling Strategy</h3>

<p>I built a two-dimensional strategy to control data volume and cardinality independently.</p>

<p><strong>Why test both dimensions?</strong></p>
<ul>
  <li><strong>M-Dimension (Data Density)</strong>: Tests pure data throughput. It measures how fast the TSDB can write data and how much disk space it uses.</li>
  <li><strong>N-Dimension (Cardinality)</strong>: Tests the TSDB’s indexing and metadata engine. High cardinality usually breaks TSDBs faster than raw data volume.</li>
</ul>

<p>We must separate them because a TSDB handles raw data points very differently than it handles unique time series indexes.</p>

<h4 id="m-dimension-increasing-data-points">M-Dimension: Increasing Data Points</h4>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// M-Implementation: Timestamp division</span>
<span class="nc">LocalDateTime</span> <span class="n">baseTime</span> <span class="o">=</span> <span class="n">metric</span><span class="o">.</span><span class="na">getTimestamp</span><span class="o">().</span><span class="na">truncatedTo</span><span class="o">(</span><span class="nc">ChronoUnit</span><span class="o">.</span><span class="na">MINUTES</span><span class="o">);</span>

<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">offset</span> <span class="o">&lt;</span> <span class="no">INTERVAL_SECONDS</span><span class="o">;</span> <span class="n">offset</span> <span class="o">+=</span> <span class="no">TARGET_INTERVAL</span><span class="o">)</span> <span class="o">{</span>
    <span class="nc">Metric</span> <span class="n">transformed</span> <span class="o">=</span> <span class="n">metric</span><span class="o">.</span><span class="na">copy</span><span class="o">();</span>
    <span class="n">transformed</span><span class="o">.</span><span class="na">setTimestamp</span><span class="o">(</span><span class="n">baseTime</span><span class="o">.</span><span class="na">plusSeconds</span><span class="o">(</span><span class="n">offset</span><span class="o">));</span>
    <span class="n">emit</span><span class="o">(</span><span class="n">transformed</span><span class="o">);</span> <span class="c1">// Generates M times the data points</span>
<span class="o">}</span>
</code></pre></div></div>

<p><strong>Effect</strong>:</p>
<ul>
  <li>Cardinality Impact: <strong>None</strong> (same metric + labels).</li>
  <li>Data Volume Impact: <strong>M-fold increase</strong>.</li>
</ul>

<h4 id="n-dimension-increasing-cardinality">N-Dimension: Increasing Cardinality</h4>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// N-Implementation: Diversifying time series via unique identifiers</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">TimeSeriesIdentifierTransformer</span> <span class="o">{</span>
    <span class="kd">private</span> <span class="kd">final</span> <span class="kt">int</span> <span class="n">metricId</span><span class="o">;</span>
    
    <span class="kd">public</span> <span class="nf">TimeSeriesIdentifierTransformer</span><span class="o">(</span><span class="kt">int</span> <span class="n">metricId</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">this</span><span class="o">.</span><span class="na">metricId</span> <span class="o">=</span> <span class="n">metricId</span><span class="o">;</span>
    <span class="o">}</span>
    
    <span class="kd">public</span> <span class="nc">Metric</span> <span class="nf">transform</span><span class="o">(</span><span class="nc">Metric</span> <span class="n">input</span><span class="o">)</span> <span class="o">{</span>
        <span class="nc">Metric</span> <span class="n">transformed</span> <span class="o">=</span> <span class="n">input</span><span class="o">.</span><span class="na">copy</span><span class="o">();</span>
        
        <span class="c1">// Add a new unique identifier to the labels</span>
        <span class="n">transformed</span><span class="o">.</span><span class="na">addLabel</span><span class="o">(</span><span class="s">"metric_id"</span><span class="o">,</span> <span class="nc">String</span><span class="o">.</span><span class="na">valueOf</span><span class="o">(</span><span class="n">metricId</span><span class="o">));</span>
        
        <span class="k">return</span> <span class="n">transformed</span><span class="o">;</span> <span class="c1">// Generates N-fold cardinality</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p><strong>Effect</strong>:</p>
<ul>
  <li>Cardinality Impact: <strong>N-fold increase</strong> (creates new time series).</li>
  <li>Data Volume Impact: <strong>N-fold increase</strong>.</li>
</ul>

<h4 id="putting-it-together">Putting It Together</h4>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// M×N Integrated Implementation</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">timeOffset</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">timeOffset</span> <span class="o">&lt;</span> <span class="no">M</span><span class="o">;</span> <span class="n">timeOffset</span><span class="o">++)</span> <span class="o">{</span>
    <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">identifierOffset</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">identifierOffset</span> <span class="o">&lt;</span> <span class="no">N</span><span class="o">;</span> <span class="n">identifierOffset</span><span class="o">++)</span> <span class="o">{</span>
        <span class="nc">Metric</span> <span class="n">transformed</span> <span class="o">=</span> <span class="n">metric</span><span class="o">.</span><span class="na">copy</span><span class="o">();</span>
        
        <span class="c1">// M: Refine the timestamp (increase data density)</span>
        <span class="n">transformed</span><span class="o">.</span><span class="na">setTimestamp</span><span class="o">(</span><span class="n">baseTime</span><span class="o">.</span><span class="na">plusSeconds</span><span class="o">(</span><span class="n">timeOffset</span> <span class="o">*</span> <span class="no">TARGET_INTERVAL</span><span class="o">));</span>
        
        <span class="c1">// N: Diversify the unique identifier (increase cardinality)</span>
        <span class="n">transformed</span><span class="o">.</span><span class="na">addLabel</span><span class="o">(</span><span class="s">"metric_id"</span><span class="o">,</span> 
            <span class="nc">String</span><span class="o">.</span><span class="na">valueOf</span><span class="o">(</span><span class="n">identifierOffset</span><span class="o">));</span>
        
        <span class="n">emit</span><span class="o">(</span><span class="n">transformed</span><span class="o">);</span> <span class="c1">// Total data volume is M×N</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p><strong>Final Effect</strong>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Total Data Volume = Original Volume × M × N
Total Cardinality = Original Cardinality × N (M has no effect)
</code></pre></div></div>

<h3 id="how-this-actually-impacts-the-tsdb">How This Actually Impacts the TSDB</h3>

<p>Let’s look at exactly what happens to the database when we turn these dials.</p>

<h4 id="impact-of-m-dimension-scaling">Impact of M-Dimension Scaling</h4>

<p><strong>What it stresses:</strong></p>
<ul>
  <li><strong>Write I/O</strong>: The disks must write more data points per second.</li>
  <li><strong>Network Bandwidth</strong>: The network must transfer M times more data.</li>
  <li><strong>Disk Storage</strong>: The disk fills up M times faster.</li>
  <li><strong>Compression Efficiency</strong>: Because data points arrive closer together, compression often improves.</li>
</ul>

<p><strong>What we monitor:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>- Write throughput (points/sec)
- Disk usage growth rate
- Memory usage for write buffers
- Query response time for time-range queries
</code></pre></div></div>

<h4 id="impact-of-n-dimension-scaling">Impact of N-Dimension Scaling</h4>

<p><strong>What it stresses:</strong></p>
<ul>
  <li><strong>Index Memory</strong>: The TSDB must create and hold indexes for new label combinations in RAM.</li>
  <li><strong>Metadata Management</strong>: The system does N times more work to discover and manage series.</li>
  <li><strong>Label-based Search</strong>: Regex queries like <code class="language-plaintext highlighter-rouge">{server="web01_virtual_*"}</code> become much slower.</li>
  <li><strong>Aggregate Queries</strong>: <code class="language-plaintext highlighter-rouge">GROUP BY</code> operations must scan N times more series.</li>
</ul>

<p><strong>What we monitor:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>- Memory usage for the series index
- Label query performance (milliseconds)
- Cardinality limit warnings
- Query planning time for complex aggregations
</code></pre></div></div>

<h4 id="integrated-load-testing-scenarios">Integrated Load Testing Scenarios</h4>

<p>With the M×N strategy, we can run targeted scenarios to find exact weaknesses:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Scenario 1: M=60, N=1  → High data density, existing cardinality
Scenario 2: M=1, N=100 → Existing density, high cardinality
Scenario 3: M=10, N=10 → Balanced load test
</code></pre></div></div>

<p>This lets us find out <strong>exactly which part of the TSDB breaks first.</strong></p>

<h2 id="lessons-learned">Lessons Learned</h2>

<h3 id="1-understand-the-real-goal">1. Understand the Real Goal</h3>
<p>It was important to look past the basic request of “make more data.” The true goal was measuring the performance limits of the TSDB.</p>

<h3 id="2-realism-over-perfection">2. Realism Over Perfection</h3>
<p>Building a system that behaves like the real world is much more valuable than building a theoretically “perfect” but unnatural simulation.</p>

<h3 id="3-weigh-the-trade-offs">3. Weigh the Trade-offs</h3>
<p>I learned to constantly weigh performance against complexity, and perfection against practicality.</p>

<h3 id="4-talk-to-your-team">4. Talk to Your Team</h3>
<p>Discussing these ideas with colleagues helped me fix blind spots I never would have seen on my own.</p>

<h3 id="5-take-small-steps">5. Take Small Steps</h3>
<p>Starting with a simple multiplication idea and slowly refining it worked much better than trying to design a massive, complex system on day one.</p>

<h2 id="conclusion">Conclusion</h2>

<p>This experience taught me how vital it is to understand how data flows over time.</p>

<p>It’s easy to fall into the trap of over-engineering a solution. This process reminded me that the best development involves <strong>finding the core problem, matching the real-world environment, and validating ideas with your team.</strong></p>

<hr />

<p><strong>Part 2 Preview</strong></p>

<p>In Part 2, I’ll explain how we tried to implement this M×N strategy in a real Proof of Concept (PoC) environment.</p>

<ul>
  <li><strong>Technology Selection</strong>: Why we compared Kafka Streams and Flink.</li>
  <li><strong>PoC Architecture Design</strong>: How we planned to build the M×N engine.</li>
  <li><strong>Validation</strong>: Did the “Immediate Transformation” approach actually work in practice?</li>
</ul>

<p><strong>Spoiler Alert</strong>: When I showed this to my team, their feedback led us to scrap the Flink idea entirely. We pivoted to a completely different, much simpler approach. Discover how teamwork turned a complex architecture into a pragmatic solution in Part 2.</p>

<hr />

<p><em>All technical content in this article is based on actual production experience. Specific system names and configuration values have been generalized for security.</em></p>]]></content><author><name>SeungHyeon Lee</name><email>tmdgus8490@gmail.com</email></author><category term="System Architecture" /><category term="Work Experience" /><category term="Time Series" /><category term="Data Processing" /><category term="System Design" /><category term="Stream Processing" /><category term="TSDB" /><category term="Load Testing" /><category term="Cardinality" /><category term="Scalability" /><category term="Performance Optimization" /><category term="OOM" /><category term="Timeseries Database" /><category term="Architecture Patterns" /><summary type="html"><![CDATA[A deep dive into the architectural challenges and solutions for scaling time series data ingestion, based on real-world production experience.]]></summary></entry><entry><title type="html">Exception Handling Best Practices: Lessons from Effective Java</title><link href="https://2nanoori.github.io/work%20experience/exception-handling-best-practices/" rel="alternate" type="text/html" title="Exception Handling Best Practices: Lessons from Effective Java" /><published>2025-08-22T00:00:00+00:00</published><updated>2025-08-22T00:00:00+00:00</updated><id>https://2nanoori.github.io/work%20experience/exception-handling-best-practices</id><content type="html" xml:base="https://2nanoori.github.io/work%20experience/exception-handling-best-practices/"><![CDATA[<script src="https://unpkg.com/mermaid@9.4.3/dist/mermaid.min.js"></script>

<script>
  mermaid.initialize({
    startOnLoad: true,
    theme: 'default',
    securityLevel: 'loose',
    flowchart: {
      useMaxWidth: true,
      htmlLabels: true
    }
  });
</script>

<h2 id="1-overview">1. Overview</h2>

<p>Recently, while working on a software development project, I had the opportunity to think deeply about Exception Handling.</p>

<p>While integrating a third-party library, I encountered conflicts between the library’s guidelines and our application’s exception handling approach. This led me to reconsider exception handling practices, and I decided to revisit the concepts from Effective Java 3rd Edition - a true bible for Java developers.</p>

<p>As you can see from the table of contents, Exception Handling deserves an entire chapter rather than just a single item, highlighting its critical importance.</p>

<p>Let me explore Effective Java’s exception handling principles and reflect on how to solve the real-world problems I’ve encountered.</p>

<h2 id="2-exception-handling-principles">2. Exception Handling Principles</h2>

<p>The opening statement sets the tone perfectly:</p>

<blockquote>
  <p>“Used properly, exceptions can improve a program’s readability, reliability, and maintainability. Used improperly, they can have the opposite effect.”</p>
</blockquote>

<p>I completely agree with this statement, and I believe developers should always consider whether improper usage might occur.</p>

<h3 id="21-use-exceptions-only-for-exceptional-conditions">2.1. Use Exceptions Only for Exceptional Conditions</h3>

<p>Exceptions should be used only for <strong>truly exceptional situations that disrupt the normal flow</strong> of a program.</p>

<p>Using exceptions for situations that can be handled with simple conditional statements leads to <strong>performance degradation</strong>, <strong>reduced readability</strong>, and <strong>debugging difficulties</strong>.</p>

<p><strong>Why shouldn’t exceptions be overused?</strong></p>

<ol>
  <li><strong>Performance Issues</strong>
    <ul>
      <li>Throwing exceptions is relatively expensive</li>
      <li>Frequent exceptions in loops can significantly impact performance</li>
    </ul>
  </li>
  <li><strong>Reduced Code Readability</strong>
    <ul>
      <li>Too much exception handling code makes normal logic hard to read</li>
      <li>Overused exceptions = “control flow via exceptions” → not intuitive</li>
    </ul>
  </li>
  <li><strong>Debugging Difficulties</strong>
    <ul>
      <li>Unnecessary exceptions create longer stack traces, making actual problem identification harder</li>
    </ul>
  </li>
</ol>

<p><strong>Proper Usage Examples</strong></p>

<ul>
  <li><strong>Situations for exception handling:</strong>
    <ul>
      <li>Unpredictable errors: network disconnection, missing files, DB connection failures</li>
      <li>External system dependency failures</li>
      <li>Programming contract violations (IllegalArgumentException within reasonable bounds)</li>
    </ul>
  </li>
  <li><strong>Situations to avoid exception handling:</strong>
    <ul>
      <li>Simple conditional checks are sufficient</li>
      <li>Normal situations that frequently occur in loops</li>
    </ul>
  </li>
</ul>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Wrong: Using exceptions for existence checks in loops</span>
<span class="k">for</span> <span class="o">(</span><span class="nc">String</span> <span class="n">s</span> <span class="o">:</span> <span class="n">list</span><span class="o">)</span> <span class="o">{</span>
    <span class="k">try</span> <span class="o">{</span>
        <span class="n">process</span><span class="o">(</span><span class="n">s</span><span class="o">);</span>
    <span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">NoSuchElementException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
        <span class="c1">// Simply because the list is empty</span>
        <span class="c1">// Conditional handling is much more efficient</span>
    <span class="o">}</span>
<span class="o">}</span>

<span class="c1">// Correct:</span>
<span class="k">for</span> <span class="o">(</span><span class="nc">String</span> <span class="n">s</span> <span class="o">:</span> <span class="n">list</span><span class="o">)</span> <span class="o">{</span>
    <span class="k">if</span> <span class="o">(</span><span class="n">s</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">process</span><span class="o">(</span><span class="n">s</span><span class="o">);</span>
    <span class="o">}</span>
<span class="o">}</span>

<span class="c1">// File reading where file doesn't exist -&gt; exception appropriate</span>
<span class="k">try</span> <span class="o">{</span>
    <span class="n">readFile</span><span class="o">(</span><span class="s">"nonexistent.txt"</span><span class="o">);</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">IOException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
    <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"Error reading file: "</span> <span class="o">+</span> <span class="n">e</span><span class="o">.</span><span class="na">getMessage</span><span class="o">());</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="22-use-checked-exceptions-for-recoverable-conditions-and-runtime-exceptions-for-programming-errors">2.2. Use Checked Exceptions for Recoverable Conditions and Runtime Exceptions for Programming Errors</h3>

<p>Java provides three types of throwables: checked exceptions, runtime exceptions, and errors. Here’s guidance on when to use each:</p>

<blockquote>
  <p><strong>Recoverable conditions</strong> → <strong>Checked exceptions</strong>
<strong>Programming errors</strong> → <strong>Runtime exceptions</strong></p>
</blockquote>

<ul>
  <li><strong>Checked Exceptions</strong>
    <ul>
      <li>Force callers to handle exceptions</li>
      <li>Use for situations where the program can recover</li>
    </ul>
  </li>
  <li><strong>Unchecked Exceptions</strong>
    <ul>
      <li>Subclasses of <code class="language-plaintext highlighter-rouge">RuntimeException</code></li>
      <li>Callers don’t need to handle them</li>
      <li>Represent programming errors that should be fixed through code changes</li>
    </ul>
  </li>
</ul>

<div class="mermaid">
flowchart TD
    A["Should throw an exception?"] --&gt; B{"Normal flow?"}
    
    B --&gt;|Yes| C["Use conditionals<br />if / for etc."]
    B --&gt;|No| D["Use exceptions"]
    
    D --&gt; E{"Exception type selection"}
    
    E --&gt;|Recoverable situation| F["Checked Exception<br />try-catch or throws required<br />Example: File not found, Network error"]
    E --&gt;|Programming error| G["Runtime Exception<br />Code fix required<br />Example: IndexOutOfBounds, NullPointer"]
</div>

<h3 id="23-avoid-unnecessary-checked-exceptions">2.3. Avoid Unnecessary Checked Exceptions</h3>

<p><strong>Checked exceptions</strong> force callers to handle exceptions. However, if it’s not truly a recoverable situation, using checked exceptions <strong>hurts API usability and makes code messy</strong>.</p>

<p><strong>Why is this problematic?</strong></p>

<ol>
  <li><strong>Messy caller code</strong>
    <div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="o">{</span>     
 <span class="n">obj</span><span class="o">.</span><span class="na">action</span><span class="o">();</span> 
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">SomeCheckedException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>     
 <span class="c1">// Actually no recovery method available     </span>
 <span class="k">throw</span> <span class="k">new</span> <span class="nf">RuntimeException</span><span class="o">(</span><span class="n">e</span><span class="o">);</span> 
<span class="o">}</span>
</code></pre></div>    </div>
    <p>→ Callers inevitably end up with unnecessary “catch and rethrow” code.</p>
  </li>
  <li><strong>Reduced API usability</strong>
    <ul>
      <li>Developers always have to write try-catch blocks</li>
      <li>APIs become unnecessarily complex</li>
    </ul>
  </li>
</ol>

<p><strong>Correct Design Principles</strong></p>

<ul>
  <li>Use <strong>runtime exceptions (Unchecked Exception)</strong> instead of checked exceptions if no recovery is possible</li>
  <li>Use checked exceptions only when clients must respond</li>
  <li><strong>Optional / null returns</strong> might be better in some cases</li>
</ul>

<p><strong>Examples</strong></p>

<ul>
  <li>Wrong approach
    <div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Caller cannot actually recover</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">connect</span><span class="o">()</span> <span class="kd">throws</span> <span class="nc">IOException</span> <span class="o">{</span>     
  <span class="c1">// IOException thrown on connection failure</span>
<span class="o">}</span>
</code></pre></div>    </div>
  </li>
</ul>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="o">{</span>
    <span class="n">service</span><span class="o">.</span><span class="na">connect</span><span class="o">();</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">IOException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
    <span class="c1">// Cannot recover but must catch and rethrow or just log</span>
<span class="o">}</span>
</code></pre></div></div>

<ul>
  <li>Improved approach
    <div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Not recoverable → change to runtime exception</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">connect</span><span class="o">()</span> <span class="o">{</span>
  <span class="k">if</span> <span class="o">(</span><span class="cm">/* failure */</span><span class="o">)</span> <span class="o">{</span>
      <span class="k">throw</span> <span class="k">new</span> <span class="nf">IllegalStateException</span><span class="o">(</span><span class="s">"Cannot connect to server"</span><span class="o">);</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div>    </div>
  </li>
</ul>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// No recovery needed → provide state check method</span>
<span class="k">if</span> <span class="o">(</span><span class="n">service</span><span class="o">.</span><span class="na">canConnect</span><span class="o">())</span> <span class="o">{</span>
    <span class="n">service</span><span class="o">.</span><span class="na">connect</span><span class="o">();</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="24-favor-the-use-of-standard-exceptions">2.4. Favor the Use of Standard Exceptions</h3>

<p>Java provides <strong>well-defined standard exception classes</strong>. Rather than defining new exception classes, using appropriate standard exceptions is advantageous for <strong>consistency, readability, and maintainability</strong>.</p>

<p><strong>Standard exceptions should be the first choice</strong>, and creating new exceptions should be a last resort.</p>

<p><strong>Why use standard exceptions?</strong></p>

<ol>
  <li><strong>Consistency</strong>
    <ul>
      <li>All Java developers can easily understand the meaning</li>
      <li>Names like <code class="language-plaintext highlighter-rouge">NullPointerException</code>, <code class="language-plaintext highlighter-rouge">IllegalArgumentException</code> are self-explanatory</li>
    </ul>
  </li>
  <li><strong>Avoiding unnecessary duplication</strong>
    <ul>
      <li>No need to create custom exceptions with the same functionality as existing ones</li>
    </ul>
  </li>
  <li><strong>API simplification</strong>
    <ul>
      <li>Prevents unnecessary proliferation of exception classes → easier maintenance</li>
    </ul>
  </li>
</ol>

<p><strong>Commonly Used Standard Exceptions</strong></p>

<table>
  <thead>
    <tr>
      <th>Exception</th>
      <th>Usage</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>IllegalArgumentException</strong></td>
      <td>When arguments are invalid</td>
    </tr>
    <tr>
      <td><strong>IllegalStateException</strong></td>
      <td>When object state is inappropriate for method call</td>
    </tr>
    <tr>
      <td><strong>NullPointerException</strong></td>
      <td>When null arguments are not allowed</td>
    </tr>
    <tr>
      <td><strong>IndexOutOfBoundsException</strong></td>
      <td>When index is out of range</td>
    </tr>
    <tr>
      <td><strong>ConcurrentModificationException</strong></td>
      <td>When concurrent modification is prohibited</td>
    </tr>
    <tr>
      <td><strong>UnsupportedOperationException</strong></td>
      <td>When called method is not supported</td>
    </tr>
  </tbody>
</table>

<p><strong>Examples</strong></p>

<ul>
  <li>Using standard exceptions
    <div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kt">void</span> <span class="nf">setAge</span><span class="o">(</span><span class="kt">int</span> <span class="n">age</span><span class="o">)</span> <span class="o">{</span>
  <span class="k">if</span> <span class="o">(</span><span class="n">age</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="o">)</span> <span class="o">{</span>
      <span class="k">throw</span> <span class="k">new</span> <span class="nf">IllegalArgumentException</span><span class="o">(</span><span class="s">"Age cannot be negative: "</span> <span class="o">+</span> <span class="n">age</span><span class="o">);</span>
  <span class="o">}</span>
  <span class="k">this</span><span class="o">.</span><span class="na">age</span> <span class="o">=</span> <span class="n">age</span><span class="o">;</span>
<span class="o">}</span>
</code></pre></div>    </div>
  </li>
  <li>Unnecessary custom exception
    <div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Actually IllegalArgumentException is sufficient</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">InvalidAgeException</span> <span class="kd">extends</span> <span class="nc">RuntimeException</span> <span class="o">{</span>
  <span class="kd">public</span> <span class="nf">InvalidAgeException</span><span class="o">(</span><span class="nc">String</span> <span class="n">message</span><span class="o">)</span> <span class="o">{</span>
      <span class="kd">super</span><span class="o">(</span><span class="n">message</span><span class="o">);</span>
  <span class="o">}</span>
<span class="o">}</span>
</code></pre></div>    </div>
  </li>
</ul>

<h3 id="25-throw-exceptions-appropriate-to-the-abstraction">2.5. Throw Exceptions Appropriate to the Abstraction</h3>

<p>Exceptions thrown by a method should <strong>match the abstraction level</strong> of that method.</p>

<p><strong>Lower-level implementation details should not leak through exceptions</strong> - they should be translated to exceptions appropriate for the higher abstraction level.</p>

<p><strong>Why is this important?</strong></p>

<ol>
  <li><strong>Maintaining Encapsulation</strong>
    <ul>
      <li>External APIs shouldn’t change when internal implementation technology changes</li>
      <li>Exposing internal exceptions leaks implementation details</li>
    </ul>
  </li>
  <li><strong>Consistent API</strong>
    <ul>
      <li>Callers only need to think at the method’s abstraction level</li>
      <li>“What situations can cause this method to fail?” is all they need to understand</li>
    </ul>
  </li>
  <li><strong>Maintenance ease</strong>
    <ul>
      <li>If internal technology changes (e.g., DB → file storage), client code shouldn’t need updates if API exceptions change</li>
    </ul>
  </li>
</ol>

<p><strong>Wrong Example (Exposing Implementation Exceptions)</strong></p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Internal library code</span>
<span class="kd">public</span> <span class="nc">List</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">&gt;</span> <span class="nf">readNames</span><span class="o">()</span> <span class="kd">throws</span> <span class="nc">SQLException</span> <span class="o">{</span>
    <span class="c1">// DB access logic</span>
<span class="o">}</span>
</code></pre></div></div>
<ul>
  <li>Problem: Clients depend on <code class="language-plaintext highlighter-rouge">SQLException</code> → API changes needed when DB is replaced</li>
</ul>

<p><strong>Correct Example (Matching Abstraction Level)</strong></p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Abstracted API</span>
<span class="kd">public</span> <span class="nc">List</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">&gt;</span> <span class="nf">readNames</span><span class="o">()</span> <span class="kd">throws</span> <span class="nc">DataAccessException</span> <span class="o">{</span>
    <span class="k">try</span> <span class="o">{</span>
        <span class="c1">// DB access</span>
    <span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">SQLException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">throw</span> <span class="k">new</span> <span class="nf">DataAccessException</span><span class="o">(</span><span class="s">"Database read failed"</span><span class="o">,</span> <span class="n">e</span><span class="o">);</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<ul>
  <li>Clients only need to understand <strong>“data cannot be read”</strong> in abstract terms</li>
  <li>Internal implementation (DB vs file) can change without affecting the API</li>
</ul>

<p><strong>Exception Translation</strong></p>

<ul>
  <li>Convert lower-level exceptions → higher-level abstraction exceptions</li>
  <li>Methods:
    <ul>
      <li><strong>Exception Translation</strong>: Wrap lower exceptions in higher-level exceptions</li>
      <li><strong>Exception Chaining</strong>: Pass lower exception as cause (<code class="language-plaintext highlighter-rouge">new MyException("msg", cause)</code>)</li>
    </ul>
  </li>
</ul>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="o">{</span>
    <span class="c1">// DB access</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">SQLException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
    <span class="k">throw</span> <span class="k">new</span> <span class="nf">DataAccessException</span><span class="o">(</span><span class="s">"Data access failed"</span><span class="o">,</span> <span class="n">e</span><span class="o">);</span> <span class="c1">// include cause</span>
<span class="o">}</span>
</code></pre></div></div>

<h2 id="3-conclusion-applying-lessons-to-real-world-problems">3. Conclusion: Applying Lessons to Real-World Problems</h2>

<p>After studying the theory, let me share how I resolved the <strong>library integration problem</strong> mentioned in the overview.</p>

<h3 id="31-problem-situation-two-conflicting-exception-handling-approaches">3.1. Problem Situation: Two Conflicting Exception Handling Approaches</h3>

<p>The third-party library we were integrating validates requests before they reach application controllers. The library guidelines were:</p>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">checkPermission()</code> method throws a single <code class="language-plaintext highlighter-rouge">AuthorizationException</code> on failure</li>
  <li>This exception contains an <code class="language-plaintext highlighter-rouge">ErrorCode</code> indicating the failure reason (<code class="language-plaintext highlighter-rouge">TOKEN_EXPIRED</code>, <code class="language-plaintext highlighter-rouge">INSUFFICIENT_PERMISSIONS</code>)</li>
  <li>The <code class="language-plaintext highlighter-rouge">GlobalExceptionHandler</code> should catch this <code class="language-plaintext highlighter-rouge">AuthorizationException</code>, check the internal <code class="language-plaintext highlighter-rouge">ErrorCode</code>, and use <code class="language-plaintext highlighter-rouge">if/else</code> branching to return different HTTP status codes (401, 403, etc.)</li>
</ul>

<p><strong>[Library-recommended approach]</strong></p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@RestControllerAdvice</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">GlobalExceptionHandler</span> <span class="o">{</span>

    <span class="nd">@ExceptionHandler</span><span class="o">(</span><span class="nc">AuthorizationException</span><span class="o">.</span><span class="na">class</span><span class="o">)</span>
    <span class="kd">public</span> <span class="nc">ResponseEntity</span><span class="o">&lt;</span><span class="nc">ErrorResponse</span><span class="o">&gt;</span> <span class="nf">handleAuthorizationException</span><span class="o">(</span><span class="nc">AuthorizationException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
        <span class="c1">// 😫 Logic to analyze the cause inside ExceptionHandler</span>
        <span class="k">if</span> <span class="o">(</span><span class="n">e</span><span class="o">.</span><span class="na">getErrorCode</span><span class="o">()</span> <span class="o">==</span> <span class="nc">ErrorCode</span><span class="o">.</span><span class="na">TOKEN_EXPIRED</span><span class="o">)</span> <span class="o">{</span>
            <span class="k">return</span> <span class="nc">ResponseEntity</span><span class="o">.</span><span class="na">status</span><span class="o">(</span><span class="nc">HttpStatus</span><span class="o">.</span><span class="na">UNAUTHORIZED</span><span class="o">)</span>
                <span class="o">.</span><span class="na">body</span><span class="o">(</span><span class="k">new</span> <span class="nc">ErrorResponse</span><span class="o">(</span><span class="s">"Token has expired."</span><span class="o">));</span>
        <span class="o">}</span> <span class="k">else</span> <span class="k">if</span> <span class="o">(</span><span class="n">e</span><span class="o">.</span><span class="na">getErrorCode</span><span class="o">()</span> <span class="o">==</span> <span class="nc">ErrorCode</span><span class="o">.</span><span class="na">INSUFFICIENT_PERMISSIONS</span><span class="o">)</span> <span class="o">{</span>
            <span class="k">return</span> <span class="nc">ResponseEntity</span><span class="o">.</span><span class="na">status</span><span class="o">(</span><span class="nc">HttpStatus</span><span class="o">.</span><span class="na">FORBIDDEN</span><span class="o">)</span>
                <span class="o">.</span><span class="na">body</span><span class="o">(</span><span class="k">new</span> <span class="nc">ErrorResponse</span><span class="o">(</span><span class="s">"Access denied."</span><span class="o">));</span>
        <span class="o">}</span>
        <span class="c1">// ... various other error code branches</span>
        <span class="k">return</span> <span class="nc">ResponseEntity</span><span class="o">.</span><span class="na">status</span><span class="o">(</span><span class="nc">HttpStatus</span><span class="o">.</span><span class="na">INTERNAL_SERVER_ERROR</span><span class="o">)</span>
            <span class="o">.</span><span class="na">body</span><span class="o">(</span><span class="k">new</span> <span class="nc">ErrorResponse</span><span class="o">(</span><span class="s">"Unknown authentication error."</span><span class="o">));</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p>However, this conflicted with our project’s exception handling principles:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">ExceptionHandler</code> should have <strong>simple responsibility</strong> - only converting exception types to appropriate HTTP responses</li>
  <li>Throw <strong>clear custom exceptions matching the root cause</strong> (e.g., <code class="language-plaintext highlighter-rouge">ProductNotFoundException</code>, <code class="language-plaintext highlighter-rouge">InvalidOrderException</code>)</li>
</ul>

<h3 id="32-finding-solutions-with-effective-java-principles">3.2. Finding Solutions with Effective Java Principles</h3>

<p>The Effective Java principles I just summarized provided clear direction:</p>

<ul>
  <li><strong>Item 73: Throw exceptions appropriate to the abstraction</strong>
    <ul>
      <li>The interceptor’s role is the <strong>abstract concept</strong> of ‘authentication/authorization’. Exposing <code class="language-plaintext highlighter-rouge">AuthorizationException</code> with <strong>library implementation details</strong> is inappropriate. <code class="language-plaintext highlighter-rouge">TOKEN_EXPIRED</code> should be translated to <code class="language-plaintext highlighter-rouge">UnauthorizedException</code>, and <code class="language-plaintext highlighter-rouge">INSUFFICIENT_PERMISSIONS</code> to <code class="language-plaintext highlighter-rouge">ForbiddenException</code> at a higher abstraction level.</li>
    </ul>
  </li>
  <li><strong>Item 75: Exception Translation</strong>
    <ul>
      <li>Instead of exposing lower-level exceptions directly, I decided to apply ‘exception translation’ by wrapping them in appropriate higher-level exceptions. The <code class="language-plaintext highlighter-rouge">Interceptor</code> would act as an ‘adapter’, catching the library’s <code class="language-plaintext highlighter-rouge">AuthorizationException</code> and converting it to project-compliant exceptions.</li>
    </ul>
  </li>
</ul>

<h3 id="33-final-solution-maintaining-architectural-consistency-through-exception-translation">3.3. Final Solution: Maintaining Architectural Consistency Through Exception Translation</h3>

<p><strong>1. Define Project-Appropriate Custom Exceptions</strong></p>

<p>First, I defined exceptions appropriate for our project’s abstraction level:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 401 Unauthorized</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">UnauthorizedException</span> <span class="kd">extends</span> <span class="nc">RuntimeException</span> <span class="o">{</span>
    <span class="kd">public</span> <span class="nf">UnauthorizedException</span><span class="o">(</span><span class="nc">String</span> <span class="n">message</span><span class="o">)</span> <span class="o">{</span> <span class="kd">super</span><span class="o">(</span><span class="n">message</span><span class="o">);</span> <span class="o">}</span>
<span class="o">}</span>

<span class="c1">// 403 Forbidden</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">ForbiddenException</span> <span class="kd">extends</span> <span class="nc">RuntimeException</span> <span class="o">{</span>
    <span class="kd">public</span> <span class="nf">ForbiddenException</span><span class="o">(</span><span class="nc">String</span> <span class="n">message</span><span class="o">)</span> <span class="o">{</span> <span class="kd">super</span><span class="o">(</span><span class="n">message</span><span class="o">);</span> <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p><strong>2. Apply Exception Translation in Interceptor</strong></p>

<p>Then, I modified the <code class="language-plaintext highlighter-rouge">AuthInterceptor</code> to catch the library’s exceptions and translate them to appropriate custom exceptions:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@Component</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">AuthInterceptor</span> <span class="kd">implements</span> <span class="nc">HandlerInterceptor</span> <span class="o">{</span>

    <span class="kd">private</span> <span class="kd">final</span> <span class="nc">AuthenticationService</span> <span class="n">authService</span><span class="o">;</span> <span class="c1">// External library service</span>

    <span class="c1">// ... constructor omitted ...</span>

    <span class="nd">@Override</span>
    <span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">preHandle</span><span class="o">(</span><span class="nc">HttpServletRequest</span> <span class="n">request</span><span class="o">,</span> <span class="nc">HttpServletResponse</span> <span class="n">response</span><span class="o">,</span> <span class="nc">Object</span> <span class="n">handler</span><span class="o">)</span> <span class="o">{</span>
        <span class="k">try</span> <span class="o">{</span>
            <span class="nc">String</span> <span class="n">token</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="na">getHeader</span><span class="o">(</span><span class="s">"Authorization"</span><span class="o">);</span>
            <span class="n">authService</span><span class="o">.</span><span class="na">checkPermission</span><span class="o">(</span><span class="n">token</span><span class="o">);</span> <span class="c1">// This method throws AuthorizationException</span>
        <span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">AuthorizationException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
            <span class="c1">// ✨ Exception Translation happens here</span>
            <span class="k">if</span> <span class="o">(</span><span class="n">e</span><span class="o">.</span><span class="na">getErrorCode</span><span class="o">()</span> <span class="o">==</span> <span class="nc">ErrorCode</span><span class="o">.</span><span class="na">TOKEN_EXPIRED</span><span class="o">)</span> <span class="o">{</span>
                <span class="k">throw</span> <span class="k">new</span> <span class="nf">UnauthorizedException</span><span class="o">(</span><span class="s">"Authentication token is invalid."</span><span class="o">);</span>
            <span class="o">}</span> <span class="k">else</span> <span class="k">if</span> <span class="o">(</span><span class="n">e</span><span class="o">.</span><span class="na">getErrorCode</span><span class="o">()</span> <span class="o">==</span> <span class="nc">ErrorCode</span><span class="o">.</span><span class="na">INSUFFICIENT_PERMISSIONS</span><span class="o">)</span> <span class="o">{</span>
                <span class="k">throw</span> <span class="k">new</span> <span class="nf">ForbiddenException</span><span class="o">(</span><span class="s">"You don't have permission to access this resource."</span><span class="o">);</span>
            <span class="o">}</span>
        <span class="o">}</span>
        <span class="k">return</span> <span class="kc">true</span><span class="o">;</span>
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<p><strong>3. Simplified ExceptionHandler</strong></p>

<p>As a result, the <code class="language-plaintext highlighter-rouge">GlobalExceptionHandler</code> returned to its clean original role of ‘simple conversion based on exception type’:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@RestControllerAdvice</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">GlobalExceptionHandler</span> <span class="o">{</span>

    <span class="c1">// 👍 Handler with clear roles and responsibilities</span>
    <span class="nd">@ExceptionHandler</span><span class="o">(</span><span class="nc">UnauthorizedException</span><span class="o">.</span><span class="na">class</span><span class="o">)</span>
    <span class="nd">@ResponseStatus</span><span class="o">(</span><span class="nc">HttpStatus</span><span class="o">.</span><span class="na">UNAUTHORIZED</span><span class="o">)</span>
    <span class="kd">public</span> <span class="nc">ErrorResponse</span> <span class="nf">handleUnauthorized</span><span class="o">(</span><span class="nc">UnauthorizedException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="n">e</span><span class="o">.</span><span class="na">getMessage</span><span class="o">());</span>
        <span class="k">return</span> <span class="k">new</span> <span class="nf">ErrorResponse</span><span class="o">(</span><span class="n">e</span><span class="o">.</span><span class="na">getMessage</span><span class="o">());</span>
    <span class="o">}</span>

    <span class="nd">@ExceptionHandler</span><span class="o">(</span><span class="nc">ForbiddenException</span><span class="o">.</span><span class="na">class</span><span class="o">)</span>
    <span class="nd">@ResponseStatus</span><span class="o">(</span><span class="nc">HttpStatus</span><span class="o">.</span><span class="na">FORBIDDEN</span><span class="o">)</span>
    <span class="kd">public</span> <span class="nc">ErrorResponse</span> <span class="nf">handleForbidden</span><span class="o">(</span><span class="nc">ForbiddenException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
        <span class="n">log</span><span class="o">.</span><span class="na">warn</span><span class="o">(</span><span class="n">e</span><span class="o">.</span><span class="na">getMessage</span><span class="o">());</span>
        <span class="k">return</span> <span class="k">new</span> <span class="nf">ErrorResponse</span><span class="o">(</span><span class="n">e</span><span class="o">.</span><span class="na">getMessage</span><span class="o">());</span>
    <span class="o">}</span>
    
    <span class="c1">// ... other handlers ...</span>
<span class="o">}</span>
</code></pre></div></div>

<h2 id="4-final-thoughts">4. Final Thoughts</h2>

<p>While library and framework guidelines are important, when they conflict with our application’s overall design principles and consistency, it may be better to integrate them non-intrusively by adding an ‘adapter’ layer as shown above.</p>

<p>Ultimately, good exception handling goes beyond simply catching errors - it’s an important design activity that enhances code <strong>readability, maintainability, and overall system stability</strong>.</p>

<hr />

<p><em>This article is based on real work experiences, with specific system names and configuration values generalized for security purposes.</em></p>

<h2 id="references">References</h2>

<ul>
  <li>Bloch, J. (2018). <em>Effective Java (3rd Edition)</em>. Addison-Wesley Professional.</li>
  <li>Martin, R. C. (2008). <em>Clean Code: A Handbook of Agile Software Craftsmanship</em>. Prentice Hall.</li>
  <li>Fowler, M. (2018). <em>Refactoring: Improving the Design of Existing Code (2nd Edition)</em>. Addison-Wesley Professional.</li>
  <li>Oracle. (2021). <em>The Java™ Tutorials - Exception Handling</em>. Oracle Documentation.</li>
  <li>Spring Framework Documentation. (2023). <em>Exception Handling in Spring MVC</em>. VMware, Inc.</li>
</ul>]]></content><author><name>SeungHyeon Lee</name><email>tmdgus8490@gmail.com</email></author><category term="Work Experience" /><category term="Exception Handling" /><category term="Effective Java" /><category term="Clean Code" /><category term="Spring Framework" /><category term="Best Practices" /><summary type="html"><![CDATA[A comprehensive guide to exception handling best practices based on Effective Java, including real-world implementation experiences.]]></summary></entry></feed>