Executive Summary
Global AI spending is projected to reach $2.5 trillion in 2026, with over half flowing into
infrastructure. NVIDIA dominates this landscape, capturing roughly 86% of data center GPU revenue.
Yet this dominance is not purely a hardware story: on identical H100 chips, software optimization
alone can produce over 3x differences in actual throughput. The software ecosystem built over nearly
two decades since CUDA's 2006 launch constitutes a structural barrier that competitors cannot easily
replicate.
This report classifies the AI software stack into four layers—Framework, Compiler, Acceleration
Library, and Driver/Runtime—and derives three distinct lock-in mechanisms: performance lock-in,
where optimization asymmetries cause de facto convergence toward specific hardware; design
lock-in, where framework-compiler-hardware co-design fixes the hardware path at the point of
software selection; and structural lock-in, where the closed-source driver/runtime physically blocks
hardware substitution. When these types overlap, switching costs increase exponentially. Meanwhile,
open-source inference serving engines such as vLLM and SGLang are emerging as new variables
that partially mitigate traditional lock-in structures.
Analyzing major players through this framework reveals that NVIDIA maintains a dual barrier of
performance and structural lock-in, Google constructs a separate design lock-in pathway through
TPU-XLA-JAX, and Huawei replicates all three lock-in types domestically through
Ascend-CANN-MindSpore.
Applying this framework to K-NPU, we diagnose that Korea's NPU ecosystem has successfully
entered the framework layer through PyTorch native support and vLLM integration, but faces a
sequential three-stage challenge: Stage 1 (framework entry) is achieved, Stage 2 (resolving
performance lock-in in compiler and library layers) is in progress, and Stage 3 (building operational
ecosystem scale) remains nascent. A circular dependency exists between Stages 2 and
3—performance gaps hinder reference accumulation, while lack of references undermines investment
justification.
Based on this diagnosis, we recommend lock-in-type-specific policy responses: (1) resolving
performance lock-in by expanding R&D from chip design to the full software stack; (2) mitigating
performance lock-in through participation in global open-source projects such as OpenXLA and
MLIR; (3) circumventing structural lock-in by creating public-sector demand to build large-scale
operational references that break the circular dependency; (4) introducing a TCO evaluation
framework to make switching costs across all three lock-in types visible and quantifiable; and (5)
establishing talent pipelines for AI compiler and system software specialists as the execution
foundation for all policy measures.