Storage-Compute Symbiosis: Redefining Data Storage in the AI Era
Neil  2026-02-25 20:29  发布于中国

Neil Sun | Secretary-General, Data Storage Committee of China Electronics Standardization Association

— Included in Data Dialogue, Issue 002

A consensus is emerging: the data storage industry is undergoing a fundamental, structural transformation driven by AI-native workloads. This is not a linear upgrade in capacity or speed, but a comprehensive restructuring of system architecture, the relationship between storage and computing, and the industry ecosystem.

We have all encountered this scenario: a compute cluster training a trillion-parameter model hits a bottleneck, not from insufficient GPU power, but because the storage system cannot "feed" the GPUs fast enough. This reveals a critical shift: In the AI era, storage is evolving from a silent auxiliary component into an active participant in building intelligence [1].

We are at an inflection point. As data volumes surge from petabytes (PBs) toward exabytes (EBs)[2], as storage systems are expected to understand data rather than just store bits, and as the line between memory and storage blurs, a new storage epoch is dawning.

I. From PB to EB: Architectural Reconstructing

The training data for a single large model has jumped from terabytes to petabytes, with exabyte scale datasets on the horizon. This magnitude of growth breaks traditional storage design logic, which assumes data access locality. AI workloads violate this assumption. Training requires near random access to massive numbers of small files, while inference must maintain vast, dynamic states like the KV cache. At an exabyte scale, the traditional PCIe-based I/O model becomes a bottleneck, bogged down by data copying and protocol conversion overhead.

CXL and Unified Memory: Redefining the Data Path

Technologies like Compute Express Link (CXL) have emerged to address this. Using the PCIe physical layer, CXL introduces cache-coherent memory semantics, allowing CPUs, GPUs, and storage devices to share a unified memory space [3]. This drastically reduces data movement. A GPU can directly access data on an SSD as if it were its own memory, turning storage from a remote peripheral into a direct compute partner. The performance implications are profound.

The Unified Namespace Challenge Managing billions or trillions of files renders traditional hierarchical directory structures ineffective. Next-gen systems are exploring new paradigms where data is addressed not by path, but by meaning—using content hashes, vector embeddings, or knowledge graphs [4]. In this paradigm, data is not found through "paths" but connected through "meaning." For example, a file containing an image of a cat may no longer reside under a fixed directory like /images/animals/cats/, but instead be directly associated with all other cat images via its visual feature vector. This shift requires storage systems to possess fundamental content understanding capabilities [5], marking the evolution of storage from "bit managers" to "semantic understanders."

II.  Storage-Compute Symbiosis: A Pivotal Role Shift

Diverging Roles in Training and Inference

In the AI workflow, storage plays complementary yet distinct roles.

• Training: The primary challenge is efficiently supplying data and saving massive checkpoints without interrupting workflow. Intelligent checkpointing and fast recovery mechanisms exemplify "enhancing compute with storage," boosting continuity and resilience.

• Inference: Storage evolves into a "state maintainer" and "knowledge carrier." The KV cache, acting as a short-term memory to avoid redundant computation, can be hundreds of gigabytes with random access. AI-optimized storage minimizes this overhead through multi tier caching and tight GPU coordination.

More groundbreaking is the rise of "storage-as compute" or "query-over-compute." In scenarios like recommendation systems, pre-computed results (e.g., user-item vector matches) are stored and retrieved, which is more efficient than real time calculation. This blurs the compute-storage boundary, akin to an expert consulting a manual for common answers to save mental effort for novel problems.

Data Mobility: From Tiering to Two-Way Activation

AI disrupts the traditional one-way "hot → warm → cold" data tiering model by introducing the ability to "reactivate" cold data. For instance, a decade's worth of e-commerce purchase records, once relegated to cold storage for compliance, becomes invaluable for training models to predict long-term consumer trends or uncover latent market opportunities. Storage systems must now support efficient, large-scale "cold data activation", which requires automated cross-tier data movement and predictive data warming based on the evolving needs of AI tasks.

Storage as a Function: Ubiquitous Data Service

The most fundamental shift is the redefinition of storage from a device to a capability set embedded across infrastructure. Huawei's Unified Cache Manager (UCM) is a prime example [6]. It is not a standalone box but a software suite that orchestrates a unified memory pool from GPU HBM and host DRAM to NVMe SSDs. For AI inference, it intelligently schedules data like the KV cache across these tiers based on access patterns. Here, "storage" is no longer a fixed location but an intelligent, orchestrated data service—wherever the data resides.

III. Media Convergence: Blurring Hierarchies

AI workloads are actively dissolving the discrete layers of the traditional storage-media pyramid (register/cache/DRAM/SSD/HDD), pushing it toward a continuous performance-capacity spectrum. Technologies like Persistent Memory (PMem), despite commercial hurdles, validated the need for a medium that bridges the gap between memory-like speed and storage-like persistence—ideal for AI data structures such as embedding tables. Concurrently, High Bandwidth Memory (HBM) is evolving from a dedicated GPU resource into a potential system-wide caching layer, accessible to CPUs and other accelerators via interconnects like CXL, catering to the extreme bandwidth demands of inference.

Intelligent, Semantic-Aware Tiering

This media diversification enables data placement strategies far more sophisticated than those based solely on access frequency. An AI-aware storage system can make placement decisions by analyzing:

(1) Compute affinity: Is the data frequently accessed by a specific GPU?

(2) Access patterns: Is the data read sequentially or accessed randomly? Does it involve large or small I/O blocks?

(3) Semantic importance: What role does the data play within the model? Is it a critical attention head or a compressible, redundant parameter?

(4) Lifecycle: Is the data a permanent model parameter, a temporary intermediate state, or a cache entry nearing expiration?

This multi-dimensional analysis allows for dynamic, semantic-based tiering. In a Mixture of Experts model, for instance, only the parameters of the currently activated "experts" need to reside in the fastest tier (HBM), while inactive experts can be cost-effectively stored on higher-capacity QLC SSDs.

Long-term memory storage

A defining characteristic of human intelligence is the possession of long-term memory. We do not simply react to immediate inputs but draw upon years or even decades of experience. In contrast, current AI systems largely operate without this capability, often starting each interaction from a blank slate. The emerging concept of "long-term memory storage" aims to build a continuously growing, queryable knowledge base for AI, enabling it to connect present queries with long term context and past interactions. This requires storing not just raw data, but its associated context, relationships, and evolution over time— a challenge that may involve converging the capabilities of vector databases, graph databases, and time-series stores at the infrastructure layer. The significant technical hurdle lies in maintaining high data throughput while simultaneously supporting complex semantic searches and traversals across terabytes of this "memory."

IV.  Semantic Awareness: The Storage System's Cognitive Leap

From Bits to Meaning: Content-Aware Storage

Conventional storage is content-agnostic, managing bits without understanding their meaning. This becomes a critical bottleneck in AI pipelines, whether for identifying prunable weights, retrieving knowledge semantically, or cleaning corrupted training samples. Content Aware Storage (CAS) addresses this by embedding or tightly integrating lightweight analysis engines. These engines can extract features, generate indexes, and tag data as it is ingested. For example, a CAS system for medical imaging could automatically classify modalities and flag anomalies. Solutions like the one from IBM and NVIDIA go further, using integrated AI microservices to transform unstructured documents into searchable vector knowledge, continuously updating only changed content [7]. For training, this means many preprocessing steps (deduplication, augmentation) can occur at the point of ingestion, with results cached to drastically reduce future data movement overhead.

AI-Native Protocols and Specialized Stacks

General-purpose storage protocols, designed for broad compatibility, often incur performance penalties when faced with the distinct patterns of AI workloads. This has spurred the development of AI-native protocols. Universal Checkpointing is a prime example [8]. It avoids the I/O storm caused by thousands of GPUs simultaneously writing a monolithic checkpoint file. Instead, each GPU writes its local model shard, and a separate universal format describes how these shards map to the global model. This approach decouples the checkpoint from the hardware configuration, enabling efficient recovery and elastic scaling. Similarly, the rise of Retrieval Augmented Generation (RAG) is transforming vector similarity search from an application-layer task into a native storage primitive, necessitating efficient, standardized interfaces for vector index management that support streaming updates and hybrid searches.

The Modularity of Storage Microservices

The monolithic storage array is giving way to a disaggregated, microservices-based design. In this model, a storage system comprises a set of collaborative services—dedicated modules for metadata indexing, block I/O, multi-tier caching, vector retrieval, and resilience enforcement.

This architecture offers compelling advantages: independent scalability of each function, the flexibility to compose tailored storage stacks for specific workloads, and accelerated innovation, as new capabilities (e.g., a "sparsity-aware data service" for sparse model training) can be developed and deployed as independent microservices.

V.  AI-Native Resilience: An Intrinsic Immune System

The traditional resilience model of "perimeter defense"—locking data inside a fortified vault—is obsolete in the AI era. Data must flow freely to be aggregated for training, accessed with low latency for inference, and shared for collaboration, making it more exposed than ever. AI-native resilience adopts a different philosophy: instead of trying to immobilize data, it embeds protection into the data itself and every layer of the infrastructure, creating an "intrinsic resilience" architecture.

Core Principles Include:

• Data-Centric Traceability: Each data unit carries its provenance, transformation history, and usage policies, enforceable wherever it travels.

• Policy Enforcement Bound to Data: Resilience rules travel with the data (e.g., "for training only"), not just reside on the storage device.

• Privacy-Preserving Computation: Integration of techniques like homomorphic encryption or resilient multi-party computation, allowing data to be utilized while remaining encrypted.

• Defense Against AI-Specific Threats: Proactive measures to detect and mitigate novel threats like training data poisoning, model extraction attacks, and membership inference.

The Convergence of Resilient Computing and Trusted Storage

A new security paradigm is emerging from the fusion of trusted hardware and AI storage requirements. Technologies like Hardware Security Modules (HSMs) and, more pivotally, Trusted Execution Environments (TEEs)—such as Intel SGX and AMD SEV—can create hardware-isolated, protected execution environments. These ensure that even the cloud provider or compromised system software cannot access the code and data within, a capability of paramount value for AI.

Consider a multi-party collaborative training scenario where several companies wish to jointly train a model without sharing their proprietary data. Traditionally, this would necessitate complex legal frameworks and technical isolation, a cumbersome process that cannot fully eliminate risk. In a TEE-enhanced storage architecture, each party's data remains encrypted in their own storage. The training process is executed entirely within the resilient TEE: encrypted data is sent in, decrypted, and used internally, and only the encrypted final model is output. Throughout this process, raw data is never exposed, and even the intermediate model parameters are protected.

This architecture demands deep integration between the storage system and the TEE. The storage system must be semantically aware of data encryption states, intelligently route data to the TEE, and determine which operations must occur inside the trusted enclave. Simultaneously, critical components of the storage stack itself— such as the access control engine and audit log service—may need to run within the TEE to prevent software-level attacks and ensure end-to end trust.

Defense Against AI-Specific Resilience Threats

AI systems face unique threats that traditional defenses are ill-equipped to handle, necessitating evolution at the storage layer.

• Data Poisoning Defense: Attackers can manipulate model behavior by contaminating training data. Storage systems mitigate this through granular data provenance tracking, anomaly pattern detection, and robust version control. By recording the source and complete processing history of each training sample, the system can trace aberrant model behavior back to specific, potentially poisoned data batches for review and rollback.

• Model Theft Protection: Attackers may attempt to reconstruct a proprietary model through exhaustive queries to an inference service. Storage and data access layers can defend against this by monitoring query patterns, detecting anomalous access frequencies, and enforcing strict rate limits. More advanced techniques involve implementing differential privacy mechanisms at the storage layer, automatically adding calibrated noise to query results to mathematically hinder model extraction.

• Membership Inference Attack Defense: Attackers might try to deduce whether a specific individual's data was part of the training set. Storage systems can counter this through strict training data lifecycle management and access control. This includes ensuring that raw training data cannot be directly accessed after the training phase concludes, allowing analytical access only through carefully audited and policy-controlled interfaces.

VI.  Conclusion: The Invisible Revolution

We are witnessing the most profound transformation in storage technology since the invention of disks. This transformation is not a gradual improvement, but a structural leap. It is not the breakthrough of a single technology, but a comprehensive reshaping of architecture, roles, media, intelligence, and resilience.

Future storage systems will no longer be merely places to store data. They will be intelligent carriers, collaborators in computation, and enforcers of resilience. They will understand the meaning of data, not just the bits. They will participate in, rather than merely facilitate intelligent processes. And they will accelerate the flow of data rather than merely store data.

As the boundaries between storage and computation blur, the transformation of data into knowledge becomes more tangible, and the relationship between resilience and efficiency is redefined. What we see is not only technological progress but a change in the very way intelligence exists. Storage is becoming the foundation for AI's evolution towards true intelligence. On this path, every architectural innovation, protocol optimization, and resilience enhancement helps create richer, more flexible, and more reliable foundations for machine memory and reasoning.

This invisible revolution is quietly taking place in laboratories and data centers around the world, and its impact will extend far beyond the domain of technology. It will transform every area of human development, from scientific research to business innovation. The one certainty is that organizations that understand and embrace this transformation early will have a massive competitive advantage in the AI era. Storage is no longer merely a place to preserve data—it is the soil from which the future intelligence will grow.

Access Data Dialogue, Issue 002 →

 

References:

[1] Weimin Zheng, Academician of Chinese Academy of Engineering. WeChat official account of the Data Storage Committee of China Electronics Standardization Association—In-depth Interpretation: Why Do Large AI Models Need Future-proof AI Storage? https://mp.weixin.qq.com/s/zta0mObf3pSvVXPlgCYOBQ

[2] According to the 2025 Storage Power Development Report, large AI model training demands millisecond-level latency, TB-scale bandwidth, and EB-scale scalability from storage, driving the parallel development of all-flash architectures, AI data lakes, and intrinsic resilience technologies.

[3] CXL. CXL white paper: https://docs.wixstatic.com/ugd/0c1418_d9878707bbb7427786b70c3c91d5fbd1.pdf

[4] NVIDIA Technical Blog. Optimizing Vector Search for Indexing and Real-Time Retrieval with NVIDIA cuVS: https://developer.nvidia.cn/blog/optimizing-vector-search-for-indexing-and-real-time-retrieval-withnvidia- cuvs/

[5] Springer Nature Link. Survey of vector database management systems: https://link.springer.com/ article/10.1007/s00778-024-00864-x

[6] WeChat official account of the Data Storage Committee of China Electronics Standardization Association— Industry Insights. UCM Innovative Technologies Accelerate AI Inference Cost Reduction and Experience Upgrades: https://mp.weixin.qq.com/s/db-BASpb24PJsJ_-FRy8zw

[7] IBM. New content-aware capabilities help IBM Storage Scale improve AI responses: https://www. ibm.com/ new/announcements/new-content-aware-capabilities-help-ibm-storage-scale-improve-airesponses

[8] Cornell University. Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelis: https://arxiv.org/abs/2406.18820

 

 

全部回复(
回复
回复
发布帖子
帖子标题
行业分类
场景分类
帖子来源
发送语言版本
可切换语言,在您的个人中心检查译文是否正确
我要投稿
姓名
昵称
电话
邮箱
文章标题
行业
领域

投稿成功

感谢您的精彩投稿!✨我们的编辑团队正在快马加鞭审核中,请稍候~

如有任何修改建议,会第一时间与您联系沟通哒!

发布文章
文章标题
文章分类
发送语言版本
可切换语言,在您的个人中心检查译文是否正确