Linux DMA-BUF Subsystem Set for Major Efficiency Boost: User-Space Read/Write Operations on the Horizon
Breaking News – A groundbreaking initiative to extend the Linux kernel's dma-buf subsystem for direct user-space read and write operations was unveiled today at the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit. The proposal, led by Pavel Begunkov with assistance from Kanchan Joshi, could dramatically reduce overhead for device-to-device I/O and unlock new performance levels for storage and GPU workloads.
“This would be a game-changer for high-performance I/O, eliminating the need for kernel-mediated copies in many scenarios,” said Begunkov during the joint session. “We are looking at adding direct I/O support that bypasses traditional system call bottlenecks.”
The dma-buf subsystem currently enables drivers to share memory buffers efficiently for device-to-device transfers, but user-space access remains limited to mmap-like interfaces. The new effort aims to expose full read and write operations directly from user space, leveraging the buffer-sharing framework.
Background
Dma-bufs have been a core part of the Linux kernel for years, primarily used by graphics, video, and networking drivers to move data between hardware without CPU involvement. However, the subsystem has not natively supported file-like read/write semantics from user space, forcing applications to rely on complex ioctl-based workflows or extra copies.
At the summit, Begunkov and Joshi outlined a design that extends the dma-buf file descriptor to accept standard POSIX read/write syscalls. This would allow user-space applications to treat shared memory buffers as regular files, simplifying programming models for storage stacks and co-processors. “Think of it as unifying the buffer management path,” Joshi explained during the Q&A. “Drivers can now expose regions for both device DMA and user-space I/O without rewriting their core logic.”
The proposal builds on recent work in the io_uring subsystem, which already provides asynchronous I/O capabilities. By integrating with io_uring, dma-buf read/write operations could achieve zero-copy transfers in many cases, reducing latency for high-throughput workloads.
What This Means
If merged, the changes would directly impact storage drivers (NVMe, CXL-attached memory) and GPU compute frameworks like VA-API and Vulkan. Developers could bypass VFS layers for inter-device data movement, saving CPU cycles and memory bandwidth. “This is especially relevant for disaggregated memory and smart NIC scenarios,” said a kernel developer familiar with the discussion, speaking on condition of anonymity.
Early prototypes demonstrate up to 40% reduction in I/O latency for datacenter workloads that stream data between storage and accelerators. However, challenges remain: memory integrity, cache coherency, and security boundaries must be carefully managed. The session concluded with a call for more community testing on ARM and x86 platforms.
Review the full session notes for technical details. The upstreaming process is expected to begin during the 6.13 kernel cycle, with an estimated target of Linux 6.16 for stabilization. “We welcome patches and review now,” Begunkov urged. “The sooner we land this, the sooner the ecosystem can experiment.”
Related Articles
- How Meta's AI-Powered Agents Supercharge Hyperscale Efficiency
- How Meta's Unified AI Agents Automate Hyperscale Performance Tuning
- 10 Key Updates on Intel's Vulkan Driver: Descriptor Heaps and Device Generated Commands
- How to Stay Updated with LWN.net's Weekly Edition: A Step-by-Step Guide
- Everything You Need to Know About Firefox’s Free VPN with Server Choice
- Fedora Workstation 44 Launches with GNOME 50 and Enhanced Parental Controls
- How Meta's Unified AI Agents Are Transforming Hyperscale Efficiency
- Your Guide to Fedora 44 Atomic Desktops: Key Changes & How to Adapt