Vramfs: Vram Based Filesystem for Linux

devit · 2025-03-30T01:31:43 1743298303

That's cool but I think the proper solution is to write a Linux kernel module that can reserve GPU RAM via DRM to create ramdisks, not create a userspace filesystem using OpenCL.

That would give proper caching, direct mmap support if desired, a reliable, correct and concurrent filesystem (as opposed to this author's "all of the FUSE callbacks share a mutex to ensure that only one thread is mutating the file system at a time"), etc.

dcanelhas · 2025-03-29T19:05:06 1743275106

On the topic of coercing bits into functioning as data storage: harder drive ( http://tom7.org/harder/ )

godelski · 2025-03-29T20:04:12 1743278652

  > harder drive

Here's the direct YouTube link[0]

I'd *HIGHLY* recommend this video to anyone here. It is exactly that fun silly computer science stuff where you also learn a shit ton. His channel is full of this stuff.

  Don't ask why, ask why not

Is essentially the motto of his channel, and it is the best. Leads to lots of innovations and I think we all should encourage more of this kind of stuff.

  [0] https://www.youtube.com/watch?v=JcJSW7Rprio

rwmj · 2025-03-29T20:03:26 1743278606

Tom used nbdkit, which would have been a better choice here. You could probably make a VRAM plugin in a few minutes if you knew what the read & write calls are: https://gitlab.com/nbdkit/nbdkit/-/blob/6017ba21aeeb3d7ad859...

shadowpho · 2025-03-29T20:12:21 1743279141

2 GB/s is pretty crappy, that’s about the burst speed of many nvme SSDs.

Virtual disk should me more then 6 gb/s at least with ddr5.

d3Xt3r · 2025-03-29T20:35:27 1743280527

Yes but bear in mind that those benchmarks were taken on an ancient system, with an ancient OS/kernel and FUSE:

   - OS: Ubuntu 14.04.01 LTS (64 bit)   
   - CPU: Intel Core i5-2500K @ 4.0 Ghz   
   - RAM: 8GB DDR3-1600   
   - GPU: AMD R9 290 4GB (Sapphire Tri-X)

So that's an Gen 2 CPU, with DDR3 RAM and a PCIe 3.0 GPU.

On a modern system, with a recent kernel+FUSE, I expect the results would be much better.

But we also now have the phram kernel module, with which you can create a block device completely bypassing FUSE, so using phram should result in even greater performance than vramfs.

vladvasiliu · 2025-03-30T08:46:49 1743324409

> a PCIe 3.0 GPU

Note that that CPU only has PCIe 2.0 according to Intel: https://www.intel.com/content/www/us/en/products/sku/52210/i...

somat · 2025-03-29T20:44:33 1743281073

Also all reads and writes have to go across pcie and through the cpu, which should be fast but you are not going to get vram to gpu access speeds

finnjohnsen2 · 2025-03-29T21:31:49 1743283909

using precious vram to store files is a special kind of humor. especially since someone actually implemented it. kudos

theragra · 2025-03-31T08:11:48 1743408708

It is not precious if you don't run LLMs or play games. For many people like myself, video card is idle most of the time. Using its ram to speed-up compilation or similar is not a bad idea.

LeFantome · 2025-03-31T17:04:57 1743440697

It would be interesting to have something that used VRAM when there was no other demand and regular RAM otherwise.

Even gamers and (most) LLM users are not using the GPU all the time.

12destroyer21 · 2025-03-29T18:38:05 1743273485

What is the overhead on a FUSE filesystem compared to being implemented in the kernel? Could something like eBPF be used to make a faster FUSE-like filesystem driver?

marbu · 2025-03-29T20:53:57 1743281637

> What is the overhead on a FUSE filesystem compared to being implemented in the kernel?

The overhead is quite high, because of the additional context switching and copying of data between user and kernel space.

> Could something like eBPF be used to make a faster FUSE-like filesystem driver?

eBPF can't really change any of the problems I noted above. To improve performance one would need to change how the interface between kernel and user space part of FUSE filesystem works to make it more efficient.

That said FUSE support for io_uring, which got merged recently in Linux 6.14, has a potential there, see:

https://www.phoronix.com/news/Linux-6.14-FUSE

ChocolateGod · 2025-03-29T20:32:37 1743280357

There is considerable overhead of the user space <> kernel <> userspace switches, you can see similar with something like Wireguard if you compare the performance of its go client Vs the kernel driver.

Some fuse drivers can avoid the overhead by letting the kernel know that the backing resource of a fuse filesystem can be handled by the kernel (e.g. for fuse based overlays FS where the backing storage is xfs or something), that probably isn't applicable here.

If you're in kernel space though I don't think you'd have access to OpenCL so easily, you'd need to reimplement it based on kernel primitives.

smw · 2025-03-29T22:38:14 1743287894

Tailscale tells us that, at least on some hardware, wireguard-go userspace performance beats the in-kernel implementation?

https://tailscale.com/blog/more-throughput

eru · 2025-03-30T09:07:39 1743325659

> What is the overhead on a FUSE filesystem compared to being implemented in the kernel?

It depends on your use case.

If you serve most of your requests from kernel caches, then fuse doesn't add any overhead. That was the case for me, when I had a FUSE service running to directly serve all commits from all branches (from all of history) at the same time as directories directly from the data in a .git folder.

d3Xt3r · 2025-03-29T20:48:59 1743281339

If you want to avoid the overhead of FUSE, just use the phram kernel module: https://wiki.archlinux.org/title/Swap_on_video_RAM

fp64 · 2025-03-30T14:42:05 1743345725

Somewhat related, there is NVIDIA CUDA Direct Storage[0] which provides an API for efficient “file transfer” between GPU and local filesystem. Always wanted to give it a try but haven’t yet

[0]: https://docs.nvidia.com/gpudirect-storage/index.html

dheera · 2025-03-29T19:44:21 1743277461

If you want a vramfs, why would you use GPU VRAM? CPU<->GPU copy speeds are not great.

I have 192GB of CPU VRAM in my desktop and that was cheap to obtain. Absolute best build decision ever.

yjftsjthsd-h · 2025-03-29T21:36:40 1743284200

> I have 192GB of CPU VRAM in my desktop and that was cheap to obtain.

How? Or what's "cheap" here? (Because I wouldn't call 192G of just regular RAM that's plugged into the motherboard cheap, I think everything else is more expensive, and if there's some hack here that I haven't caught I very much would like to know about it)

dheera · 2025-03-30T02:16:16 1743300976

4x48GB Corsair DDR5 sticks is about $500.*

Which is pretty cheap compared to the cost of my whole build and whatever other things I've spent on. Cheap is relative, but I'm just saying that if you're going to spend $3000+ on a build, and you love to work with massive datasets, VMs, and things, $500 for a metric fuckton of RAM so that your system is never, ever swapping, is a very worthwhile thing to spend on.

192GB worth of GPU will cost you about $40000, for reference, and will be less performant if your goal is just a vramfs for CPU tasks.

* Beware that using 4 DDR5 slots will cut your memory bandwidth in half on consumer motherboards and CPUs. But I willingly made that tradeoff. Maybe at some point I'll upgrade to a server motherboard and CPU.

yjftsjthsd-h · 2025-03-30T02:23:38 1743301418

Ah, okay. Yes, if that's your reference point then just buying more RAM to plug into the motherboard is an excellent deal.

brutal_chaos_ · 2025-03-30T05:48:49 1743313729

Regarding *, do you know why? Shouldn't dual/quad channel be in effect?

LtdJorge · 2025-03-29T19:52:20 1743277940

What other VRAM is there?

winwang · 2025-03-29T23:40:01 1743291601

Couple of reasons. 1. You can use vram when you don't have massive amounts of ram for a ramdisk (or /dev/shm) 2. Depending on implementation, you might have faster random seek/write than normal ram. 3. You could presumably run certain gpu kernels on the vramfs.

harha_ · 2025-03-30T10:45:17 1743331517

Cool, I love that there are ways to utilize RAM and VRAM as filesystems. Sometimes you just don't need all that pure RAM/VRAM.

hinkley · 2025-03-29T18:46:05 1743273965

These days is it better to use an old video card or a few PCIE NVME multiplexer for those same lanes?

Tuna-Fish · 2025-03-29T22:15:21 1743286521

Hands down the latter. Good M.2 drives can generally get pretty close to the capacity of the bus, and you can fit literally a thousand times more stuff on 4 NVME than you can on any old GPU.

3np · 2025-03-30T08:39:41 1743323981

The nvme, by far.

Hard to imagine a reasonable real-world use-case for vramfs. Still cool.

hinkley · 2025-03-30T17:40:21 1743356421

It has been tried in each generation of motherboard design but in an era where GPUs had a custom motherboard slot that normal cards could not occupy it made a sort of sense. And I know there have been times where the northbridge could not saturate as many PCIe devices as one might have motherboard slots. So even leaving the slot intended for GPUs empty or populated with a daughter card might be leaving performance on the table. But I suspect a riser card would fit handily into a 16x slot without blocking more than one or two 2x slots.

Dwedit · 2025-03-29T20:13:38 1743279218

Using something like this would keep the GPU powered on and unable to shut itself off.

mjg59 · 2025-03-29T20:31:01 1743280261

Why? Vram has to be powered as long as you're scanning out of it, any competent design is going to support powering down most of the GPU while keeping RAM alive otherwise an idle desktop is going to suck way more power than necessary

Dwedit · 2025-03-30T04:22:31 1743308551

Some GPUs don't have scanout, such as Laptop GPUs that pipe pixels through the iGPU. Those fully power off when they're not in use.

bobmcnamara · 2025-03-29T21:01:55 1743282115

I wonder if any GPU is powering down chips or banks like you can on PC.

They all have MMUs right? So you could defrag all in-use memory to fewer refresh domains too.

wtallis · 2025-03-29T21:31:41 1743283901

GPUs will drop memory clocks dynamically, with at least one supported clock speed that's intended to be just fast enough to support scanning out the framebuffer. I haven't seen any indication that anybody is dynamically offlining VRAM capacity.

JackYoustra · 2025-03-29T21:28:55 1743283735

you can validate this yourself: if you have access to an A/H100, allocate a 30gb tensor and do nothing - you'll see nvidia-smi's reported wattage go up by a watt or so

wtallis · 2025-03-29T23:30:26 1743291026

That doesn't prove anything. Allocating a second 30GB chunk and seeing the power go up another Watt would be more convincing.

ggm · 2025-03-29T21:47:20 1743284840

Doesn't the graphics processor of the pi double as bootstrap loader?

knome · 2025-03-29T18:24:03 1743272643

could be a good place to sequester a swap file, similar to zram

yjftsjthsd-h · 2025-03-29T18:38:34 1743273514

You can, but https://wiki.archlinux.org/title/Swap_on_video_RAM suggests not doing it this way:

> Warning: Multiple users have reported this to cause system freezes, even with the fix in #Complete system freeze under high memory pressure. Other GPU management processes or libraries may be swapped out, leading to nonrecoverable page faults.

and in general you have to be really careful swapping to anything that uses a driver that could itself be swapped (which FUSE is especially prone to, but IIRC even ZFS and NFS did(?) have caveats with swap).

OTOH that same page documents a way to swap to vram without going through userspace, so don't take this as opposition to the general idea:)

3np · 2025-03-30T08:41:34 1743324094

> IIRC even ZFS and NFS did(?) have caveats with swap

ZFS still does. If you run VMs off zvols you want to avoid putting swap there. Learned this the hard way.