So I had a micro PC that was running one of my core services and it only supports NVMe drives. Unfortunately, this little guy cooked itself and I’m not in a position to replace the drive. The system is still good and is fairly powerful, so I want to be able to reuse it.
I’m thinking I want to set up some kind of netboot appliance on another server to be able to allow me to boot the system without ever having a local disk. One thing I want to is run some docker images (specifically Frigate) but i wont be able to write anything to persistent storage locally. NFS shares are common in my setup.
Is it even possible to make a ‘gold image’ of a docker host and have it netboot? I expect that memory limitations (16GB) will be my main issue, but I’m just trying to think of how to bring this system back into use. I have two NAS appliances that I can use for backend long term storage (where I keep my docker files and non-database files anyway), so it shouldn’t be too difficult to have some kind of easily editable storage solution. I don’t want to use USB drives as persistent storage due to lifespan concerns from using them in production environments.
Id be pretty wary of using any system that “cooked” an nvme. That not the sign of an actual healthy system.
Was the failure just heat damage?
I’m actually not 100% what killed the drive. It could have been an issue with the drive wearing out, but my services didn’t write much locally and it wasn’t super old so I assume its a heat issue with a fanless micro system. I try to write everything important to my NASs so I don’t have to worry about random hardware failures, but this one didn’t have backups configured before it failed. Other than the drive issue its been solid for 1.5-2 years of near constant uptime.
Unless you are writing petabytes the nvme did not just burn “wear” out. Probably shouldn’t do anything until you figured out what caused this failure
Yeah, I didnt think that was a realistic possibility. Given that it was a bitty fan less nuc style system, I’m leaning more to a heat death as I originally surmised.
E: though another person suggested a frigate misconfig could have worn the drive out early
Consumer SSDs generally only have a 200-600TBW rating, not petabytes. Its pretty easy to wear one out in a few years installed in a server.
Is the drive totally dead? Curious what SMART would report.
My gut feeling is that it’s probably cheaper to buy a replacement m.2 than the hours of time to get netboot working but it could be a fun project!
I might be able to hook it up to a usb NVMe reader, but when I initially tried I barely got any recognition of the drive from the OS. My primary system is windows, so I might get more info from one of my linux systems, just haven’t had the fucks to give to the dead drive. As for a replacement drive, funds are scarce and time/learning is (comparatively) free. Someone else suggested kubernetes, so I might look into that to see if that can accomplish what I’m looking for.
Modern minipc often place nvme near other elements that heats and that’s what kills nvme since they need to be cooled too, you can try to place cooling pads and micro radiators here and there and try to isolate them from each other but many mini pc have this flaw nowadays
Yeah, pretty much what I guessed. The drive came with a cooling pad but it didn’t do much at all
Kind of, but probably not. I started writing this and was like “totally it could be stateless”. Docker runs stateless, and I believe when it starts it is still stateless (or at least could be mounted on a ramdrive) - but then I started thinking, and what about the images? Have to be downloaded and ran somewhere, and that’s going to eat ram quickly. So I amend to you don’t need it to be stateful, you could have an image like you talked about that is loaded every time (that’s essentially what kubernetes does), but you will still need space somewhere as scratch drive. A place docker will places images and temporary file systems while it’s running.
For state, check out docker’s volume backings here: https://docs.docker.com/engine/storage/volumes/. You could use nfs to another server as an example for your volumes. Your volumes would never need to be on your “app server”, but instead could be loaded via nfs from your storage server.
This is all nearing into kubernetes territory though. If you’re thinking about netboot and automatically starting containers, and handling stateless volumes and storing volumes in a way that are synced with a storage server… it might be time for kubernetes.
I guess you can also use NFS/iSCSI for images too?
Correct, I run docker on a compute host that has no local storage. The host’s disks are on iSCSI LUNs.
That’s really good to know. Do you ever have issues writing database files on those disks? Database files on nfs mounts have been the bane of my existence.
FWIW I run only very small databases e.g., sqlite ones shipped with applications, but haven’t had any problems in about a year now, and nothing that wasn’t recoverable from backup.
So I amend to you don’t need it to be stateful, you could have an image like you talked about that is loaded every time (that’s essentially what kubernetes does), but you will still need space somewhere as scratch drive. A place docker will places images and temporary file systems while it’s running.
Putting the image somewhere is easy. I’ve got TBs of space available on my NAS drives, especially right now with not acquiring any additional linux ISOs.
For state, check out docker’s volume backings here: https://docs.docker.com/engine/storage/volumes/. You could use nfs to another server as an example for your volumes. Your volumes would never need to be on your “app server”, but instead could be loaded via nfs from your storage server.
I’ll check that out. If that allows me to actually write databases to disk on the nfs backing volume, that would be amazing. That’s the biggest issue I run into (regularly).
This is all nearing into kubernetes territory though. If you’re thinking about netboot and automatically starting containers, and handling stateless volumes and storing volumes in a way that are synced with a storage server… it might be time for kubernetes.
I don’t think I’ve ever looked into kubernetes. I’ll have to look into that at some point… Any good beginner resources?
Personally I suggest k3s, setting up a test cluster and playing with it. For volume management I use longhorn. It’s a HUGE learning curve, but it’s officially something companies will shell out big money for too if you’re willing to learn it. Soup to nuts from setting up test cluster and playing with it all the way to all of my services running was about 4 months of tinkering for me- but I’ll never go back
Yeah, a PXE boot should work, but you’d need a ton of RAM (I’d double to 32GB for Frigate). Drives are cheap, I’d just get one and not deal with network booting at all.
Exactly. Hell 50 bucks you can get a decent SSD. Just grab something, have all of your drives hosted via NFS, but then you aren’t hacking docker to run in ram all the time, and wasting your ram hosting stuff it doesn’t need to
Hell 50 bucks you can get a decent SSD.
If only it were that easy, I would have already thrown a spare 2.5" into the system, but it’s only got a single nvme slot for local storage.
You can get an NVME drive for <$50, in fact I saw a 128GB one online for ~$15 from a reliable brand (Patriot).
That’s actually doable. Thanks for that friend.
You mention frigate specifically. Were you running this on the system when the drive failed, or is this a future endeavour?
I bring this up because I also use frigate, and for some time I was running with a misconfigured docker compose that drove my SSD wearout to 40% in a matter of months.
Make sure that the tmpfs is configured per the frigate documentation and example config. If misconfigured like mine was, all of that IO is on disk. I believe the ramdisk is used for temp storage of camera streams, until an event occurs and the corresponding clip is committed to disk.
Good luck!
Interesting, it was running on this system, so it may actually have been wear that killed the drive. I’ll have to look into that config and see if it’s worth getting a new nvme to throw into the cook box.
Thanks for that info!
How about running the OS from a USB stick? Put all images you want on it and mount NAS drives at boot.
I’m leery about using a USB for long term persistent OS storage due to lifespan issues I’ve seen when just running a hypervisor from one. A ‘real’ usermode OS is probably going to have a worse lifespan than what I was seeing at work.
I have a raspberry pi running from a microsd (which uses the same kind of tech as a usb stick) for over 5 years with dietpi.
But considering that you think you chewed through an nvme somehow, you may be right.
You could boot over the network and use the NAS for storage, but that’s going to be a lot of work to get running properly, and it’ll be pretty slow too.
Honestly, if you want to run a read-only service from it, it could work, but anything more than a light, immutable host is going to be unpleasant.
Realistically, I just want to have a system that can act as the hardware end point for a coral processor to do image recognition. I don’t need to write a lot on demand, and what was being written previously was all to the NAS (other than the app’s database)
That could work, then! You’d have to set up the boot image or reconfigure it each time (maybe cloud-init and/or ansible), but as a mostly compute node it could work.
My ideal is something more like a netboot-able image that I can modify/recreate and have it pull on next boot. But those options aren’t a bad thought either. I’d just need to have the bootable image configured with the info needed to bootstrap it. I’ve got another VM that’s got a different automation platform running (Powershell Universal), but it would give me an excuse to learn another well known automation platform.
I wouldn’t do this. If you are spending the time to do netboot you might as well get a proper boot drive.
Run a livecd of whatever Linux distro you like, and get a USB drive. Store persistent files on the USB stick.
I don’t want to use a USB for storage, because those aren’t going to have a great lifespan in my experience. I’ve used them as the install media for something like ESX, but I’d rather not run a ‘real’ OS from a disk because I wasn’t impressed with overall lifespan on some of the systems we managed at work.
Then get a proper drive. Maybe a USB adapter?
It’s possible, but I recommend more RAM because you won’t have any swap.
But adding a network boot is a hassle, I recommend buying another drive, it’ll be cheaper than RAM esp since you don’t need much storage.
You can host docker volumes over NFS, but the actual container images need to exist on a filesystem that supports overlay (which NFS does not) unless you want things to be slow as shit. And I really do mean miserably slow. A container image shared over NFS will take forever to spin up because it has to duplicate the entire container filesystem instead of using overlays, and then it’ll blow up your disk usage by copying all these files around instead of overlaying them. It’s truly unusable.
I’ve done something extremely similar with a custom NixOS iso for my docker VMs to make versioning and backups easier (golden image live disk with SSH+Docker+Dockge shared between all VMs + local persistent storage specific to each VM).
You can configure frigate via OCI container with custom config, as well as NFS mounts, SSH server, etc and then have a read-only live disk that boots up, mounts NFS share, and then starts up frigate.
Do you have any info on the custom setup? Sounds like a fun project/learning experience.
And do you mean OCI like oracle cloud?
https://git.mlaga97.space/mlaga97/persistent-live-docker-flake has a builder for a live disk that will mount /dev/sda as ext4 to /persistent, and then start up dockge and whatever containers are present from the previous boot automagically.
OCI as in Open Container Initiative.
I can’t speak to the netboot part personally but I’ve had Docker data folders mounted via an NFS share for a while now, and while it worked fine, I’ve just in the last week or so swapped them all back to local storage for performance reasons (typically anything involving a SQLite database), so depending on what services you’re running via Docker, check that your network speeds aren’t going to be a bottleneck for it. (My home network is only 1G for reference so might not be a problem for you).
Yeah, NFS bind mounts aren’t an issue. The issue I run into is database lock errors when I try to write a database file to the NFS share. I’ve got 1G networking as well and haven’t seen issues accessing regular files from my containers via the bind mounts.
I do it with k3s right now on fedora. I like it personally.
Nice thing if you use k8s settings up persistent net storage with something like longhorn is an option too.
Do you have any resources on that kind of setup? I appreciate that constructive advice!