Skip to content

Instantly share code, notes, and snippets.

@Jip-Hop
Last active August 27, 2024 16:40
Show Gist options
  • Save Jip-Hop/4704ba4aa87c99f342b2846ed7885a5d to your computer and use it in GitHub Desktop.
Save Jip-Hop/4704ba4aa87c99f342b2846ed7885a5d to your computer and use it in GitHub Desktop.
Persistent Debian 'jail' on TrueNAS SCALE to install software (docker-compose, portainer, podman, etc.) with full access to all files via bind mounts. Without modifying the host OS at all thanks to systemd-nspawn!
@Jip-Hop
Copy link
Author

Jip-Hop commented Jan 25, 2023

@qudiqudi Sorry I don't know what you mean with regards to host path verification and nfs/smb. Can you elaborate?

@sprint1849 Why do you want to do VLAN in there? Can't you just use host networking and use docker to make separate networks for your docker apps? That's what I do, with docker compose (Portainer). Are you saying this doesn't work when you're using the macvlan option of systemd-nspawn?

Thanks for testing!

@qudiqudi
Copy link

Sorry I don't know what you mean with regards to host path verification and nfs/smb. Can you elaborate?

Sure, it's essentially a measure to prevent data loss and permission issues, and was implemented on TrueNAS SCALE 22.12 Bluefin. Although it is present even back on CORE with iocage jails, it now came to light with Bluefin because iX is now forcing you to activate host path verification by default and doesn't give support to users that have it disabled.
It basically ensures that a mount path is only used by one app (Kubernetes) at a time. This means the same path cannot be used by a SMB or NFS share. For plex users i.e. this is very unfortunate. You can't access your plex library via smb now. There are ways to circumvent this but it's always a compromise.

@Ixian
Copy link

Ixian commented Jan 25, 2023

Sorry I don't know what you mean with regards to host path verification and nfs/smb. Can you elaborate?

Sure, it's essentially a measure to prevent data loss and permission issues, and was implemented on TrueNAS SCALE 22.12 Bluefin. Although it is present even back on CORE with iocage jails, it now came to light with Bluefin because iX is now forcing you to activate host path verification by default and doesn't give support to users that have it disabled. It basically ensures that a mount path is only used by one app (Kubernetes) at a time. This means the same path cannot be used by a SMB or NFS share. For plex users i.e. this is very unfortunate. You can't access your plex library via smb now. There are ways to circumvent this but it's always a compromise.

This isn't a problem with either the "native" docker workaround or this solution here which leverages a SystemD jail. I just mount my entire data directory to the jail and my containers running inside the jail can access everything I allow them to fine.

@qudiqudi
Copy link

This isn't a problem with either the "native" docker workaround or this solution here which leverages a SystemD jail. I just mount my entire data directory to the jail and my containers running inside the jail can access everything I allow them to fine.

The question is not whether they are allowed to but rather if it carries any risks to the data due to permission issues. Are ACLs on the host datasets overwritten by jails that access the same data?

@sprint1849
Copy link

sprint1849 commented Jan 26, 2023

@qudiqudi Sorry I don't know what you mean with regards to host path verification and nfs/smb. Can you elaborate?

@sprint1849 Why do you want to do VLAN in there? Can't you just use host networking and use docker to make separate networks for your docker apps? That's what I do, with docker compose (Portainer). Are you saying this doesn't work when you're using the macvlan option of systemd-nspawn?

Thanks for testing!

Hi, thanks for the suggestion. So I went to your suggestion to use host networking, and from then on, vlan my apps inside the jail using the host vlan interface. Its working great. The other thing I mentioned that doesn't work is this setup:
host vlan interface entered in the script's macvlan

Before I'll use this for production, I have some observations to share, and I need your advice.

In this setup, I notice that docker daemon is not started in jail if the host unselected pool in the apps settings. So I wonder if the jail still depends on host docker?

How safe is this jail thing? Sorry, I dont understand much the system. It seems quite powerful when using jail shell.

Stability? I guess this will be answered by us testers.

Will this survive future updates especially when truenas moves to containerd?

I like the implimentation that this doesn't use vm and could bind dataset. Kubernetes is so complicated for me. I just want segregated apps thru vlan, and this is easily achievable with your script thanks to you, using docker with portainer is possible in truenas scale without much hacking.

Edit:
I'd like to ask if its possible to impliment a vlan per jails in your script? Where you create vlan in truenas network, and attach the jails to such vlans?

@Jip-Hop
Copy link
Author

Jip-Hop commented Jan 28, 2023

@Ixian Do you also see these warnings when running docker info inside the jail (when using host networking)?

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

And were you able to confirm that this method of mounting the nvidia drivers from the host works? 😄

I started over with a new version of the script. I think it's turning out quite nice. 🙂 Will post as a proper git repo with readme soon.

@qudiqudi
Copy link

Looking forward to that git. This has huge potential. Works pretty well for me so far.

@Jip-Hop
Copy link
Author

Jip-Hop commented Jan 28, 2023

Here's the new script. Looking forward to your feedback!

@Jip-Hop
Copy link
Author

Jip-Hop commented Jan 29, 2023

@sprint1849

In this setup, I notice that docker daemon is not started in jail if the host unselected pool in the apps settings. So I wonder if the jail still depends on host docker?

It doesn't depend on the host docker. But you may be running into this issue.

Please check out the new script. It works a bit differently than the one in this gist, but has better documentation. Should also fix the above issue.

How safe is this jail thing? Sorry, I dont understand much the system. It seems quite powerful when using jail shell.

The jail has its own root filesystem and you can install packages in it without conflicting with packages on the host. But the jail also share resources, the kernel, bind-mounts, networking etc. with the host. The jail is not fully isolated. So I can't really answer this question without knowing what you want it to be safe from. 😄

Will this survive future updates especially when truenas moves to containerd?

Yes! That's kind of the point of jailmaker. 🙂

Edit:
I'd like to ask if its possible to impliment a vlan per jails in your script? Where you create vlan in truenas network, and attach the jails to such vlans?

I don't think there's a need to implement anything as jailmaker allows you to use additional systemd-nspawn options. Have a look at the Networking Options. Perhaps my Advanced Networking notes are helpful to you too.

@Jip-Hop
Copy link
Author

Jip-Hop commented Jan 29, 2023

@qudiqudi

The question is not whether they are allowed to but rather if it carries any risks to the data due to permission issues. Are ACLs on the host datasets overwritten by jails that access the same data?

Sorry I don't know the answer to that. I don't use ACLs on the files I mount inside the jail. Perhaps you could try it out and report your findings?

@Ixian
Copy link

Ixian commented Jan 29, 2023

@Ixian Do you also see these warnings when running docker info inside the jail (when using host networking)?

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

And were you able to confirm that this method of mounting the nvidia drivers from the host works? 😄

I started over with a new version of the script. I think it's turning out quite nice. 🙂 Will post as a proper git repo with readme soon.

I was not getting those warnings no.

Going to try your new script in a bit :)

@Jip-Hop
Copy link
Author

Jip-Hop commented Feb 4, 2023

Jailmaker is now written in Python instead of bash.

@qudiqudi
Copy link

qudiqudi commented Feb 5, 2023

@Jip-Hop just switched to the python version. Works very well! Thanks!
Why did you decide against the option to autom. install docker in the install wizard? Or are you planning on putting it back in?

@Jip-Hop
Copy link
Author

Jip-Hop commented Feb 5, 2023

Thanks for letting me know :) Have you tried what happens with ACLs?

Jailmaker initially installed just debian bullseye, so it was easy for me to implement the official steps to install docker. But now you may choose from different distros. In order for me to automate installing docker on those distros I'd basically have to write code which does whatever get.docker.com does, or I could run the get.docker.com script from Jailmaker. But docker doesn't recommend installing with the get.docker.com script and they also advise against 'blindly' downloading and running that script (which would happen if Jailmaker runs it).

So I decided to keep Jailmaker small. Less code makes it easier for people to decide if they can trust it. And it's less maintenance for me :) Installing docker is still just a few commands away. But by not automating it with Jailmaker everyone can decide for themselves to follow the official install guide or use the install script.

I think Jailmaker should stick to whatever is specific about TrueNAS and systemd-nspawn and leave whatever is generic to the user.

How did you install docker inside the jail?

@Jip-Hop
Copy link
Author

Jip-Hop commented Feb 24, 2023

@Ixian have you tried to use the latest script which mounts the nvidia driver files from the host? Still curious if you can confirm this approach works well.

@Ixian
Copy link

Ixian commented Feb 24, 2023

@Ixian have you tried to use the latest script which mounts the nvidia driver files from the host? Still curious if you can confirm this approach works well.

I’m still using the older one but I will take a look this weekend; been meaning to update.

@Talung
Copy link

Talung commented Feb 28, 2023

Quick question about this approach. Is is possible to run docker in the jail and still run the IX apps as normal, or like that last approach mutually exclusive?

Am looking to migrate to Jailmaker, but was thinking of the possibility of trying the Kubernetes where possible.

@Jip-Hop
Copy link
Author

Jip-Hop commented Feb 28, 2023

Jailmaker should not interfere with the IX kubernetes Apps :)

@Talung
Copy link

Talung commented Feb 28, 2023

Jailmaker should not interfere with the IX kubernetes Apps :)

Excellent news. Best of both worlds. Hopefully they will leave Jailmaker alone :) I will try transition to this soon to add to your testing data. Thanks :)

@qudiqudi
Copy link

@Talung you can even use both in conjunction. I have every service inside jails reverse proxied through the Truecharts Traerfik app via the "external service" app. Though, I plan to ditch k3s altogether, but it works very well.

@Talung
Copy link

Talung commented Feb 28, 2023

Hi there,
Noob question with binding mounts. I have been reading through the previous comments to find out about binding some folders to the new machine. I have docker setup and its working with the "hello World" thing.

My jail is called "dockerjail" otherwise pretty much followed all the instructions including docker install recommendations.

With systemd-nspawn which I am assuming you run in a truenas shell, I am unsure of the -D option. I keep getting doesn't look like an OS root directory (os-release file is missing). Refusing. Should this be pointing to the actual jail rootfs folder. ie /mnt/pond/jailmaker/jails/dockerjail/rootfs/

Also, once this is mounted is it something that needs to be done on every subsequent reboot?

systemd-nspawn --capability=all -b -D "????/mnt/pond/jailmaker/jails/dockerjail/rootfs????" --system-call-filter='add_key keyctl bpf' --bind=/mnt/pond/appdata:/mnt/lake/media:/mnt/lake/cloud

Thanks

@Jip-Hop
Copy link
Author

Jip-Hop commented Feb 28, 2023

Hi. You should not need to run systemd-nspawn yourself. The jlmkr.py script takes care of that. You should create a jail with jlmkr.py create. Then also specify the bind mounts when the wizard asks you to. Then run jlmkr.py start dockerjail on each reboot. Hope that makes sense.

@Talung
Copy link

Talung commented Feb 28, 2023

ah ok. Must have missed that part of the setup. Or maybe I skipped it just to get it working for testing. Is there a way to update it now, or should I just make a new one. Thing is, I would probably run into situations where I would want to add more mounts in at a later stage.

I have everything else setup with no issues. So script and documentation was fine in that respect. thanks

EDIT. went through install again and saw what was happening. It is way easier than I thought, just need to modify the config file and add: systemd_nspawn_user_args=--bind='/mnt/pond/appdata/' for the various folders or --bind-ro=

Added this part for anybody else that runs into the same issue. :)

@Jip-Hop
Copy link
Author

Jip-Hop commented Feb 28, 2023

Exactly! Glad you got it solved :)

@Talung
Copy link

Talung commented Mar 1, 2023

Quick update for you. Everything is working great. Been able to move all the native docker stuff into the jail without really needing to change much. Even the "internal networking" I setup went through just fine ie. in portainer "networks". Haven't need to use Traefek or anything like that, works as it did before. I haven't tried any nvidia passthrough yet and have noticed that you have a recent update on that, so will look at that at a later stage.

I did notice that there is no daemon.json file in the container and am assuming I just need to add in my copy from "enable-docker.sh" in there. I haven't done that yet.

Also, I have noticed all the images are stored with the jail, which means if I recreate the jail I would need to re-download all the images again. All my data is outside of it, so not really an issue tbh, except during development stages.

Just wanted to add my thanks to you for doing this. 👍

@Jip-Hop
Copy link
Author

Jip-Hop commented Mar 1, 2023

Good to hear! And convenient that you have a nvidia GPU :) Indeed still in the process of getting that working properly.

I don't know what you have in the daemon.json. I don't think you need to do anything with it when running docker inside the jail. I know I haven't touched that file myself since switching.

I think you can also bind mount the directory where docker stores the images to a directory on the host. That way when you recreate a jail all the images are still downloaded. But since you also need to reinstall docker in this fresh jail perhaps that won't work the way I think/hope it does.

@Talung
Copy link

Talung commented Mar 1, 2023

Essentialy my last daemon.json was

{
  "data-root": "/mnt/pond/dockerset",
  "storage-driver": "overlay2",
  "exec-opts": [
    "native.cgroupdriver=cgroupfs"
  ],
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
      }
    }
}

But, as you mention with the bind mounts, if I add that dockerset, then it would make the jail extremely thin. Meaning the only thing I would need to do when creating it is a small script to install docker and setup the various files. That should be easy enough. Will be trying that soon with the nvidia changes and seeing if I can get that working. Got an old GTX 1070 in there.

Btw, I take it the quick/correct way to remove a jail is just to remove the folder after it is stopped?

EDIT: Interestingly enough this is what TrueNAS has changed the daemon.json file to:

{
	"data-root": "/mnt/pond/ix-applications/docker", 
        "exec-opts": ["native.cgroupdriver=cgroupfs"], 
	"iptables": false, 
	"bridge": "none", 
	"storage-driver": "overlay2"
}

No mention of nvidia in there.

@Jip-Hop
Copy link
Author

Jip-Hop commented Mar 1, 2023

Yes, you can just stop the jail and then remove the corresponding directory from the jails directory. I'd add a sleep of 1 second in between those 2 commands. Noticed the rm would fail because some files were not unmounted yet immediately after stopping the jail with machinectl stop.

@ZestyChicken
Copy link

2 questions for you guys - can you pull your TrueNAS Scale users into the jail and work with consistent users between the jail and Scale host?

For example, I have an apps user/group in Scale which has ownership of my media files (set up for TrueCharts). It looks like there is a flag -u that may do that but the language is confusing.

Also, if I wanted to expose port 80 & 443 for traefik, I know I can create an alias for my primary ip address but would this IP need to be attached to the jail via host or bridge networking? I set up a bridge but was unable to deploy traefik presumedly because it was complicting with the TrueNAS GUI.

@qudiqudi
Copy link

2 questions for you guys - can you pull your TrueNAS Scale users into the jail and work with consistent users between the jail and Scale host?

You have to translate your TN users into unique Users/Groups inside the jail. You can map them to the same UID/GID but I would advise against it to prevent permission issues.

Also, if I wanted to expose port 80 & 443 for traefik, I know I can create an alias for my primary ip address but would this IP need to be attached to the jail via host or bridge networking?

There is a networking section of the readme.md, read it carefully, as it gives concise yet direct instructions on how to do that. If you want to reverse proxy via traefik, your jail needs an IP address from your DHCP server. This is possible via a network bridge on the TN host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment