This particular scenario assumes that we want local root user (which is typically used by Linux system services such as Docker) on various kerberized NFSv4 clients to be able to access and create files on a kerberized share. We also want files and directories created by local root user of one client to be operable on by local root user of other clients. The typical use case would be a cluster of Docker or Kubernetes nodes that all need access to the same data.
Following these suggestions REQUIRES that all your Linux machines are joined to Active Directory as described in this article and your NFSv4 configuration is done in accordance with this one. The resulting configuration will allow for local root user on your client machines to automatically and transparently authenticate against AD using machine account credentials stored in your krb5.keytab. DO NOT run kinit manually as local root at any point or you will likely screw up your authentication context.
Put AD machine accounts of clients (rockytest and docker01 in this example) we want to grant access to data into an AD group, let’s call it NFS_docker. Then we do the following server-side:
chmod 770 /nfs-export chgrp NFS_docker /nfs-export chmod g+s /nfs-export
At this point, new files and directories created under /nfs-export on the server will be inheriting the NFS_docker group. Note that for a few moments after setup, I noticed that while server-side the group showed up correctly on newly created files immideately, clients that had the NFS share already mounted displayed group nobody (Rocky client) and group 4294967294 (Ubuntu client) for a few minutes. In the few minutes I had spent googling about this, the problem had resolved itself without any intervention, meaning SSSD had done it’s magic in resolving the gids from AD.
Now we need to solve the issue of umasks. Default umasks for both Redhat and Debian family distributions not only differ, neither distro family allow for other group members to modify files created by one another by default. While in theory you could enforce umasks client-side, we are doing this for local root users and enforcing 770 directory and 660 file permissions for all files created by root is a massive security disaster waiting to happen, so we are going to do the enforcement by creating an ACL server-side:
setfacl -d -m u::rwx /nfs-export
setfacl -d -m g::rwx /nfs-export
setfacl -d -m o::- /nfs-export
After the change, new files and directories created by local root of kerberized clients on the share look like this:
drwxrws--- 2 rockytest$ nfs_docker 6 Jul 1 23:30 testdir1
drwxrws--- 2 docker01$ nfs_docker 6 Jul 1 23:31 testdir2
-rw-rw---- 1 rockytest$ nfs_docker 0 Jul 1 23:30 testfile1
-rw-rw---- 1 docker01$ nfs_docker 0 Jul 1 23:31 testfile2
Now we have NFS client machines belonging to a mutual AD group being capable of operating on each others data. At this point, instead of declaring the nfsv4 mount in every clients fstab, we will use autofs to configure on-demand share access. This eliminates the risk of a stalled mount process during boot due to lack of networking or Kerberos ticket as well as reduces unneeded strain on the network by only keeping up an NFS connection that is actually being used for something.
First we install the autofs package for our distribution on the client, then we define a direct automount map file in our autofs master configuration file /etc/auto.master:
/- /etc/auto.clientmount
Then we create the abovementioned /etc/auto.clientmount map file with the following configuration:
/path/to/local/mountpoint -fstype=nfs4 nfs.server.fqdn:/nfs-export/
The “clientmount” map name is an arbitrary example and you can name it however you want. Unmount and disable any previously existing mount created manually or via fstab on the clients, then enable and start the service (or restart it if it was already running):
systemctl enable autofs
systemctl start autofs
At this point, you might try running “df” or “ls /path/to/mount” and wonder why nothing is showing up. This is by design. Neither command actually uses the filesystem to the point of autofs actually enabling the defined mount. Try to cd into the mount path or create a new file in it and suddenly the mount appears. Depending on your distribution, the mount will autoremove itself after 5-10 minutes of inactivity.
If you are having trouble, need to debug and journalctl isn’t being too helpful:
Stop the autofs daemon:
systemctl autofs stop
Run automount in the foreground with verbose information:
automount -f -v
Open another terminal and try accessing the mount path and watch the first terminal for errors.
Notes:
When using Docker or Kubernetes on top of autofs, special consideration must be given to your container volume mount configuration or you risk running into “Too many levels of symbolic links” issue that seems well-documented online. Docker needs “:shared” to be included in volume mount configuration and there are various solutions for Kubernetes as well. You could, obviously, take another approach and skip autofs altogether, keeping your NFS storage permanently mounted on all nodes and script some some sort of delay into the NFS mount process to avoid potential boot stalls.
ACLs in modern (RHEL 8.x, Ubuntu 20.04) distributions seem to mostly “just work”. Contrary to most guides found online, there is apparently no longer any need to deliberately enable acl support via filesystem mount options. Neither on the server nor the clients.
Utilities provided by the nfs4-acl-tools package such as nfs4_getfacl and nfs4_setfacl will ONLY work from client-side. We are using setfacl directly on the NFS server.
You REALLY don’t want to kinit manually as local root on the clients once everything is running / otherwise screw with contents of root’s KEYRING:persistent:%{uid} ccache.
Not really sure why, but after everything is configured and working, running “klist” as local root on RHEL-based clients will show output similar to the following:
Ticket cache: KCM:0:53219 Default principal: ROCKYTEST$@SYSTEMS.DANCE Valid starting Expires Service principal 01/01/1970 02:00:00 01/01/1970 02:00:00 Encrypted/Credentials/v1@X-GSSPROXY:
While running “klist” on an Ubuntu client will result in:
klist: Credentials cache keyring 'persistent:0:0' not found
Yet, both clients have access.