Restarting TrueNAS and Portainer probably isn’t a great title for this blog, but it will pop-up as a reminder for me to simply restart the NFS service after I’ve done this and hopefully avoid a few wasted hours trying to work out the problem!
The reason for restarting TrueNAS was simple. I was running 2 x 480GB SSDs in a mirrored pool for my jails and databases. It reduces fragmentation on the main pool as there are fewer block changes, and running from SSD gives a bit of a performance boost for general operation.
To get the full benefit, it is recommended to keep the Used Space below 50% and I’d started to bump into that limit. The drives were used when I configured the pool, and while it’s worked perfectly fine, I’ve been considering an upgrade to 2 x 1TB drives for a little while.
Step forward Amazon and their Spring Sale and some Samsung storage reduced by 53% so I managed to pick up two 860 EVO 1TB drives for less than the price of one! This would take me back to around 25% so give me some room to play around with moving one or two of the virtual machine zvols across to SSD to see if this also improves performance.
So my aim this afternoon was to replace the two drives and get everything back up and running. I would tackle the zvol issue at a later date, or so I thought. Replacing the drives was an incredibly simple process, and one I even managed without switching off my TrueNAS box.
My experience of ‘replacing’ drives in the past has been for failing drives, where you take the drive offline, power down and switch the two drives over, reboot and then replace the drive in the WUI. The manual on TrueNAS has been redone from the FreeNAS version, and it’s really poor in comparison. It didn’t even cover this process, so I doubled checked on the forum first. Someone suggested doing the ‘replace’ without taking the old drive offline, which I hadn’t even considered.
I found a MOLEX-SATA power cable and 2 SATA data cables in the garage, and my motherboard had 2 spare SATA ports so it was simply a case of connecting the drives up and clicking replace. I actually did them one at a time, although could have done them in parallel. This approach did leave me with excess cables and nowhere the mount the drives, so once everything had resilvered (only about 15 minutes per drive) I decided to shut down to remove and rearrange the new drives.
So this is where we get to the crux of the blog. I restarted TrueNAS and initially, everything looked fine. I’d accidentally powered off my switch (which lives on top of my TrueNAS box) so the WUI didn’t come back online and I needed to restart my Raspberry Pi (that lives on top of the switch!) that’s running Pi-hole as a full DNS server, along with NGINX Proxy Manager.
Once that was done, my jails were all accessible externally and it looked like all my VMs has restarted fine. It was only when I checked Portainer than I realised (or should that be thought?) I had a bigger problem. I couldn’t access Portainer and a number of Docker containers, but Bitwarden and Mattermost were running. I should have known the problem at this point, as I’ve been here before, but I’d forgotten so spent the next few hours trying different things.
One of those was moving the zvol disk across onto the SSD pool, using a pretty simple zfs snapshot and replication:
zfs snapshot pool/dataset/zvol@migrate zfs send pool/dataset/zvol@migrate | zfs recv new_pool/dataset/zvol
This moved the zvol across to the SSD pool and all I needed to do to use it was change the virtual disk location in the VM settings. I’m not sure why I thought this might fix the problem, as I was just taking an exact copy of the virtual disk, but I tried it anyway, and surprise surprise, it still didn’t work!
It did give me some comfort for playing around trying to fix things though, as I knew I still had the old virtual disk on the main pool if I completely screwed things up. For some reason I thought the problem was NGINX related and the service wasn’t starting in the ubuntu VM, but after installing and checking the configuration I think I worked out it was never running!
I updated the Ubuntu packages (still running 18.4) which also didn’t fix the issue, but did mean I have a more up to date system. Frustration was setting in with my lack of Docker knowledge and my need to have Portainer running to try and diagnose any problems. Not good when Portainer won’t start!
I can’t remember now what flashed the solution back into my brain, but when it did I felt so stupid. I’ve been here a couple of times before, and after restarting TrueNAS, I’ve needed to restart a couple of services to get everything working again. The one causing the problem here was the NFS service, as I’m using that within the VM to access to Docker dataset on TrueNAS for storing all the container configuration outside the VM.
For some reason, that’s not writable after a restart, but simply clicking the slider to restart the service and everything was back up and running, including Portainer. The other service is SMB, although my network shares were all working fine, so I didn’t need to restart that one!
That’s quite a long blog to simply remind me to restart the NFS service, but given I’d manage to achieve quite a few other things thought it might be of interest. My Docker VM is now running from the SSD pool, or at least the disk is. The ubuntu packages are also up to date. I might consider moving the Docker dataset across to the SSD pool too, as this should help with performance, although that can wait for another day/blog.