Summary
With AWSE deployments there is a need to correct the /etc/fstab file to use UUIDs to avoid higher-than-normal disk usage increases.
Reported Issue
The AWSE deployment uses kernel device names to mount /mnt/c1
which can cause issues on reboot as the /var/dremio_ebs
directory can end up on the same device and users can unexpectedly fall short on disk space.
Our AWSE /etc/fstab
typically looks like this:
UUID=e6c06bf4-70a3-4524-84fa-35484afc0d19 / xfs defaults,noatime 1 1 2/dev/nvme1n1 /mnt/c1 ext4 defaults,nofail 0 0 3UUID=1c75c74a-96ce-4110-b308-d9864423d5c9 /var/dremio_ebs ext4 defaults,nofail 0 0 10.10.10.126:/ /var/dremio_efs nfs nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport 0 0
The mount points look like this from lsblk
$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme1n1 259:0 0 279.4G 0 disk /mnt/c1 nvme0n1 259:1 0 50G 0 disk ├─nvme0n1p1 259:2 0 50G 0 part / └─nvme0n1p128 259:3 0 1M 0 part nvme2n1 259:4 0 50G 0 disk /var/dremio_ebs
From mount
/dev/nvme1n1 on /mnt/c1 type ext4 (rw,relatime,data=ordered) /dev/nvme2n1 on /var/dremio_ebs type ext4 (rw,relatime,data=ordered)
If the instance is rebooted, it could become mounted on the same device:
/dev/nvme1n1 on /var/dremio_ebs type ext4 (rw,relatime,data=ordered) /dev/nvme1n1 on /mnt/c1 type ext4 (rw,relatime,data=ordered)` $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme1n1 259:0 0 279.4G 0 disk /mnt/c1 nvme0n1 259:1 0 8G 0 disk ├─nvme0n1p1 259:2 0 8G 0 part / └─nvme0n1p128 259:3 0 1M 0 part
Relevant Versions
All versions of Dremio AWSE deployments.
Troubleshooting Steps
1. Login to the coordinator node for the AWSE cluster
2. Run the lsblk command to confirm the /var/dremio_ebs volume is no longer displayed under the device: /dev/nvme2n1
4. Run the mount command to confirm the below entry is seen for the /var/dremio_ebs volume where it is now under the device /dev/nvme1n1
per the above details in the reported issue section.
/dev/nvme1n1 on /var/dremio_ebs type ext4 (rw,relatime,data=ordered)
Cause
This issue is currently being tracked by Dremio Engineering, and at the time of writing, this issue is still pending a fix.
Steps to Resolve
Use sudo blkid
to identify your UUID for /dev/nvme1n1
and then edit the /etc/fstab
From this:
UUID=e6c06bf4-70a3-4524-84fa-35484afc0d19 / xfs defaults,noatime 1 1 /dev/nvme1n1 /mnt/c1 ext4 defaults,nofail 0 0 UUID=1c75c74a-96ce-4110-b308-d9864423d5c9 /var/dremio_ebs ext4 defaults,nofail 0 0 10.10.10.126:/ /var/dremio_efs nfs nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport 0 0
To this:
UUID=e6c06bf4-70a3-4524-84fa-35484afc0d19 / xfs defaults,noatime 1 1 UUID=fd8e8e9d-8501-42cb-90b5-b8690e365806 /mnt/c1 ext4 defaults,nofail 0 0 UUID=1c75c74a-96ce-4110-b308-d9864423d5c9 /var/dremio_ebs ext4 defaults,nofail 0 0 10.10.10.126:/ /var/dremio_efs nfs nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport 0 0
Tips & Tricks
The following commands are very helpful in identifying the device and volume related information for the AWSE disk mount.
lsblk
sudo blkid
mount
cat /etc/fstab
Additional Resources
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html