Summary
Slow coordinator start up time following an upgrade to v25
Reported Issue
After upgrading from Dremio v24.x ( or earlier ) to v25.x, the initial startup will take a very long time to migrate data sources. The server.log will display migration messages like so:
2024-07-18 17:59:56,729 [main] INFO c.dremio.exec.catalog.PluginsManager - Successfully migrate the source [s3_1]. Took 192 milliseconds.
2024-07-18 18:09:21,236 [main] INFO c.dremio.exec.catalog.PluginsManager - Successfully migrate the source [s3_2]. Took 564505 milliseconds.
The initial source will migrate quickly, subsequent sources migrate very slowly.
Relevant Versions
Dremio v25.x when upgrading from a v24.x or earlier release
Troubleshooting Steps
If the initial startup of the coordinator process is slow following migration review the coordinator server.log for instances of the INFO messages referenced in the "Reported Issue" section above.
If the time to migrate the second and subsequent sources is significantly longer than the first source then review the operating system entropy levels by running the following command
$ watch -n 1 cat /proc/sys/kernel/random/entropy_avail
Typically, if you are hitting this problem entropy levels will be in the order of single or low double digit values.
Cause
The migration process of sources involves the encryption of any secret or password that is associated with a source. The process is dependent on operating system entropy. If entropy levels are low or depleted, then the thread processing the encryption is blocked until entropy levels are restored.
This problem has been observed on Redhat/Centos v7 and v8 but can impact other distributions and versions depending on the kernel level in use.
In later linux kernels, changes were introduced to maintain constant levels of entropy and eliminate problems such as this. Typically if these changes have been introduced into your kernel the entropy levels will remain constant at 256.
Steps to Resolve
To overcome this you have three options
1. Install a package called "rng-tool" and then run "rngd" to keep entropy levels high:
Redhat / Centos
$sudo yum install -y rng-tool $sudo systemctl enable rngd.service $sudo systemctl start rngd.service
Ubuntu / Debian
$sudo apt install rng-tool $sudo rngd -r /dev/urandom
2. Upgrade to Dremio v25.0.8 which introduced changes (DX-92906) to make the encryption process non-blocking to avoid slow migration of sources when entropy levels are low.
3. Upgrade the operating system / Kernel version. This problem has not been observed in RH9 or Ubuntu 22.04.
Additional Resources
Entropy : https://en.wikipedia.org/wiki/Entropy_(computing)
Random Number Generator Enhancements ( Linux ) : https://www.zx2c4.com/projects/linux-rng-5.17-5.18/
Entropy in RHEL based cloud instances: https://developers.redhat.com/blog/2017/10/05/entropy-rhel-based-cloud-instances#methods_to_improve_entropy_in_cloud_instances