Summary
When restoring a project to a new environment in AWSE, after selecting a project to restore, the build appears to hang starting services.
Reported Issue
When restoring a project in AWSE after the initial environment build has completed, the initializing phase appears to hang in the AWSE console when starting services, at this point:
The instance shows as up in the EC2 console, and you can SSH to the remote node. The server.log is paused at the message:
2022-07-21 13:21:44,558 [main] INFO c.d.datastore.LocalKVStoreProvider - Starting LocalKVStoreProvider
Relevant Versions
All AWSE versions
Troubleshooting Steps
As per "Reported Issue"
Cause
Dremio is in fact starting up. The RocksDB KVStore is initialising and replaying the Write Ahead (WAL) logs from the previous project to restore metadata.
Steps to Resolve
To confirm this, check the following:
1. Under /var/dremio_ebs/db/catalog/, count the number of .log files - these are the WAL logs to be replayed. This will give you some indication of how long this phase will take
2. Run a tail on the LOG file in the same location. This is the RocksDB system log, and will show actions taking place
3. Run a lsof on the dremio PID. You should see the log currently being replayed in the output, ie:
After the WAL logs finish replaying, the startup will continue
Recommendations
N/A
Additional Resources
https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log