Overview
When restoring a project to a new environment in AWSE, after selecting a project to restore, the build appears to hang starting services.
Applies To
All AWSE hosted environments where restoring a project is being carried out
Scenario
When restoring a project in AWSE after the initial environment build has completed, the initializing phase appears to hang in the AWSE console when starting services, at this point:
The instance shows as up in the EC2 console and you are able to SSH to the remote node. The server.log is paused at the message:
2022-07-21 13:21:44,558 [main] INFO c.d.datastore.LocalKVStoreProvider - Starting LocalKVStoreProvider
Corrective Action
Dremio is in fact starting up. As the long message indicates, the RocksDB KVStore is initializing and replaying the Write Ahead (WAL) logs from the previous project to restore metadata.
To confirm this, check the following:
1. Under /var/dremio_ebs/db/catalog/, count the number of .log files - these are the WAL logs to be replayed. This will give you some indication of how long this phase will take
2. Run a tail on the LOG file in the same location. This is the RocksDB system log, and will show actions taking place
3. Run a lsof on the dremio PID. You should see the log currently being replayed in the output, ie:
After the WAL logs finish replaying, the startup will continue
Further Reading
https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log