Summary
Dremio uses a central key-value store (KV store) to maintain data about your catalog, including the state of sources, spaces, folders, VDS and PDS. In Dremio software, RocksDB implements this KV store. The files for RocksDB are stored under the directory {paths.local}/db/catalog
. If files are missing or inaccessible from this directory, you will not be able to start Dremio and you will likely need to restore the application from a backup.
Reported Issue
If you have lost .sst
files, Dremio will cycle through startup attempts, but services such as the web UI will never successfully come online. You will see a message similar to the following, but with different numbered .sst
files:
2023-05-18 04:14:17,113 [main] INFO c.d.datastore.LocalKVStoreProvider - Starting LocalKVStoreProvider
2023-05-18 04:14:17,303 [main] INFO c.d.datastore.LocalKVStoreProvider - Stopping LocalKVStoreProvider
2023-05-18 04:14:17,307 [main] ERROR ROOT - Dremio is exiting. Failure while starting services.
org.rocksdb.RocksDBException: Can't access /000215.sst: IO error: while stat a file for size: /var/lib/dremio/db/catalog/000215.sst: No such file or directory
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:286)
at com.dremio.datastore.ByteStoreManager.openDB(ByteStoreManager.java:295)
at com.dremio.datastore.ByteStoreManager.start(ByteStoreManager.java:232)
at com.dremio.datastore.CoreStoreProviderImpl.start(CoreStoreProviderImpl.java:148)
at com.dremio.datastore.LocalKVStoreProvider.start(LocalKVStoreProvider.java:159)
at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:184)
at com.dremio.dac.daemon.DremioDaemon.main(DremioDaemon.java:141)
Suppressed: java.lang.IllegalStateException: #start was not invoked, so metadataManager is not available
at com.google.common.base.Preconditions.checkState(Preconditions.java:508)
at com.dremio.datastore.ByteStoreManager.getMetadataManager(ByteStoreManager.java:439)
at com.dremio.datastore.ByteStoreManager.close(ByteStoreManager.java:446)
at com.dremio.common.AutoCloseables.close(AutoCloseables.java:139)
at com.dremio.common.AutoCloseables.close(AutoCloseables.java:76)
at com.dremio.datastore.CoreStoreProviderImpl.close(CoreStoreProviderImpl.java:235)
at com.dremio.datastore.LocalKVStoreProvider.close(LocalKVStoreProvider.java:205)
at com.dremio.dac.cmd.upgrade.Upgrade.run(Upgrade.java:187)
... 1 common frames omitted
Relevant Versions
All versions of Dremio software.
Troubleshooting Steps
Check the {paths.local}/db/catalog directory for the presence of the .sst files that are reported as missing.
Cause
RocksDB creates numerous files of several types to maintain it’s state between restarts. You can see these under {paths.local}/db/catalog
(paths.local
is configured in dremio.conf). For example:
$ pwd
/var/lib/dremio/db/catalog
$ ls -ltr
total 4124
-rw-r--r--. 1 dremio dremio 0 Feb 24 18:28 LOCK
-rw-r--r--. 1 dremio dremio 37 Feb 24 18:28 IDENTITY
-rw-r--r--. 1 dremio dremio 1710 Feb 24 18:51 000096.sst
-rw-r--r--. 1 dremio dremio 1578 Feb 24 18:51 000097.sst
-rw-r--r--. 1 dremio dremio 1040 Feb 24 18:51 000098.sst
-rw-r--r--. 1 dremio dremio 1499 Feb 24 18:51 000099.sst
-rw-r--r--. 1 dremio dremio 14606 Feb 24 18:51 000100.sst
-rw-r--r--. 1 dremio dremio 1000 Feb 24 18:51 000101.sst
-rw-r--r--. 1 dremio dremio 1061 Feb 24 18:51 000102.sst
-rw-r--r--. 1 dremio dremio 2948 Feb 24 18:51 000103.sst
-rw-r--r--. 1 dremio dremio 1113 Feb 24 18:51 000106.sst
-rw-r--r--. 1 dremio dremio 1031 Feb 24 18:51 000107.sst
-rw-r--r--. 1 dremio dremio 11331 Feb 24 18:51 000108.sst
-rw-r--r--. 1 dremio dremio 48141 Feb 24 18:51 000110.sst
-rw-r--r--. 1 dremio dremio 1029 Feb 24 18:51 000111.sst
-rw-r--r--. 1 dremio dremio 1249 Feb 24 18:51 000113.sst
-rw-r--r--. 1 dremio dremio 1415 Feb 24 18:51 000114.sst
-rw-r--r--. 1 dremio dremio 146072 Feb 24 18:51 LOG.old.1677264778827386
-rw-r--r--. 1 dremio dremio 2248 Feb 24 18:52 000122.sst
-rw-r--r--. 1 dremio dremio 1494 Mar 30 17:27 000143.sst
-rw-r--r--. 1 dremio dremio 1555275 Mar 30 17:27 000144.sst
-rw-r--r--. 1 dremio dremio 1014 Mar 30 17:27 000145.sst
-rw-r--r--. 1 dremio dremio 3506 Mar 30 17:27 000146.sst
-rw-r--r--. 1 dremio dremio 1238 Mar 30 17:27 000149.sst
-rw-r--r--. 1 dremio dremio 4553 Mar 30 17:27 000150.sst
-rw-r--r--. 1 dremio dremio 4551 Mar 30 17:27 000152.sst
-rw-r--r--. 1 dremio dremio 1284 Mar 30 17:27 000154.sst
-rw-r--r--. 1 dremio dremio 212990 Mar 30 17:27 LOG.old.1684381433143357
-rw-r--r--. 1 dremio dremio 535157 May 18 03:43 000175.sst
-rw-r--r--. 1 dremio dremio 178462 May 18 03:43 LOG.old.1684381458266871
-rw-r--r--. 1 dremio dremio 49824 May 18 03:44 000182.sst
-rw-r--r--. 1 dremio dremio 10073 May 18 03:44 000193.sst
-rw-r--r--. 1 dremio dremio 1513 May 18 03:44 000194.sst
-rw-r--r--. 1 dremio dremio 2182 May 18 03:44 000195.sst
-rw-r--r--. 1 dremio dremio 178112 May 18 03:44 LOG.old.1684381483437281
-rw-r--r--. 1 dremio dremio 2529 May 18 03:44 000210.sst
-rw-r--r--. 1 dremio dremio 1011 May 18 03:44 000212.sst
-rw-r--r--. 1 dremio dremio 10071 May 18 03:44 000213.sst
-rw-r--r--. 1 dremio dremio 2191 May 18 03:44 000214.sst
-rw-r--r--. 1 dremio dremio 1470 May 18 03:44 000215.sst
-rw-r--r--. 1 dremio dremio 178235 May 18 03:44 LOG.old.1684381920535560
-rw-r--r--. 1 dremio dremio 116943 May 18 03:51 OPTIONS-000220
-rw-r--r--. 1 dremio dremio 1469 May 18 03:51 000223.sst
-rw-r--r--. 1 dremio dremio 16 May 18 03:52 CURRENT
-rw-r--r--. 1 dremio dremio 116943 May 18 03:52 OPTIONS-000228
-rw-r--r--. 1 dremio dremio 0 May 18 04:12 000229.log
-rw-r--r--. 1 dremio dremio 1088 May 18 04:12 000230.sst
-rw-r--r--. 1 dremio dremio 10081 May 18 04:12 000233.sst
-rw-r--r--. 1 dremio dremio 1518 May 18 04:12 000232.sst
-rw-r--r--. 1 dremio dremio 2169 May 18 04:12 000234.sst
-rw-r--r--. 1 dremio dremio 1527 May 18 04:12 000235.sst
drwxr-xr-x. 2 dremio dremio 24 May 18 04:12 archive
-rw-r--r--. 1 dremio dremio 7806 May 18 04:12 MANIFEST-000225
-rw-r--r--. 1 dremio dremio 180790 May 18 04:12 LOG
These include:
-
#.sst
- Sorted Sequence Tables (SST) files which store the actual key-value pairs -
#.log
- write ahead logs (WALs) which record key-value writes to RocksDB for later replay, if the system fails. These files are eventually removed or archived when all of the writes they contain are stored in an SST file. -
MANIFEST-#
- a transactional log which snapshots the state of RocksDB for recovery between restarts. -
CURRENT
- a pointer to the latest manifest file. -
LOG
andLOG.old.#
- human-readable text logs which record events in RocksDB.
RocksDB uses an in-memory data structure called a memtable to hold the key-value pairs while Dremio is running, but it frequently flushes them to disk to the SST files. When you start up Dremio, the memtable is loaded back into memory from these SST files. In your typical catalog
directory, you will see many such files and they will take up the majority of the storage. These files contain the data that make up your Dremio catalog. If you have a large catalog, you will see many large SST files.
The MANIFEST
transaction log and the WALs only record a limited period of time. This is why if you lose SST files you likely cannot reconstitute your Dremio catalog.
Steps to Resolve
If you do find you have lost SST files, your best option is to restore from a backup. See "Best Practices" and "Recommendations" below for more information.
RocksDB has a command line tool, ldb, for repairing a database which "does best effort recovery to recover as much data as possible after a disaster without compromising consistency". But "it does not guarantee bringing the database to a time consistent state." Even if this repair is successfully, you still may have lost data.
Best Practices
Make regular backups of your Dremio catalog using our backup utility and periodically verify that you can restore from these backup.
You can also make backups by "cold copying" the db
using cp -R
while the Dremio application is stopped. If you backup this way, you should also periodically check that you can restore from these copies. Point paths.db
in dremio.conf to the copy of the catalog, and verify you can start Dremio.
Recommendations
To avoid this situation, ensure you have reliable storage attached to the Dremio coordinator host for {paths.local}/db
. If you are using NAS with NFS, we have requirements and recommendations in our documentation.
If you know ahead of time that this storage service will be temporarily unavailable, you should plan on stopping the Dremio application before the outage occurs.
If you have the snapshotting available for this storage, consider enabling it.
Never delete SST files or any of the files that make up the Dremio catalog. More generally, avoid manipulating the files in {paths.local}/db
. These should be left for Dremio to maintain, whether through routine operation or our administrative utilities.
Additional Resources
Information about RocksDB implementation and operation can be found on the project's GitHub repository - https://github.com/facebook/rocksdb
RocksDB Repairer - https://github.com/facebook/rocksdb/wiki/RocksDB-Repairer