Overview
The default logging configuration for Dremio software writes INFO-level messages from the Source Metadata Manager service to a file called metadata_refresh.log. Often we will want to see more detailed logging about metadata refresh, including the times when individual tables are refreshed. To accomplish this, we can enable DEBUG-level logging for the Source Metadata Manager and Metadata Synchronizer modules.
Applies To
All versions of Dremio software.
Details
The default INFO messaging in metadata_refresh.log tells us when each source is background refreshed and a summary of how many tables were affected by the refresh. For example, here’s an entry for Dremio’s “Sample” source:
2023-06-27 05:59:23,662 [metadata-refresh-modifiable-scheduler-18] INFO c.d.e.catalog.SourceMetadataManager - Source 'Samples' refreshed details in 4 seconds. Details:
Shallow probed 0 datasets: 0 added, 0 unchanged, 0 deleted
Deep probed 3 queried datasets: 3 changed, 0 unchanged, 0 deleted, 0 unreadable
If we want to know which tables were changed here, we have to enable the DEBUG logging for the Metadata Synchronizer by adding this logger to logback.xml:
<logger name="com.dremio.exec.catalog.MetadataSynchronizer" additivity="false">
<level value="debug" />
<if condition='isDefined("dremio.log.path")'>
<then>
<appender-ref ref="metadata_refresh"/>
</then>
</if>
</logger>
The log will then tells us the specific tables being refreshed, when that occurred and how long it took. For example:
2023-06-27 06:08:19,280 [metadata-refresh-modifiable-scheduler-16] DEBUG c.d.e.catalog.MetadataSynchronizer - Source 'Samples' sync setup (3 datasets)
2023-06-27 06:08:19,280 [metadata-refresh-modifiable-scheduler-16] DEBUG c.d.e.catalog.MetadataSynchronizer - Source 'Samples' sync started
2023-06-27 06:08:19,280 [metadata-refresh-modifiable-scheduler-16] DEBUG c.d.e.catalog.MetadataSynchronizer - Source 'Samples' syncing datasets
2023-06-27 06:08:21,255 [metadata-refresh-modifiable-scheduler-16] DEBUG c.d.e.catalog.MetadataSynchronizer - Dataset 'Samples."samples.dremio.com"."NYC-taxi-trips"' sync took 1380 milliseconds
2023-06-27 06:08:23,118 [metadata-refresh-modifiable-scheduler-16] DEBUG c.d.e.catalog.MetadataSynchronizer - Dataset 'Samples."samples.dremio.com".tpcds_sf1000.catalog_sales' sync took 1264 milliseconds
2023-06-27 06:08:23,456 [metadata-refresh-modifiable-scheduler-16] DEBUG c.d.e.catalog.MetadataSynchronizer - Dataset 'Samples."samples.dremio.com"."SF_incidents2016.json"' sync took 136 milliseconds
2023-06-27 06:08:23,457 [metadata-refresh-modifiable-scheduler-16] DEBUG c.d.e.catalog.MetadataSynchronizer - Source 'Samples' recursively deleting orphan folders
2023-06-27 06:08:23,457 [metadata-refresh-modifiable-scheduler-16] DEBUG c.d.e.catalog.MetadataSynchronizer - Source 'Samples' sync ended. Took 4176 milliseconds
2023-06-27 06:08:23,457 [metadata-refresh-modifiable-scheduler-16] INFO c.d.e.catalog.SourceMetadataManager - Source 'Samples' refreshed details in 4 seconds. Details:
Shallow probed 0 datasets: 0 added, 0 unchanged, 0 deleted
Deep probed 3 queried datasets: 3 changed, 0 unchanged, 0 deleted, 0 unreadable
We will also want to capture detailed information about what the Source Metadata Manager is doing as well. To do this, locate the logger for this service — which should already be in logback.xml — and change the log level from info to debug. The resulting logger should look like this:
<logger name="com.dremio.exec.catalog.SourceMetadataManager" additivity="false">
<level value="debug"/>
<if condition='isDefined("dremio.log.path")'>
<then>
<appender-ref ref="metadata_refresh"/>
</then>
</if>
</logger>
One particularly useful outcome of this debug logging, is that it records when an individual table is refreshed through a user-issued SQL command. For example, I could refresh the NYC taxi dataset:
ALTER PDS Samples."samples.dremio.com"."NYC-taxi-trips" REFRESH METADATA;
With the Source Metadata Manager set to DEBUG-level, this is logged in metadata_refresh.log like so:
2023-06-27 06:25:36,742 [1b657f1e-a799-b879-e9be-06bfd9837400/0:foreman-planning] DEBUG c.d.e.catalog.SourceMetadataManager - Dataset 'Samples."samples.dremio.com"."NYC-taxi-trips"' is being synced (exists: true, isExtended: true)
Please note that, unless you’ve enabled dynamic logging, you will have to restart the Dremio coordinator to apply these changes. Therefore, plan for a short amount of downtime when adding this (or any) debug logging.
Further Reading
How to enable dynamic logging in Dremio