Summary
This article deals with an issue that can cause queries to datasets in S3 to fail due to the connection pool being busy.
Reported Issue
The query profile for the failed job or the server.log shows the following error:
ConnectionPoolTimeoutException: Timeout waiting for connection from pool
Relevant Versions Tools and Integrations
This can affect all Dremio releases.
Steps to Resolve
Dremio uses the Apache Hadoop-AWS module to form create client connections to S3 (and S3-compatible) data sources.
There are a couple of important settings used to tune this connection:
- fs.s3a.connection.maximum - Maximum number of HTTP connections to S3 (default 1000).
- fs.s3a.threads.max - Threads in the AWS transfer manager (default 1000). Increasing the number of threads means that more memory will be used.
To test increased values for these properties, one can create a new source and promote underlying PDS referenced in your VDS, and apply the experimental fs.s3a.connection.maximum setting on this new source. This will avoid affecting your current S3 source. This way you can test the setting safely.
To increase the parameters for existing sources, one can use the "Advanced Options" for the S3 source or the core-site.xml file.
If one uses "Advanced Options", one should also use support key store.plugin.keep_metadata_on_replace to edit Dremio sources without losing metadata (including permissions).
The steps are:
1. Under "Settings" -> "Support" add the support key: store.plugin.keep_metadata_on_replace. Enable it then click "Save";
2. Go to your S3 source and add the following Connection Property:
Name: fs.s3a.connection.maximum
Value: 2000
3. Then save the changes on the S3 source (you will still see a popup message that this is a "metadata impacting change", but it will not make any changes to the datasets in that source);
4. Go back to "Settings" -> "Support" and disable/reset the key store.plugin.keep_metadata_on_replace and save this change.
Remember to check the core-site.xml file and make sure the same properties are not defined there.