This page will explain how to troubleshoot your Dremio Cloud engines if they are failing to come online, if queries are stuck on "running," or if the query times out with "Scaling failed. Replica creation time exceeded" message.
Dremio runs a diagnostic service on each executor that is spun up, and these errors can be viewed through the System Logs of the EC2 instance that is spun up in your AWS account.
How to view EC2 System Logs
- Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
- In the left navigation pane, choose "Instances", and select the instance that Dremio Cloud spins up.
- Choose "Actions" -> "Monitor and troubleshoot" -> "Get system log."
- View the System Log
Dremio Gateway connectivity test
<< BEGIN >>
*connect to xxx.xx.xxx.xx port 443 failed: Connection timed out
*Failed to connect to aw.dremio.cloud port 443
S3 connectivity test
diagnostics.sh: << BEGIN >>
connect to xx.xx.xxx.xxx port 443 failed: Connection timed out
* Failed to connect to s3.us-west-2.amazonaws.com port 443 after 298676 ms: Couldn't connect to server
* Closing connection 0
If the Dremio Gateway connectivity test fails, check that your security groups are correct. The error will indicate the missing port.
- Port 45678 for the security group attached to the VPC
- Port 443 inbound for the security group attached to the VPCE (for PrivateLink connections)
If the S3 connectivity test fails, check that you have access to S3 within your VPC via one of these methods:
- S3 Gateway Endpoints
- NAT Gateway
- Internet Gateway
- AWS Site to Site VPN
- AWS Direct Connect
- Dremio Cloud - "Scaling up the engine replicas failed, please check the engine scaling events page for more details"
- Dremio Cloud - S3 Source Connection Troubleshooting