Summary
This article explains how to troubleshoot Dremio Cloud engines if they are failing to come online, queries are stuck on "running," or if the query times out with the "Scaling failed. Replica creation time exceeded" message.
Reported Issue
Dremio Cloud engines are failing to come online, queries are stuck on "running," or the query times out with the "Scaling failed. Replica creation time exceeded" message.
Relevant Versions
Dremio Cloud
Troubleshooting Steps
1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
2. In the left navigation pane, choose "Instances", and select the instance that Dremio Cloud spins up. 3. Choose "Actions" - "Monitor and troubleshoot" - "Get system log."
4. View the System Log for errors related to Dremio Gateway connectivity and S3 connectivity.
Cause
The issues may be caused by: - Incorrect security group configurations, missing required ports - Lack of access to S3 within the VPC
Steps to Resolve
For Dremio Gateway connectivity issues:
- Check that the security groups have the required ports open (45678 for the VPC security group, 443 inbound for the VPCE security group).
For S3 connectivity issues: - Ensure you have access to S3 within your VPC via one of the following methods:
- S3 Gateway Endpoints
- NAT Gateway
- Internet Gateway
- AWS Site to Site VPN
- AWS Direct Connect
Additional Resources
- Dremio Cloud - "Scaling up the engine replicas failed, please check the engine scaling events page for more details": https://support.dremio.com/hc/en-us/articles/8440596487323 - Dremio Cloud - S3 Source Connection Troubleshooting: https://support.dremio.com/hc/en-us/articles/8785640941211