I’m running a query in Atlas Data Federation and it’s running much slower than expected. Looking at the logs, it appears to be opening thousands of files to satisfy the query, even though it should only have to open 120 files based on the time range provided in the query filter and the S3 path partitions. I’m querying on a time range in the S3 path, but it’s opening files whose path indicates a date outside the range. When I run simpler examples, it seems that Data Federation can push down the range portion of the query (for eg. b < 5) to the storage layer when the field being queried is an int, but not when its type is epoch_millis or isodate (time types). Why is this?
Hello,
We may be able to help if you can provide some more information:
- The S3 Path you defined with path partitions
- A sample of the data in the files (please change any sensitive info)
- The query you’re trying to run
We can look and see if anything jumps out.
Thanks,
Irwin!