I have a project which involves the followings processing steps:
- EventBridge scheduler triggers a Lambda function which will run several Athena queries
- Athena queries produce data on S3
- Process data and store result on S3 (long running and high memory usage)
I can handle easily steps 1 and 2 using Serverless framework, but my doubts are about step 3. Since this is a possibly long running task I can’t execute it on a Lambda function (also RAM could be a constraint).
I’ve read about Step Functions and Fargate to run docker containers on ECS, but here are my concerns:
- I would like to keep both the code and the architecture git-versioned (as I’ve always done for serverless projects), but in every example/tutorial I’ve read about Fargate it seems it requires to setup a cluster by AWS console or CLI before hand…doesn’t this missing the point for Fargate to offer a real serverless container service?
- I need all the computation-resources to be transient, meaning that once the job is completed all resources such as EC2, ECS containers etc must be released. But if I’ve read correctly about Fargate, the ECS cluster will be still running even if the containers are removed. Is thats so?
Generally speaking, how do you guys approach problems like this involving both Serverless and long running resources such as an EC2, a Fargate container or anything related?
Is it possible just with the Serverless framework to define/orchestrate/version all these resources or we need to combine different tools?