I have an app which has an asynchronous ‘post registration’ step.
This step involved paging thousands of records (getting batches of 250 at a time) from an external api and saving them to a database.
I iterated on many solutions, from step functions, to manual orchestration of Lambdas iterating over pages and sending messages to SNS and SQS and finally consumed by another lambda on the other side.
Then I found Kinesis Streams and was simply able to my 250 record batches each time to a stream and have a lambda consume batches of size 1 or higher if I wanted - this works great.
But im thinking that its an under utilization of Kinesis streams since its not exactly happening every hour of the day - and Kinesis streams are charged per shard hour.
Now, I was just learning bout Kinesis Firehose which gives me another interesting idea, and perhaps more cost efficient as it is based on amount of data rather than time. Also I dont have to worry about partitions etc
So now I was thinking of doing the following:
- The post registration task calls my GetPageOfDataFromExternalApiLambda
- my GetPageOfDataFromExternalApiLambda retrieves page of 250 items, sends them to Firehose, then invokes itself again to get next page and do the same until no ore
- As my special s3 bucket fills up with records I stream s3 events to a S3EventLambda
- There my S3EventLambda looks at event and stores data in db. Afterwards it can also delete the file from s3
does this seem like a reasonable solution ?