I’m been working on this design for several weeks and would really appreciate further insights.
Basically, I would like to track Create, Update and Delete on dynamoDB tables with the following requirements:
- User is able to query the changes for the past 15 mths
- User will need to make request, if need to query > 15 mths
- These records are kept for 10 years for compliance after which it will be deleted
I came up with this design:
dynamoDB tbl --(stream)--> lambda function --(save to)--> RDS Monthly --(cron job run)--> lambda --(archive records > 15 mths to json and put into)-->Glacier--(query)-->Athena
Any flaws with this design? Will this be the most cost effective implementation?
lambda function --(save to)--> RDS:
- I need to do a cost calculation, I will either save to RDS or back to dynamoDB whichever is cheaper
- I was thinking of saving each change record to S3 as a json file, and use lifecycle to archive to Glacier after 15 mths, and use Athena to do user query. But the front-end app to query via Athena seems a bit difficult to implement. Also, it seems a bit costly as there is additional cost for lifecycle put request from S3 to Glacier.
Glacier--(query)-->Athena, I’m not very familiar with Athena, but is there an effective way to name my json file and place them in certain folder structure so that it’s easier for Athena to build the query?