Tracking changes of records in DynamoDB tables

Hi,

I’m been working on this design for several weeks and would really appreciate further insights.

Basically, I would like to track Create, Update and Delete on dynamoDB tables with the following requirements:

  1. User is able to query the changes for the past 15 mths
  2. User will need to make request, if need to query > 15 mths
  3. These records are kept for 10 years for compliance after which it will be deleted

I came up with this design:

dynamoDB tbl --(stream)--> lambda function --(save to)--> RDS

Monthly --(cron job run)--> lambda --(archive records > 15 mths to json and put into)-->Glacier--(query)-->Athena

Any flaws with this design? Will this be the most cost effective implementation?

In lambda function --(save to)--> RDS:

  1. I need to do a cost calculation, I will either save to RDS or back to dynamoDB whichever is cheaper
  2. I was thinking of saving each change record to S3 as a json file, and use lifecycle to archive to Glacier after 15 mths, and use Athena to do user query. But the front-end app to query via Athena seems a bit difficult to implement. Also, it seems a bit costly as there is additional cost for lifecycle put request from S3 to Glacier.

In Glacier--(query)-->Athena, I’m not very familiar with Athena, but is there an effective way to name my json file and place them in certain folder structure so that it’s easier for Athena to build the query?

Thanks.

One thing I’d change is the monthly cron. I’d change this to a daily process. Reason being is you’ll be placing quite a bit of load on your RDS, 1 day a month. Do this in small batches daily and you can then scale the system accordingly.

One thing to keep in mind is that streams generated by DynamoDB is NOT a SQS stream, therefore if the lambda fails to write the records to RDS the RDS will be out of sync. If you want it to be bullet proof and have the ability to playback failures, using a lambda to write to Kinesis then into RDS or RedShift is a better option, you’ll also have the ability format the records with an additional lambda.

Glacier is a safe option to use, but I’d only really opt to go that route if you have Terrabytes of data. Otherwise just keep it in S3, also be aware of the time/cost of retrieving data from Glacier.

As for saving records to RDS I’d go

DynamoDB -> DynamoDB Stream -> Lambda -> Kinesis Firehose -> RDS

Otherwise the solution seems sound