I would like to ask you for an opinion about architecture for following case. I’m writing simple application but I want to do it properly avoiding any anit-patterns. I wrote few simple projects using serverless framework and I’m aware of “Lambda should not invoke other lambda directly” anti-pattern.
So my application should:
get a list of items (from Dynamo)
For each item perform the same task (set of steps that use some rest endpoints which are already provided etc.)
Update item, when that task in step two has been done, for example counter, timestamp etc.
Sample item can be {name: 'foo bar', counter: 14}.
I could use step-functions for that, but I think it’s a little too much for such simple setup. So I want to have solution with simple lambdas in JavaScript / Node.js.
My concerns:
First and third steps access DynamoDB. Should I create separate function that deals with database and use that function for getting list of all items (step 1) and updating each item (step 3).
If this is a correct way, how should I use that function from other lambdas? Via API Gateway REST? I don’t want to expose “dynamo db” function to the internet?
Parallel processing is not important for me, can I publish each item loaded in step 1 to the SNS and have this SNS topic setup as trigger for step 2 function? Ideally I would use SQS but lambda cannot be triggered by SQS and I don’t want to use cron (CloudWatch Events) to check if there is something in the queue.
When using SNS I have loose coupling between functions and they don’t invoke themselves directly.
What do you think? I’m very curious of you opinion.
I don’t consider that to be an anti-pattern at all. I use it a lot, and it works well.
I’d say it’s both more reliable and more secure than a HTTP request between functions. For example, your internal services can be lambdas on a VPC, with no API gateway link to the outside world, and be able to access other internal resources which are not on the Internet.
For each lambda, you can control which other lambdas it’s permitted to call by iamRoleStatements in the calling function’s serverless.yml:
If you don’t want to directly couple lambdas then I guess you’re looking at something like SQS in between, but that makes things more complex because you can’t get synchronous replies.
Could you ellaborate a bit more what is the loose coupling an issue for you?
Also, don’t understand what you mean about them invoking themselves directly.
I have been experimenting with this type of SNS event-driven design, and so far, I am quite happy with this configuration.
Using SNS for all internal communications has little overhead in terms of boilerplate and does not present the issue of unnecessary or insecure traffic going to the world outside and back in.
This setup also gives you the benefit from using several subscribers on a particular topic, if you need notifications or different processors.
Coupling lambdas together could get messy. As long as the flow of events/state transitions stays the same, you can introduce new features by simply modifying a single function or introducing a new one if necessary.
I think that the defining points on your design is whether it is important for you to process all those items independently and rely on the internal mechanisms for considering a task failed, or whether it should be retried.
And how often new data might be getting published.
Depending on how you answer those 2 questions you could also consider Kinesis.
Essentially, think a bit more about the type of workload you are expecting to be processing, and there might be a service that could align better to your specific needs.
var params = {
FunctionName: 'STRING_VALUE', /* required */
ClientContext: 'STRING_VALUE',
InvocationType: Event | RequestResponse | DryRun,
LogType: None | Tail,
Payload: new Buffer('...') || 'STRING_VALUE' /* Strings will be Base-64 encoded on your behalf */,
Qualifier: 'STRING_VALUE'
};
lambda.invoke(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
This is an anti-pattern as you admit
Anyway, I end up with two functions:
First function is triggered by CloudWatch Event (cron) and loades data from DynamoDB, performs some lookup for each item from db and pushes results for each item to SNS message.
Second function is performing work on payload received from first function via SNS message (loose coupling ) and saving results to DynamoDB.
As you can see I resigned from and idea of having separate function for accessing data.
OK, finally get you
In that case I think we are discussing fairly similar approaches.
Having a separate function to get the data sounds like overkill indeed.
It is difficult to make many more recommendations without knowing more about the type of workloads that you are dealing with.
Something that I would maybe add to the mix is that if you don’t need individual fail/retry policies for each individual fields that you are accessing. You could consider using batched reads and writes in Dynamo to avoid the overhead of individually processing every message.