Batch processing in a serveless arch question

This may not be the correct forum for these questions in many ways, but perhaps someone may have some ideas

I am working on an ‘email integration’ project where most likely I will have to deliver thousands of emails based on certain events.

So for example some events are:

  • single events e.g. SendNewUserRegistrationEmail
  • some events are periodic events based on time
  • and some could be classified as batch events e.g SendImportantMessageToAllUsers

I have used AWS Lambda recently on another project and really loving the serverless element of things i.e. not have to worry about a ‘running’ server or scaling / bottleneck issues. I am hoping that I can apply a serverless architecture to my current project using AWS infrastructure. I am also hoping to keep things as simple and un complicated as possible.

For my SendNewUserRegistrationEmail use case I would assume this would be pretty straight forward

  • A source posts a userId to a AWS Lambda ‘SendNewUserRegistrationEmail’ endpoint
  • The Lambda function looks up user details by the user Id, builds up a html email message with the user details and send calls SES to deliver the email

For time based events I guess I can just used Lambda ‘Scheduled events’ , I assume this is easy to configure with the ServerLess Framework ?

But the most thing that is bothering me is how I am going to handle batch events like SendImportantMessageToAllUsers. Assume that although each email will have more or less the same content, it might have some content specific to that user. Obviously a Lambda function cannot to any batch processing on its own due to timeout constraints so I am trying to understand how this use case could be fulfilled in a serverless situation. Would anyone have any suggestions:

  • Would I need some kind of a source to basically generate an individual event for each email recipient. e.g. a kenesis event for each email? and have each of these kenesis events consumed by an individual Lambda function which would build the email and call SES.

  • and how could I generate all these events inside a serverless architecture from one simple trigger ?

hope someone might be able to help

thanks a lot

1 Like

I wouldn’t call SES directly from the API that registers new users. It’s better to decouple the two using SNS. i.e. API GW -> Lambda -> SNS -> Lambda -> SES. This way your API won’t block or fail if there’s a temporary issue with SES.

The biggest problem with bulk sending is going to be the SES rate limit. I’d be inclined to break it up into a two step process. Have one Lambda generate the messages and put them onto an SQS queue. Once it finishes populating the queue I would invoke another Lambda to read messages from the queue and deliver them via SES.

The first Lambda needs to break sending into smaller batches and keep an eye on context.getRemainingTimeInMillis() to make sure you don’t exceed the maximum time. If you are getting close to time then you can invoke the Lambda again telling it where to continue.

The second Lambda pulls messages from SQS and sends them via SES until the queue is empty. It also needs to keep an eye on context.getRemainingTimeInMills() to invoke itself again if it gets close to time.

1 Like

Hi @buggy,
I’m interested on this too :slight_smile:

In this case there is a lambda handler for API GW, let’s say registration_handler.js which includes a method for registering users.
Inside this method there is a “sendARegistrationMessageToSNS” function.
How this function should be? Is it a mySNSInstance.publish(data. funcion(err, result) or is it an “invoke another lambda as event type -which receives the message and it performs the publish-”?

Thank in advance!

The way I’ve been trying to architect this is to have a core API that triggers generic events. In this case I’d have an SNS topic for user_created. Once the API has validated the input and saved the data I would publish a message to the user_created topic and have the API return success. This keeps the code in the API to a minimum so it’s fast.

The reason I’m using generic events instead of verbs/instructions (user_created versus send_welcome) is so that I can easily add, replace and remove functionality without needing to change existing code. I then build other Lambda functions to receive messages from the user_created topic and process them accordingly. There might be one to send a welcome email, one to add the user to my mailing list, etc.

In terms of the actual implementation there are two approaches.

  1. You could create a Lambda that listens to the DynamoDB stream of a table and publish the events as they come off the stream.

  2. You could have the Lambda processing the API request directly call mySNSInstance.publish().

Currently I’m using option 2. In your example that would mean having sendARegisationMessageToSNS() inside registration_handler.js and it would call mySNSInstance.publish(). It’s my preference because it gives me more control over when the event is published at the expensive of having to publish it myself.

PS: My general rule is that if you have one Lambda directly invoking another Lambda you’re probably doing it wrong.

1 Like

so would this be an appropriate process…

  1. when some part of my system e.g. the website backend registers a new user etc and saves to db etc, it then subsequently makes a post call to my api gateway endpoint e.g. ‘amazon.xxx/userCreated?userId=123

  2. APi gateway forwards this to a lambda function, the lambda function pulls all the data for the userId=123 from a database and puts message on an SQS ‘user_created’ queue

  3. another lambda is configured to read messages off this SQS queue. It uses all the info in the message to call some external email api (i.e. passing email address and some other user info that the email system will build an email from.) If the email api call returns a ‘success’ then the lambda deletes the message from the queue hopefully before the ‘visibility timeout’ expires. But if there was an error, then no delete is issues and i guess the message will be picked up and tried again later.

I have a few questions:

  1. does above sound like a decent approach?
  2. Doesn’t SQS have a four day limit for messages, so how do I handle things when mail fails for 4 days ?
  3. Is it possible to have a serverless consumer for the SQS queue or is some long running process required? I am trying to keep everything serverless

thanks

1 Like

@walshe Use SNS instead of SQS. You can configure SNS to invoke a Lambda when a message is ready. It will also handle retries. With SQS you need a long running process that polls the queue*. You can then use a dead letter queue for failures that you monitor for alerting.

@buggy hmm, I was reading this article https://cloudonaut.io/integrate-sqs-and-lambda-serverless-architecture-for-asynchronous-workloads/ which talks about SQS. SQS seems to be the best way to deal with fault tolerance

heres my summary of the article applied to my scenario

1 - some event happens
2 - post a message to AWS Api Gateway passing relevant parameters e.g. an entity id
3 - API Gateway forwards to a Lambda Node.js function
4 - The Lambda function looks up any info for the entity id if necessary and then places a message on an SQS ‘user_registered’ queue with a payload containing relevant info
5 - An Events Rule will trigger a dedicated Lambda ‘Consumer’ every minute.
6 - The Lambda Consumer will read as many messages off the SQS queue as it can and for each message will call a ‘worker’ Lambda to handle the message
7 - The ‘worker’ Lambda calls 3rd party email api
8 - A successful api call will mean that the worker should delete the message from the SQS queue, But a failure will mean it wont and so it should just do nothing . Does ‘doing nothing’ mean that after the visibility period expires that the message goes back on the sqs queue ?
9 - if the message is put back on the regular sqs queue <max + 1> times I it automatically goes onto the preconfigured Dead letter queue to avoid poison pill message situations

would love your thoughts
thanks

1 Like

Personally - this screams massive fail. I’m not saying there aren’t reasons to use SQS but if you need SQS + regular polling (especially every minute) then you should probably be using SNS.

Use SNS + a dead letter queue.

interesting, didnt know SNS could be used with DLQs

Technically the dead letter queue is on the Lambda, not SQS. http://docs.aws.amazon.com/lambda/latest/dg/dlq.html

What’s kind of interested (I haven’t tried this yet) is that you can use an SNS topic for the dead letter queue. This should allow you take advantage of SNS fan out so that you receive a notification when something fails and you could then put the message into SQS for later retry.

1 Like

@buggy So I had this idea like you said:

  1. event triggered
    2 posts a message to AWS Api Gateway passing relevant parameters (this may be an id of a relevant entity or a maybe a collection of data)
  2. API Gateway forwards to a Lambda Node.js function
  3. The Lambda function looks up any info for the entity id if necessary and then places a message to an SNS ‘note_posted’ topic with a payload containing relevant info
  4. The SNS topic this time delivers its payload to a Lambda subscriber
  5. The Lambda subscriber takes the payload from SNS and makes a call to email api . Again depending on the result of the api call the Lambda will finish will success or failure
  6. A success will mean that the message from SNS will not need to be retried. However a failure code will mean it will be automatically retried in correspondence with the configured retry policies
  7. After a configured number of retries and failures, the message will be sent to an SQS or SNS dead letter queue (DLQ). This DLQ is subscribed to by another ‘Failure’ Lambda ‘processor’ which will perform necessary reporting of this failure -

However,
I just read that SNS can end up delivering a message twice to the same subscriber which is not so good as i could end up sending duplicate emails. I could try and implement some sort of throttle in the last part that does the email, i e record something in a database but it probably wouldnt get saved fast enough to be read by the processor of the duplicate message

from aws docs

Q: How many times will a subscriber receive each message?

Although most of the time each message will be delivered to your application exactly once, the distributed nature of Amazon SNS and transient network conditions could result in occasional, duplicate messages at the subscriber end. Developers should design their applications such that processing a message more than once does not create any errors or inconsistencies.

1 Like

When SNS invokes your Lambda it may send multiple messages at once. I don’t see how SNS could know which messages were successfully processed if your Lambda fails so I assume it needs to send all of them again. I would write your Lambda so that it’s idempotent or has a way to de-duplication messages. If you don’t you run the risk of processing the first couple of messages multiple times while the last message don’t get processed.

@buggy from what I understand the lambda processes one message at a time from SNS, but the problem is that it can end up receiving the same message twice from SNS.

I wonder of I could just get my lambda to insert a row in to dynambo db with an identifier for the message, then if a duplicate gets sent to just make sure that there is still only one row in dynamo for that message. Meanwhile I think I could could use dynamodb streaming to have make any inserts into the dynamdb trigger a lambda - this way hopefully the lambda would not process any duplicates - since dynamo would have no duplicate rows

too complex maybe ?

1 Like

@walshe I’m in the same exact position. Could you share how you resolved it ?
I’m evaluating using Dynamodb and a table with “seen IDs” keys and using the new TTL feature available on Dynamodb.

Long time running AWS Lambda, use Step Functions!

https://www.linkedin.com/pulse/large-file-processing-csv-using-aws-lambda-step-functions-nacho-coll

1 Like