How I ended up with a $1,365.67 AWS Lamda bill last month

I drank the Serverless/lambda coolaid. I read up about it at the suggestion of a friend, everything you read is rainbows and unicorns.

So I began moving a realtime analytics app I had running on 10 ec2 servers over to it.

The app basically does a bunch of polling on a number of different sources, for hundreds of social media profiles to match data and provide a unified dashboard.

The initial structure I came up with was as follows:

  1. Fast Workers - Which do most of the polling/checking profiles some as often as every 5 minutes to check for updates.
  2. Slow Workers - Which handle data processing queues.
  3. External APIs - Endpoints for returning data etc.
  4. Internal APIs - Endpoints for standardized functions that the functions all access.

This is the promise of Serverless computing right? Write functions, deploy, everything scales, live a happy life.

The devil, my friends, is in the fine print. The REAL costs for Serverless don’t lie in each invocation, they are in the cost/GB-second.

WTF is a gigabyte second you ask? Well, its kind of confusing. I too glazed over this cost, because in most cases it’s presented as an after thought, and the numbers are SO small it’s hard to even imagine them adding up. From what I understand, it’s the size of the GB you want to use to process times the number of seconds your app runs.

From the Amazon website: $0.00001667 FOR EVERY GB-SECOND USED THEREAFTER?

Sounds REALLY small right?

So for example if your app has many worker functions like mine does (32 currently), and they are all running every 5-15 minutes, and they are running on intervals of 30-90 seconds (to avoid timeouts), well that can add up VERY quickly.

I’m still in the process refactoring my app to optimize for this showstopper. What I did immediately, was the things.

  1. Run functions less often. Many of the functions that were running every 2 or 5 minutes I changed to 15 minute intervals.
  2. Shortened unnecessarily long runtimes on some functions. However this one has been more tricky, as i’m still trying to figure out how to balance processing the queues without the function timing out prematurely.
  3. Moved away from my centralized internal API structure. All of my worker functions were using the internal APIs which I believe exponentially increased cost (I haven’t confirmed this yet).

Ultimately it may be more cost efficient to move most of my “worker” functions back to ec2 instances, and just scale those up as my data processing needs increase.

Maybe Serverless/Lambda is not the right use case for real-time apps??

Just wanted to share a word of caution if you’re considering moving from traditional servers to Serverless. While it’s very convenient to write simple functions and deploy, and scalability is nice. There are a couple of “gotchas” that are forcing me to consider moving back to a more traditional server structure.

They other annoyance I had last month was that I hit the “200 resource limit”. What this meant is that I had to break up my app (that is growing by a few functions/week) into various “micro services”.

It’s shocking to me that there aren’t more best practices around building scalable truly scalable serverless apps.

As a one-man boostrapped startup, all this may turn out to be more hassle than it’s worth. Still figuring it out.

Anyways - if you made it this far thanks for reading my rant :).

And if you have any suggestions for refactoring to reduce costs, or how you structure your Serverless app with many functions I’d love to hear about it!

Have you looked at AWS Batch? That may be closer to your use case. At least for data processing. You could also try writing lighter weight functions for endpoints.

1 Like

One quick thing to look at that I don’t know if you have is if your lambda execution times are way below 100ms consider decreasing the size of the functions a notch. Since you get billed per 100ms, if you are using a 1024 size and execution finishes way below 100ms, by using smaller Lambdas that still finish under 100ms you are reducing your GB-second amount.

In the same toke, if your Lambdas are taking a lot MORE than 100ms to run, look at tweaking the size again. You may have a sweet spot at a larger size that reduces execution time enough that overall you end up with a lower GB-second amount as well.

In case you are unfamiliar with what I am talking about, the memorySize configuration parameter can scale the size of your Lambda from 128MB all the way 3008MB with CPU scaling linearly along the way as well as cost.

3 Likes

Thanks yes - I took them all down to 256 which makes a huge difference.