API Gateway - round trip from browser about 150ms
Cloudfront proxying to direct lambda - round trip from browser about 120ms
Directly calling lambda - round trip from browser about 60ms
Now this is only one function I am testing and I only clicked on it about ten times, but what I am observing here is that each layer of AWS the call goes through adds 30ms. I suspect this has to do with https and the encypt/decrypt between each box. For example API gateway is going through three boxes - Cloudfront, API Gateway, Lambda. So that is six encryption points.
On the other hand, 60ms to go 650 miles, hit lambda, do a database read, and return 30 results is pretty good considering the speed of light round trip is 8ms of that. And this is not a do-nothing db read. Everything is running on a Cognito user ID and the last 30 results are returned for that specific user.
I’m bugging the AWS Go SDK people to make some things more efficient. If they do what I’m asking it should take another 10ms off from my lambda run time.
Edit: I do see considerable variability in these numbers. It is likely the impact of other traffic getting in the way.
Edit2: Direct lambda calls support CORS. The options request comes back with access-control-max-age: 172800 which causes the CORS status to be cached for two days.
Holy smokes. Go is freakin’ fast!
Thanks for posting this @jonsmirl
My do-nothing Go lambda that just returns “Hello world” runs in 0.31ms. This is on a 512MB lambda node. That is just the lambda time, not from the browser.
For my 18ms function that does the DynamoDB read, the fastest round trip I have observed is 49ms. I switched over to the do-nothing function and I’ve observed 40ms round trips when making direct calls to lambda.
When I say directly hitting lambda - I am using AWS Sig4 to sign direct calls to https://lambda.us-east-1.amazonaws.com
Edit: the surprising part to me was that Cloudfront was so much slower. Apparently the public Internet pipes between Boston and Virginia are pretty good. Probably almost the same speed as the Cloudfront private pipes. But… Cloudfront is an https termination so they have to decrypt and re-encrypt my https traffic. That appears to take on the order of 30 ms which adds far more delay than the dedicated AWS pipes can remove. Also note that API calls can’t be cached so Cloudfront caching does nothing. I do have my web app cached in Cloudfront and that is a clear benefit.
Costwise: API Gateway is the most expensive option. Cloudfront and Direct Lambda appear to be a wash. The data transfer rate is higher for Direct Lamda, but Direct Lambda does not have the $1/million HTTPS request surcharge Cloudfront has.
Love your post, @jonsmirl Thanks for posting.
For people that are worried about managing versions of an API while making direct lambda calls, lambda supports aliases. Your app does not need to be pointed at $LATEST of a lambda function.
You can add alias suffices to your lambda ARNS.
There are commands that you can use to do something like map PROD to V21 of your function. Then deploy a new version V22 and when you are happy with it, set PROD equal to V22. To rollback set PROD back to V21. The app never changes.
I checked out the cold start time this morning. With cold start the lambda and database results should have fallen out of the cache overnight. This is round trip from the browser.
Cold start response: 639ms
Subsequent call: 58ms
How do you call the lambda directly from browser? without going thru apigw?
You can call every AWS service directly, it is what mobile apps do. For example the URL for lambda is: https:://lambda/.us-east-1.amazonaws.com
Then you look up the API in the documentation, for example lambda invoke.
All AWS calls are POST. This one is:
Your browser needs AWS credentials to make the call. I am using aws-amplify to get those credentials:
Finally you need to make the request. amplify has a routine in it to make the signed request. These signed requests are just like calling API Gateway using IAM security.
There is an example on this page that makes a direct lambda call
this is the key bit:
I’ve been playing with cost reduction even more. For small functions Go will always finish way under the 100ms slice with 512MB. For most of my functions I am able to push the memory down to 128MB and they still finish in under 100ms. Since the CPU is slower for 128MB they run longer, maybe even double, going from 18ms to 36ms. But now they cost 1/4 as much to run.
My top problem now is lambda charges relating to latency. I call into AWS IOT from lambda using the SDK. The latency on those calls is terrible. The Publish() call, which I do a lot of, will randomly vary between 100ms and 3000ms to complete. Since I always make that call with the same parameters, that latency is some mess inside of AWS. This is annoying since it adds 3000ms to my lambda bill on every lambda call while the lambda twiddles its thumbs waiting for AWS IOT to respond. That is why I wanted to drop the memory down from 512 to 128. That cuts my wait time charge 75%.
I turned on AWS Xray for some of my lambda functions. X-ray is very useful. It tells me how long my calls sit in the queue before lambda acts on them, and then it tells me how long AWS SDK calls using remote services take. Some AWS SDK calls execute locally so the don’t need special measurement plumbing.
What I’ve observed so far…
DynamoDB is lightning fast. Typical response time is 5-6ms.
IOT services are pig slow. Typical response time 25-60ms.
All of my IOT devices have a human owner who has a Cognito ID. When I get a message from a device I do iot.ListPrincipalThings() immediately followed by iot.ListThingPrincipals(). That uses AWS iot to track which CognitoID owns the device. Those are trivially simple calls, one string for input, one string for output. Just a simple DB lookup. But those calls take 50ms each to run. So the pair of them is running over 100ms.
I’m going to rewrite my stuff now to keep my own database of who owns each device in DynamoDB where I can access it in a single 5ms call.
You’ve done a great job examining and logging your findings. These are quite helpful, so thank you.
I can just offer a few observations, which might be useful.
If you are trying to optimize lambda response for speed especially with API gateway involved, it’s pretty much an impossible task, due to latency between various services. I can spin up an ec2 endpoint on something like a lamp stack and get 5ms response time (end to end), with proper caching. Lambda can return a response pretty fast as well, but it could still be running, especially if you did some async operation.
Honestly I don’t think lamba is the best solution for APIs that require an immediate or near real time response time. Unless you can use caching, but then underlying solution doesn’t matter as much.
To offer some tips to speed things up…
On Dynamo increase your read/write capacity units.
If you are planning to query by something other than primary key, you’re going to need an index. Be mindful.
Use one file per function. Don’t do handler.save or handler.get in one file. This tends to grow unnecessarily big quickly.
Be mindful of cold starts, that will add as much as a second to your response time because the container needs to come “alive” again. Unless your API is hit non stop, consider using serverless plugin to keep your lambdas warm.
Remember that billing is done in 100ms increments, so you are not going to see any savings if your function runs for 5ms vs 85ms.
More memory = faster running function. You need to find a balance for each use case.
If you see strange outliers in terms of response time or timeouts, make sure you’re exiting correctly and not just leaving some process to accidentally run in the background until things timeout.
My use case involves several hundred thousand sensors uploading tiny bits of data to store in a database. But the sensor traffic is variable and it depends on what the sensor is seeing. The sensor may go quiet for hours and then burst up 200 data points in five minutes. I use IOT because IOT is very cheap for tracking whether the sensor is online or offline. If the sensor goes offline I get an IOT message. Scaling is also important since the sensors wake up in a somewhat coordinated fashion. Occasionally thousands will try and transmit at the same time.
Now it is obvious to me that iot:xxx calls are extremely slow. ListPrincipalThings and ListThingsPrincipals are used in a pair to determine which customer owns the IOT device. I thought those simple calls would run instantly but X-ray shows that they do no run quickly and running the pair takes over 100ms. These two calls force my trivial lambda from one 100ms slice into two slices.
Now that I know where the problem is I am going to complain to the IOT team to fix this. Meanwhile I will rework my upload strategy to not use these calls.
@jonsmirl, regarding the JIT comment. I can understand it is true for cold start, but is it still true for warm start?
Another important side effect, you can get away with far less memory when using Go. That will let you set your memory slice down to the minimum size (ie 1/4 the price).