I’m testing the serverless example using python and dynamoDB. I deployed with no alterations to test, and I’m noticing upwards of 1500ms response times (often upwards of 4000ms!) to create/list the ‘Todos’ for more than 1 in 10 requests. I understand that lambda occasionally has cold-start times for a function, but the volume of high-latency requests is absurd.
To see if this was specific to DynamoDB response times, I even tested with a simple ‘hello world’ that returns a JSON response without touching the DB – same issue.
Is this typical in production, or is there some optimization I’m missing that didn’t ship with the example? As it stands, I’m not sure how using serverless for a production application is tenable.
What region are you in? The cold start times are usually very low (even for python) and the roundtrip to dynamodb is generally only a few milliseconds. Maybe there is a network issue or something else happening, but this is not typical by any means. Once a function is “warm”, it typically stays warm for several minutes. Response times should be almost instantaneous with a warm function.
I deployed to US-East 1 for the initial test – could region play that large of a factor for a simple function? Happy to test in US-West 1 or 2 if you think it’d make a difference.
I tried testing on several networks today and noticed the same behavior with request times upwards of 600-1500ms every seven or eight requests. I also tried making requests in Postman, chrome, and via the python shell just see if there were any weird interactions… but the same behavior persists.
I threw together a quick and dirty script to show what I mean… it runs a 1000 requests against a simple List endpoint connected to DynamoDB. It’s equivalent to the python API dynamoDB example, I just changed the table name / endpoint.
Here is the code:
import timeit
import requests
url = "https://378cqxe0e4.execute-api.us-east-1.amazonaws.com/dev/configurations"
num_runs = 1000
total = 0
long_runs = []
highest = 0
for x in range(0, num_runs):
start = timeit.default_timer()
response = requests.get(url)
stop = timeit.default_timer()
time = stop-start
total += round(time, 5)
if time > .3:
long_runs.append(round(time, 5))
if time > highest:
highest = round(time, 5)
print("total time to run: %sms" % str((total)*1000))
print("average time to run: %sms" % str(((total) * 1000)/num_runs))
print("num runs over 300ms: %d" % len(long_runs))
print("highest run time: %dms" % (highest * 1000))
And here are the run results:
total time to run: 224860.56ms
average time to run: 224.86056ms
num runs over 300ms: 87
highest run time: 3066ms
There’s an unexpectedly high number of long-running requests for such a simple function, 87 out of 1000. Thoughts?
Assuming your URL endpoint is correct then this is the result for 100 requests with a concurrency of 1 from the other side of the planet (Sydney, Australia) using the Apache Benchmark tool.
Percentage of the requests served within a certain time (ms)
50% 504
66% 530
75% 562
80% 593
90% 899
95% 933
98% 976
99% 1075
100% 1075 (longest request)
I did hit the URL once before running the test to make sure it was warm. It’s nothing like the 3,000ms response time you’re seeing.
INFO Max requests: 100
INFO Concurrency level: 1
INFO Agent: none
Completed requests: 100
INFO Total errors: 0
INFO Total time: 13.91271278 s
INFO Requests per second: 7
INFO Mean latency: 139 ms
INFO Percentage of the requests served within a certain time
INFO 50% 132 ms
INFO 90% 158 ms
INFO 95% 177 ms
INFO 99% 436 ms
INFO 100% 436 ms (longest request)
Thanks for looking into it Jeremy and buggy. If you wouldn’t mind, could one of you gentlemen share the code you’re using to test? I just re-ran my own test this morning on the same network (Denver, CO) and got the following results:
number requests: 3000
average time to run: 165.4704ms
num runs over 300ms: 127
highest run time: 1837ms
A much lower average time to run, and a far more acceptable number of long-running requests. The results seem the be improved from last night… is there a ‘buffer time’ which the code is deployed/loaded within the AWS infrastructure?
INFO Max requests: 1000
INFO Completed requests: 1000
INFO Total time: 307.64351615 s
INFO Mean latency: 307.6 ms
INFO Percentage of the requests served within a certain time
INFO 50% 180 ms
INFO 90% 696 ms
INFO 95% 977 ms
INFO 99% 1104 ms
INFO 100% 1844 ms (longest request)
And for sanity’s sake, here’s the AWS dashboard showing > 4s invocations yesterday:
For what it’s worth, I did some testing in the other US AZs and got stellar results, e.g. 95% below 180ms in us-west-2. Seems to be an issue solely with us-east-1, which is obviously the default region for a lot of services… maybe it’s oversaturated or they’re experiencing networking issues internally.
There are a lot of places where latency is added. Start by eliminating the network between you and the API Gateway endpoint by using ping. If that’s ok then look at your CloudWatch logs and/or dashboards for slow Lambda. Also consider enabling x-ray.
If you’re Lambda is accessing external resources like DynamoDB then make sure you have sufficient read capacity provisioned. Is it possible your first few requests are fast because they’re using reserve capacity but later requests can’t? You mentioned doing 1,000 requests, I only performed 100.
Are you prewarming the Lambda or does your test include a cold start?
Are you using concurrency > 1? If so, is it possible additional cold starts are causing the problem or that this is showing a lack of provisioning in other resources (DynamoDB).