Moving python script to AWS Lambda - how to shift from asynchronous to serverless

aws
lambda

#1

Hi Everyone, my first post in the forum.

I have a python script I’ve been running at home. I queries ~200 web pages to check for updates in any of my various work groups. The system only provides hourly emails - I want notification on 10 minute intervals - so I wrote a polling script.

It uses aiohttp and asyncio pretty heavily to do the intial session login and set up credentials and then farms out and harvests all the 200 requests into a list of groups with activity - if any.

When I ported this to Lambda - I had to cut out all the asyncio and use straight synchronous requests. The Lambda function timed out after 6s… So I’m not doing this right.

Should I be thinking about this by:

  1. having the initial trigger check for a valid login session - otherwise creates one
  2. creates a loop for all active teams and triggers a different Lambda function to check each group
  3. some kind of harvesting function that is triggered once all the groups are finished

Is this the serverless way to architect this?

How would 3) get triggered?

Thanks for helping out! Cheers, jas…


#2

Questions:
1.) How long is the overall execution time when you run your script at home?
2.) What ist the error message when it fails on aws lambda?
3.) Do you use any non-standard python libraries, for example requests?
4.) Which runtime did you choose?
5.) How did you trigger your lambda function? schedule? api gateway?

Some notes:
1.) The asyncio stuff should run on aws lambda - because its python. But of course you have to use runtime: python3.6
2.) The timeout property for the function should be set appropriately - otherwise timeout can occur before the function finishes. Timeout can be set up to 5 minutes for a lambda function.
3.) If you use non-standard python libraries, you maybe forgot to deploy them?

Just my two cents…


#3

Hey Franky! Thanks for grabbing ahold of this. Here’s some answers:

  1. around 60s - most of that is waiting for 700 or so http: requests to come back
  2. the error is execution time is too long - 6s?? - I think
  3. Yes, here are the imports for the synchronous version:

import json
import requests
import boto3
import itertools
from pprint import pprint
from datetime import datetime, timedelta
from bs4 import BeautifulSoup

  1. python3.6
  2. right now I just hit the test button

Thanks for these. Here’s my replies:

  1. my serverless.yml has:

provider:
name: aws
runtime: python3.6

so I think the runtime is fine. The real problem was I had no idea how to write the loop that handle all the asyncio stuff. The standalone script has the following:

loop = asyncio.get_event_loop()
future = asyncio.ensure_future(main())
loop.run_until_complete(future)

I have no idea how to encode that loop in a lambda handler. I’m very new to Lambda

  1. I have not set the timeout property - so I’m guessing the default is 6s - since that’s the error I get.
  2. I’m using sls deploy at the moment and together with docker it’s uploading a 6Mb .zip file each time (why is it so big???) - which makes it really hard for me to test using my unstable rural internet connection.

Thanks for your time, jas…


#4

Add the property timeout to your function in serverless.yml:

timeout: 120 # Sets the timeout to 120 seconds, maximum possible value is 300 = 5 minutes

Then the maximum execution time of your lambda is 120 seconds and it should work…
See example in docs:


#5

Thanks - yes, that does let the synchronous version run to completion, thank you.

I’m curious about getting the asynchronous version running. I will attempt to add in the aiohttp and the event loop grabbing code even though I’m not sure how it will all work with Lambda


#6

I think there should be no problem running the asyncio stuff - if you select the python3.6 runtime - because its native python!?
Just make sure all your external (non-standard) libraries are asyncio-compatible.
Afaik the requests library is not compatible with asyncio.

However if you come close to the 300sec limit with the total execution time of your lambda, you have to rethink. A possible architecture may be:

  • A lambda function that initially feeds all the jobs in an Amazon SQS queue or Amazon SNS topic.
  • A worker lambda function then grabs one job from the queue/topic and runs it.

This can be scaled up if you allow parallel execution of the worker lambda function.

Just my two cents…


#7

Hi Franky,

Thanks for helping.

How does the event loop programming need to be done? In my old async code I had these lines at the end:

loop = asyncio.get_event_loop()
future = asyncio.ensure_future(main())
loop.run_until_complete(future)

but now my AWS handler needs event and context to be passed. I’m very new to python and even newer to AWS. I’ve been googling and reading for days but I’m still pretty clueless.


#8

I tried using the following:

async def run_task_checks():
#my normal code

async def hello(event, context):
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run_task_checks())
loop.run_until_complete(future)

But I get the error: “RuntimeWarning: coroutine ‘hello’ was never awaited” - not sure how to proceed. Any help greatly appreciated.


#9

OK, I think I got it. It should be:

def hello(event, context):

and not

async def hello(event,context):

Seems to be working now! Thanks for holding my hand…