I have a python script I’ve been running at home. I queries ~200 web pages to check for updates in any of my various work groups. The system only provides hourly emails - I want notification on 10 minute intervals - so I wrote a polling script.
It uses aiohttp and asyncio pretty heavily to do the intial session login and set up credentials and then farms out and harvests all the 200 requests into a list of groups with activity - if any.
When I ported this to Lambda - I had to cut out all the asyncio and use straight synchronous requests. The Lambda function timed out after 6s… So I’m not doing this right.
Should I be thinking about this by:
having the initial trigger check for a valid login session - otherwise creates one
creates a loop for all active teams and triggers a different Lambda function to check each group
some kind of harvesting function that is triggered once all the groups are finished
Questions:
1.) How long is the overall execution time when you run your script at home?
2.) What ist the error message when it fails on aws lambda?
3.) Do you use any non-standard python libraries, for example requests?
4.) Which runtime did you choose?
5.) How did you trigger your lambda function? schedule? api gateway?
Some notes:
1.) The asyncio stuff should run on aws lambda - because its python. But of course you have to use runtime: python3.6
2.) The timeout property for the function should be set appropriately - otherwise timeout can occur before the function finishes. Timeout can be set up to 5 minutes for a lambda function.
3.) If you use non-standard python libraries, you maybe forgot to deploy them?
Hey Franky! Thanks for grabbing ahold of this. Here’s some answers:
around 60s - most of that is waiting for 700 or so http: requests to come back
the error is execution time is too long - 6s?? - I think
Yes, here are the imports for the synchronous version:
import json
import requests
import boto3
import itertools
from pprint import pprint
from datetime import datetime, timedelta
from bs4 import BeautifulSoup
python3.6
right now I just hit the test button
Thanks for these. Here’s my replies:
my serverless.yml has:
provider:
name: aws
runtime: python3.6
so I think the runtime is fine. The real problem was I had no idea how to write the loop that handle all the asyncio stuff. The standalone script has the following:
I have no idea how to encode that loop in a lambda handler. I’m very new to Lambda
I have not set the timeout property - so I’m guessing the default is 6s - since that’s the error I get.
I’m using sls deploy at the moment and together with docker it’s uploading a 6Mb .zip file each time (why is it so big???) - which makes it really hard for me to test using my unstable rural internet connection.
Thanks - yes, that does let the synchronous version run to completion, thank you.
I’m curious about getting the asynchronous version running. I will attempt to add in the aiohttp and the event loop grabbing code even though I’m not sure how it will all work with Lambda
I think there should be no problem running the asyncio stuff - if you select the python3.6 runtime - because its native python!?
Just make sure all your external (non-standard) libraries are asyncio-compatible.
Afaik the requests library is not compatible with asyncio.
However if you come close to the 300sec limit with the total execution time of your lambda, you have to rethink. A possible architecture may be:
A lambda function that initially feeds all the jobs in an Amazon SQS queue or Amazon SNS topic.
A worker lambda function then grabs one job from the queue/topic and runs it.
This can be scaled up if you allow parallel execution of the worker lambda function.
but now my AWS handler needs event and context to be passed. I’m very new to python and even newer to AWS. I’ve been googling and reading for days but I’m still pretty clueless.