Hey everyone !
I’m doing a small app using serverless. I’m simply writing some webscraping in an aws lambda in order to retrieve some data with an API Gateway
The problem is, my lambda returns a timeout (changed the timeout to 30s just to be sure) and I have no clue how to debug this in the context of serverless.
Here is my lambda :
import json
from lambda_decorators import cors_headers
import requests
@cors_headers
def handler(event, context):
print("Was called with params : " + str(event))
teamId = event["queryStringParameters"]['team_id']
print(teamId)
body = callNbaStatsApi(teamId)
return {
'statusCode': 200,
'body': body
}
def callNbaStatsApi(teamId):
print("Start callNbaStatsApi")
nbaStatsUrl = 'https://stats.nba.com/stats/teamdashboardbygeneralsplits?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlusMinus=N&Rank=N&Season={0}&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&Split=general&TeamID={1}&VsConference=&VsDivision='
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0',
'x-nba-stats-origin': 'stats',
'Referer': 'https://stats.nba.com/',
}
url = nbaStatsUrl.format('2019-20', teamId)
print(url)
r = requests.get(url, headers=headers)
print(r)
print("End callNbaStatsApi")
return r.json()
if __name__ == "__main__":
print(callNbaStatsApi('1610612749'))
I notice two things :
- When calling the python script on my own laptop the request works and print the nba data.
- When the request goes to http://www.google.com it does work inside of the aws lambda environment.
With all this, I suppose there is something that makes https://stats.nba.com unavailable but I don’t know what.
Also, since it can be a problem, I have no VPC configuration on my lambda
The real question here is simply : how would you work to debug such a behavior ?
Thank you for your time