Python Lambda logging duplication workaround

aws
#1

We’ve been building some complex Python code on the Serverless framework w/AWS Lambda. Recently we spent a month resolving a Lambda malfunction with Amazon support. There is a very unexpected behavior. Because the “logger” object is global to the interpreter in Python, log handlers can hang around and multiply for multiple executions of a Lambda because Amazon reuses the same environment and leaves all global variables in place – even if you try to NULL out the object you’ll find that it stays. We stored some log metadata in our handler and found that it persisted from one execution to the next, so that we had data from triggers (message UUIDs) hanging around, and also a single log message would show up multiple times.

Here is what solved it:

import logging
from logstash_formatter import LogstashFormatterV1
logger = logging.getLogger(function_name)
logger.handlers = [] # <== SOLUTION HERE
handler = logging.StreamHandler(sys.stdout)
log_fmt = "" # omitted for simplicity
formatter = LogstashFormatterV1(log_fmt)
handler.setFormatter(formatter)
logger.addHandler(handler)

This is going to be common for anyone doing log shipping to ElasticSearch with CloudWatch Logs, etc.
I’d like to see this make it into documentation so others don’t suffer as we did. I’m not sure where to direct a pull request or what would be appropriate.

3 Likes

#2

This AWS Lambda function scheme was driving me crazy… many thanks for those tips!

I did not know what to do to overcome this misfunctioning!

0 Likes

#3

Thanks for sharing, @sventech

Is it same to the problem I discovered recently?

2 weeks ago I enabled Custom Access Logging in api gateway’s stage (for example, dev), I found the logs are collected across in a lot of log steams (UUID) in same log group. They are not always the latest logs saved in latest log stream, in each log stream, the logs can be 2 weeks old to present.

0 Likes

#4

@sventech the solution you suggested doesn’t work for us. Can you confirm that the solving line
logger.handlers = [] # <== SOLUTION HERE

Should actually be:
logging.getLogger().handlers = []

Which effectively resets the handlers of the root logger.

Thx

0 Likes

#5

Yes, I was doing the standard format

logger = logging.getLogger()

logger.handlers = []

1 Like

#6

Excellent.

Just for future knowledge or anyone else coming across this issue, here is the cause:

Lambda sets up its own form of root logger with a handler attached.
Python’s logging module creates loggers in an hierarchical manner such that each logger is a descendant of the root logger.
With this given, every logger created in the Lambda code will be a descendant of that pre-set root logger.
Python loggers also behave such that a logger would first handle the log message with its attached log handlers, and then propagate the message up the logging hierarchy to its parent. In turn each ancestor up the chain would also issue the message to its handlers and then pass the call up to its parent and so on.

The solution for this can be either deleting the root logger’s handlers altogether
OR
setting the logger’s propagate attribute to False, such that it won’t propagate the log message to its parent and prevent the ancestor’s handlers from being called.

That is:

import logging
import sys

logger = logging.getLogger("my_module")
handler = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter("(%(name)s) %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.propagate = False

def lambda_handler(event, context):
    logger.warn("Hello from Lambda")
    
    return "Hello from Lambda"
1 Like

#7

Hi Bill, that is interesting but I think unrelated – would be more about Amazon infrastructure log handling vs. Python interpreter in Lambda.

1 Like