Using Serverless and Docker to import Lambda Dependencies

Hi there.

I’m pretty new to this, so bare with me. I’m looking to load some dependencies and packages into a Lambda build, namely pandas, numpy, and sklearn. The goal is to use Lambda to do further automated feature engineering/preparations after I query data from Athena that SQL isn’t capable of. And yes, I realize the limitations of Lambda, but I don’t anticipate any issues with runtime or usage in this application.

So, I was going through a Stephane Maarek’s class on Lambda that made heavy use of Serverless, and using Docker to call in those dependencies while making the stack. The class didn’t go into great detail into the nuts and bolts of what was happening or generalize too much on how it could be changed for a different project application. His example was the thumbnail resize example and in that case he used I think Serverless with Docker and Pillow as the installed Dependency.

I’d like to do something extremely similar in concept; have a file taken from S3 that triggers a Lambda Function to do a transformation on that and send it to a different folder or bucket. Instead of using Pillow I’d like install Numpy, Pandas, and Sklearn.

  • In terms of the packages/dependencies is there anything else I need to change to the files, or is it really just as simple as changing out the packages that I want?
  • Does Docker need to be running on my personal machine in order for the function to work/use those dependencies, or once the stack is created it’s good to go Independence of my machine?

Again, sorry if this is a pretty basic question, but just trying to open the black box of what is the easiest, most effecient way to set up this function.

Sorry, here is an additional picture. Was just curious if I would just swap out Pillow for the dependencies I’d want to use for my Lambda function

You don’t need any of this. You need to write a proper Dockerfile containing the stuff to run your app. And you need Docker in your machine.

Have you seen Container Image Support for AWS Lambda ?

So, no, with Docker you don’t need serverless-python-requirements plugin (though it’s fantastic in other scenarios).

So, your Dockerfile should be something like:

FROM public.ecr.aws/lambda/python:3.8 # you have to use an AWS lambda docker image

# WORKDIR /app # DO NOT USE IT WITH LAMBDA!!!

# set environment variables, to avoid pyc files and flushing buffer
ENV PYTHONDONTWRITEBYTECODE 1

ARG DEBIAN_FRONTEND=noninteractive

COPY_you_app_folder_ ./_you_app_folder_
COPY requirements.txt . # or pyproject.toml if using poetry, but change the RUN below accordingly
COPY .env . # contains your AWS credentials, case you need to access AWS services like S3

RUN yum update -y \
    && yum install -y --setopt install_weak_deps=false python3-pip \
    && pip3 --no-cache-dir install --upgrade pip \
    && pip3 install -r requirements.txt \
    && python3 -m build
    && pip3 install dist/*whl \
    && yum upgrade \
    && yum -y clean all \
    && rm -rf /var/cache/yum/*

COPY _your_handler_.py .

CMD ["_your_handler_._call_function_"]
# where `_call_function_()` is the function interfacing with Lambda and calling your main app.
# your handler.s3_thumbnail_generator I guess.

Of course, you could use the traditional way as taught, include your deps in requirements.txt etc.
Then if your artifact, uncompressed, got under the 250 Mb, that should be fine. But those python modules you want to use take a lot of space and you may need to read carefully Serverless Framework: Plugins.