Aws-python function dependencies load

Hi!

I am creating an aws-python function with some external dependencies, as bson, pymongo, scikit-learn. The dependencies are managed with virtualenv.

When executing serverless deploy these are added to the build zip with the usual file structure
venv/lib/python2.7/site-packages/..installed packages../

I saw that in this other question Brett explained that he added them at the build root with a script. But even with this zip structure it still won’t work. I am getting this error:

Unable to import module 'handler': No module named pymongo

I think is related with the way I am loading the dependencies, I assume the normal way won’t work for this case.I tried a walk through the libs and loading the libs with native module ctypes

import ctypes for d, dirs, files in os.walk('lib'): for f in files: if f.endswith('.a'): continue ctypes.cdll.LoadLibrary(os.path.join(d, f))

also I tried to solve the dependencies path with:
import os import sys here = os.path.dirname(os.path.realpath(__file__)) sys.path.append(os.path.join(here, "venv"))

None of them worked… Any suggestions or ideas?

thanks!

1 Like

Hi @mpradilla, Try puting site-packages on your path.

import sys
import os
here = os.path.dirname(os.path.realpath(__file__))

# Import installed packages (in site-packages)
site_pkgs = os.path.join(here, "..", "venv", "lib", "python2.7", "site-packages")
sys.path.append(site_pkgs)

If this doesn’t resolve it let me know.

1 Like

Thanks for your response @el0ck!

I did what you suggested with some minor differences (moving installed packages to /lib and ignoring venv), and it worked. This way the dependencies can be loaded. Now I am facing a more complicated issue related with the compilation of the external dependencies This issue.

The solution for this kind of troubles made me doubt completely from the approach to solve my problem. Probably I don´t need a lambda function, but a compute engine instance to perform my job. This way I am also free of the troubles of having a different development and deploy environment with lambda.

Thank you again for your time! This is something definitely needs to be included in the Hello world of serverless aws-python template.

Dont give up @mpradilla!

This is completely achievable on lambda. It is not necessary to manually zip your package first. Let me describe the way I use lambda to run python code with external python modules that leverage precompiled libraries:


External Modules

Firstly, when serverless zips your lambda, it follows symlinks. As such, one can symlink the virtual environment’s site-packages into the function directory. That way it doesn’t clutter the lib/ directory, which can be used solely for your precompiled libraries. In your case numpy’s multiarray.so.

File structure

▾ lib/
    some_lib.so
  event.json
  handler.py
  serverless.yml
  site-packages -> /Users/jimbo/miniconda2/envs/testenv/lib/python2.7/site-packages

For local testing, you run the code OUTSIDE your virtual environment to ensure the manual imports (below) are working:

import sys
import os
here = os.path.dirname(os.path.realpath(__file__))

# Import installed packages (in site-packages)
site_pkgs = os.path.join(here, "site-packages")
sys.path.append(site_pkgs)

# Now Import your environment packages
import numpy as np
...

Compiled code

A simple case

As you have found, the lambda execution environment is a linux 64 bit architecture. You are not developing on this so your multiarray.so is different, hence your ‘invalid ELF header’ error.

You problem is solved by developing on a linux-64 machine. Thus when you install scikit-learn and its dependency numpy, you will get the shared object files that can be used by lambda.

It is worth noting for other readers that this error may report in cloudwatch logs as simply an ImportError

ImportError: No module named numpy.

So far you know all this, however at some point you may run into the followin case.

Non site-package Libraries

Sometimes you may require libraries from other directories that are on your local LD_LIBRARY_PATH but don’t get zipped up with the lambda function.

Example

I have lambda code that requires the intel math kernel libraries. Here are my problems:

  1. Libraries are not in site-packages
  2. LD_LIBRARY_PATH is read when the python interpreter starts so changing it with os.environ in code is too late
  3. Some library files are large. AWS lambda zip files have limits and run faster (and cheaper) when they are small.

The solution

I need to include only the libraries I need, not entire lib directories. The directory where I put the libraries must already be on the LD_LIBRARY_PATH when my lambda starts. This restricts the directory to the lib/ beside my handler.

I run the function in the lambda execution environment and cherry pick the libraries as they error out and add them to lib/ thus ensuring I have the bare minimum libraries I need.

Now that this is in the service repository I can continue to develop on my mac because the linux-64 binaries are deployed.

2 Likes

How do you do the cherry picking of the libraries?

1 Like