Dont give up @mpradilla!
This is completely achievable on lambda. It is not necessary to manually zip your package first. Let me describe the way I use lambda to run python code with external python modules that leverage precompiled libraries:
External Modules
Firstly, when serverless zips your lambda, it follows symlinks. As such, one can symlink the virtual environment’s site-packages
into the function directory. That way it doesn’t clutter the lib/
directory, which can be used solely for your precompiled libraries. In your case numpy’s multiarray.so
.
File structure
▾ lib/
some_lib.so
event.json
handler.py
serverless.yml
site-packages -> /Users/jimbo/miniconda2/envs/testenv/lib/python2.7/site-packages
For local testing, you run the code OUTSIDE your virtual environment to ensure the manual imports (below) are working:
import sys
import os
here = os.path.dirname(os.path.realpath(__file__))
# Import installed packages (in site-packages)
site_pkgs = os.path.join(here, "site-packages")
sys.path.append(site_pkgs)
# Now Import your environment packages
import numpy as np
...
Compiled code
A simple case
As you have found, the lambda execution environment is a linux 64 bit architecture. You are not developing on this so your multiarray.so
is different, hence your ‘invalid ELF header’ error.
You problem is solved by developing on a linux-64 machine. Thus when you install scikit-learn
and its dependency numpy
, you will get the shared object files that can be used by lambda.
It is worth noting for other readers that this error may report in cloudwatch logs as simply an ImportError
ImportError: No module named numpy
.
So far you know all this, however at some point you may run into the followin case.
Non site-package Libraries
Sometimes you may require libraries from other directories that are on your local LD_LIBRARY_PATH
but don’t get zipped up with the lambda function.
Example
I have lambda code that requires the intel math kernel libraries. Here are my problems:
- Libraries are not in
site-packages
LD_LIBRARY_PATH
is read when the python interpreter starts so changing it with os.environ
in code is too late
- Some library files are large. AWS lambda zip files have limits and run faster (and cheaper) when they are small.
The solution
I need to include only the libraries I need, not entire lib directories. The directory where I put the libraries must already be on the LD_LIBRARY_PATH
when my lambda starts. This restricts the directory to the lib/
beside my handler.
I run the function in the lambda execution environment and cherry pick the libraries as they error out and add them to lib/
thus ensuring I have the bare minimum libraries I need.
Now that this is in the service repository I can continue to develop on my mac because the linux-64 binaries are deployed.