Very long delay when doing sls remove of Lambda in a VPC


#1

I’ve been using Serverless for about a year now but I’m building my first app that runs inside a VPC so it can access an RDS database. When I run sls remove Cloudformation hangs at deleting the Lambda function for a long time (40 minutes) with a message saying:

CloudFormation is waiting for NetworkInterfaces associated with the Lambda Function to be cleaned up..

Is this normal? My non-VPC Serverless projects are removed in under a minute. I’ve tested this several times with the same result. I’m using sls version 1.19.0.


#2

Yes, this is the “new normal”.

VPC-based Lambda functions used to be removed immediately, but the ENI that had been (automatically) allocated so that it could communicate inside the VPC would be orphaned unless you had waiting a certain (unspecified) amount of time after the function had had traffic. This resulted in lots of orphaned ENIs, and it was easy to quickly reach the default soft-limit for ENIs in an account.

The recent change in behaviour (which I didn’t see mentioned anywhere officially, but had heard of others experiencing) means that your stack clean-up will take as long as it takes to clean up the associated resources (in this case the ENI).

I would hope that this gets faster in the future, but I don’t think AWS will commit to any timeline or durations (it’s just not their style).

Just to be clear, this has nothing to do with Serverless, and everything to do with VPC-based Lambda functions.


#3

That’s a bummer. I was pretty sure this was an AWS problem since the delay was happening during Cloudformation stack delete. Thanks for confirming. I guess we will just have to work around this.


#4

I found a StackOverflow answer https://stackoverflow.com/questions/35990747/lambda-creating-eni-everytime-it-is-invoked-hitting-limit and a AWS Deleloper Forum answer https://forums.aws.amazon.com/message.jspa?messageID=734756 that seem to be relevant.
They both say it’s due to the Lambda execution policy lacking ec2:DeleteNetworkInterface permission.

But my lambdas have the following permissions and the issue still randomly happens:

- ec2:CreateNetworkInterface
- ec2:DescribeNetworkInterfaces
- ec2:DeleteNetworkInterface

I mitigated the problem moving out of the VPC all Lambdas not requiring VPC access.


#5

I recently had an e-mail exchange with Chris Munns who is the Senior Developer Advocate for the AWS Lambda team and he confirmed there definitely is a 40 minute delay when cleaning up ENI’s on Lambda functions in VPC’s. He also told me “the 40 minute time issue is being worked on”. So for now we have to live with it but it appears they are working on getting rid of that delay. He indicated they have helped customers with workarounds so if you have support that might be an option.

Unfortunately our project requires using a VPC because the Lambda functions need access to an RDS database.


#6

Ironically, Terraform has fixed this issue in their implementation https://github.com/hashicorp/terraform/issues/5767 While CloudFormation, in its dumbness, has no way of doing it :man_facepalming:.

I wonder if splitting the stack as suggested by https://forums.aws.amazon.com/message.jspa?messageID=734756#jive-message-734756 would help.
It doesn’t make any easier to delete the whole stack, but should allow redeploying the stack containing lambdas, adding and removing functions at any time


#7

Yeah I read up on how Terraform handles it. I believe what they do is tear down the whole vpc subnet and recreate it which has the side effect of deleting the Lambda ENI’s.


#8

From the PR associted to the issue in terraform, they are not deleting the subnet but only the ENI attached to a lambda function having a name that match a predefined rule :slight_smile:

Have a look at : https://github.com/hashicorp/terraform/pull/8033/files

I think serverless can do the exact same thing : that will save a lot of time to all devs that are required to use these fuck**g VPC inside a lambda to access RDS


#9

@cblin

Also very interested in this. We opened a ticket here https://github.com/serverless/serverless/issues/5008