Very long delay when doing sls remove of Lambda in a VPC

I’ve been using Serverless for about a year now but I’m building my first app that runs inside a VPC so it can access an RDS database. When I run sls remove Cloudformation hangs at deleting the Lambda function for a long time (40 minutes) with a message saying:

CloudFormation is waiting for NetworkInterfaces associated with the Lambda Function to be cleaned up..

Is this normal? My non-VPC Serverless projects are removed in under a minute. I’ve tested this several times with the same result. I’m using sls version 1.19.0.

47

2 Likes

Yes, this is the “new normal”.

VPC-based Lambda functions used to be removed immediately, but the ENI that had been (automatically) allocated so that it could communicate inside the VPC would be orphaned unless you had waiting a certain (unspecified) amount of time after the function had had traffic. This resulted in lots of orphaned ENIs, and it was easy to quickly reach the default soft-limit for ENIs in an account.

The recent change in behaviour (which I didn’t see mentioned anywhere officially, but had heard of others experiencing) means that your stack clean-up will take as long as it takes to clean up the associated resources (in this case the ENI).

I would hope that this gets faster in the future, but I don’t think AWS will commit to any timeline or durations (it’s just not their style).

Just to be clear, this has nothing to do with Serverless, and everything to do with VPC-based Lambda functions.

1 Like

That’s a bummer. I was pretty sure this was an AWS problem since the delay was happening during Cloudformation stack delete. Thanks for confirming. I guess we will just have to work around this.

I found a StackOverflow answer https://stackoverflow.com/questions/35990747/lambda-creating-eni-everytime-it-is-invoked-hitting-limit and a AWS Deleloper Forum answer https://forums.aws.amazon.com/message.jspa?messageID=734756 that seem to be relevant.
They both say it’s due to the Lambda execution policy lacking ec2:DeleteNetworkInterface permission.

But my lambdas have the following permissions and the issue still randomly happens:

- ec2:CreateNetworkInterface
- ec2:DescribeNetworkInterfaces
- ec2:DeleteNetworkInterface

I mitigated the problem moving out of the VPC all Lambdas not requiring VPC access.

I recently had an e-mail exchange with Chris Munns who is the Senior Developer Advocate for the AWS Lambda team and he confirmed there definitely is a 40 minute delay when cleaning up ENI’s on Lambda functions in VPC’s. He also told me “the 40 minute time issue is being worked on”. So for now we have to live with it but it appears they are working on getting rid of that delay. He indicated they have helped customers with workarounds so if you have support that might be an option.

Unfortunately our project requires using a VPC because the Lambda functions need access to an RDS database.

Ironically, Terraform has fixed this issue in their implementation https://github.com/hashicorp/terraform/issues/5767 While CloudFormation, in its dumbness, has no way of doing it :man_facepalming:.

I wonder if splitting the stack as suggested by https://forums.aws.amazon.com/message.jspa?messageID=734756#jive-message-734756 would help.
It doesn’t make any easier to delete the whole stack, but should allow redeploying the stack containing lambdas, adding and removing functions at any time

Yeah I read up on how Terraform handles it. I believe what they do is tear down the whole vpc subnet and recreate it which has the side effect of deleting the Lambda ENI’s.

From the PR associted to the issue in terraform, they are not deleting the subnet but only the ENI attached to a lambda function having a name that match a predefined rule :slight_smile:

Have a look at : https://github.com/hashicorp/terraform/pull/8033/files

I think serverless can do the exact same thing : that will save a lot of time to all devs that are required to use these fuck**g VPC inside a lambda to access RDS

1 Like

@cblin

Also very interested in this. We opened a ticket here https://github.com/serverless/serverless/issues/5008

This is also very painful for our team. We’re using the gitlab ci to deploy our serverless functions and if we don’t coordinate our deploys to our integration environment we sometimes run into this problem. The CI job will either run for hours or time out. As a workaround I was hoping we could somehow disable removal of api methods and force them to be manually cleaned later, or have serverless fail to deploy without a force flag if the removal of an api method would occur.

1 Like

The same for me, I have

iamRoleStatements:
    - Effect: "Allow"
      Action:
        - "ec2:CreateNetworkInterface"
        - "ec2:DescribeNetworkInterfaces"
        - "ec2:DeleteNetworkInterface"
      Resource: "*"

But it’s been deleted ~40 min every time

As a follow up to this issue, it seems that the recent overhaul of VPC’s in relation to Lambda should fix the issue. They have started rolling out the changes and a handful of regions should see the improvement already but they apparently plan to have the full rollout completed some time in December across all regions.

This should be out in all regions now, but I’m still seeing this issue.

Seconded. I’m still seeing it. Even with really small stacks, if there’s a Lambda in a VPC, it takes FOREVER to delete.

1 Like

Still seeing this issue on 2020-06-24.

affirmative … 2020-08-25

still seeing the issue on 14 Dec, 2020

Still seeing the issue 08.06.2021.

And now, this issue, is why we are dropping serverless altogether in 11/2021.

Way to go dumbs!

Ignore more things.

Nobody has 40 mins to wait for no good reason.

We never see this any more with updated Serverless.

And remove is rare enough it wasn’t too big a deal anyway