Teaching an old dog new tricks: moving to serverless microserviceses

Is it too late to teach an old dog a new trick?

I’ve always been the guy with a monolithic application/codebase.

For more than 10 years now, I’ve gotten good at caching, writing good SQL queries, using Redis to store frequently called read data, using background jobs to perform updates, edits, long-running tasks, e.t.c

Then again, I haven’t had to deal with more than 100 million rows in a db before.

Not until now, I’ve used up all my tricks and alas no significant improvement.

Now I’m coming for advice from you smart devops people.

For my current app, just recently I had a rails app, my DB, and some script in the /lib, Redis, Sidekiq, all on one, you guessed it, server.

I read up on Docker and now I have Sidekiq, Redis, my app, and my db all on different containers.

Now, I’d like to split some functions on my app itself. This is where I need your help.

For example:

Script A: I have this ruby script(a ruby class) that crawls the web for backlink data. Script A only gathers links and removes links that have been parsed already.

Script B: Another script that gets the links script A collected and retrieves the information I want from that link.

Let’s stop at that for now, so it doesn’t get confusing.

I’d like to run these scripts on their own servers, and they would connect to my app via api’s.

Both scripts are less than 1000 lines of code and are written in Ruby.

Requirements

Both scripts require Selenium, Watir, and headless chrome to run.
The server must be able to autoscale as needed.
If the server is rate-limited or stuck in a link blackhole, I should be able to spin up a new server(host), with a different wan IP, with all the content of the script in as little time as possible.
The server should be affordable, as it’s these are long running tasks. The script does not require a database as it only sends back important data back to my app via API.
In short, both scripts should be on their own server (host) and it should be easy to spin up a new one in no time.
Has anyone worked on a similar use case before? How do you suggest I go about it? Can a serverless platform serve this use case?

Many thanks in advance.

1 Like

@tomiwaAdey It’s probably not too late to learn new tricks but you need to stop thinking in terms of servers and long running scripts.

For example: You have a Lambda triggered from a DynamoDB stream that reads the contents of a URL and places it into S3 (one invoke, one URL). Placing the content into S3 triggers a Lambda that processes the file (doing what ever script B does) but also adding new URLs to DynamoDB triggering the first Lambda to grab additional content. Using DynamoDB over Kinesis Streams or SNS will help keep track of which URLs you’ve previously processed and DynamoDB streams will auto scale unlike Kinesis Streams.