Is it too late to teach an old dog a new trick?
I’ve always been the guy with a monolithic application/codebase.
For more than 10 years now, I’ve gotten good at caching, writing good SQL queries, using Redis to store frequently called read data, using background jobs to perform updates, edits, long-running tasks, e.t.c
Then again, I haven’t had to deal with more than 100 million rows in a db before.
Not until now, I’ve used up all my tricks and alas no significant improvement.
Now I’m coming for advice from you smart devops people.
For my current app, just recently I had a rails app, my DB, and some script in the /lib, Redis, Sidekiq, all on one, you guessed it, server.
I read up on Docker and now I have Sidekiq, Redis, my app, and my db all on different containers.
Now, I’d like to split some functions on my app itself. This is where I need your help.
Script A: I have this ruby script(a ruby class) that crawls the web for backlink data. Script A only gathers links and removes links that have been parsed already.
Script B: Another script that gets the links script A collected and retrieves the information I want from that link.
Let’s stop at that for now, so it doesn’t get confusing.
I’d like to run these scripts on their own servers, and they would connect to my app via api’s.
Both scripts are less than 1000 lines of code and are written in Ruby.
Both scripts require Selenium, Watir, and headless chrome to run.
The server must be able to autoscale as needed.
If the server is rate-limited or stuck in a link blackhole, I should be able to spin up a new server(host), with a different wan IP, with all the content of the script in as little time as possible.
The server should be affordable, as it’s these are long running tasks. The script does not require a database as it only sends back important data back to my app via API.
In short, both scripts should be on their own server (host) and it should be easy to spin up a new one in no time.
Has anyone worked on a similar use case before? How do you suggest I go about it? Can a serverless platform serve this use case?
Many thanks in advance.