Unique item from large data set of usernames

jordmax12 · August 20, 2022, 1:15am

So I’m just curious to see what people can think of in terms of a solution for the following problem:

say we have a list of 1M+ random usernames that we’ve generated. We are grabbing these from an API at 12K per day. We have serverless architecture setup and want to assign a user a randomly generated username, but the API we use limits us to 12K per day. So the idea is to have a cron lambda run to collect 12K per day until we launch which should give us a lot of usernames to work with, and everyday we get 12k more.

So the question now becomes, where can i store these usernames where I can grab them and ensure theyre only grabbed once since i may have x amount of asynchronous workers that are requesting this all at the same time for x amount of items.

The obvious solution to me is to use SQS, and since ill have a batch of records needed to assign a username, I can poll the SQS queue and request x amount of messages (the batch size from a ddb stream), and this way I can ensure messages wont be sent to multiple consumers and I can assign a random username.

My problem is this seems overengineered. And I’m trying to think of a simpler solution, and am all ears to see other peoples thoughts/opinions.

buggy · August 20, 2022, 7:34am

Obvious question - Why do you need to pre-generate random usernames versus generating them when you need them?

I would keep them in DynamoDB. Look for optimistic locking examples. The same principle can be used to make sure that no other person has been allocated that username.

jordmax12 · August 20, 2022, 1:27pm

So we have a API that we are building now, that we are expecting huge amounts of traffic on in the future. We want to basically assign a random generated username to each user, however we want it to be a certain category. Say superhero for now.

We found an API that generates random usernames based on a category. However this API only allows 12.5K calls per day. So the idea we had was to use the time we have before we go live, and everyday make the 12.5K API calls and just store them on our side.

The issue though is how do I read from one table, to update another and ensure that I am only ever assigning a username to one person, when we have a serverless asynchronous workloads and can have x amount of lambdas requesting a username.

buggy · August 21, 2022, 2:08pm

You’re looking for conditional writes. Put the usernames in your DynamoDB table then update the item setting a new attribute like UserID with the ID of the user you’re assigning the username to and the attrtribute_not_exists() condition expression to prevent it from succeeding if the UserID attribute has already been written (i.e. someone got to it between you reading and claiming the username).

jordmax12 · August 21, 2022, 5:22pm

This would work if I had a way to pick a random item from dynamo.

Which I actually had an idea for, basically assigning incremental ids as the hash, and then I can pick a random one by just creating a random number. The only issue with this that I saw was the performance over time will degrade and you’d have to keep querying (or scanning if you’re using a condition expression to determine if it’s assigned, but this only happens AFTER the query is executed meaning you may not get any results and would have to keep querying to find one).

I was also thinking of a stream that would delete them from the table once we made the assignment to the users table, but then we would have no way to pick a random item from ddb since if we delete them that Id exist and you’d have to account for that when generating a random number to lookup.

Appreciate the insight, I originally wasn’t thinking of assigning the user on the username table not the user table (we can setup event driven stuff to maybe also put the user name in the user table if we want). But just unsure how I can find an item in ddb from one query that doesn’t have a flag without it being done after the query is already executed. (Post filter not pre filter).

Topic		Replies	Views
Cognito Pre-Token Generation Lambda sample Serverless Framework aws , lambda	0	1266	August 9, 2018
Dynamodb, Graphql (server), Authorization and Permissions - how to get it right? Serverless Architectures aws	8	5060	May 24, 2017
Mass curling URLs and saving content to S3 Serverless Architectures	1	571	January 12, 2019
API Gateway / Dynamo High Cost Serverless Framework	2	952	July 4, 2017
Create SaaS Subscriptions Usage Plans w/o API Key Serverless Architectures	1	619	August 22, 2020

Unique item from large data set of usernames

Related topics