Ingestion <id>.json.gz with 1m records

georgelza · June 28, 2023, 2:32pm

Hi all
Looking for some suggestions.
I have a .json.gz file inbound, comprised out of an array, 1M records/documents.
I need to decompose the array into it’s individual documents, was looking at awswranger, but that seem to.
#1 unpack into a panda array, as such i loose my json structure and
#2 i don’t see how awswrangler can work with the .gz file inline/natively.

Suggestions please.
G

georgelza · July 2, 2023, 6:20am

so solution ended as a lambda based function on python.
inline unzip (gz), readline and post to confluent kafka cluster.
I achieved 7000tps, (achieved 5500 on my M1 Mac )

busy doing a golang version, initial numbers look like it will be north of 10 000tps

sitting with error though when trying to build the golang binary.
see another thread here.
G

beenmeckel · October 13, 2025, 8:03pm

I have a 1 million json collection. It’s 5gb. It’s growing, and it’s only 20 days in, so it’ll be much bigger by summer next year.

I need to run analytics on some key/value pairs to establish time heatmaps of feature usage and usage rate effects of app updates. snaptube vidmate

georgelza · October 14, 2025, 5:59am

hi hi.
might be able to help,

so i got python code that will read the compressed json.gz file.

My code then json’ify the structure/record and then post it onto Kafka (option 1), post it into Mongodb (option 2). from Kafka you can sink it into Apache flink and then real time analyse, aggregate, or dissimilar in mongo.

my blogs about kafka and mongo you can find at: https://medium.com/@georgelza/list/the-rabbit-hole-0df8e3155e33

ping me for the python code that consume the json.gz file, it’s 2 code stacks, on to kafka and other to mongo…

G

georgelza · October 15, 2025, 9:58am

ps: I was running the code locally on laptop, reading/posting at 9000+/second. so Million records is small/easily do-able. I loaded just over 400M records into a local datastore.

G

Topic		Replies	Views
How to configure yml file or anything else to handle gzip on AWS Lambda Serverless Framework lambda , api-gateway	0	688	August 5, 2021
Using aws lambda to concatenate small files in S3 into multi GB file Serverless Framework	1	4929	March 25, 2019
Upload and uzip file on s3 bucket Serverless Framework lambda , api-gateway	2	5508	September 3, 2018
Processing large data sets stored on S3 - Architecture? Serverless Architectures	4	16838	February 9, 2017
Unzipped file size - breaking the zip down or deploying binaries Serverless Framework lambda	0	1537	February 17, 2019

Ingestion <id>.json.gz with 1m records

Related topics