Hi all
Looking for some suggestions.
I have a .json.gz file inbound, comprised out of an array, 1M records/documents.
I need to decompose the array into it’s individual documents, was looking at awswranger, but that seem to.
#1 unpack into a panda array, as such i loose my json structure and
#2 i don’t see how awswrangler can work with the .gz file inline/natively.
Suggestions please.
G
so solution ended as a lambda based function on python.
inline unzip (gz), readline and post to confluent kafka cluster.
I achieved 7000tps, (achieved 5500 on my M1 Mac
)
busy doing a golang version, initial numbers look like it will be north of 10 000tps
- sitting with error though when trying to build the golang binary.
see another thread here.
G
I have a 1 million json collection. It’s 5gb. It’s growing, and it’s only 20 days in, so it’ll be much bigger by summer next year.
I need to run analytics on some key/value pairs to establish time heatmaps of feature usage and usage rate effects of app updates. snaptube vidmate
hi hi.
might be able to help,
so i got python code that will read the compressed json.gz file.
My code then json’ify the structure/record and then post it onto Kafka (option 1), post it into Mongodb (option 2). from Kafka you can sink it into Apache flink and then real time analyse, aggregate, or dissimilar in mongo.
my blogs about kafka and mongo you can find at: https://medium.com/@georgelza/list/the-rabbit-hole-0df8e3155e33
ping me for the python code that consume the json.gz file, it’s 2 code stacks, on to kafka and other to mongo…
G
ps: I was running the code locally on laptop, reading/posting at 9000+/second. so Million records is small/easily do-able. I loaded just over 400M records into a local datastore.
G