Dmitrii Kovanikov's Twitter Thread

Here’s a real task from my job. I have a 100GB binary file. Produced daily. I can’t grep it. But I can decode it. However, I can’t store the decoded version either. It’s too big. How do I efficiently query it? Decoding piped to grep takes 2 minutes. I want 2 seconds.

Structured data. Think of binary encoded records. But each record could be different (~30 kinds of records) 2. Should be possible, as long as you can calculate the offset properly. 3. Patterns could be different but usually it’s something like “give me all records that

I do have a big machine. The problem is that this is just one service and we have 540 such services. It’s a microservice communism. It’s not my machine, it’s “our” machine.

The question is how do you build this index

Anyone already built this?

Several terabytes after decoding to JSON

Replied nearby

Apply to Bloomberg

It’s actually incrementally updated throughout the day

Now, I just need to decode my query in binary…

Your mom sent me her photo

Binary format. The data is important.

My job is constantly finding answers to riddles

1. Binary

Share this thread

Read on Twitter

Navigate thread