Here’s a real task from my job. I have a 100GB binary file. Produced daily. I can’t grep it. But I can decode it. However, I can’t store the decoded version either. It’s too big. How do I efficiently query it? Decoding piped to grep takes 2 minutes. I want 2 seconds.
Structured data. Think of binary encoded records. But each record could be different (~30 kinds of records) 2. Should be possible, as long as you can calculate the offset properly. 3. Patterns could be different but usually it’s something like “give me all records that
I do have a big machine. The problem is that this is just one service and we have 540 such services. It’s a microservice communism. It’s not my machine, it’s “our” machine.
The question is how do you build this index
Anyone already built this?
Several terabytes after decoding to JSON
Replied nearby
Apply to Bloomberg
It’s actually incrementally updated throughout the day
Now, I just need to decode my query in binary…
Your mom sent me her photo
Binary format. The data is important.
My job is constantly finding answers to riddles
1. Binary
