Questions Leak Search Engine
- Thread starter
- Thread Author
- #1
There are so many search engines around like intelx, stealseek etc all ingesting over billions of records. I want to know how do they ingest it where do they store the data for such fast retrieval. Even when locally going through logs files with maybe 700 million records it takes a good while using tools like ripgrep. What do you guys use for searching through the log data locally?
Also if someone understands the tech that goes behind these leak search engines please tell I would love to learn. From what I have seen they must be using some kind of chunking with inverted indexes for parameters like domain, email, ip etc. IntelX seems to divide files into smaller chunks and store each one into a bucket based on the network traffic when you lookup a domain.
Using something like rust we can parse through large logs under a minute with tokenization but ingestion is painfully slow on even a single txt log from Alien.
Also if someone understands the tech that goes behind these leak search engines please tell I would love to learn. From what I have seen they must be using some kind of chunking with inverted indexes for parameters like domain, email, ip etc. IntelX seems to divide files into smaller chunks and store each one into a bucket based on the network traffic when you lookup a domain.
Using something like rust we can parse through large logs under a minute with tokenization but ingestion is painfully slow on even a single txt log from Alien.