I had a similar experience, but testing by running `strings` on the Steam Deck r...

jandrese · on March 11, 2025

I have tried this on a couple of different machines. On one machine it gives ridiculous answers like you found. On the other it at least works as expected, although it's kinda useless since it doesn't print the matched lines.

On the working machine it reported using SSE4.2 acceleration while the broken one used AVX2 acceleration. However, the machine using SSE4.2 didn't see nearly as much speedup as the AVX2 machine. Regular system grep on the SSE4.2 machine took 0.186 seconds to do the search, while krep needed 0.154 seconds. However the biggest test file I had handy was only 123MB, so maybe the lead will grow more with a larger file?

burntsushi · on March 11, 2025

That's probably because pcmpestri is trash for substring search. There is a good reason why ripgrep doesn't use it. :-)

I looked for an authoritative search for why pcmpestri is trash, and I couldn't find anything I was happy linking to other than Agner Fog's instruction tables: https://www.agner.org/optimize/instruction_tables.pdf You can see that the throughput and latency for pcmpestri is just awful.

And yes, not having any code to print the matching lines means that the only code path in krep is just counting things. If that's all your tool is doing, you can totally beat ripgrep or any other tool that is more applicable to generalized use cases. It's why the `memchr` crate (what ripgrep uses for single substring search) has a specialized routine for counting occurrences of bytes (which ripgrep uses for line counting): https://github.com/BurntSushi/memchr/blob/746182171d2e886006...

Because it's faster to do that than it is to reuse the generalized `memchr` API for finding the location of matching bytes.

And counting matches in a multi-threaded context is way easier than actual managing the printing of matches in the same order that you get them.

krep isn't big. You can skim its source code in a few minutes and get a good idea of how it works.