hex: binary: binarycontent:

Stamimail · Post by **Stamimail** » Fri Jul 08, 2022 2:49 pm

viewtopic.php?f=12&t=11747

Can you expand the explanation on that?
Why using binarycontent: and not binary: and why do we need to add the hex:
Don't these functions perform a search in what we see in Hex editor?

Post by **void** » Sat Jul 09, 2022 8:29 am

binarycontent:

binarycontent: is a search function.

The file content is treated as binary.
Everything will not try to load the content with an iFilter.
Everything will not try to load the content as text/plain.

The search is treated as text.
Everything will try matching the binary content as UTF-8, ANSI, UTF-16, UTF-16 with a byte offset of 1, UTF-16BE and UTF-16BE with a byte offset of 1.

binary:

binary: is a search modifier.

The file content is treated as binary.
Everything will not try to load the content with an iFilter.
Everything will not try to load the content as text/plain.

The search is also treated as binary.
Both the content and the search are treated as binary byte streams.
No special search is performed other than a simple byte comparison test.

Using the regex: modifier will also treat the regex pattern as binary: \xff == 255

binary: and hex: are the same, except how they handle characters 0-9, a-f and A-F.
With binary:, 0-9, a-f and A-F are treated as ASCII character codes.
With hex:, two consecutive 0-9, a-f or A-F characters are converted to a byte value.

Using binary: might be preferable to hex: when you are trying to find the binary ASCII characters: "abc" in a file:
binary:content:abc
With hex: you would have to search for:
hex:content:616263

hex:binarycontent: is the same as hex:content:
These will both match the text you would see in a hex editor.

Post by **NotNull** » Sat Jul 09, 2022 12:03 pm

Thanks for the explanation!

There is one thing I don't understand though:

void wrote: ↑Sat Jul 09, 2022 8:29 am hex:binarycontent: is the same as hex:content:
These will both match the text you would see in a hex editor.

void wrote: ↑Sat Jul 09, 2022 8:29 am binary:

binary: is a search modifier.

The file content is treated as binary.
Everything will not try to load the content with an iFilter.

The search is also treated as binary.
Both the content and the search are treated as binary byte streams.
No special search is performed other than a simple byte comparison test.
[...]
binary: and hex: are the same, except how they handle characters 0-9, a-f and A-F.

This works as expected:

Code: Select all

test.pdf    startwith:hex:binarycontent:25504446

as each PDF file starts with %PDF (hex: 25504446)

But the hex: doesn't seem to modify the content: function to a bytestream. I needed this to make it work as expected:

Code: Select all

test.pdf   startwith:hex:fromdisk:content:25504446

(PDF files are not content-indexd by Everything on this system, although that should not matter. I think ...)

void wrote: ↑Sat Jul 09, 2022 8:29 am Everything will try matching the binary content as UTF-8, ANSI, UTF-16, UTF-16 with a byte offset of 1, UTF-16BE and UTF-16BE with an offset of 1.

Not sure, but this seems more likely to me:
Everything will try matching the binary content as UTF-8, ANSI, UTF-16, UTF-16 with a byte offset of 0, UTF-16BE and UTF-16LE with an offset of 2.

Differences: offset (FEFF/FFFE) and UTF16-LE instead of BE

Post by **void** » Sat Jul 09, 2022 12:24 pm

test.pdf startwith:hex:fromdisk:content:25504446

If you search for:
test.pdf startwith:hex:content:25504446

What are the search ops from the Everything debug console?

For example:

Code: Select all

FILE TERM START 0000000032a2a6f8 M 000000000018dc70 N 000000000018dd90
0000000032a2a6f8 20e01104 M 0000000032a2a838 N 000000000018dd90 OP 163 c:\PDFs\
0000000032a2a838 20e01100 M 0000000032a2a978 N 000000000018dd90 OP 205 pdf
0000000032a2a978 20e01140 M 000000000018dc70 N 000000000018dd90 OP 558 %PDF

Not sure, but this seems more likely to me:
Everything will try matching the binary content as UTF-8, ANSI, UTF-16, UTF-16 with a byte offset of 0, UTF-16BE and UTF-16LE with an offset of 2.

Differences: offset (FEFF/FFFE) and UTF16-LE instead of BE

Everything will try both UTF-16LE and UTF-16BE.

If there's no match, Everything will try both UTF-16LE and UTF-16BE again with a 1 byte offset.
This is because UTF-16 text in the content might not be aligned to two bytes.

Consider the following files (shown in hex):

Code: Select all

00680065006C006C006F // UTF16BE text: hello
680065006C006C006F00 // UTF16LE text: hello
FF00680065006C006C006F // UTF16BE text: hello with junk first byte 0xff
FF680065006C006C006F00 // UTF16LE text: hello with junk first byte 0xff

Everything will find "hello" in all the above files.

If your search text is all ASCII characters, Everything will do the search in one pass.

Post by **NotNull** » Sat Jul 09, 2022 12:39 pm

void wrote: ↑Sat Jul 09, 2022 12:24 pm What are the search ops from the Everything debug console?

Code: Select all

search 'ext:pdf menu   startwith:hex:content:25504446' filter '' sort 10 ascending 0
parse flags 00000000 type 20c00100
TERM pdf
parse flags 00000000 type 20c00100
TERM menu
parse flags 00080008 type 20c00100
TERM %PDF
FOLDER TERM START 0000000000afe280 M 0000000000afe160 N 0000000000afe280
000000000c2f9578 20e00100 M 000000000c2f97f8 N 0000000000afe280 OP 5 menu
000000000c2f97f8 20e00140 M 0000000000afe160 N 0000000000afe280 OP 378 %PDF
FILE TERM START 000000000c2fc638 M 0000000000afe160 N 0000000000afe280
000000000c2fc638 20e00100 M 000000000c2f9578 N 0000000000afe280 OP 205 pdf
000000000c2f9578 20e00100 M 000000000c2f97f8 N 0000000000afe280 OP 5 menu
000000000c2f97f8 20e00140 M 0000000000afe160 N 0000000000afe280 OP 378 %PDF
found 0 files with 2 threads in 0.013394 seconds
found 0 folders with 0 threads in 0.000002 seconds

If there's no match, Everything will try both UTF-16LE and UTF-16BE again with a 1 byte offset.

Got it. Thanks!

Post by **void** » Sun Jul 10, 2022 2:41 am

OP code 378 (Binary content search) is unexpected.

What version of Everything are you using?

Post by **NotNull** » Sat Jul 16, 2022 11:02 pm

1315a x64.

Will take a closer look tomorrow (I saw more unexpected results).

Post by **void** » Wed Aug 31, 2022 5:21 am

Everything 1.5.0.1316a fixes an issue with binarycontent: and hex: not using the correct search op code.

voidtools forum

hex: binary: binarycontent:

hex: binary: binarycontent:

Re: hex: binary: binarycontent:

Re: hex: binary: binarycontent:

Re: hex: binary: binarycontent:

Re: hex: binary: binarycontent:

Re: hex: binary: binarycontent:

Re: hex: binary: binarycontent:

Re: hex: binary: binarycontent: