Finding corrupt office files with Everything

Discussion related to "Everything" 1.5 Alpha.
Post Reply
Joelmo
Posts: 11
Joined: Sun May 09, 2021 3:27 pm

Finding corrupt office files with Everything

Post by Joelmo »

Hi all,

I discovered just now that you can use Everything to find corrupt Microsoft Office files.

Try searching Google for "find corrupt docx files" and you will see that 90 percent of the
websites found are commercial.

I made it so far by "Add columns" with "first 4 bytes" in Everything 1.5.0.1315a.

In this way I found out, that the hex code of a not damaged docx file starts with 504B0304.
But there are thousends …. and then it needs a lot of time to find every single file that
starts not with 504B0304.

Is there a way to search only for files whose hex code starts with 504B0304 ?

Thanks a lot
Joelmo
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Finding corrupt office files with Everything

Post by void »

Please try the following search:

*.docx hex:startwith:binarycontent:504B0304



hex: = treats the search as a hex string.
startwith: = match only the start of the file content.
binarycontent: = Search file content and treat the content as a byte stream.

Searching binary content



-or-


Please try adding the File Signature column.
docx files are zip files and will display application/zip if valid.
Joelmo
Posts: 11
Joined: Sun May 09, 2021 3:27 pm

Re: Finding corrupt office files with Everything

Post by Joelmo »

Oh wow !!!!!!

I tried all these:

*.docx hex:startwith:binarycontent:504B0304
*.doc hex:startwith:binarycontent:D0CF11E0
*.pub hex:startwith:binarycontent:D0CF11E0
*.xls hex:startwith:binarycontent:D0CF11E0
*.xlsx hex:startwith:binarycontent:504B0304
*.rtf hex:startwith:binarycontent:7B5C7274
*.pdf hex:startwith:binarycontent:25504446
*.jpg hex:startwith:binarycontent:FFD8FFE0
*.exe hex:startwith:binarycontent:4D5A90000

.... that's really awesome … it worked perfekt … !!!! ….

And then of course I had the idea ;-) to search not
only for intact docx files were hex begins with
504B0304 but to search for damaged docx files which
start not with 504B0304 too with:

*.docx hex:no-start-with:binarycontent:504B0304

But this didn`t worked … it showed again only files
hex starting with 504B0304
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Finding corrupt office files with Everything

Post by void »

Please try the following search:

*.docx !hex:startwith:binarycontent:504B0304

! = NOT
Joelmo
Posts: 11
Joined: Sun May 09, 2021 3:27 pm

Re: Finding corrupt office files with Everything

Post by Joelmo »

xxxxxxxxxxxxxxxxx
you are a genius
xxxxxxxxxxxxxxxxx
w64bit
Posts: 252
Joined: Wed Jan 09, 2013 9:06 am

Re: Finding corrupt office files with Everything

Post by w64bit »

I noticed that when using:
*.docx !hex:startwith:binarycontent:504B0304
first are displayed all docx files and after that the "good" files are removed from the list, remaining only "bad" files.

I think that the list should be empty and to display from the beginning only the "bad" files found.
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Finding corrupt office files with Everything

Post by tuska »

w64bit wrote: Wed Jul 06, 2022 10:17 am ... remaining only "bad" files.
Attention:
They are not only "bad" files - there may also be password protected files(!) and read-only files* among them!
* e.g. C:\Program Files\Microsoft Office\root\vfs\Windows\SHELLNEW\WORD.DOCX

I use these search queries for this purpose:

Code: Select all

*.docx !hex:startwith:binarycontent:504B0304 !<ext:lnk;docx.lnk>

*.xlsx !hex:startwith:binarycontent:504B0304 !<ext:lnk;xlsx.lnk>
*.xlsm !hex:startwith:binarycontent:504B0304 !<ext:lnk;xlsm.lnk>
Thanks to Joelmo!
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Finding corrupt office files with Everything

Post by void »

I think that the list should be empty and to display from the beginning only the "bad" files found.
I will look into showing an empty list first and adding files as they are found when using !
Thanks for the suggestion.

Attention: They are not only "bad" files - there may also be password protected files(!) among them!
Another option is to search for:
ext:docx;xlsx !file-signature:application/zip
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Finding corrupt office files with Everything

Post by tuska »

void wrote: Wed Jul 06, 2022 11:07 am
Attention: They are not only "bad" files - there may also be password protected files(!) among them!
Another option is to search for:

Code: Select all

ext:docx;xlsx !file-signature:application/zip
This query is much better! :)
Thanks a lot!
Joelmo
Posts: 11
Joined: Sun May 09, 2021 3:27 pm

Re: Finding corrupt office files with Everything

Post by Joelmo »

I have noticed that some docx files, that I converted from doc files do not show any content, although they show the correct hex start with 504B0304 when searching with *.docx hex:startwith:binarycontent:504B0304

Meanwhile I found out that all these corrupt docx files have the same number 10775 in the column "Sibling count".

When searching with *.docx hex:startwith:binarycontent:504B0304 in the column "Sibling Count" the following numbers were displayed:

47765 in the preview intact docx files
10775 in the preview damaged/corrupt word files
3097 in the preview intact docx files
822 in the preview intact docx files
451 in the preview intact docx files
84 in the preview intact docx files
45 in the preview intact docx files
17 in the preview intact docx files
12 in the preview intact docx files
10 in the preview intact docx files
8 in the preview intact docx files
6 in the preview intact docx files
5 in the preview intact docx files
3 in the preview intact docx files
2 in the preview intact docx files

Can please anybody explain to me what is behind ?
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Finding corrupt office files with Everything

Post by void »

Sibling count is the number of files/folders in the same folder (not counting itself)

Meanwhile I found out that all these corrupt docx files have the same number 10775 in the column "Sibling count".
This means all the corrupt docx files are in the same location.
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Finding corrupt office files with Everything

Post by void »

Everything 1.5.0.1316a will now add results from !content: searches as they are found.



Fixed an issue with endwith: and wildcards.
Search for the following will now work as expected:

Code: Select all

hex:endwith:wildcards:binarycontent:504b0506??????????????????
Everything 1.5.0.1316a also fixes an issue with binarycontent: and hex: not using the correct search op code.
Everything 1.5.0.1316a also fixes an issue with content offset and maxsize parameters.
Post Reply