Finding corrupt office files with Everything
Finding corrupt office files with Everything
Hi all,
I discovered just now that you can use Everything to find corrupt Microsoft Office files.
Try searching Google for "find corrupt docx files" and you will see that 90 percent of the
websites found are commercial.
I made it so far by "Add columns" with "first 4 bytes" in Everything 1.5.0.1315a.
In this way I found out, that the hex code of a not damaged docx file starts with 504B0304.
But there are thousends …. and then it needs a lot of time to find every single file that
starts not with 504B0304.
Is there a way to search only for files whose hex code starts with 504B0304 ?
Thanks a lot
Joelmo
I discovered just now that you can use Everything to find corrupt Microsoft Office files.
Try searching Google for "find corrupt docx files" and you will see that 90 percent of the
websites found are commercial.
I made it so far by "Add columns" with "first 4 bytes" in Everything 1.5.0.1315a.
In this way I found out, that the hex code of a not damaged docx file starts with 504B0304.
But there are thousends …. and then it needs a lot of time to find every single file that
starts not with 504B0304.
Is there a way to search only for files whose hex code starts with 504B0304 ?
Thanks a lot
Joelmo
Re: Finding corrupt office files with Everything
Please try the following search:
*.docx hex:startwith:binarycontent:504B0304
hex: = treats the search as a hex string.
startwith: = match only the start of the file content.
binarycontent: = Search file content and treat the content as a byte stream.
Searching binary content
-or-
Please try adding the File Signature column.
docx files are zip files and will display application/zip if valid.
*.docx hex:startwith:binarycontent:504B0304
hex: = treats the search as a hex string.
startwith: = match only the start of the file content.
binarycontent: = Search file content and treat the content as a byte stream.
Searching binary content
-or-
Please try adding the File Signature column.
docx files are zip files and will display application/zip if valid.
Re: Finding corrupt office files with Everything
Oh wow !!!!!!
I tried all these:
*.docx hex:startwith:binarycontent:504B0304
*.doc hex:startwith:binarycontent:D0CF11E0
*.pub hex:startwith:binarycontent:D0CF11E0
*.xls hex:startwith:binarycontent:D0CF11E0
*.xlsx hex:startwith:binarycontent:504B0304
*.rtf hex:startwith:binarycontent:7B5C7274
*.pdf hex:startwith:binarycontent:25504446
*.jpg hex:startwith:binarycontent:FFD8FFE0
*.exe hex:startwith:binarycontent:4D5A90000
.... that's really awesome … it worked perfekt … !!!! ….
And then of course I had the idea to search not
only for intact docx files were hex begins with
504B0304 but to search for damaged docx files which
start not with 504B0304 too with:
*.docx hex:no-start-with:binarycontent:504B0304
But this didn`t worked … it showed again only files
hex starting with 504B0304
I tried all these:
*.docx hex:startwith:binarycontent:504B0304
*.doc hex:startwith:binarycontent:D0CF11E0
*.pub hex:startwith:binarycontent:D0CF11E0
*.xls hex:startwith:binarycontent:D0CF11E0
*.xlsx hex:startwith:binarycontent:504B0304
*.rtf hex:startwith:binarycontent:7B5C7274
*.pdf hex:startwith:binarycontent:25504446
*.jpg hex:startwith:binarycontent:FFD8FFE0
*.exe hex:startwith:binarycontent:4D5A90000
.... that's really awesome … it worked perfekt … !!!! ….
And then of course I had the idea to search not
only for intact docx files were hex begins with
504B0304 but to search for damaged docx files which
start not with 504B0304 too with:
*.docx hex:no-start-with:binarycontent:504B0304
But this didn`t worked … it showed again only files
hex starting with 504B0304
Re: Finding corrupt office files with Everything
Please try the following search:
*.docx !hex:startwith:binarycontent:504B0304
! = NOT
*.docx !hex:startwith:binarycontent:504B0304
! = NOT
Re: Finding corrupt office files with Everything
xxxxxxxxxxxxxxxxx
you are a genius
xxxxxxxxxxxxxxxxx
you are a genius
xxxxxxxxxxxxxxxxx
Re: Finding corrupt office files with Everything
I noticed that when using:
*.docx !hex:startwith:binarycontent:504B0304
first are displayed all docx files and after that the "good" files are removed from the list, remaining only "bad" files.
I think that the list should be empty and to display from the beginning only the "bad" files found.
*.docx !hex:startwith:binarycontent:504B0304
first are displayed all docx files and after that the "good" files are removed from the list, remaining only "bad" files.
I think that the list should be empty and to display from the beginning only the "bad" files found.
Re: Finding corrupt office files with Everything
Attention:
They are not only "bad" files - there may also be password protected files(!) and read-only files* among them!
* e.g. C:\Program Files\Microsoft Office\root\vfs\Windows\SHELLNEW\WORD.DOCX
I use these search queries for this purpose:
Code: Select all
*.docx !hex:startwith:binarycontent:504B0304 !<ext:lnk;docx.lnk>
*.xlsx !hex:startwith:binarycontent:504B0304 !<ext:lnk;xlsx.lnk>
*.xlsm !hex:startwith:binarycontent:504B0304 !<ext:lnk;xlsm.lnk>
Re: Finding corrupt office files with Everything
I will look into showing an empty list first and adding files as they are found when using !I think that the list should be empty and to display from the beginning only the "bad" files found.
Thanks for the suggestion.
Another option is to search for:Attention: They are not only "bad" files - there may also be password protected files(!) among them!
ext:docx;xlsx !file-signature:application/zip
Re: Finding corrupt office files with Everything
This query is much better!void wrote: ↑Wed Jul 06, 2022 11:07 amAnother option is to search for:Attention: They are not only "bad" files - there may also be password protected files(!) among them!Code: Select all
ext:docx;xlsx !file-signature:application/zip
Thanks a lot!
Re: Finding corrupt office files with Everything
I have noticed that some docx files, that I converted from doc files do not show any content, although they show the correct hex start with 504B0304 when searching with *.docx hex:startwith:binarycontent:504B0304
Meanwhile I found out that all these corrupt docx files have the same number 10775 in the column "Sibling count".
When searching with *.docx hex:startwith:binarycontent:504B0304 in the column "Sibling Count" the following numbers were displayed:
47765 in the preview intact docx files
10775 in the preview damaged/corrupt word files
3097 in the preview intact docx files
822 in the preview intact docx files
451 in the preview intact docx files
84 in the preview intact docx files
45 in the preview intact docx files
17 in the preview intact docx files
12 in the preview intact docx files
10 in the preview intact docx files
8 in the preview intact docx files
6 in the preview intact docx files
5 in the preview intact docx files
3 in the preview intact docx files
2 in the preview intact docx files
Can please anybody explain to me what is behind ?
Meanwhile I found out that all these corrupt docx files have the same number 10775 in the column "Sibling count".
When searching with *.docx hex:startwith:binarycontent:504B0304 in the column "Sibling Count" the following numbers were displayed:
47765 in the preview intact docx files
10775 in the preview damaged/corrupt word files
3097 in the preview intact docx files
822 in the preview intact docx files
451 in the preview intact docx files
84 in the preview intact docx files
45 in the preview intact docx files
17 in the preview intact docx files
12 in the preview intact docx files
10 in the preview intact docx files
8 in the preview intact docx files
6 in the preview intact docx files
5 in the preview intact docx files
3 in the preview intact docx files
2 in the preview intact docx files
Can please anybody explain to me what is behind ?
Re: Finding corrupt office files with Everything
Sibling count is the number of files/folders in the same folder (not counting itself)
This means all the corrupt docx files are in the same location.Meanwhile I found out that all these corrupt docx files have the same number 10775 in the column "Sibling count".
Re: Finding corrupt office files with Everything
Everything 1.5.0.1316a will now add results from !content: searches as they are found.
Fixed an issue with endwith: and wildcards.
Search for the following will now work as expected:
Everything 1.5.0.1316a also fixes an issue with binarycontent: and hex: not using the correct search op code.
Everything 1.5.0.1316a also fixes an issue with content offset and maxsize parameters.
Fixed an issue with endwith: and wildcards.
Search for the following will now work as expected:
Code: Select all
hex:endwith:wildcards:binarycontent:504b0506??????????????????
Everything 1.5.0.1316a also fixes an issue with content offset and maxsize parameters.