Search for DOCX using Verdana font
Search for DOCX using Verdana font
I tried to search for DOCX files which are using Verdana font
ext:docx content:Verdana
but not all files are found.
Is there anything I can do to found them?
ext:docx content:Verdana
but not all files are found.
Is there anything I can do to found them?
Re: Search for DOCX using Verdana font
Font names are not part of the content.
Maybe a binary search can help here.
Maybe a binary search can help here.
Re: Search for DOCX using Verdana font
Word .docx files are compressed very much like .zip files.
If a .docx file, say wd,docx, is renamed to .zip, then this will work to find fonts:
wd.zip content:acme
(Most (all?) word documents contain some default formats, such as Times New Roman.)
If a .docx file, say wd,docx, is renamed to .zip, then this will work to find fonts:
wd.zip content:acme
(Most (all?) word documents contain some default formats, such as Times New Roman.)
Re: Search for DOCX using Verdana font
froggie, this is a good one. Thanks.
Now, how can I search inside docx files (without renaming them to zip) by using zip iFilter instead docx iFilter?
something like:
ext:docx ifilter:zip content:Verdana
Now, how can I search inside docx files (without renaming them to zip) by using zip iFilter instead docx iFilter?
something like:
ext:docx ifilter:zip content:Verdana
Re: Search for DOCX using Verdana font
Very, very much
When you open a docx file in a text or hex editor, you'll see that the first two characters are PK.
Those are the initials of Phil Katz, the developer of the original PKzip and "founder" of the zip format.
All zipfiles start with this PK identifier, although I'm not certain about about self-extracting zip files.
(end of Useless Fact)
I am surprised the font names could be found in the docx file..
Inside the docx "zipfile" are many xml files, among which a font declaration file (fonttable.xml), but reading a zip using the content: function should not read through all the files in the zip. At least that is how I thought this works.
I would like to be wrong here though, as that gives loads of new search opportunities!
Alternative would be to extract this 'fonttable.xml' from the docx file and parse that (with a script).
Re: Search for DOCX using Verdana font
I was aware of the "PK", but was (and am) unsure of what modifications Microsoft might have made, thus my comment.
Nevertheless, content searches for fonts and text in documents(renamed to zip files) work for me.
Maybe it won't be necessary to rename the files:
Nevertheless, content searches for fonts and text in documents(renamed to zip files) work for me.
Maybe it won't be necessary to rename the files:
Re: Is it possible to search for content within an archive file
Post by void » Sat Jul 15, 2023 9:29 pm
I will trial a change in the next alpha update to treat any file (with a zip footer) as a zip file.
I'll report back here once this is ready for testing.
Last edited by froggie on Sat Jul 22, 2023 12:41 am, edited 1 time in total.
Re: Search for DOCX using Verdana font
Everything can't do this yet.
I will consider a zipcontent: search function.
Thank you for the suggestions.
I will consider a zipcontent: search function.
Thank you for the suggestions.
Re: Search for DOCX using Verdana font
Everything will only read zip filenames, not zip content.I will trial a change in the next alpha update to treat any file (with a zip footer) as a zip file.
Re: Search for DOCX using Verdana font
@void: Maybe I am missing something, but when there is a xml structure inside a zip (and only then) Everything seems to read the content - is it regarding it as a directory? Is something else strange going on? Other content within a zip does not work, as expected, but xml seems to. I created several examples and they all work like this
[/code]
Code: Select all
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
-<w:document mc:Ignorable="w14 w15 wp14" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
-<w:body>
-<w:p w:rsidRDefault="001F53A3" w:rsidR="00E65461">
-<w:r>
<w:t>Greatbigword</w:t>
</w:r>
<w:bookmarkStart w:name="_GoBack" w:id="0"/>
<w:bookmarkEnd w:id="0"/>
</w:p>
-<w:sectPr w:rsidR="00E65461" w:rsidSect="007A34C0">
<w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar w:gutter="0" w:footer="720" w:header="720" w:left="1440" w:bottom="1440" w:right="1440" w:top="1440"/>
<w:cols w:space="720"/>
<w:docGrid w:linePitch="360"/>
</w:sectPr>
</w:body>
</w:document>
Re: Search for DOCX using Verdana font
Everything will use the system-wide docx/zip iFilter to search docx/zip content.
Typically there's no zip iFilter so Everything will fall back to a binary content search.
Everything will look for the following text encodings when performing a binary content search:
ASCII
ANSI
UTF-8
UTF-16 LE
UTF-16 LE (with an offset of 1)
UTF-16 BE
UTF-16 BE (with an offset of 1)
My guess is the font name is stored as raw text in your zip file somewhere..
To check the content Everything reads, include the following in your search:
new .zip regex:dotall:content:^(.*)$ addcol:regmatch1
The read content is shown in the regmatch1 column.
Typically there's no zip iFilter so Everything will fall back to a binary content search.
Everything will look for the following text encodings when performing a binary content search:
ASCII
ANSI
UTF-8
UTF-16 LE
UTF-16 LE (with an offset of 1)
UTF-16 BE
UTF-16 BE (with an offset of 1)
My guess is the font name is stored as raw text in your zip file somewhere..
To check the content Everything reads, include the following in your search:
new .zip regex:dotall:content:^(.*)$ addcol:regmatch1
The read content is shown in the regmatch1 column.
Re: Search for DOCX using Verdana font
All of the Microsoft Word XML from the WordDoc.zip is in the regmatch1 column (as far as I can expand the width of the column).
So Everything can match whatever is found in the XML (including text in documents)
Using the Nirsoft filter listing tool, there is a "Microsoft Office Open XML Format Filter" (offfiltx.dll) which processes XML and only XML in ZIP files.
So this is all working the way it does because of the filter.
Now if I only knew exactly which release of Office it came from.
Thank you @Void.
So Everything can match whatever is found in the XML (including text in documents)
Using the Nirsoft filter listing tool, there is a "Microsoft Office Open XML Format Filter" (offfiltx.dll) which processes XML and only XML in ZIP files.
So this is all working the way it does because of the filter.
Now if I only knew exactly which release of Office it came from.
Thank you @Void.
Last edited by froggie on Sat Jul 22, 2023 1:52 pm, edited 1 time in total.
Re: Search for DOCX using Verdana font
Neither was I ( due to Microsoft's infamous "Embrace and Extend and Extinguish" policy). Just a fun fact.
You could check the version of offfitx.dll. Typically these are in line with the Office versions.
8.0 = Office 97
9.0 = Office 2000
10.0 = Office XP
11.0 = Office 2003
12.0 = Office 2007
14.0 = Office 2010
15.0 = Office 2013
16.0 = Office 2016
(lost track after that)
Re: Search for DOCX using Verdana font
Good idea.
Offfiltx.dll is installed in two different places - one from Office 2010 and one from Office 2013. On another system, there are two from Office 2016 (x86 & x64) installed in yet different locations.
.docx came out with Office 2007, so that is the earliest I would expect to (perhaps) have this Ifilter.
(I originally left out the "l". The file name is offfiltx.dll )
Offfiltx.dll is installed in two different places - one from Office 2010 and one from Office 2013. On another system, there are two from Office 2016 (x86 & x64) installed in yet different locations.
.docx came out with Office 2007, so that is the earliest I would expect to (perhaps) have this Ifilter.
(I originally left out the "l". The file name is offfiltx.dll )