Content Searching

If you are experiencing problems with "Everything", post here for assistance.
Post Reply
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Content Searching

Post by void »

Warning: content searching is extremely slow.
File content is not indexed.

Please combine content: functions with other filters for the best performance.

Content search functions:
content:<text> - Search file content using the associated iFilter. If no iFilter exists, UTF-8 content is used.
ansicontent:<text> - File contents are treated as ANSI text.
utf8content:<text> - File contents are treated as UTF-8 text.
utf16content:<text> - File contents are treated as UTF-16 (Unicode) text.
utf16becontent:<text> - File contents are treated as UTF-16 (Big Endian) text.

Example, find emails, modified this week, containing the text "bananas":

Code: Select all

*.eml dm:thisweek content:bananas
Note: the content: function requires Everything 1.4 or later.
Please note: Everything will not display any results until all content has been scanned.
Adding results one at a time that match the content search is on my TODO list.
frozst
Posts: 7
Joined: Fri Feb 19, 2016 11:24 pm

Re: Content Searching

Post by frozst »

It would be great if you could actually index the content of some files but I guess that then the db would be massive...

Thanks!
ccs86
Posts: 2
Joined: Mon Mar 14, 2016 3:25 pm

Re: Content Searching

Post by ccs86 »

Hi there, loving your program!

On content searching, it seems that Everything is excluding certain file types from content searching. Is there a setting where I can control that?

Specifically, it is not searching content of .LIB files, and I have a bunch which contain plain text.

Thanks!
therube
Posts: 4955
Joined: Thu Sep 03, 2009 6:48 pm

Re: Content Searching

Post by therube »

That's odd.
Never would have expected that.
(Other file extensions also seem to be affected, at the least, .exe. Suspect that it is excluding "binary" file types, by extension, but still I would not have expected that.)
horst.epp
Posts: 1443
Joined: Fri Apr 04, 2014 3:24 pm

Re: Content Searching

Post by horst.epp »

frozst wrote:It would be great if you could actually index the content of some files but I guess that then the db would be massive...

Thanks!
Windows indexing does this and works fine for most type of files using IFilters.
boxxybabe
Posts: 1
Joined: Thu May 12, 2016 10:37 am

Re: Content Searching

Post by boxxybabe »

Hi,

thanks for the content search feature!
When I use a saved search, the search criteria is not shown on the search field. When I then output content:<text> to the search field, Everything searches against all files instead of searching against those files that were searched by the saved search. Would it be possible to have an option to search content from those files only that were initially found based on the file search criteria? I assume this would be really easy to implement - just combine both search criteria automatically when adding that content search.

Thanks,

B
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content Searching

Post by void »

A separate content field is on my TODO list.
For now, please add the content: search to the end of your search, eg:

Code: Select all

foo.bar content:qux
adamantine
Posts: 214
Joined: Mon Jan 09, 2012 10:56 am

Re: Content Searching

Post by adamantine »

void, i have 27000 cue-files and 500 txt-files (they are not big at all and they are changing not substantially)

1) could you recommend me the exact function (so that content search was faster)?:
content / ansicontent / utf8content / utf16content / utf16becontent

2) is it possible to implement some simple content-indexing system so that content search was much faster or even instantaneous (only for small txt-files like cue/txt)?

(i use 1.4.0.703 + xp sp 2)
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content Searching

Post by void »

1) could you recommend me the exact function (so that content search was faster)?:
content / ansicontent / utf8content / utf16content / utf16becontent
If the CUE files are mostly ASCII text then utf8content: would be the fastest, otherwise content:

utf8content: will load the entire file into memory and scan it directly (fast).
content: will load the associated iFilter and the iFilter will search the file as a stream for your contents (slow).
ansicontent:, utf16content: and utf16becontent: will convert the contents to UTF-8 in memory before the comparison (slow).

content: will use utf8content: when no associated iFilter is present. Which will be for most file types on Windows XP.

As always, please check the performance yourself:
Open Everything in debug mode and try utf8content: and content: and check the db search time for both searches.
2) is it possible to implement some simple content-indexing system so that content search was much faster or even instantaneous (only for small txt-files like cue/txt)?
It's possible, the Everything database is capable, maybe in a future release..
27000 CUE files on a SSD would only take a few seconds to search.
Copying all the CUE/TXT files to a single text file and using a good text editor to search would also be faster.
otravers
Posts: 5
Joined: Tue May 31, 2016 6:30 pm

Re: Content Searching

Post by otravers »

Hi, may I suggest you look into Windows' native search index? It can apparently be tapped by third-party applications. I tried one app that provides full-text search that way, it's called DeskRule and it's pretty fast.
EchterStahlmann
Posts: 6
Joined: Fri Aug 25, 2017 4:52 pm

Re: Content Searching

Post by EchterStahlmann »

otravers wrote:Hi, may I suggest you look into Windows' native search index? It can apparently be tapped by third-party applications. I tried one app that provides full-text search that way, it's called DeskRule and it's pretty fast.
Yes, interesting question!!!

What's the nature of content searchers and why do they run slow (or how do you define slow)?
BDBill
Posts: 1
Joined: Mon Oct 23, 2017 4:42 pm

Re: Content Searching

Post by BDBill »

The drawback to Window search is, it does not index network drives. This is all I use Everything for. All of my data is on network drives.
rgbigel
Posts: 41
Joined: Sun Apr 17, 2011 4:00 pm

Re: Content Searching

Post by rgbigel »

I may be blind, but I can not find the Content: function in the help for search syntax.
I knew it was there (from the very bottom of my memory about reading in forum).
:roll:
tuska
Posts: 1052
Joined: Thu Jul 13, 2017 9:14 am

Re: Content Searching

Post by tuska »

Stamimail
Posts: 1122
Joined: Sat Aug 31, 2013 9:05 pm

Re: Content Searching

Post by Stamimail »

Is it possible to identify files that have been compiled and files that didn't?
Excluding all binaries files can make it faster, isn't it?
Content: will retrieve only files with:
1. normal text.
2. common extensions with common metadata.

For binaries content searching, that search text in the whole file, we can use other function.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Content Searching

Post by NotNull »

@Stamimail: thanks for digging up this thread! Learnt a lt about the mechanics of content searching!

Most filetypes have some 'signature' in the first few bytes of the file. For example: executables start with "MZ".
Search for PEID (PE identifier) for a list of them.
Stamimail
Posts: 1122
Joined: Sat Aug 31, 2013 9:05 pm

Re: Content Searching

Post by Stamimail »

I thought about "Content Searching" one more time.
I think that a program that wants to support "Content Searching", should have a Preview to look at the results.
I mean,
1. At first the user types in his query in SearchBox, then
2. Go to Results Pane (list of files), and select a file, then
3. Look at the Preview of each file, to find which file exactly he wants.

As long as Everything does not intend to do that, I suggest to put the focus of developing on the rest things.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Content Searching

Post by NotNull »

This is how I think 'things' work (if not: please let me know);

For the most part, that is already the case. For common filetypes (docx, xlsx, pdf txt, etc) preview as well as content searching both use Windows' iFilter interface to access the content. The iFiletr for a certain filetype (let's say a .doc file) knows the layout of that filetype and is able to read the clean text (whereas opening this file in Notepad gives you a lot of 'nonsense' mixed through some text).

Windows Search and Windows Preview both use this iFilter way to read 'clean text' for indexing resp. previewing purposes.
As far as I can tell, that's what Everything does too.
Through iFilter it is also possible to read properties like metadata, tags, .... That is: if applications(or Windows itself) implemented this for that filetype.
But I don't know if Everything makes use of that (the iFilter interface to read metadata etc).

Things are different when there is no iFilter for certain filetypes. Here Everything can't call an API to read contents and falls back on raw reading the file.
ANSI, UTF8 UTF16 (Big Endian/Little Endian) all have different çodepages' (by lack of a better word) to describe individual characters. That's why you have to specify content: utf8content: etc.
These files can be searched in Everything, but there is no decent way to preview them. I seriously doubt if Windows Search indexes this content.


Long story short: if you can preview a certain file, you can search it's content.
Stamimail
Posts: 1122
Joined: Sat Aug 31, 2013 9:05 pm

Re: Content Searching

Post by Stamimail »

ANSI, UTF8 UTF16 (Big Endian/Little Endian) all have different çodepages' (by lack of a better word) to describe individual characters. That's why you have to specify content: utf8content: etc.
I guess Everything it's not a dev tool. The simple user expects that Everything will parse correctly what the user can see in Microsoft Notepad. The simple user will want to use the default function for this.
Long story short: if you can preview a certain file, you can search it's content.
There are two stages here:
1. Content Indexing
2. Preview

To be a good program of Content Search the program needs to have high quality in the two stages.
Let's say Everything will make it and will have high quality of "Content Indexing", but still, I'm not sure the quality of the Preview stage will be good enough.

What is high quality of Preview? I would say similar to WIndows Preview but with the content search query highlighted in Preview.
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content Searching

Post by void »

Everything attempts to use the associated IFilter, if one does not exist it will fallback to UTF-8 content, whereas in Windows it would not search any further.

IFilters do not search file meta data.
I think this is part of the issue you are describing, Windows Explorer searches for this information without the need to specify a content: search function, whereas in Everything you will need to be specific with your search, do you want file content? meta data? id3 data?
Eventually I'll add meta data searching support too, such as meta:"The Prodigy"

This comes back to the "should content: search id3 tags" issue too.. My current plan is to add a id3: search function which searches all tags for the specified text. I haven't decided on whether I make "Everything" "smart" and to search id3 tags when using the content: search functions, yet. I can see this being useful for some file types, but it will be impossible to support all file types.
The simple user expects that Everything will parse correctly what the user can see in Microsoft Notepad. The simple user will want to use the default function for this.
This is what should be happening, there is an IFilter for most text files formats, so Everything should be reading the correct ANSI/UTF-8/Unicode content.
The most important file which is .txt has an IFilter that will support ANSI/UTF-8/Unicode content.
For other file types I can only guess what the content might be and most the time UTF-8 will be fine for ASCII searches.
If you know the content type, use one of the following content search functions:
  • utf8content:
    ansicontent:
    utf16content:
    utf16becontent:
I doubt I will make "Everything" content aware, there's too many file formats, using IFilters should be more than sufficient.

As for highlighting content searches in the preview pane, I would have to implement my own preview handlers. I do have on my TODO list to implement my own text preview handler, so it is doable with basic text documents, I'll have to look into anything beyond that..

What are your thoughts on content: searching / id3 searching / meta data searching?
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Content Searching

Post by NotNull »

Stamimail wrote: I guess Everything it's not a dev tool. The simple user expects that Everything will parse correctly what the user can see in Microsoft Notepad. The simple user will want to use the default function for this.
Not only 'simple users'. I guess 99% of the users, including myself.
My opinion: the last thing you want is a complicated, 'bureaucratic' syntax like Windows Search has. It makes it effectively unusable without it's manual at hand. Read further how that could be avoided (again: my opinion)
What is high quality of Preview? I would say similar to WIndows Preview but with the content search query highlighted in Preview.
That's a very nice idea! Will involve a lot of extra code in Everything I guess, as this is now largely 'outsourced' to Windows itself (and Windows' preview host (prevhost.exe) doesn't have that feature)

FWIW: If you have certain (text) filetypes where you want to search it's contents with an iFilter, you can add that filetype in the registry.
That way you can use the content: function instead of utf16content: /..
Note: this has to be configured separate from the preview iFilter.
Let me know when interested.

void wrote: IFilters do not search file meta data.
Isn't that what the iFilter::GetValue() is all about?
As for highlighting content searches in the preview pane, I would have to implement my own preview handlers. I do have on my TODO list to implement my own text preview handler, so it is doable with basic text documents, I'll have to look into anything beyond that..
For inspiration purposes: https://www.codeproject.com/Articles/13 ... sg=2959782

What are your thoughts on content: searching / id3 searching / meta data searching?
Don't know if this was aimed at @Stamimail or in general, but my opinion:
(Assuming the GEtValue() function can indeed read the metadata/properties/tags or anoher easy way to read tags)

I would like the content: function to search text as well as (let's call them) tags. Just to keep it simple.
If you want specifically search in tags, you can use the tag: function.
Couple of examples (from simple to more advanced):
tag:"The Prodigy" (searches through all values)
tag:Artist="The Prodigy" (searches for this specific key/value pair)
tag:year<2000
tag:"The Prodigy";"Chemical Brothers" (OR; just like ext:exe;dll)
tag:"The Prodigy"+2000" (AND) (if one of the tags includes a "+", you would have to "" the tag).
tag:artist="The Prodigy"+year=1997


There might be an issue with localization. If I had a Dutch OS, it would say "Jaar" instead of "Year" for example. So just stick with the English version.
Stamimail
Posts: 1122
Joined: Sat Aug 31, 2013 9:05 pm

Re: Content Searching

Post by Stamimail »

I would say 2 things:

1. Everything should let the user to create and combine custom "Content Searching" like:
content1: <== utf8content:+ansicontent:
content2: <== utf8content:+ansicontent:+utf16becontent:
content3: <== utf8content:+utf16becontent:

I guess it's already possilble with Filter functions, but I think need to find a way how to make the syntax and the combining more clear and simple, so that you don't need to teach the users how to (The principle of Associativity).

2. I think the main and the real problem is the Preview stage.
I think users simply will look for alternative to Everything to get the high quality of Preview they are looking for.

I don't think is the job of Everything to make those "iFIlters", especially since Everything has a lot to do on the TODO list. Leave it to v1.6...
Stamimail wrote:I suggest to put the focus of developing on the rest things.
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content Searching

Post by void »

My opinion: the last thing you want is a complicated, 'bureaucratic' syntax like Windows Search has. It makes it effectively unusable without it's manual at hand. Read further how that could be avoided (again: my opinion)
I'm thinking about having two functions for searching content:
1) a simple one and
2) a specific method one, for example:
width: would search any image or video for the specified width, while pngwidth: or jpgwidth: or mkvwidth: would search using the specific method for searching for width. Similarly, content: could use IFilter, UTF-8 content, UTF-16, ansi content and any other text content it could try..

*.png pngwidth: would be faster than *.png width: as it wouldn't need to check png files for a jpeg header, or mkv header etc... (this is how it currently works for Everything 1.3)

I can make the optimization where Everything can attempt to read the png header first when a filename with .png was specified.. as a png file without a png header is not going to occur often.

While Everything is rather relaxed to search syntax, Everything does have a lot of search functions. I'm not sure at this stage if I should avoid adding search functions just to keep the list small. I think this will come down to how the help is presented, perhaps separate list of common functions names and a complete list, as most users will not care for pngwidth:, but might find width: useful.

When content indexing is implemented, I will consider an option so you don't need to specify content: or width: etc.. , simply typing in 1920 will show all images with a width of 1920. Maybe a list of check boxes so Everything can automatically search content without the specific search function.
Same could be applied with size: and date modified information..

This takes away a lot of control from the user, I would rather have the user type in size:14kb rather than just 14kb because that could be misrepresented. I will still give them the option to automatically search size:<search> width:<search> etc, however it would be disabled by default and would only work for content that is indexed.
What is high quality of Preview? I would say similar to WIndows Preview but with the content search query highlighted in Preview.

That's a very nice idea! Will involve a lot of extra code in Everything I guess, as this is now largely 'outsourced' to Windows itself (and Windows' preview host (prevhost.exe) doesn't have that feature)
A good place to start would be to show a "quick preview" that is a text only preview with content: searches matches highlighted.
I'll consider adding something like this..

void wrote:IFilters do not search file meta data.
Isn't that what the iFilter::GetValue() is all about?
IFilter::GetValue gets the type of the text, such as is the text a date or currency or simple text etc..
It is more or less a hint for how to store the actual data when content indexing.
While an IFilter implementation could be used for reading meta data, it is mostly used to read the plain text of documents, such as Word or PDF files.
What are your thoughts on content: searching / id3 searching / meta data searching?
Don't know if this was aimed at @Stamimail or in general, but my opinion:
I'm happy to hear anyone's input on the subject.

My current thoughts are to make content: "smart" and attempt to understand the content, and have other content functions for very specific methods.
1. Everything should let the user to create and combine custom "Content Searching" like:
content1: <== utf8content:+ansicontent:
content2: <== utf8content:+ansicontent:+utf16becontent:
content3: <== utf8content:+utf16becontent:
I'll look into making content: not fall back to UTF-8 and make it "smart" by attempting to understand the content, this might involve searching for UTF-8, UTF-16, ansi and other codepages. However, this is only relevant for unknown file types, ie: the file does not have an IFilter association.
The question is, is using just the IFilter enough?

Like you say, perhaps present to the user a list of content types, and allow them to check them as needed. If it can be avoided, and the user can search all content types with content:, that would be the preferred option.
2. I think the main and the real problem is the Preview stage.
I think users simply will look for alternative to Everything to get the high quality of Preview they are looking for.

I don't think is the job of Everything to make those "iFIlters", especially since Everything has a lot to do on the TODO list. Leave it to v1.6...
I do have "Implement my own text preview handler" on my TODO list, I will look at rendering and highlighting Word and PDF and other files via IFilter in plain text. This would have to be a separate "Preview" pane, one for formatted text and another for plain text.
IFilters do not return any formatting, so Everything would have to implementing its own Word / PDF renderer, not something I'm looking to do anytime soon.

Previewing and highlighting plain text might be a good start..
Stamimail
Posts: 1122
Joined: Sat Aug 31, 2013 9:05 pm

Re: Content Searching

Post by Stamimail »

pngwidth: or jpgwidth: or mkvwidth:
consider
png.width: or jpg.width: or mkv.width:

I'm not sure at this stage if I should avoid adding search functions just to keep the list small. I think this will come down to how the help is presented
You can keep the list small, and having a lot of functions, if you keep the dispaly and the curve learning reasonable. like to make 10 list items, and Advanced option, 10 sub list items and Advanced2 option.

A good place to start would be to show a "quick preview" that is a text only preview with content: searches matches highlighted.
I'll consider adding something like this..
You will need to make a research of what exist in market (DocFetcher, GoogleDesktop...) and use those apps for a while to know what is the better approach, and maybe achieve an open-source or people that can help. I guess it's a new world to learn.
Keep in mind LTR/RTL support ;-)
Stamimail
Posts: 1122
Joined: Sat Aug 31, 2013 9:05 pm

Re: Content Searching

Post by Stamimail »

Image
_______________________
When you need to read a mixture of LTR & RTL paragraphs, one solution is to have an option to flip the (web page) direction:
Image
Stamimail
Posts: 1122
Joined: Sat Aug 31, 2013 9:05 pm

Re: Content Searching

Post by Stamimail »

I tested briefly:

Google Desktop
Lookeen
Searchmonkey
FileSearchy
FileLocator

Unfortunately all of them are still not compatible to work properly with RTL text. :(

Anyway, I think the research here is important, as long as you thinking of how to design Content Searching (and maybe Basic Editing).
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content Searching

Post by void »

consider
png.width: or jpg.width: or mkv.width:
I've added pngwidth:/png-width:/png.width:
However, I'm currently thinking these need to be replaced with an ispng: function, which would verify the result is a valid png file.
For example:
pic: !ext:png ispng:

This would be useful for all other formats too, eg: ext:mkv !ismkv: or ext:zip !iszip: to detect for corrupt files..

Thanks for the file content search suggestions, I'll consider them for when I add my own text preview handler to Everything.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Content Searching

Post by NotNull »

Why not a more general picture.width , video.width or audio.length?
There are a lot of file extensions for audio, video, picture, which would lead to twice as much width/height functions. Or trice as much if you decide to include the ispng: like functions
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content Searching

Post by void »

width: will always be the generic width for pictures and videos.
For now I have been using pic: width:800 or video: width:320 to specify what type of width:
I'll consider adding picture-width:/pic-width/video-width:/vid-width: etc.
Thanks for the suggestion.
There are a lot of file extensions for audio, video, picture, which would lead to twice as much width/height functions. Or trice as much if you decide to include the ispng: like functions
Yes, I've added support for all the most common picture formats to Everything (webp,png,bmp,jpg,gif,ico,tga,pcx,psd,tiff), and I've added some video format support (still more to add), and I've realized having png-width:, tga-width:, psd-width:, png-height:, tga-height:, psd-height: etc... is too much.
It will be up to the user to specify the desired extension, for example:
ext:png width:800

So far, is-png:, is-tga:, is-psd: is an improvement..
There is only one function for each picture/video format type.
It also provides a simple and fast way to find corrupt files or files with an incorrect extension.
For example:
ext:png !is-png:

Note that the is-png: functions will only check the container for errors, not the data itself.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Content Searching

Post by NotNull »

The is-png:, is-jpg: functions could also be combined in something like true-type: /has-valid-extension:
(I suggest a better name though ;))

That way you could search for:
ext:png;jpg !truetype:
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content Searching

Post by void »

A true-type: or has-valid-extension: search function is a good idea. I can see one issue though, is that it implies Everything knows the filetype.
For example, if I search for:
ext:xz !truetype:
Everything does not know xz is a compressed format (for now, I may add it later..) does this mean everything should add it as a result or not.

I could add an is-image: search function which would combine all is-jpg: is-png: is-gif: etc..
It is a bit vague, but might make the following search useful:
pic: !is-image:
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Content Searching

Post by NotNull »

void wrote: Fri Dec 07, 2018 11:32 pm A true-type: or has-valid-extension: search function is a good idea. I can see one issue though, is that it implies Everything knows the filetype.
For example, if I search for:
ext:xz !truetype:
Everything does not know xz is a compressed format (for now, I may add it later..) does this mean everything should add it as a result or not.

I could add an is-image: search function which would combine all is-jpg: is-png: is-gif: etc..
It is a bit vague, but might make the following search useful:
pic: !is-image:
Good points!
I'll think about it a little more (should have done that the first time ..)
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Content Searching

Post by void »

Content searching is evolving rapidly, I've added an optional Content Type column which lists the file's mime-type based on file content (not extension).

It is interesting to find files with odd extensions that are really image/jpeg or working out what the content was for an orphaned file such as FILE0000.CHK..

This also means you can do searches like: content-type:image/png (same as ispng:)
However, it will only work for content-type that Everything understands, So I will have to add as many types as I can.

Currently all of Everything's supported content-types do a container check for basic file content validation. I'm not sure if I'll add this for all future types, or change it so just looks for a basic file signature..

Currently, I find it useful to check if a file container is intact.. eg: half-downloaded jpeg files will not show image/jpeg as the file content type, but there is a good chance the file signature would still indicate image/jpeg..
Post Reply