[Suggestion] Hybrid content indexing

Discussion related to "Everything" 1.5 Alpha.
Post Reply
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

[Suggestion] Hybrid content indexing

Post by NotNull »

More a wild thought than an actual suggestion ....

We see quite a few people on these forums that try to content-index too many documents. That made me think ...

- Read a document
- Grab all words
- De-duplicate that list
- Add that to the Everything database instead of the complete document


That way, a content-search (i.e. content:"some text") can in real-time exclude irrelevant documents (does not contain "some"or does not contain "text") and go in detail for the remaining documents by reading the file from disk.
A search like content:holiday would not even need to go to disk as all needed information is already in the database.

This would reduce the size of the content-database greatly (I think) and still give almost real-time results.
Just a thought ..


On second thought: off-line files could give pproblems.
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: [Suggestion] Hybrid content indexing

Post by void »

It's a good idea.

It is a feature that does interest me.
It may make the content indexing smaller with the loss of full text searching.

Thank you for the suggestion.



Currently, Everything will just index the text content as is.
Generally, this is pretty small.. 1MB of text is a lot of text.



Windows indexing does a pretty good job of content indexing.
Users can search the system index with si:



Good NVMe SSDs are so fast these days, searching the raw files for content with no indexing is perfectly fine.



I'm also considering removing content indexing from the UI and making it a hidden feature.
A lot of shell extensions are buggy (PDFs particularly), providing support will be difficult.
horst.epp
Posts: 1443
Joined: Fri Apr 04, 2014 3:24 pm

Re: [Suggestion] Hybrid content indexing

Post by horst.epp »

void wrote: Mon Jan 30, 2023 1:59 am ...
I'm also considering removing content indexing from the UI and making it a hidden feature.
A lot of shell extensions are buggy (PDFs particularly), providing support will be difficult.
There are free iFilter software which runs without problems on many OS versions.
I started with Windows 7 and now I'm on Windows 11
Using PDFlib TET PDF IFilter
https://www.pdflib.com/de/download/tet-pdf-ifilter/

______________________________________________________
Windows 11 Home x64 Version 22H2 (OS Build 22621.1194)
Everything 1.5.0.1335a (x64)
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: [Suggestion] Hybrid content indexing

Post by NotNull »

void wrote: Mon Jan 30, 2023 1:59 am Windows indexing does a pretty good job of content indexing.
[...]
A lot of shell extensions are buggy (PDFs particularly).
Now I wonder what happens if these two get combined ..
(I do not have any PDF content-indexed)


horst.epp wrote: Mon Jan 30, 2023 9:48 am There are free iFilter software which runs without problems on many OS versions.
But how can Everything be sure if it is a 'good' iFilter (or even 'good' iFilter version) before starting content-indexing?
Post Reply