[Suggestion] Hybrid content indexing

Post by **NotNull** » Sun Jan 29, 2023 10:50 pm

More a wild thought than an actual suggestion ....

We see quite a few people on these forums that try to content-index too many documents. That made me think ...

- Read a document
- Grab all words
- De-duplicate that list
- Add that to the Everything database instead of the complete document

That way, a content-search (i.e. content:"some text") can in real-time exclude irrelevant documents (does not contain "some"or does not contain "text") and go in detail for the remaining documents by reading the file from disk.
A search like content:holiday would not even need to go to disk as all needed information is already in the database.

This would reduce the size of the content-database greatly (I think) and still give almost real-time results.
Just a thought ..

On second thought: off-line files could give pproblems.

Post by **void** » Mon Jan 30, 2023 1:59 am

It's a good idea.

It is a feature that does interest me.
It may make the content indexing smaller with the loss of full text searching.

Thank you for the suggestion.

Currently, Everything will just index the text content as is.
Generally, this is pretty small.. 1MB of text is a lot of text.

Windows indexing does a pretty good job of content indexing.
Users can search the system index with si:

Good NVMe SSDs are so fast these days, searching the raw files for content with no indexing is perfectly fine.

I'm also considering removing content indexing from the UI and making it a hidden feature.
A lot of shell extensions are buggy (PDFs particularly), providing support will be difficult.

horst.epp · Post by **horst.epp** » Mon Jan 30, 2023 9:48 am

void wrote: ↑Mon Jan 30, 2023 1:59 am ...
I'm also considering removing content indexing from the UI and making it a hidden feature.
A lot of shell extensions are buggy (PDFs particularly), providing support will be difficult.

There are free iFilter software which runs without problems on many OS versions.
I started with Windows 7 and now I'm on Windows 11
Using PDFlib TET PDF IFilter
https://www.pdflib.com/de/download/tet-pdf-ifilter/

______________________________________________________
Windows 11 Home x64 Version 22H2 (OS Build 22621.1194)
Everything 1.5.0.1335a (x64)

Post by **NotNull** » Mon Jan 30, 2023 11:27 am

void wrote: ↑Mon Jan 30, 2023 1:59 am Windows indexing does a pretty good job of content indexing.
[...]
A lot of shell extensions are buggy (PDFs particularly).

Now I wonder what happens if these two get combined ..
(I do not have any PDF content-indexed)

horst.epp wrote: ↑Mon Jan 30, 2023 9:48 am There are free iFilter software which runs without problems on many OS versions.

But how can Everything be sure if it is a 'good' iFilter (or even 'good' iFilter version) before starting content-indexing?

voidtools forum

[Suggestion] Hybrid content indexing

[Suggestion] Hybrid content indexing

Re: [Suggestion] Hybrid content indexing

Re: [Suggestion] Hybrid content indexing

Re: [Suggestion] Hybrid content indexing