More a wild thought than an actual suggestion ....
We see quite a few people on these forums that try to content-index too many documents. That made me think ...
- Read a document
- Grab all words
- De-duplicate that list
- Add that to the Everything database instead of the complete document
That way, a content-search (i.e. content:"some text") can in real-time exclude irrelevant documents (does not contain "some"or does not contain "text") and go in detail for the remaining documents by reading the file from disk.
A search like content:holiday would not even need to go to disk as all needed information is already in the database.
This would reduce the size of the content-database greatly (I think) and still give almost real-time results.
Just a thought ..
On second thought: off-line files could give pproblems.
[Suggestion] Hybrid content indexing
Re: [Suggestion] Hybrid content indexing
It's a good idea.
It is a feature that does interest me.
It may make the content indexing smaller with the loss of full text searching.
Thank you for the suggestion.
Currently, Everything will just index the text content as is.
Generally, this is pretty small.. 1MB of text is a lot of text.
Windows indexing does a pretty good job of content indexing.
Users can search the system index with si:
Good NVMe SSDs are so fast these days, searching the raw files for content with no indexing is perfectly fine.
I'm also considering removing content indexing from the UI and making it a hidden feature.
A lot of shell extensions are buggy (PDFs particularly), providing support will be difficult.
It is a feature that does interest me.
It may make the content indexing smaller with the loss of full text searching.
Thank you for the suggestion.
Currently, Everything will just index the text content as is.
Generally, this is pretty small.. 1MB of text is a lot of text.
Windows indexing does a pretty good job of content indexing.
Users can search the system index with si:
Good NVMe SSDs are so fast these days, searching the raw files for content with no indexing is perfectly fine.
I'm also considering removing content indexing from the UI and making it a hidden feature.
A lot of shell extensions are buggy (PDFs particularly), providing support will be difficult.
Re: [Suggestion] Hybrid content indexing
There are free iFilter software which runs without problems on many OS versions.void wrote: ↑Mon Jan 30, 2023 1:59 am ...
I'm also considering removing content indexing from the UI and making it a hidden feature.
A lot of shell extensions are buggy (PDFs particularly), providing support will be difficult.
I started with Windows 7 and now I'm on Windows 11
Using PDFlib TET PDF IFilter
https://www.pdflib.com/de/download/tet-pdf-ifilter/
______________________________________________________
Windows 11 Home x64 Version 22H2 (OS Build 22621.1194)
Everything 1.5.0.1335a (x64)
Re: [Suggestion] Hybrid content indexing
Now I wonder what happens if these two get combined ..void wrote: ↑Mon Jan 30, 2023 1:59 am Windows indexing does a pretty good job of content indexing.
[...]
A lot of shell extensions are buggy (PDFs particularly).
(I do not have any PDF content-indexed)
But how can Everything be sure if it is a 'good' iFilter (or even 'good' iFilter version) before starting content-indexing?