Suggestion: content indexing check

Discussion related to "Everything" 1.5 Alpha.
Post Reply
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Suggestion: content indexing check

Post by NotNull »

Content indexing can use a lot of RAM if not configured correctly.

Suggestion to calculate the size of all relevant files (query= c:\tools;x:\folder ext:pdf;txt ) and report this:
"content indexing will use approximately x GB RAM. Your system has y GB RAM available."

(or similar)


Will never be 100% accurate (docx is zipped, for example), but to prevent overcommitting memory
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Suggestion: content indexing check

Post by void »

It's tricky to determine as Everything is multi-threaded.

Each thread could potentially allocate gigabytes of memory for content searching.

However, Everything will avoid using more than 50% of total memory.
If more than 50% of total memory has been allocated by all active content searches, a new content search will block until memory is released from another content search thread.

To customize this 50% limit, set the ini setting: content_multithreaded_max_memory_percent

To customize the maximum number of content search threads, set the ini setting: content_max_threads

To limit the size of files to content search, include the following in your search: size:<100mb



I may have to reduce the default value for content_multithreaded_max_memory_percent, as there is probably a lot of memory overhead with the pdf iFilter and you might actually see allocation over 100% of total memory.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Suggestion: content indexing check

Post by NotNull »

what I meant was not about content searching/indexing, but about the end result. Content indexing 10GB of plain-text files will in the end add roughly 10GB to the size of the database and thus to the size of the database in RAM (ball-park numbers, of course).
When the system has 8GB RAM installed, you can predict that this is going to be an issue.

A query c:\folder ext:txt could quickly indicate the expected grow in database size.
That check can be done when OK/ Apply is pressed.
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Suggestion: content indexing check

Post by void »

Sorry, I misread content indexing as content searching..

Showing the total indexed size in the Content options page would be useful.
I'll put this on my TODO list.

This size would grow as content is indexed so you could keep a eye on the total size while content indexing takes place in the background.
If this size exceeds 100% of your total RAM a warning icon and text could be shown.

Thanks for the suggestion.

For now, Everything will include the total indexed content size in Tools -> Options -> Debug -> Statistics -> File data size.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Suggestion: content indexing check

Post by NotNull »

That could be useful too.

What I meant was as a preemptive measure:
  1. Suppose you have 8GB RAM in your system.
  2. And you configured
    - Include only folders: = C:\folder
    - Include only files: = *.txt
  3. And press the Apply button to activate this configuration
  4. Before starting the content indexing, do the following search query in Everything:
    c:\folder ext:txt
  5. That query reveals that there are 10GB worth of textfiles in C:\folder
  6. Issue a warning message: "content indexing using these setting will require approximately 10GB of RAM. Your system has 8GB installed (3GB available)

That will not help when after configuration, someone copies 30GB of text files to C:\folder, but might prevent a lot of iniial mistakes.
froggie
Posts: 300
Joined: Wed Jun 12, 2013 10:43 pm

Re: Suggestion: content indexing check

Post by froggie »

I always thought that RAM referred to real (hardware memory), where just "memory" referred to virtual memory --
Suppose you have 8GB RAM in your system.
- that looks like a real memory specification.
Real memory does not limit Everything capacity, virtual memory does. On a 64-bit Windows system you can set virtual memory as high as you have available disk space for paging, or the system might, if you have enough free space on your disks.

I have a 16GB RAM system, but I just built a 406GB Everything database by indexing lots of content, with 450 GB allocated to Windows paging.

@Void:
1. On disk It is not 406GB as shown by the statistics screen, it is smaller. Is the DB compressed (that options is no longer in the UI)?
2. If there is not enough space to write the DB when everything exits, it just goes away with no indication that it failed.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Suggestion: content indexing check

Post by NotNull »

If Everything's database does no longer fit in RAM, it will have to use virtual memory. Swapping to disk and getting the pages back to RAM will be relatively slow.

Note:
Due to the nature of the Everything database (that is: what I think I know about the inner workings), that might also give a lot of writes to disk, to the point that it might wear out an SSD fast.
void
Developer
Posts: 16672
Joined: Fri Oct 16, 2009 11:31 pm

Re: Suggestion: content indexing check

Post by void »

What I meant was as a preemptive measure:
As you said, it's tricky know the size for pdf/docs without reading the content.

What about showing a warning in the status bar when the total indexed content size exceeds your total RAM size?
I have a 16GB RAM system, but I just built a 406GB Everything database by indexing lots of content, with 450 GB allocated to Windows paging.
I do not recommend using Everything to index more content than 50% of your total RAM.
As soon as Everything states paging you will have horrible system performance issues.
You would be better of just content searching directly from disk.

Everything is designed to index at most a couple 100MB of text for when you really want instant content searching.
Content searching directly on SSDs is very fast.
Content searching directly on NVMe SSDs is extremely fast.
1. On disk It is not 406GB as shown by the statistics screen, it is smaller. Is the DB compressed (that options is no longer in the UI)?
This option was removed because when enabled it would just hurt saving and loading performance for minimal compression on disk.
Everything 1.5 does use some compression when storing to disk, so the DB size on disk should be smaller than the memory used by Everything.
2. If there is not enough space to write the DB when everything exits, it just goes away with no indication that it failed.
I'll consider showing an error when this occurs.
The next time you start Everything it will detect the incomplete database and rebuild a fresh one.
Post Reply