I have a few shares indexed on our SAN which total about 12M files and 500K folders. I need to re-scan them daily to update the index.
Having the shares on Indexes->Folders, the re-indexing takes about 16h (and frequently crashes mid-way). Change monitoring is OFF.
Indexing the same shares to EFU (everything -create-file-list) takes about 4h.
That's 4x faster.
I analyzed network accesses during both scans, and the root cause seems clear:
- EFU scanning does a depth-first scan of the tree
- Folder scan does a breadth-first scan of the tree
The breadth-first is extremely inefficient on modern storage because it does not take advantage of folder Caching. Imagine a very reduced folder set and a disk with a very small Cache, enough to hold only 5 folders:
Depth first:
Code: Select all
read \\folder1
read \\folder1\sub1 (cache hit on \\folder1)
read \\folder1\sub1\sub2 (cache hit on \\folder1, cache hit on sub1)
read \\folder1\sub1\sub2\sub3 (cache hit on \\folder1, cache hit on sub1, cache hit on sub2)
read \\folder1\sub4 (cache hit on \\folder1)
read \\folder2
read \\folder2\sub5 (cache hit on \\folder2)
Breadth first:
Code: Select all
read \\folder1
read \\folder2
read \\folder3
read \\folder4
read \\folder5
read \\folder6 (replaces folder1 in cache)
read \\folder7 (replaces folder2 in cache)
(...)
read \\folder1\sub1 (no cache hit, folder1 is no longer in cache)
read \\folder2\sub2 (no cache hit, folder2 is no longer in cache)
read \\folder3\sub3 (no cache hit, folder3 is no longer in cache)
So I think there's an easy fix for Folder scanning speed - just switch to Depth first
Thanks!