Search for duplicates with regex: fileexists: issue

Discussion related to "Everything" 1.5 Alpha.
Post Reply
programmablesoda
Posts: 18
Joined: Wed Jan 22, 2020 7:37 am

Search for duplicates with regex: fileexists: issue

Post by programmablesoda »

Background: The goal is not only to identify files that are a duplicate somewhere, but to verify that they have a duplicate in a specific drive/directory, so I don't delete all the duplicates without leaving at least one copy.

example: test.txt exists in:
c:\temp
c:\temp\backup
But doesn't exist in:
c:\archive
Deleteing all duplicates from C:\temp will delete both copies! I want to delete files only if they exist in another, specified, location (such as c:\archive).

I'm having a problem figuring out why searching for duplicates gives different results with the below regex: / fileexists:

Command 1:
<J:\200009.Archives.SCSI.24gb | "J:\Backup\not_found\200009.Archives.SCSI.24gb">
shows 19,870 files & directories (shows files & dirs from both directories)

Command 2:
<J:\200009.Archives.SCSI.24gb | "J:\Backup\not_found\200009.Archives.SCSI.24gb"> file: dupe-size:
shows 17,734 files (shows files from both directories)

Command 3:
Right click and Find MD5 duplicates and/or Right click and Find Size duplicates
shows 17,734 files (shows files from both directories)

Command 4:
<J:\200009.Archives.SCSI.24gb | "J:\Backup\not_found\200009.Archives.SCSI.24gb"> size-dupe: file: regex:J:\\200009.Archives.SCSI.24gb\\(.*) fileexists:J:\\Backup\\not_found\\200009.Archives.SCSI.24gb\\\1
shows 8861 files (shows files from just one of the directories; 17734 / 2 = 8867)

Command 5:
Right click and Find Size duplicates
shows 7454 files - Why is this less than 8861?

Command 6:
Right click and show duplicates of MD5
shows 6356 files - Why is this less than 8861? (yes, all files have MD5 pre-calculated)

I have used FreeFileSync to verify that the directories are the same (9928 files/directories).
I have the logfile and an XLS with the export of the 4 sets of files.
void
Developer
Posts: 16676
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for duplicates with regex: fileexists: issue

Post by void »

The unexpected results are most likely due to the dupe-size: search.

dupe-size: will find files with the same size as another file in the ENTIRE index (not the current results)

Instead of dupe-size:, please try the following search with Everything 1.5:
sort:size finddupes:

finddupes: will find duplicates within the current results based on the current sort order.
Use sort:size to specify the sort order. (in this case, size)

Does this improve your results?

Note:
finddupes: or Right click column header -> Find property Duplicates will remove results that are not duplicated from the result list.
DUPE is shown in the status bar to indicate results have been removed and only duplicate results are shown.
Double click the DUPE text in the status bar to restore all results.
programmablesoda
Posts: 18
Joined: Wed Jan 22, 2020 7:37 am

Re: Search for duplicates with regex: fileexists: issue

Post by programmablesoda »

Thanks. I tried that and it still showed the 7454 files.
So I deleted them.
When I search the two directories, there are clearly still duplicates.
It must be something about how I built the regex/fileexists.
<J:\200009.Archives.SCSI.24gb | "J:\Backup\not_found\200009.Archives.SCSI.24gb"> file: regex:J:\\Backup\\not_found\\200009.Archives.SCSI.24gb\\(.*) fileexists:J:\\200009.Archives.SCSI.24gb\\\1 sort:size finddupes:
shows zero files

but
<J:\200009.Archives.SCSI.24gb | "J:\Backup\not_found\200009.Archives.SCSI.24gb"> finddupes: files:
shows 10260 files

and there are clearly duplicates in both paths.
duplicates left over.png
duplicates left over.png (255.25 KiB) Viewed 3518 times
void
Developer
Posts: 16676
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search for duplicates with regex: fileexists: issue

Post by void »

It's the regex search:
regex:J:\\Backup\\not_found\\200009.Archives.SCSI.24gb\\(.*)

Only results matching the above regex search will be shown.
Are you expecting results to also be shown from J:\200009.Archives.SCSI.24gb?


Please try the following search:

file: <regex:J:\\Backup\\not_found\\200009.Archives.SCSI.24gb\\(.*) fileexists:J:\\200009.Archives.SCSI.24gb\\\1> | <regex:J:\\200009.Archives.SCSI.24gb\\(.*) fileexists:J:\\Backup\\not_found\\200009.Archives.SCSI.24gb\\\1> sort:size finddupes:


---read this as:
  • find files only
  • AND
    • find files in J:\\Backup\\not_found\\200009.Archives.SCSI.24gb where the same filename exists in J:\\200009.Archives.SCSI.24gb
    • OR
    • find files in J:\\200009.Archives.SCSI.24gb where the same filename exists in J:\\Backup\\not_found\\200009.Archives.SCSI.24gb
  • AND
  • sort by size and show only duplicated files.


file-exists:
Post Reply