Recursive dupes
Recursive dupes
Is there a way to do recursive duplicate searching ? For example, I want to find all duplicates, by comparing custom columns, comparing 2 folders like this
C:\folder1\subfolder1-1\subfolder1-2
D:\folder1\testing folders\subfolder1-1\subfolder1-2
lets say folder1\ in C:\ and D:\ matches but subfolder1-1\ in C:\folder1\ and D:\folder1\testing folders doesn't match then how can I ensure that I don't delete folder1\ in either volumes not knowing that everything below it doesn't match ?
C:\folder1\subfolder1-1\subfolder1-2
D:\folder1\testing folders\subfolder1-1\subfolder1-2
lets say folder1\ in C:\ and D:\ matches but subfolder1-1\ in C:\folder1\ and D:\folder1\testing folders doesn't match then how can I ensure that I don't delete folder1\ in either volumes not knowing that everything below it doesn't match ?
Re: Recursive dupes
What we need is an easy way to produce a column that shows relative paths below the starting directories of A and B that are being compared for dupes. I cannot think of an easy means to do this yet.
Everything does have a Fullpath column, and also a way to turn on fullpaths under the Name column, but no convenient way to snip off the D:\Backup2005\ and the Z:\Backup2009\ prefixes so that the remainder relative paths can be name-duped against each other. You would have to perform some string manipulation for each scenario in order to design such a col1 or regmatch1 column yourself.
Everything does have a Fullpath column, and also a way to turn on fullpaths under the Name column, but no convenient way to snip off the D:\Backup2005\ and the Z:\Backup2009\ prefixes so that the remainder relative paths can be name-duped against each other. You would have to perform some string manipulation for each scenario in order to design such a col1 or regmatch1 column yourself.
Re: Recursive dupes
I think a better approach would be if then else statement used in a search syntax...raccoon wrote: ↑Sat Feb 11, 2023 3:25 am What we need is an easy way to produce a column that shows relative paths below the starting directories of A and B that are being compared for dupes. I cannot think of an easy means to do this yet.
Everything does have a Fullpath column, and also a way to turn on fullpaths under the Name column, but no convenient way to snip off the D:\Backup2005\ and the Z:\Backup2009\ prefixes so that the remainder relative paths can be name-duped against each other. You would have to perform some string manipulation for each scenario in order to design such a col1 or regmatch1 column yourself.
Another idea I have is to use something like viceversa pro does...its like a folder comparison...but it lacks an organized view of dates and times but it places a red dot on either side of a folder compare to view which is older/newer...this only works for files as their company is reluctant for updates ...maybe you could perhaps incorporate some of their leverage in terms of comparing folders/files? Their website is: https://www.tgrmn.com/?camp=goog_cmt&gc ... gJB1fD_BwE
Another prospect is to have a folder tree view together with custom columns? You already have a folders view to begin with..
Re: Recursive dupes
How large are the folders?
Would calculating the sha256 sum of a folder be useful?
Calculate the sha256 sum of just content or filenames and content? or an option to choose?
I have a compare folders feature in the works (has been on my TODO list for a long time)
Check Tools -> Compare Folders for an idea of how this might work..
The plan is to be able to compare two folders from either the index, a folder from disk or a file list.
Would calculating the sha256 sum of a folder be useful?
Calculate the sha256 sum of just content or filenames and content? or an option to choose?
I have a compare folders feature in the works (has been on my TODO list for a long time)
Check Tools -> Compare Folders for an idea of how this might work..
The plan is to be able to compare two folders from either the index, a folder from disk or a file list.
Re: Recursive dupes
The folders range from 1 GB up to 22 GB folders. I'm not certain what sha256 is? Is it like a unique identifier of each file and folder like spectrometry ?void wrote: ↑Fri Feb 17, 2023 4:55 am How large are the folders?
Would calculating the sha256 sum of a folder be useful?
Calculate the sha256 sum of just content or filenames and content? or an option to choose?
I have a compare folders feature in the works (has been on my TODO list for a long time)
Check Tools -> Compare Folders for an idea of how this might work..
The plan is to be able to compare a two folders from either the index, a folder from disk or a file list.
Yes, I saw the compare folders and I was shocked that I couldn't do anything with it lol. I saw it in the 1338 update. Something like a blend of vice versa pro and everything would make best use of compare folders
Re: Recursive dupes
SHA256 is a hash algorithm.
Calculating the folder sha256 involves calculating the sha256 hash value from all the file content inside this folder/subfolders.
The final folder sha256 hash will be unique.
You can compare it against another folder.
If the hashes match, the folders contain the same data.
If the hashes differ, the folders contain different data. (could be missing data, modified data or added data)
Calculating the hash will take some time.
A rough guess is around 100MB/s
7zip can calculate the sha256 hash of a folder (file content, or file content and filenames)
It might be useful to add these properties to Everything.
Calculating the folder sha256 involves calculating the sha256 hash value from all the file content inside this folder/subfolders.
The final folder sha256 hash will be unique.
You can compare it against another folder.
If the hashes match, the folders contain the same data.
If the hashes differ, the folders contain different data. (could be missing data, modified data or added data)
Calculating the hash will take some time.
A rough guess is around 100MB/s
7zip can calculate the sha256 hash of a folder (file content, or file content and filenames)
It might be useful to add these properties to Everything.
Re: Recursive dupes
100 Mb/s is more than enough for my purposes. Usually my folders are less than 7 or 8 GB in total size. There are only a few excwptions for creating large archive zip files which can be from 20-30 GB total size.void wrote: ↑Fri Feb 17, 2023 7:19 am SHA256 is a hash algorithm.
Calculating the folder sha256 involves calculating the sha256 hash value from all the file content inside this folder/subfolders.
The final folder sha256 hash will be unique.
You can compare it against another folder.
If the hashes match, the folders contain the same data.
If the hashes differ, the folders contain different data. (could be missing data, modified data or added data)
Calculating the hash will take some time.
A rough guess is around 100MB/s
7zip can calculate the sha256 hash of a folder (file content, or file content and filenames)
It might be useful to add these properties to Everything.
I do agree calculating hash 256 for file and folder contents or times and filenames and folder names would be one more effective way to identify duplicates among long line codes of search syntaxes of finding duplicates or similarities.
Re: Recursive dupes
If I've not misread, I think your ask isn't to detect whether two folders are identical, but to identify which files make the two folders nonidentical. Or more specifically, which files ARE identical so they can be deleted to save space and allow the remaining dissimilar files to be given attention to figure out why they are dissimilar.
At least that's usually my objective in detecting duplicates. Dismiss the duplicates and scrutinize the remaining files for quality or modernity.
At least that's usually my objective in detecting duplicates. Dismiss the duplicates and scrutinize the remaining files for quality or modernity.
Re: Recursive dupes
Again I'll point to, viewtopic.php?p=53680#p53680.
IMO, you use the tool that is correct for the job.
So sometimes you use 1, next time you use the other, & somethings you mix & match.
Since Everything is great at finding everything, once you've found your wanted files or file locations, take that information & plug it into a different tool.
If you want to "sync" or "update" a sync or update tool is what you need.
So use a sync tool, be it ViceVersa or FreeFileSync or...
If you want to compare directories, or directory trees (& then maybe update 1 to the other or whatnot),
IMO Salamander's Directory Compare feature works very well, in particular with 2 particular directories,
very easily pointing out differences (based on various criteria).
It can also handle trees (subdirectories). It will identify "different" branches, but at that point, that is
all that you know, they're different, so it is a bit limited in that respect. You'd have to actually traverse
into those directories (again running a Directory Compare) to know what specifically is different.
(And that is not to say dealing with trees is lacking, it just may not be the correct tool to use - depending
on a particular situation.)
IMO, you use the tool that is correct for the job.
So sometimes you use 1, next time you use the other, & somethings you mix & match.
Since Everything is great at finding everything, once you've found your wanted files or file locations, take that information & plug it into a different tool.
If you want to "sync" or "update" a sync or update tool is what you need.
So use a sync tool, be it ViceVersa or FreeFileSync or...
If you want to compare directories, or directory trees (& then maybe update 1 to the other or whatnot),
IMO Salamander's Directory Compare feature works very well, in particular with 2 particular directories,
very easily pointing out differences (based on various criteria).
It can also handle trees (subdirectories). It will identify "different" branches, but at that point, that is
all that you know, they're different, so it is a bit limited in that respect. You'd have to actually traverse
into those directories (again running a Directory Compare) to know what specifically is different.
(And that is not to say dealing with trees is lacking, it just may not be the correct tool to use - depending
on a particular situation.)
Re: Recursive dupes
If you're looking for "equality" (duplicates), then IMO you don't need a "crypto" hash (like md5, sha1, sha256...) - where other, faster hashes exist - just not crypto, but will give equally relevant results.sha256 hash ... 100 Mb/s is more than enough
"100 Mb/s".
"I/O" & cache matter.
IOW, your hash program can only hash as quickly as it can read (be sent) data.
So if your "sending" is slow, the hash you use (as far as speed is concerned) is relatively unimportant.
So if you're sending at 30Mb/s & your slowest hash reads at 100Mb/s, it doesn't matter.
But if you're sending at 20000Mb/s, then you will most certainly see a difference between faster & slower hashes.
Cache matters. (And if you have lots of cash you can purchase a very large cache.)
If you're reading from "cache" (of various sorts) that can be exponentially faster then reading "cold".
Don't recall, maybe it was a 4GB file or something? But... speed was along the lines of...
Code: Select all
IF a file is cached (& you use --use_cache, hash can COMPUTE the hash (2.2 sec)
quicker then sha1 can VERIFY the hash (13.7 sec)
if file is not cached, you're limited by "BUS", & everything,
compute & check will both take (2.min 30. sec)
IMO, sha256 is overkill - depending on needs.)
Re: Recursive dupes
Uh...im lost for words but I think directory compare would suffice for my purpose
Re: Recursive dupes
I probably said this before, but I think the current setup of Everything is not well suited for such queries.
In Everything you can search for a file by specifying part of that files name, specifying a date range and a wealth of other attributes/propertiesthat identify this file. Everything is very good at that.
Relations with other files/folders is basically outside Everything's comfort-zone.
Compare it to people:
I'm looking for a male person, between 18 and 80 years old, brown hair and shoesize 12.
That's easy (if you're Everything )
Everything already has implemented functions for close relationships, like parent, child and brother/sister. All require specific functions. That list is quite long already.
I bet that in the future someone want to find male a person, between 18 and 80 years old, brown hair and shoesize 12, who has an uncle with a dog named Tarzan.
The list of (complicated) relation-functions will explode. It will be ugly (it already is, imo).
IF this relationship-feature will be part of Everything, a separate interface is more fitting here, with 3 main input fields:
1. Main object properties (use Everything search syntax)
2. Relationship(s) (haven't thought this through; maybe clicking a tree in a GUI or selecting relationship from a list)
3. Relation properties (use Everything search syntax or "same xyz property as main object")
(there can be more than 1 relation, so relation1 + relation1 properties;relation2 + relation2 properties
(and maybe a SQL syntax for complex queries. or even better: SQLplus query syntax)
Then a lot of those child- parent- sibling- functions can be removed.
Anyway, that is my opinion. Back on topic ....
I just wrote a filter that should be able to get this done.
In my 2 minutes of testing, this worked perfectly. Ergo: this is perfect
Create a new Filter:
How to use:
In Everything you can search for a file by specifying part of that files name, specifying a date range and a wealth of other attributes/propertiesthat identify this file. Everything is very good at that.
Relations with other files/folders is basically outside Everything's comfort-zone.
Compare it to people:
I'm looking for a male person, between 18 and 80 years old, brown hair and shoesize 12.
That's easy (if you're Everything )
Everything already has implemented functions for close relationships, like parent, child and brother/sister. All require specific functions. That list is quite long already.
I bet that in the future someone want to find male a person, between 18 and 80 years old, brown hair and shoesize 12, who has an uncle with a dog named Tarzan.
The list of (complicated) relation-functions will explode. It will be ugly (it already is, imo).
IF this relationship-feature will be part of Everything, a separate interface is more fitting here, with 3 main input fields:
1. Main object properties (use Everything search syntax)
2. Relationship(s) (haven't thought this through; maybe clicking a tree in a GUI or selecting relationship from a list)
3. Relation properties (use Everything search syntax or "same xyz property as main object")
(there can be more than 1 relation, so relation1 + relation1 properties;relation2 + relation2 properties
(and maybe a SQL syntax for complex queries. or even better: SQLplus query syntax)
Then a lot of those child- parent- sibling- functions can be removed.
Anyway, that is my opinion. Back on topic ....
That looked like a bit of fun to have (thanks!).
I just wrote a filter that should be able to get this done.
In my 2 minutes of testing, this worked perfectly. Ergo: this is perfect
Create a new Filter:
Code: Select all
Name : Compare=
Search : regex:#quote:#regex-escape:<#element:<search:,;,1>>(\\.*$)#quote: | regex:#quote:#regex-escape:<#element:<search:,;,2>>(\\.*$)#quote: -add-column:regexmatch1
Macro : comp<search>
- enable the Compare= filter
- Use the following search syntax:
"c:\some\folder";"x:\another\path"
note the ; that separates the two folders. Don't use spaces before of after it
The extra Regular Expression Match 1 column will show the part of thepaths that come after "c:\some\folder" or "x:\another\path" - Right-click the Regular Expression Match 1 column header
- Select Find Regular Expression Match 1 duplicates
Re: Recursive dupes
I will give this a test comparing two folder structures. Does it only compare filename and foldername or does it compare only the properties under the regex column ?
Re: Recursive dupes
This Everything Filter does not summon dark forces and does not kill puppies, so you could just try and see for yourself. Takes 1 minute ...
On the other hand: this is just a demo to find out what the possibilities are.
It wil likely give false results when a foldername contains a semicolon (;), although I did not test this.
On the other hand: this is just a demo to find out what the possibilities are.
It wil likely give false results when a foldername contains a semicolon (;), although I did not test this.
Re: Recursive dupes
How can I use the normal files: and find-dupes: or dcdupe: etc..functions with regex kind of syntax? Also when you say macro: comp<search> do i have to substitute something inplace of search or is that actually part of the command?
It seems like I cannot add custom columns together with regex columns is this for a reason ?
It seems like I cannot add custom columns together with regex columns is this for a reason ?
Re: Recursive dupes
the comp: macro was added for 2 reasons
- to get a parameter to process (parameter= "c:\folder1";"x:\folder2"
- to add extra search options, like you are asking now.
So you can do things like:
(I virtually don't know anything about dupe-functions, so can't help you there)
BTW:
I wanted to call the macro comp= as it compares the same subpaths, but didn't know if that would give issues. Maybe you can test?
(and I had in mind a comp> and comp< macro to search for files that are not "on the other side". Maybe that will come someday ..)
- to get a parameter to process (parameter= "c:\folder1";"x:\folder2"
- to add extra search options, like you are asking now.
So you can do things like:
file: comp:"c:\folder1";"x:\folder2"
(I virtually don't know anything about dupe-functions, so can't help you there)
BTW:
I wanted to call the macro comp= as it compares the same subpaths, but didn't know if that would give issues. Maybe you can test?
(and I had in mind a comp> and comp< macro to search for files that are not "on the other side". Maybe that will come someday ..)
Re: Recursive dupes
NotNull wrote: ↑Sat Feb 18, 2023 7:27 pm the comp: macro was added for 2 reasons
- to get a parameter to process (parameter= "c:\folder1";"x:\folder2"
- to add extra search options, like you are asking now.
So you can do things like:
file: comp:"c:\folder1";"x:\folder2"
(I virtually don't know anything about dupe-functions, so can't help you there)
BTW:
I wanted to call the macro comp= as it compares the same subpaths, but didn't know if that would give issues. Maybe you can test?
(and I had in mind a comp> and comp< macro to search for files that are not "on the other side". Maybe that will come someday ..)
When I compare 2 folders, I am only getting the results of the 1st folder not together with the second
Re: Recursive dupes
What does your search look like? What is the active filter?
Re: Recursive dupes
My active search filter is compare=
Re: Recursive dupes
and your search query?
(replace actual paths with something similar if privacy requires so)
(replace actual paths with something similar if privacy requires so)
Re: Recursive dupes
Will this work for 3 paths or 4 ? How many paths can comp: take?
Re: Recursive dupes
The following is for 3 folders; you can expand to as many as you like. I think you will get the pattern on closer inspection...
Search for "c:\folder1";"c:\folder2";"c:\folder3" and activate the compare filter.
Code: Select all
regex:#quote:#regex-escape:<#element:<search:,;,1>>(\\.*$)#quote: | regex:#quote:#regex-escape:<#element:<search:,;,2>>(\\.*$)#quote: | regex:#quote:#regex-escape:<#element:<search:,;,3>>(\\.*$)#quote: -add-column:regexmatch1
Re: Recursive dupes
Thank you for suggesting this. I understand that the element number has changed and everything else has remained the same.NotNull wrote: ↑Fri Mar 17, 2023 8:01 pm The following is for max. 3 folders; you can expand to as many as you like. I think you will get the pattern on closer inspection...
Search for "c:\folder1";"c:\folder2";"c:\folder3" and activate the compare filter.Code: Select all
regex:#quote:#regex-escape:<#element:<search:,;,1>>(\\.*$)#quote: | regex:#quote:#regex-escape:<#element:<search:,;,2>>(\\.*$)#quote: | regex:#quote:#regex-escape:<#element:<search:,;,3>>(\\.*$)#quote: -add-column:regexmatch1
Re: Recursive dupes
Correct. Things might get clearer when the line is split up:
Code: Select all
regex:#quote:#regex-escape:<#element:<search:,;,1>>(\\.*$)#quote:
|
regex:#quote:#regex-escape:<#element:<search:,;,2>>(\\.*$)#quote:
|
regex:#quote:#regex-escape:<#element:<search:,;,3>>(\\.*$)#quote:
-add-column:regexmatch1
Re: Recursive dupes
Funny you typed it out like that because I did that with notepad just before I typed my last post lolNotNull wrote: ↑Fri Mar 17, 2023 9:30 pm Correct. Things might get clearer when the line is split up:
Code: Select all
regex:#quote:#regex-escape:<#element:<search:,;,1>>(\\.*$)#quote: | regex:#quote:#regex-escape:<#element:<search:,;,2>>(\\.*$)#quote: | regex:#quote:#regex-escape:<#element:<search:,;,3>>(\\.*$)#quote: -add-column:regexmatch1
Re: Recursive dupes
anmac1789 wrote: ↑Fri Mar 17, 2023 10:05 pmFunny you typed it out like that because I did that with notepad just before I typed my last post lolNotNull wrote: ↑Fri Mar 17, 2023 9:30 pm Correct. Things might get clearer when the line is split up:
Code: Select all
regex:#quote:#regex-escape:<#element:<search:,;,1>>(\\.*$)#quote: | regex:#quote:#regex-escape:<#element:<search:,;,2>>(\\.*$)#quote: | regex:#quote:#regex-escape:<#element:<search:,;,3>>(\\.*$)#quote: -add-column:regexmatch1
"c:\folder1";"c:\folder2";"c:\folder3"
Can I type it out like this or should I prefix the 3 paths with comp:"path to folder1";"path to folder2";"path to folder3" ??
Re: Recursive dupes
Your method will work too as it is "just" another way to activate the filter.
So you can use the following search "c:\folder1";"c:\folder2";"c:\folder3" and enable the Compare= filter fromthe list
- or-
Use the following search: comp:"c:\folder1";"c:\folder2";"c:\folder3"
So you can use the following search "c:\folder1";"c:\folder2";"c:\folder3" and enable the Compare= filter fromthe list
- or-
Use the following search: comp:"c:\folder1";"c:\folder2";"c:\folder3"
Re: Recursive dupes
TIP:
If you create the 3-folder compare filter, you can use it on 2 folders too by adding a non-existing dummy folder:
comp:"c:\folder1";"c:\folder2";"dsfdsfdvdfvfdvbdd"
Or by adding one of the folers twice:
comp:"c:\folder1";"c:\folder2";"c:\folder1"
If you create the 3-folder compare filter, you can use it on 2 folders too by adding a non-existing dummy folder:
comp:"c:\folder1";"c:\folder2";"dsfdsfdvdfvfdvbdd"
Or by adding one of the folers twice:
comp:"c:\folder1";"c:\folder2";"c:\folder1"