voidhash, the ramble

Post by **therube** » Tue Feb 01, 2022 4:13 pm

voidhash, the ramble (since i'm not likely to have time, i'll just throw it it...)

---

timer.exe (from igor's, 7bench) vhash .
T:\XXX\+ok.xxx-old-o-XXX\DOWN_XXX

so it runs the hash's in parallel (or at least to some extent)?
(or at least it generates/writes goes to next hash, generates/writes...)

oh, it automatically does recursion on an entire directory tree* (see below)
- in that case, option is need to not recurs

Kernel Time = 10.732 = 7%
User Time = 87.688 = 62%
Process Time = 98.421 = 69% Virtual Memory = 1028 MB
Global Time = 141.319 = 100% Physical Memory = 1030 MB

---

timer fchash --non_stop --sha1 * > 0sha1

Total: 1115files, 11501.6MiB, 86.3sec, 133.2MiB/s

Kernel Time = 0.873 = 1%
User Time = 27.159 = 31%
Process Time = 28.033 = 32% Virtual Memory = 10 MB
Global Time = 86.338 = 100% Physical Memory = 13 MB

--

ah, you've got to disregard (entirely) the above results,
as voidhash did not complete all of the files

T:\XXX\+ok.xxx-old-o-XXX\DOWN_XXX>wc -l ..*
651 ..md5
637 ..sfv
638 ..sha1
647 ..sha256
2573 total

T:\XXX\+ok.xxx-old-o-XXX\DOWN_XXX>wc -l 0hash 0sha1
1113 0hash
1125 0sha1
2238 total

---

fchash, 7200 HDD (local SATA):
TOSHIBA HDWN180

sha1: Total: 1115files, 11501.6MiB, 86.3sec, 133.2MiB/s
xxh3: Total: 1110files, 11501.3MiB, 53.8sec, 213.9MiB/s

---

for the 638 file that vhash did hash, the hashes agreed between vhash & fchash
(had to do a little tweaking of fchash' results as they are "non-standard")

wonder if timer interfered with vhash ?
- nope

-0 was with timer, !-0 was without

T:\XXX\+ok.xxx-old-o-XXX\DOWN_XXX>wc -l ..*
653 ..md5
651 ..md5-0
642 ..sfv
637 ..sfv-0
641 ..sha1
638 ..sha1-0
649 ..sha256
647 ..sha256-0
5158 total

btw: 25% cpu (100% of 1 core), 1 GB or RAM, i/o is variable max looked to be 118 MB

heh, oh, that particular directory had (at least) a 254-char file name-part
(which again is why i'm using fchash)
& if i rename it (with Everything [next largest name-part is 219-char]...)

T:\XXX\+ok.xxx-old-o-XXX\DOWN_XXX>wc -l ..*
1121 ..md5
651 ..md5-0
653 ..md5-2
1121 ..sfv
637 ..sfv-0
642 ..sfv-2
1121 ..sha1
638 ..sha1-0
641 ..sha1-2
1121 ..sha256
647 ..sha256-0
649 ..sha256-2
9642 total

- that's better

T:\XXX\+ok.xxx-old-o-XXX\DOWN_XXX>timer vhash .
voidhash 1.0 (c) voidtools 2022
parse .
parse ....
parse .\X...

Kernel Time = 14.695 = 8%
User Time = 144.004 = 78%
Process Time = 158.699 = 86% Virtual Memory = 1028 MB
Global Time = 182.643 = 100% Physical Memory = 1030 MB

given you're computing 4 (slower at that) hashes per file, would
seem you're quite efficient.

(sha1) hashes agree (as expected) between vhash & fchash
Salamander verified the same (*but* it had to be dealing with cached data)
(so... after all that, do you mind if i get back to actually doing something with my hash'd data

, heh.)

---

T:\XXX\CE\cel\X>timer vhash .
voidhash 1.0 (c) voidtools 2022
parse .
parse ....
parse .\Y...
parse .\Y\X...

Kernel Time = 5.413 = 5%
User Time = 67.704 = 73%
Process Time = 73.117 = 79% Virtual Memory = 1028 MB
Global Time = 92.210 = 100% Physical Memory = 1030 MB

T:\XXX\CE\cel\X>timer hash --non_stop --recur * > 0sha1

Total: 263files, 5430.0MiB, 25.5sec, 213.3MiB/s

Kernel Time = 0.390 = 1%
User Time = 0.717 = 2%
Process Time = 1.107 = 4% Virtual Memory = 10 MB
Global Time = 25.479 = 100% Physical Memory = 12 MB

T:\XXX\CE\cel\X>timer hash --non_stop --recur --sha1 * > 0sha1

using "timer.exe" affects cache usage (or not)
i.e. (after having already run fchash on said directory):
> timer fchash --use_cache * > 0hash
- --use-cache, should use the cached (data)
but instead of it being any quicker, i was still running 25.seconds
vs. without "timer.exe"
> fchash --use_cache fchash * > 0hash
Total: 268files, 5430.1MiB, 2.3sec, 2320.6MiB/s
2.3 vs 25.seconds, so a 10x difference
(without --use-cache, he invalidates the cache - which is probably a good thing to do.
wonder how he does that? as a "clearchache.exe" could be very useful for various [generally benchmark] purposes)

now... why did you NOT recurs, when before you did ?
oh, you DON'T actually recurs, you just /enumerate/ ("parse") subdirectories,
but you don't actually do anything with them

T:\XXX\CE\cel\X\Y\X>timer vhash .
voidhash 1.0 (c) voidtools 2022
parse .
parse ....
parse .\Y...

Kernel Time = 0.312 = 136%
User Time = 0.000 = 0%
Process Time = 0.312 = 136% Virtual Memory = 1028 MB
Global Time = 0.228 = 100% Physical Memory = 1030 MB

well, that blows that comparison out of the water (unless i NOT --recurs, which probably makes sense, for now)

X: (only)
hash --sha1 --non_stop * > 0hash.Xonly
Total: 75files, 1378.1MiB, 6.4sec, 213.9MiB/s
so 6.sec vs. 92.sec for vhash ... ???
yep. odd, but does look to be correct.

Y: (only)
timer vhash .
Global Time = 52.609 = 100% Physical Memory = 1030 MB

timer hash --non_stop --sha1 * > 0sha1
Total: 198files, 4052.0MiB, 19.1sec, 211.9MiB/s
Global Time = 19.147 = 100% Physical Memory = 12 MB

so 19.sec vs 52.sec

hash --non_stop --use_cache * > 0hash
Total: 199files, 4052.1MiB, 20.3sec, 199.6MiB/s

so hash method will also depend on the data it's hashing
cause with these particular directories/files, there is no speed diff between sha1 & xxh128

(this all assumes i am doing things "correctly" & "accurately"

)

Post by **therube** » Tue Feb 01, 2022 4:15 pm

from the other thread, i take it vhash is /supposed/ to recurs?
it does not, that i can see?

canonicalize path argument. (convert . to an absolute path)

i have no problem with ..<name>.
matter of fact, considering it's odd, & is apt to place the file name (alphabetically) near or at the top...

FcHash (mentioned above) is part of (a stand-alone part of) FastCopy.
It does md5, sha1, sha256, sha512, & two variants of xxhash.
It's output is "non-standard", but OK, doesn't create "sidecars" (but you can always sed it).
It does do recursion & LFN. (There are some issues with LFN, but they're edge cases [well, maybe not for me

].)

Post by **therube** » Tue Feb 01, 2022 4:59 pm

> DIR

Directory of W:\X\CE\cel\X\Y\X

01/31/2022 09:57 PM 0 0hash
01/31/2022 09:55 PM 2 Åmål

-------

> fchash --non_stop * > 0hash

W:\X\CE\cel\X\Y\X\ :
Can't open "0hash" (32)
xxh3 <7d596ce5fcabaf622a2300bbd7ea6e9a>: +àm+Ñl

Total: 1files, 0.0MiB, 0.0sec, 0.0MiB/s

-------

then, i want to delete the file (based on the data in 0hash)
for which i do something *like*:
%s/............/del "/
%s/$/"/

which i then write to a batch file which ends up something like:

000.bat:
del "+àm+Ñl"

which is fine, except "+àm+Ñl" is not seen by DEL on my end,
rather what is seen is "Åmål"
so the DEL (attempts to delete) a "non-existent" file

-------

Windows (7) en-US
not sure where the issue lies?
maybe i need to do a chcp first (which i'm not really familiar with) ?

-------

Microsoft Windows [Version 6.1.7601]

>777 a 0 *

7-Zip 21.07 (x64) : Copyright (c) 1999-2021 Igor Pavlov : 2021-12-26

Scanning the drive:
7 files, 2082 bytes (3 KiB)

Creating archive: 0.7z

Add new data to archive: 7 files, 2082 bytes (3 KiB)

Files read from disk: 7
Archive size: 1232 bytes (2 KiB)
Everything is Ok

---

>777 l 0.7z

7-Zip 21.07 (x64) : Copyright (c) 1999-2021 Igor Pavlov : 2021-12-26

Scanning the drive for archives:
1 file, 1232 bytes (2 KiB)

Listing archive: 0.7z

--
Path = 0.7z
Type = 7z
Physical Size = 1232
Headers Size = 259
Method = LZMA2:12
Solid = +
Blocks = 1

Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2022-01-31 22:09:53 ....A 126 973 ..md5
2022-01-31 22:09:53 ....A 54 ..sfv
2022-01-31 22:09:53 ....A 150 ..sha1
2022-01-31 22:09:53 ....A 222 ..sha256
2022-01-31 22:09:05 ....A 986 0fchash2.txt
2022-01-31 21:57:14 ....A 542 0hash
2022-01-31 21:55:55 ....A 2 +mÕl
------------------- ----- ------------ ------------ ------------------------
2022-01-31 22:09:53 2082 973 7 files

---

heh, so 7-zip sees the name differently yet again
7-zip sees the name "correctly"
(they're all correct, depending how you look at things)
7-zip's is "most" correct (IMO)

(wonder how the board will render all of this

)
(& while i mention "fchash", the same also applies to voidhash, md5: 49f68a5c8493ec2c0bf489821c21fc3b +àm+Ñl)

Post by **void** » Wed Feb 02, 2022 10:31 am

Thank you for the feedback therube,

There's plenty of room for improvements to the hashing speed.
At the moment voidhash uses a single thread.

The large memory usage is not required.
I was just using a massive buffer as I run voidhash on a network share.
It is currently set to 1GB and only needs to be about 8MB.
I found many small 8MB reads would really hammer my file server so this will need to be an option.

The recursion issue might have something to do with a relative paths argument.
I will investigate.

FYI: voidhash stores hash sidecar files as UTF-8.

raccoon · Post by **raccoon** » Wed Feb 02, 2022 3:33 pm

FYI: voidhash stores hash sidecar files as UTF-8.

What does that mean precisely? I never understood the nuance of that statement.

Does the file have a BOM header, or is it an ASCII text file with individual UTF-8 characters appearing if/when needed?

Post by **NotNull** » Wed Feb 02, 2022 7:34 pm

raccoon wrote: ↑Wed Feb 02, 2022 3:33 pm Does the file have a BOM header, or is it an ASCII text file with individual UTF-8 characters appearing if/when needed?

What I "know" about it:

When you have a gazillion unicode characters to represent (let's say 16 777 216 or 2^24), you could use 24 bits to encode each character [1].
This way, "abc" would require 3 * 24 bits = 9 bytes.

That is an enormous waste of 'empty' bits, as most of the time the regular 128 lower-ASCII characters (a-z, 0-9, etc) are used.
And they require just one single byte (7 bits to be precise) to be stored.

UTF-8 stores all characters in a variable number of bytes. ASCII characters in 1 byte and some random smiley in 3bytes. That way an average text will consume far less space on disk.

In UTF-8 the first couple of bits of each new character indicate how many bytes are going to be used for that character. That way it is also known where the next character starts.

Example: the 128 ASCII characters fit in 1 byte and all start with a 0.
So if the first bit is a 0, it *has* to be a "1-byte character" and therefor the next character starts 1 byte further down the line.

I don't know what 'length-signatures' are being used for multi-byte characters, but let's say starting with bits 1111 indicates a 3-byte character, so 1111 0000 0000 0000 0000 0001 is that smiley and thus the nxt character starts 3 bytes later.

In reality, a unicode character can use up to 4 bytes (32 bits).

So, UTF-8 does not use a 'header' (Byte Order Mark) to indicate it's encoding. The encoding is implicit and can in some cases cause wrong interpretation of the encoding. The program reading the UTF-8 file - let's say Notepad - makes an educated guess if it is indeed an UTF-8 encoded file or maybe even ANSI or an entirely different encoding. Luckily, this educated guess turns out to be pretty reliable.

UTF-8 BOM encoded files are UTF-8 encoded files, but with a 'header' that indicates that it is indeed an UTF-8 file. Programs don't have to make their educated guess in that case. On the other hand: UTF-8 has more widespread support.

end of ramble/memory dump

[1] It is actually some sort of pointer to a unicode character, but for simplicity ..

Post by **void** » Thu Feb 03, 2022 10:33 am

What does that mean precisely? I never understood the nuance of that statement.

It should really read as:

voidhash stores hash sidecar files with UTF-8 encoding.

There's no BOM.
ASCII characters will take a single byte.
non-ASCII characters will use a special encoding that can be 2-4 bytes long.

.sfv, .md5, .sha1 and .sha256 do not appear to have an encoding standard.
UTF-8 will most likely offer the best compatibility.

UTF-8

Post by **void** » Sat Feb 05, 2022 9:08 am

voidhash-1.0.1.2

canonicalize path argument. (convert . to an absolute path)
remove trailing \ in path argument.
disable recursion option -nosubfolders
convert forward slashes (/) to backslashes (\).
added hash algorithm switches. (if none specified use all, eg: -sha256 -sha1 -md5 -sfv)
added multiple thread support. (one thread per algorithm)
added timing information.
added long filename support.
added sha512 support.

Post by **therube** » Mon Mar 21, 2022 6:55 pm

(more rambling...)

hashes
- once you have these file hash histories, what do you do with them?

void writes individual hashes into each individual directory
- at which point you can check again said individual directories,
which is fine for /that/ directory

but how do you handle directory trees (with individual hashes in each)?

what i did above [now below], was to generate hashes for an entire tree,
stored in a single file. doing that on both source & destination,
i can compare one against the other & very quickly & easily confirm
that they are the same

(granted, FcHash does not use a "standard" layout, but comparing
against another FcHash output is easy [using Salamander, WinMerge,
ExamDiff...]. anything else, you'd have to parse the output..)

(xxhash itself, can check its' output files; --check, but it does
not handle LFN nor does it do directory recursion [trees])

Post by **therube** » Mon Mar 21, 2022 6:56 pm

hashtree:
i/o - R+O: 180 MB W: 2 kB (C: drive, with a | tee)
hash --xxh --recur --non_stop . > 0hash (R: drive, NO tee)
i/o, looks to be the same, so | tee, or not doesn't really matter...

Total: 101393files, 507767.8MiB, 2880.0sec, 176.3MiB/s
Total: 101394files, 507767.9MiB, 2889.1sec, 175.8MiB/s

SAME EXACT TIME !
(& this is both on M: & R: (a "SMR" drive)

---

FastCopy:

M: to R:

TotalRead = 507,767 MiB
TotalWrite = 507,767 MiB
TotalFiles = 101,439 (8,312)
TotalTime = 48:41
TransRate = 184.7 MB/s
FileRate = 35.2 files/s

hashtree.bat:

FcHash.exe --xxh --recurs --non_stop . 2>&1 | tee 0hashtree

NO.tee:

FcHash.exe --xxh --recurs --non_stop . > 0hashtree

M: (tosh, 6TB, 7200)
- Total: 101394files, 507767.9MiB, 2889.1sec, 175.8MiB/s
- (process hacker): i/o: R+O: 180 MB W: 2 kB (M: drive, with a | tee)

R: (wdc, 6TB, 5400 + "SMR", no less [connected externally, via eSATA])
[^--- for my needs, i'm not seeing ANY issues with this "SMR"]
- <
- (process hacker): i/o, looks to be the same, so '| tee', or not is irrelevant (R: drive, NO tee)
SAME EXACT TIME with 2>&1 (displaying to console) | tee 0hashtree (& tee'ing output to 0hashtree) !

NOW, run this by me... ???
TotalTime = 48:41, TransRate = 184.7 MB/s
Total: 2889.1sec, 175.8MiB/s
SO... the hash (generation, after the fact)
- took exactly the same amount of time, as did the actual copy ???

& that means just what... ?
that "bus" throughput is the limiting factor here?
& that is slower then the max capibilties of the copy/hash?
- or said the other way around, the copy/hash were max'ing out the bus i/o?

is that what is happening?
am i reading this correctly?

("bus" is what i seemed to run into, before [noted elsewhere... perhaps])
& if that is the case, when does "faster", more efficient come into play?
- when the bus i/o ("width") is > what a copy/hash can throw at it,
& in those cases, a faster copy, hash algo, will copy/compute quicker...

note that this is between physically different volumes
(internal HDD & external via eSATA)

- what happens when you're dealing with a single drive; for the copy/hash?

(i /believe/ FastCopy does, by default, a hash <xxHash3 selected> as it
[^--- raccoon, confirm ?]
runs ??? [all below checkboxes are unchecked. last option is, "if not
verifying, record the src hash value in the file log", & no hashes are
in the log, so it must have been computing them on the fly... & if that
is the case, why did it not tee them to a file, in any case - because
seems it is generating it, at the least, for the source files...)
[no method to check .log against destination "after the fact". suppose
you'd have to manually parse the log...?]

FC, mainly cause it can handle LFN (& also maintains [file &] directory,
dates). otherwise, depending on situation, no LFN, Salamander is great,
or FreeFileSync (does LFN, but does not maintain dir dates) - depending
on the situation. (Salamander is great for dir trees. FC or FFS for
more diverse copy situations, with FFS having a [generally] better UI.

now, when "bus" is not in the loop, & there is enough "bandwidth", i.e.,
checking individual files on a single drive, a faster hash (xxhash) is
certainly faster then a slower hash (in most all cases, & at the least,
as fast as the next "slower" hash)

Post by **void** » Tue Mar 22, 2022 7:40 am

once you have these file hash histories, what do you do with them?

I'm personally using them to verify the integrity of files.

Hashes might be useful to find and re-download the original data in case of data loss.

void writes individual hashes into each individual directory
- at which point you can check again said individual directories,
which is fine for /that/ directory

but how do you handle directory trees (with individual hashes in each)?

voidhash doesn't handle directory trees.
Each folder gets its own parent-name.hash file.
Each hash file only stores hashes for files in the current folder.

voidhash will parse subfolders.

NOW, run this by me... ???
TotalTime = 48:41, TransRate = 184.7 MB/s
Total: 2889.1sec, 175.8MiB/s
SO... the hash (generation, after the fact)
- took exactly the same amount of time, as did the actual copy ???

Seems possible.
I get around 150MB/s hash rates (@ 4ghz).
It will come down to single thread CPU speeds and IO bandwidth.

& that means just what... ?
that "bus" throughput is the limiting factor here?
& that is slower then the max capibilties of the copy/hash?
- or said the other way around, the copy/hash were max'ing out the bus i/o?

It will usually come down to the sha256 hash rate.
sha256 being the slowest algorhim.
voidhash will use a thread for each algorthim.

on a 4ghz CPU I would expect around 150MB/s for the hash rate.
Then you are limited to IO speed.

- what happens when you're dealing with a single drive; for the copy/hash?

You could run multiple voidhashes for improved performance if you have spare IO bandwidth.
You would have to specify different directories. Otherwise, you would end up hash the same content.

voidhash uses 4 threads for each algorthim by default.
You would need 4 logical CPUs available for each voidhash process to see any performance increase.

I recommend only one process to avoid HDD thrashing.

Post by **therube** » Mon Feb 06, 2023 4:25 pm

(best i can tell...)

In order to "correctly" benchmark
YOU MUST CLEAR CACHE

How? Good question?
Best I can figure is to use FcHash.exe, even momentarily (a few seconds) on your wanted file
> FcHash.exe <filename>, wait 1000 wait 2000 wait 3000, ^C
Best I can figure, doing that invalidates the *cache*, so subsequent tests
will not be using RAM cached data, but actually will read from disk.

(FcHash, does that by default. all others seem to read from RAM, if available)

So... it seems we ARE speed limited by the "bus", I/O, throughput.
IOW, on "slow" devices, your choice of hash is (relatively) unimporant,
as all will return the SAME throughputs.

Whereas (program) "benchmarks" show definite differences between different
hash algorithms.

Input file(s) were all ~2 GB (or was it ~2. GB or ~2.0 GB

)
(below, references to "hash" is actually "FcHash")

-------

E:\Users\RUBEN>xxh -b

Code: Select all

xxh 0.8.1 by Yann Collet
compiled as 64-bit x86_64 autoVec little endian with GCC 11.2.0
Sample of 100 KB...
 1#XXH32                         :     102400 ->    71088 it/s ( 6942.2 MB/s)
 3#XXH64                         :     102400 ->   142163 it/s (13883.1 MB/s)
 5#XXH3_64b                      :     102400 ->   183911 it/s (17960.1 MB/s)
11#XXH128                        :     102400 ->   183733 it/s (17942.7 MB/s)

xxHash, 4 (or more?) different hash methods
- xxh32 is substantially "slower" then the others
- xxh64 is 2x xxh32 & close to xxh128 / xxxh3
- xxxh128 & xxh3 are essentially the same (on my end)

E:\Users\RUBEN>rhash -B <hash>
RHash v1.4.3 benchmarking...

Code: Select all

	E:\Users\RUBEN>rhash -B --md5
	MD5       2048 MiB total in 3.198 sec,  640.400 MBps,  CPB=4.90
	
	E:\Users\RUBEN>rhash -B --sha1
	SHA1      2048 MiB total in 3.927 sec,  521.518 MBps,  CPB=6.00
	
	E:\Users\RUBEN>rhash -B --sha256
	SHA-256   2048 MiB total in 9.789 sec,  209.214 MBps, CPB=15.02
	
	E:\Users\RUBEN>rhash -B --sha512
	SHA-512   1024 MiB total in 3.049 sec,  335.848 MBps,  CPB=9.34
	
	E:\Users\RUBEN>rhash -B --edonr512
	EDON-R512 2048 MiB total in 1.317 sec, 1555.049 MBps,  CPB=2.02

sha256 is oddly slow, or inefficient?
oh, --sha512 was only using 256 MB x 4 (1024 MB) inputs, where the others were at 512 MB x 4 (2048 MB) ?
& look at edonr-512, FAST, compared to the other rhash's's run
& look at edonr-512, PALTRY, compared to xxHash (now the "benches" won't be comparable, but the MB/s are)

Win7 x64, i5-3570k, 16GB RAM, spinners

ALL the benchmark MB/s are SUBSTANTIALLY higher then any DRIVE i have
xxhash theoretical 7000-18000 MB/s vs. (my) maximum disk of 200 MB/s

i do not have SSD to test on (here)

now, if a file is cached in RAM, that is a different matter...

Code: Select all

T:\>hash --xxh     "As1.88GG"
Total: 1files, 1932.1MiB, 9.7sec, 198.8MiB/s (initial run, not cached)

T:\>hash --xxh     "As1.88GG"
Total: 1files, 1932.1MiB, 0.9sec, 2213.2MiB/s

T:\>hash --xxh3    "As1.88GG"
Total: 1files, 1932.1MiB, 0.9sec, 2213.2MiB/s

T:\>rhash --edonr512   "As1.88GG" --speed
Calculated in 2.243 sec, 861.39 MBps
Total 2.269 sec, 851.52 MBps

Again note the MB/s (& sec, 9.0 vs 0.9 vs 2.2), not cached, cached, then the "fastest" hash (perhaps) available via rhash.

-------

"LFN", in this case with a name part len=246 (so not even a real "LFN"). [DIR fails this also.]

Some utilities can handle "LFN", some, not. And "LFN" could apply anywhere, name & or name+path parts.
All utilities run from command line, C:\> prog.exe <LFN>.

Code: Select all

fsum  - fail
exf   - fail
xxh   - fail
7-zip - OK
hash  - OK
rhash - OK

TIMETHIS - fail
(I didn't check, but i'm sure voidhash - OK)

Results may differ if you use some other method, like a batch file, SendTo, d&d,...
- when initiated from Everything (as Everything may send a SFN in some cases).
(IOW, using Everything may allow access to a file in particular other utilties where you
might not be able to, otherwise [except perhaps by manually providing the program the 8.3].)

Aside from "LFN", "unicode" may also be an issue.

-------

Now if your intent is to verify that a file on disk is "correct",
if you happen to be reading RAM cache, you are not reading "disk".

VerifyCopiedFiles:
If active, FreeFileSync will binary-compare source and target files after copying and report verification errors. Note that this may double file copy times and is no guarantee that data has not already been corrupted prior to copying. Additionally, corruption may be hidden by deceptively reading valid data from various buffers in the application and hardware stack:

https://freefilesync.org/manual.php?top ... opiedFiles

Does the CopyFile function verify that the data reached its final destination successfully?

-------

my C: & T: drives, both Toshibas, both 7200, certainly have different "speed"

I:, 5700 RPM, internal SATA, is 2x the speed of W:, but less then H:
H:, 5940 RPM, internal SATA, does rather respectfully

W:, WD "3.0" spinner on USB 2.0 port, is a slug, 33 MB/s
K:, Kingston USB "3.2" flash drive on USB 2.0 port, is even more so, 18 MB/s (though it seems it will be reliable

, so who cares)

Post by **therube** » Mon Feb 06, 2023 4:36 pm

xxHash - Extremely fast hash algorithm
FSUM FAST FILE INTEGRITY CHECKER
exf is a console (command line) Windows utility similar to fsum, md5sum, sha1sum, etc.
7-Zip is a file archiver with a high compression ratio.
FastCopy is the Fastest Copy/Backup Software on Windows (FcHash.exe is included)
RHash (Recursive Hasher) is a console utility for computing and verifying hash sums of files.
voidhash creates a .sfv, .md5, .sha1 and .sha256 file containing a hash and filename for all files in the specified path (recurses subfolders).

Post by **therube** » Mon Feb 06, 2023 4:44 pm

Input file(s) were all ~2 GB (or was it ~2. GB or ~2.0 GB

)
(below, references to "hash" is actually "FcHash.exe")

-------

I: Hitachi SATA/600 256 KB Cache 5700 RPM, DDT-1.91GB

I:\>timethis hash --xxh ddt*
TimeThis : Command Line : hash --xxh ddt*
Total: 1files, 1958.3MiB, 27.4sec, 71.4MiB/s

I:>timethis rhash --sha1 ddt* --speed
Calculated in 27.241 sec, 71.89 MBps
Total 27.261 sec, 71.83 MBps
TimeThis : Elapsed Time : 00:00:27.326

I:\>timethis rhash --edonr512 ddt* --speed
Calculated in 27.231 sec, 71.91 MBps
Total 27.263 sec, 71.83 MBps
TimeThis : Elapsed Time : 00:00:27.326

-------

C-E-Y: Tosiba 7200, SW-191GB

Total: 1files, 1960.7MiB, 15.8sec, 124.0MiB/s
TimeThis : Command Line : hash --xxh sweety*
TimeThis : Elapsed Time : 00:00:15.885

Total 15.908 sec, 123.25 MBps
TimeThis : Command Line : rhash --edonr512 sweety* --speed
TimeThis : Elapsed Time : 00:00:15.975

Total 16.403 sec, 119.53 MBps
TimeThis : Command Line : rhash --sha1 sweety* --speed
TimeThis : Elapsed Time : 00:00:16.462

Total: 1files, 1960.7MiB, 16.3sec, 120.4MiB/s
TimeThis : Command Line : hash --xxh3 sweety*
TimeThis : Elapsed Time : 00:00:16.343

Total 18.707 sec, 104.81 MBps
TimeThis : Command Line : rhash --md5 sweety* --speed
TimeThis : Elapsed Time : 00:00:18.768

-------

W: Western Digital USB 3.0 (connected to USB 2.0 port) SATA/600 5400 RPM, La-1.93GB

Total 61.433 sec, 32.20 MBps
TimeThis : Command Line : rhash --md5 "La
TimeThis : Elapsed Time : 00:01:01.506

Total: 1files, 1978.1MiB, 59.6sec, 33.2MiB/s
TimeThis : Command Line : hash --xxh "La
TimeThis : Elapsed Time : 00:00:59.689

Total 61.348 sec, 32.24 MBps
TimeThis : Command Line : rhash --sha1 "La
TimeThis : Elapsed Time : 00:01:01.411

Total: 1files, 1978.1MiB, 59.7sec, 33.2MiB/s
TimeThis : Command Line : hash --xxh3 "La
TimeThis : Elapsed Time : 00:00:59.809

Total 61.499 sec, 32.16 MBps
TimeThis : Command Line : rhash --edonr512 "La
TimeThis : Elapsed Time : 00:01:01.551

*IF* run from RAM cache, 2.sec vs. 60.sec

Total 2.405 sec, 822.49 MBps
TimeThis : Command Line : rhash --edonr512 "La
TimeThis : Elapsed Time : 00:00:02.475

Total 4.869 sec, 406.26 MBps
TimeThis : Command Line : rhash --sha1 "La
TimeThis : Elapsed Time : 00:00:04.933

Total 4.155 sec, 476.08 MBps
TimeThis : Command Line : rhash --md5 "La
TimeThis : Elapsed Time : 00:00:04.216

Total 3.969 sec, 498.39 MBps
TimeThis : Command Line : rhash --blake2b "La
TimeThis : Elapsed Time : 00:00:04.033

Total 5.632 sec, 351.22 MBps
TimeThis : Command Line : rhash --blake2s "La
TimeThis : Elapsed Time : 00:00:05.697

Total 2.220 sec, 891.04 MBps
TimeThis : Command Line : rhash --crc32 "La
TimeThis : Elapsed Time : 00:00:02.384

-------

W: Western Digital external USB 3.0 (though plugged in to USB 2.0 port) SATA/600 5400 RPM

Total 60.823 sec, 32.24 MBps
TimeThis : Command Line : rhash --md5 sweety* --sp
TimeThis : Elapsed Time : 00:01:00.889

Total: 1files, 1960.7MiB, 59.6sec, 32.9MiB/s
TimeThis : Command Line : hash --xxh sweety*
TimeThis : Elapsed Time : 00:00:59.646

Total 60.834 sec, 32.23 MBps
TimeThis : Command Line : rhash --sha1 sweety* --speed
TimeThis : Elapsed Time : 00:01:00.898

Total: 1files, 1960.7MiB, 59.1sec, 33.2MiB/s
TimeThis : Command Line : hash --xxh3 sweety*
TimeThis : Elapsed Time : 00:00:59.190

Total 60.514 sec, 32.40 MBps
TimeThis : Command Line : rhash --edonr512 sweety*
TimeThis : Elapsed Time : 00:01:02.772

-------

T Toshiba SATA/300 | SATA/600 7200 RPM

T:\>hash --xxh "As1.88GG"
Total: 1files, 1932.1MiB, 9.7sec, 198.5MiB/s

T:\>hash --xxh3 "As1.88GG"
Total: 1files, 1932.1MiB, 9.9sec, 195.0MiB/s

T:\>rhash --md5 "As1.88GG" --speed
Calculated in 9.744 sec, 198.29 MBps
Total 9.763 sec, 197.90 MBps

T:\>rhash --sha1 "As1.88GG" --speed
Calculated in 9.717 sec, 198.84 MBps
Total 9.738 sec, 198.41 MBps

T:\>rhash --edonr512 "As1.88GG" --speed
Calculated in 9.717 sec, 198.84 MBps
Total 9.741 sec, 198.35 MBps

T:\>hash --md5 "As1.88GG"
Total: 1files, 1932.1MiB, 9.9sec, 194.7MiB/s

T:\>hash --sha1 "As1.88GG"
Total: 1files, 1932.1MiB, 9.7sec, 198.8MiB/s

T:\>hash --sha256 "As1.88GG"
Total: 1files, 1932.1MiB, 9.7sec, 199.1MiB/s

T:\>hash --xxh512 "As1.88GG"
Total: 1files, 1932.1MiB, 9.7sec, 198.5MiB/s

-------

H: Hitachi (my disappearing drive) 5940 RPM, MUM-1.79GB

H:\III>timethis rhash --md5 mum* --speed
TimeThis : Command Line : rhash --md5 mum* --speed
Calculated in 19.151 sec, 96.23 MBps
Total 19.163 sec, 96.17 MBps
TimeThis : Elapsed Time : 00:00:19.220

H:\III>timethis hash --xxh mum*
TimeThis : Command Line : hash --xxh mum*
Total: 1files, 1842.9MiB, 19.7sec, 93.7MiB/s
TimeThis : Elapsed Time : 00:00:19.718

H:\III>timethis rhash --sha1 mum* --speed
TimeThis : Command Line : rhash --sha1 mum* --speed
Calculated in 19.150 sec, 96.23 MBps
Total 19.172 sec, 96.12 MBps
TimeThis : Elapsed Time : 00:00:19.232

H:\III>timethis hash --xxh3 mum*
TimeThis : Command Line : hash --xxh3 mum*
Total: 1files, 1842.9MiB, 19.3sec, 95.3MiB/s
TimeThis : Elapsed Time : 00:00:19.413

H:\III>timethis rhash --edonr512 mum* --speed
TimeThis : Command Line : rhash --edonr512 mum* --speed
Calculated in 19.153 sec, 96.22 MBps
Total 19.185 sec, 96.06 MBps
TimeThis : Elapsed Time : 00:00:19.252

-------

my C: & T: drives, both Toshibas, both 7200, certainly have different "speed"

I:, 5700 RPM, internal SATA, is 2x the speed of W:, but less then H:
H:, 5940 RPM, internal SATA, does rather respectfully

W:, WD "3.0" spinner on USB 2.0 port, is a slug, 33 MB/s
K:, Kingston USB "3.2" flash drive on USB 2.0 port, is even more so, 18 MB/s (though it seems it will be reliable

, so who cares)

voidtools forum

voidhash, the ramble

voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble

Re: voidhash, the ramble