Would there be a way to figure out the text file encoding that has been used?

eswul62 · Post by **eswul62** » Sun Jul 28, 2024 11:18 am

Right now I need to open each and every text file so as to see what encoding has been used.
UTF-8 no BOM, UTF-8 with BOM, Western 1252...

Does anyone here knows a tool that can list the text encoding without having to open each and every file?
Vainly searched for it.

Thanks.

Post by **void** » Sun Jul 28, 2024 11:23 am

Everything 1.5 Byte Order Mark column and/or Character Encoding column.

*.txt addcolumn:byte-order-mark;character-encoding

eswul62 · Post by **eswul62** » Mon Jul 29, 2024 10:01 am

Wow!
Super. Obviously I wasn't aware of this.

What encodings are supported?

e.g. a text file that is Western-1252 encoded, shows up as ANSI
See: https://en.wikipedia.org/wiki/Windows-1252
(first line)

I have noticed that special characters (like: é, ë, etc) in video subtitles are commonly translated to Ã© (é) or Ã« (ë)
It has something to do with some sort of encoding problem.
see: https://www.i18nqa.com/debug/utf8-debug.html

Such text files UTF-8 encoded, when recoded to Western 1252, the issue is gone.

I guess I should consider ANSI as Western 1252

Thanks!

Post by **void** » Mon Jul 29, 2024 10:12 am

ANSI is your system code page.

The system code page can be viewed/set under Start menu -> Region and language -> Administrative -> Language for non-Unicode programs.

What encodings are supported?

UTF-8 with BOM.
UTF-16 (LE) with BOM (Unicode)
UTF-16 (BE) with BOM (Unicode Big Endian)
UTF-8 without BOM if all text is valid UTF-8
UTF-16 (LE) without BOM if text contains a NULL byte and IsTextUnicode reports Unicode.
Anything else is ANSI.

eswul62 · Post by **eswul62** » Mon Jul 29, 2024 11:19 am

Okay, many thanks indeed.
It is set to English (UK) which I believe is 1252.
Anyway, thanks again.

Post by **void** » Thu Nov 28, 2024 9:12 am

Everything 1.5.0.1384a will now treat content that is all ASCII as ANSI text. (instead of UTF-8)

voidtools forum

Would there be a way to figure out the text file encoding that has been used?

Would there be a way to figure out the text file encoding that has been used?

Re: Would there be a way to figure out the text file encoding that has been used?

Re: Would there be a way to figure out the text file encoding that has been used?

Re: Would there be a way to figure out the text file encoding that has been used?

Re: Would there be a way to figure out the text file encoding that has been used?

Re: Would there be a way to figure out the text file encoding that has been used?