Would there be a way to figure out the text file encoding that has been used?

Have a suggestion for "Everything"? Please post it here.
Post Reply
eswul62
Posts: 132
Joined: Wed Jul 31, 2013 6:07 am

Would there be a way to figure out the text file encoding that has been used?

Post by eswul62 »

Right now I need to open each and every text file so as to see what encoding has been used.
UTF-8 no BOM, UTF-8 with BOM, Western 1252...

Does anyone here knows a tool that can list the text encoding without having to open each and every file?
Vainly searched for it.


Thanks.
void
Developer
Posts: 17152
Joined: Fri Oct 16, 2009 11:31 pm

Re: Would there be a way to figure out the text file encoding that has been used?

Post by void »

Everything 1.5 Byte Order Mark column and/or Character Encoding column.

*.txt addcolumn:byte-order-mark;character-encoding
eswul62
Posts: 132
Joined: Wed Jul 31, 2013 6:07 am

Re: Would there be a way to figure out the text file encoding that has been used?

Post by eswul62 »

Wow!
Super. Obviously I wasn't aware of this.

What encodings are supported?

e.g. a text file that is Western-1252 encoded, shows up as ANSI
See: https://en.wikipedia.org/wiki/Windows-1252
(first line)

I have noticed that special characters (like: é, ë, etc) in video subtitles are commonly translated to é (é) or ë (ë)
It has something to do with some sort of encoding problem.
see: https://www.i18nqa.com/debug/utf8-debug.html

Such text files UTF-8 encoded, when recoded to Western 1252, the issue is gone.

I guess I should consider ANSI as Western 1252

Thanks!
void
Developer
Posts: 17152
Joined: Fri Oct 16, 2009 11:31 pm

Re: Would there be a way to figure out the text file encoding that has been used?

Post by void »

ANSI is your system code page.

The system code page can be viewed/set under Start menu -> Region and language -> Administrative -> Language for non-Unicode programs.


What encodings are supported?
UTF-8 with BOM.
UTF-16 (LE) with BOM (Unicode)
UTF-16 (BE) with BOM (Unicode Big Endian)
UTF-8 without BOM if all text is valid UTF-8
UTF-16 (LE) without BOM if text contains a NULL byte and IsTextUnicode reports Unicode.
Anything else is ANSI.
eswul62
Posts: 132
Joined: Wed Jul 31, 2013 6:07 am

Re: Would there be a way to figure out the text file encoding that has been used?

Post by eswul62 »

Okay, many thanks indeed.
It is set to English (UK) which I believe is 1252.
Anyway, thanks again.
void
Developer
Posts: 17152
Joined: Fri Oct 16, 2009 11:31 pm

Re: Would there be a way to figure out the text file encoding that has been used?

Post by void »

Everything 1.5.0.1384a will now treat content that is all ASCII as ANSI text. (instead of UTF-8)
Post Reply