Search by file name language type

If you are experiencing problems with "Everything", post here for assistance.
Post Reply
Ismale.d
Posts: 11
Joined: Tue Nov 02, 2021 11:20 am

Search by file name language type

Post by Ismale.d »

As per title, can this be done?
eg if it contain Chinese, or Chinese and English?

Thanks!
void
Developer
Posts: 16665
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search by file name language type

Post by void »

Please try the following searches:
Chinese (Han):
regex:[\p{Han}]

English (Latin):
regex:[\p{Latin}]

Chinese and English:
regex:[\p{Han}] regex:[\p{Latin}]

Chinese and English ignoring the extension:
regex:[\p{Han}].*\.[^.]*$ regex:[\p{Latin}].*\.[^.]*$



The following scripts are also supported:
regex:[\p{Adlam}]
regex:[\p{Ahom}]
regex:[\p{Anatolian_Hieroglyphs}]
regex:[\p{Arabic}]
regex:[\p{Armenian}]
regex:[\p{Avestan}]
regex:[\p{Balinese}]
regex:[\p{Bamum}]
regex:[\p{Bassa_Vah}]
regex:[\p{Batak}]
regex:[\p{Bengali}]
regex:[\p{Bhaiksuki}]
regex:[\p{Bopomofo}]
regex:[\p{Brahmi}]
regex:[\p{Braille}]
regex:[\p{Buginese}]
regex:[\p{Buhid}]
regex:[\p{Canadian_Aboriginal}]
regex:[\p{Carian}]
regex:[\p{Caucasian_Albanian}]
regex:[\p{Chakma}]
regex:[\p{Cham}]
regex:[\p{Cherokee}]
regex:[\p{Chorasmian}]
regex:[\p{Common}]
regex:[\p{Coptic}]
regex:[\p{Cuneiform}]
regex:[\p{Cypriot}]
regex:[\p{Cypro_Minoan}]
regex:[\p{Cyrillic}]
regex:[\p{Deseret}]
regex:[\p{Devanagari}]
regex:[\p{Dives_Akuru}]
regex:[\p{Dogra}]
regex:[\p{Duployan}]
regex:[\p{Egyptian_Hieroglyphs}]
regex:[\p{Elbasan}]
regex:[\p{Elymaic}]
regex:[\p{Ethiopic}]
regex:[\p{Georgian}]
regex:[\p{Glagolitic}]
regex:[\p{Gothic}]
regex:[\p{Grantha}]
regex:[\p{Greek}]
regex:[\p{Gujarati}]
regex:[\p{Gunjala_Gondi}]
regex:[\p{Gurmukhi}]
regex:[\p{Han}]
regex:[\p{Hangul}]
regex:[\p{Hanifi_Rohingya}]
regex:[\p{Hanunoo}]
regex:[\p{Hatran}]
regex:[\p{Hebrew}]
regex:[\p{Hiragana}]
regex:[\p{Imperial_Aramaic}]
regex:[\p{Inherited}]
regex:[\p{Inscriptional_Pahlavi}]
regex:[\p{Inscriptional_Parthian}]
regex:[\p{Javanese}]
regex:[\p{Kaithi}]
regex:[\p{Kannada}]
regex:[\p{Katakana}]
regex:[\p{Kayah_Li}]
regex:[\p{Kharoshthi}]
regex:[\p{Khitan_Small_Script}]
regex:[\p{Khmer}]
regex:[\p{Khojki}]
regex:[\p{Khudawadi}]
regex:[\p{Lao}]
regex:[\p{Latin}]
regex:[\p{Lepcha}]
regex:[\p{Limbu}]
regex:[\p{Linear_A}]
regex:[\p{Linear_B}]
regex:[\p{Lisu}]
regex:[\p{Lycian}]
regex:[\p{Lydian}]
regex:[\p{Mahajani}]
regex:[\p{Makasar}]
regex:[\p{Malayalam}]
regex:[\p{Mandaic}]
regex:[\p{Manichaean}]
regex:[\p{Marchen}]
regex:[\p{Masaram_Gondi}]
regex:[\p{Medefaidrin}]
regex:[\p{Meetei_Mayek}]
regex:[\p{Mende_Kikakui}]
regex:[\p{Meroitic_Cursive}]
regex:[\p{Meroitic_Hieroglyphs}]
regex:[\p{Miao}]
regex:[\p{Modi}]
regex:[\p{Mongolian}]
regex:[\p{Mro}]
regex:[\p{Multani}]
regex:[\p{Myanmar}]
regex:[\p{Nabataean}]
regex:[\p{Nandinagari}]
regex:[\p{New_Tai_Lue}]
regex:[\p{Newa}]
regex:[\p{Nko}]
regex:[\p{Nushu}]
regex:[\p{Nyakeng_Puachue_Hmong}]
regex:[\p{Ogham}]
regex:[\p{Ol_Chiki}]
regex:[\p{Old_Hungarian}]
regex:[\p{Old_Italic}]
regex:[\p{Old_North_Arabian}]
regex:[\p{Old_Permic}]
regex:[\p{Old_Persian}]
regex:[\p{Old_Sogdian}]
regex:[\p{Old_South_Arabian}]
regex:[\p{Old_Turkic}]
regex:[\p{Old_Uyghur}]
regex:[\p{Oriya}]
regex:[\p{Osage}]
regex:[\p{Osmanya}]
regex:[\p{Pahawh_Hmong}]
regex:[\p{Palmyrene}]
regex:[\p{Pau_Cin_Hau}]
regex:[\p{Phags_Pa}]
regex:[\p{Phoenician}]
regex:[\p{Psalter_Pahlavi}]
regex:[\p{Rejang}]
regex:[\p{Runic}]
regex:[\p{Samaritan}]
regex:[\p{Saurashtra}]
regex:[\p{Sharada}]
regex:[\p{Shavian}]
regex:[\p{Siddham}]
regex:[\p{SignWriting}]
regex:[\p{Sinhala}]
regex:[\p{Sogdian}]
regex:[\p{Sora_Sompeng}]
regex:[\p{Soyombo}]
regex:[\p{Sundanese}]
regex:[\p{Syloti_Nagri}]
regex:[\p{Syriac}]
regex:[\p{Tagalog}]
regex:[\p{Tagbanwa}]
regex:[\p{Tai_Le}]
regex:[\p{Tai_Tham}]
regex:[\p{Tai_Viet}]
regex:[\p{Takri}]
regex:[\p{Tamil}]
regex:[\p{Tangsa}]
regex:[\p{Tangut}]
regex:[\p{Telugu}]
regex:[\p{Thaana}]
regex:[\p{Thai}]
regex:[\p{Tibetan}]
regex:[\p{Tifinagh}]
regex:[\p{Tirhuta}]
regex:[\p{Toto}]
regex:[\p{Ugaritic}]
regex:[\p{Unknown}]
regex:[\p{Vai}]
regex:[\p{Vithkuqi}]
regex:[\p{Wancho}]
regex:[\p{Warang_Citi}]
regex:[\p{Yezidi}]
regex:[\p{Yi}]
regex:[\p{Zanabazar_Square}]

PCRE Unicode character properties
Ismale.d
Posts: 11
Joined: Tue Nov 02, 2021 11:20 am

Re: Search by file name language type

Post by Ismale.d »

hi thanks for the reply, however can you give me more hints on what term should I user to search for the sytanx for say Japanese?

pcre2pattern spec is too technical for me and I have tried to search for "pre2 unicode language list", "pre2 unicode Japanese" ..etc and I can't find anything that work.
void
Developer
Posts: 16665
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search by file name language type

Post by void »

To search for Hiragana OR Katakana:
regex:[\p{Hiragana}\p{Katakana}]

To search for Hiragana OR Katakana OR Han:
regex:[\p{Hiragana}\p{Katakana}\p{Han}]



Using unicode ranges might be better.

For example, Kanji:
regex:[\u4E00-\u9FFF]

https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
Ismale.d
Posts: 11
Joined: Tue Nov 02, 2021 11:20 am

Re: Search by file name language type

Post by Ismale.d »

wow this is pretty new to me, thanks for the help and reference! Couldn't have understand it otherwise :)
Ismale.d
Posts: 11
Joined: Tue Nov 02, 2021 11:20 am

Re: Search by file name language type

Post by Ismale.d »

void wrote: Wed Feb 22, 2023 11:40 pm To search for Hiragana OR Katakana:
regex:[\p{Hiragana}] | regex:[\p{Katakana}]

To search for Hiragana OR Katakana OR Han:
regex:[\p{Hiragana}] | regex:[\p{Katakana}] | regex:[\p{Han}]



Using unicode ranges might be better.

For example, Kanji:
regex:[\u4E00-\u9FFF]

https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
oh I played it around abit, actually the syntax of using ranges doesn't work, may be additional symbol is needed? I also tried the range from other language, and especially english, and none of it work. (AC00, D743; U+0000, U+007F)
void
Developer
Posts: 16665
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search by file name language type

Post by void »

The PCRE syntax is:
\x{hhh..} character with hex code hhh..

Use a - inside [ and ] to specify a range.

Please try the following:

regex:[\p{Hiragana}\p{Katakana}\x{4E00}-\x{9FFF}]

PCRE Non-printing characters
Ismale.d
Posts: 11
Joined: Tue Nov 02, 2021 11:20 am

Re: Search by file name language type

Post by Ismale.d »

work perfectly! Really appreicate the help!
samiaziz
Posts: 4
Joined: Tue Jan 23, 2024 3:16 pm

Re: Search by file name language type

Post by samiaziz »

That is very useful. However, what can I do to search for filenames written in a given language (like Korean) exclusively without any characters from another language?
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Search by file name language type

Post by NotNull »

The Korean alphabet is called Hangul (says Internet ..)

With that:

Code: Select all

!regex:[^\p{Hangul}]
Explanation:
regex:[^\p{Hangul}] = Show all files/folders that have non-Korean characters in them anywhere.
!regex:... = show alkl files/folders, except the ones found above, meaning only files with Korean characters exclusively.

samiaziz wrote: Tue Jan 23, 2024 3:39 pm what can I do to search for filenames written in a given language (like Korean) exclusively without any characters from another language?
Note that the search query above will not list files with a "normal" extension, like .txt, .jpg, .zip as those are non-Korean characters. Same goes for files with numbers (0...9) in them.
So I don't know how practical this will be, but this is what you asked :D


Regular Expressions Syntax
void
Developer
Posts: 16665
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search by file name language type

Post by void »

Please consider the following search to ignore the extension:

regex:^[\p{Hangul}]+\.[a-z]+$
samiaziz
Posts: 4
Joined: Tue Jan 23, 2024 3:16 pm

Re: Search by file name language type

Post by samiaziz »

void wrote: Wed Jan 24, 2024 2:55 am Please consider the following search to ignore the extension:

regex:^[\p{Hangul}]+\.[a-z]+$
Thanks a lot. That is exactly what I was looking for.

The following search gives the same result of ignoring the extension:

regex:stem:^[\p{Hangul}]+$
samiaziz
Posts: 4
Joined: Tue Jan 23, 2024 3:16 pm

Re: Search by file name language type

Post by samiaziz »

NotNull wrote: Tue Jan 23, 2024 4:28 pm
So I don't know how practical this will be, but this is what you asked :D
Thank you for your response,

I have some downloaded files with a name in a foreign language only and I want to add a translation to my language to the name of these files without removing the original names.

When I search for the Korean language in the file name for example, the search result lists:
  • file names in Korean only,
  • and file names in Korean and other languages (which I have already changed).
I simply want to exclude the second category of files from the search results by searching file names in Korean only.
NotNull
Posts: 5458
Joined: Wed May 24, 2017 9:22 pm

Re: Search by file name language type

Post by NotNull »

I understand. Thanks for explaining. What I meant was that usually the file extension is *not* in Korean, so that would skip lots of files that still might be of interest. But you mentioned:
exclusively without any characters from another language?
And any .txt file contains characters from another language, namely t,x and t.

But now I get that you wanted the filename *without extension* to be Korean-only.

Anyway .. problem solved :D
Post Reply