Search by file name language type
Search by file name language type
As per title, can this be done?
eg if it contain Chinese, or Chinese and English?
Thanks!
eg if it contain Chinese, or Chinese and English?
Thanks!
Re: Search by file name language type
Please try the following searches:
Chinese (Han):
regex:[\p{Han}]
English (Latin):
regex:[\p{Latin}]
Chinese and English:
regex:[\p{Han}] regex:[\p{Latin}]
Chinese and English ignoring the extension:
regex:[\p{Han}].*\.[^.]*$ regex:[\p{Latin}].*\.[^.]*$
The following scripts are also supported:
regex:[\p{Adlam}]
regex:[\p{Ahom}]
regex:[\p{Anatolian_Hieroglyphs}]
regex:[\p{Arabic}]
regex:[\p{Armenian}]
regex:[\p{Avestan}]
regex:[\p{Balinese}]
regex:[\p{Bamum}]
regex:[\p{Bassa_Vah}]
regex:[\p{Batak}]
regex:[\p{Bengali}]
regex:[\p{Bhaiksuki}]
regex:[\p{Bopomofo}]
regex:[\p{Brahmi}]
regex:[\p{Braille}]
regex:[\p{Buginese}]
regex:[\p{Buhid}]
regex:[\p{Canadian_Aboriginal}]
regex:[\p{Carian}]
regex:[\p{Caucasian_Albanian}]
regex:[\p{Chakma}]
regex:[\p{Cham}]
regex:[\p{Cherokee}]
regex:[\p{Chorasmian}]
regex:[\p{Common}]
regex:[\p{Coptic}]
regex:[\p{Cuneiform}]
regex:[\p{Cypriot}]
regex:[\p{Cypro_Minoan}]
regex:[\p{Cyrillic}]
regex:[\p{Deseret}]
regex:[\p{Devanagari}]
regex:[\p{Dives_Akuru}]
regex:[\p{Dogra}]
regex:[\p{Duployan}]
regex:[\p{Egyptian_Hieroglyphs}]
regex:[\p{Elbasan}]
regex:[\p{Elymaic}]
regex:[\p{Ethiopic}]
regex:[\p{Georgian}]
regex:[\p{Glagolitic}]
regex:[\p{Gothic}]
regex:[\p{Grantha}]
regex:[\p{Greek}]
regex:[\p{Gujarati}]
regex:[\p{Gunjala_Gondi}]
regex:[\p{Gurmukhi}]
regex:[\p{Han}]
regex:[\p{Hangul}]
regex:[\p{Hanifi_Rohingya}]
regex:[\p{Hanunoo}]
regex:[\p{Hatran}]
regex:[\p{Hebrew}]
regex:[\p{Hiragana}]
regex:[\p{Imperial_Aramaic}]
regex:[\p{Inherited}]
regex:[\p{Inscriptional_Pahlavi}]
regex:[\p{Inscriptional_Parthian}]
regex:[\p{Javanese}]
regex:[\p{Kaithi}]
regex:[\p{Kannada}]
regex:[\p{Katakana}]
regex:[\p{Kayah_Li}]
regex:[\p{Kharoshthi}]
regex:[\p{Khitan_Small_Script}]
regex:[\p{Khmer}]
regex:[\p{Khojki}]
regex:[\p{Khudawadi}]
regex:[\p{Lao}]
regex:[\p{Latin}]
regex:[\p{Lepcha}]
regex:[\p{Limbu}]
regex:[\p{Linear_A}]
regex:[\p{Linear_B}]
regex:[\p{Lisu}]
regex:[\p{Lycian}]
regex:[\p{Lydian}]
regex:[\p{Mahajani}]
regex:[\p{Makasar}]
regex:[\p{Malayalam}]
regex:[\p{Mandaic}]
regex:[\p{Manichaean}]
regex:[\p{Marchen}]
regex:[\p{Masaram_Gondi}]
regex:[\p{Medefaidrin}]
regex:[\p{Meetei_Mayek}]
regex:[\p{Mende_Kikakui}]
regex:[\p{Meroitic_Cursive}]
regex:[\p{Meroitic_Hieroglyphs}]
regex:[\p{Miao}]
regex:[\p{Modi}]
regex:[\p{Mongolian}]
regex:[\p{Mro}]
regex:[\p{Multani}]
regex:[\p{Myanmar}]
regex:[\p{Nabataean}]
regex:[\p{Nandinagari}]
regex:[\p{New_Tai_Lue}]
regex:[\p{Newa}]
regex:[\p{Nko}]
regex:[\p{Nushu}]
regex:[\p{Nyakeng_Puachue_Hmong}]
regex:[\p{Ogham}]
regex:[\p{Ol_Chiki}]
regex:[\p{Old_Hungarian}]
regex:[\p{Old_Italic}]
regex:[\p{Old_North_Arabian}]
regex:[\p{Old_Permic}]
regex:[\p{Old_Persian}]
regex:[\p{Old_Sogdian}]
regex:[\p{Old_South_Arabian}]
regex:[\p{Old_Turkic}]
regex:[\p{Old_Uyghur}]
regex:[\p{Oriya}]
regex:[\p{Osage}]
regex:[\p{Osmanya}]
regex:[\p{Pahawh_Hmong}]
regex:[\p{Palmyrene}]
regex:[\p{Pau_Cin_Hau}]
regex:[\p{Phags_Pa}]
regex:[\p{Phoenician}]
regex:[\p{Psalter_Pahlavi}]
regex:[\p{Rejang}]
regex:[\p{Runic}]
regex:[\p{Samaritan}]
regex:[\p{Saurashtra}]
regex:[\p{Sharada}]
regex:[\p{Shavian}]
regex:[\p{Siddham}]
regex:[\p{SignWriting}]
regex:[\p{Sinhala}]
regex:[\p{Sogdian}]
regex:[\p{Sora_Sompeng}]
regex:[\p{Soyombo}]
regex:[\p{Sundanese}]
regex:[\p{Syloti_Nagri}]
regex:[\p{Syriac}]
regex:[\p{Tagalog}]
regex:[\p{Tagbanwa}]
regex:[\p{Tai_Le}]
regex:[\p{Tai_Tham}]
regex:[\p{Tai_Viet}]
regex:[\p{Takri}]
regex:[\p{Tamil}]
regex:[\p{Tangsa}]
regex:[\p{Tangut}]
regex:[\p{Telugu}]
regex:[\p{Thaana}]
regex:[\p{Thai}]
regex:[\p{Tibetan}]
regex:[\p{Tifinagh}]
regex:[\p{Tirhuta}]
regex:[\p{Toto}]
regex:[\p{Ugaritic}]
regex:[\p{Unknown}]
regex:[\p{Vai}]
regex:[\p{Vithkuqi}]
regex:[\p{Wancho}]
regex:[\p{Warang_Citi}]
regex:[\p{Yezidi}]
regex:[\p{Yi}]
regex:[\p{Zanabazar_Square}]
PCRE Unicode character properties
Chinese (Han):
regex:[\p{Han}]
English (Latin):
regex:[\p{Latin}]
Chinese and English:
regex:[\p{Han}] regex:[\p{Latin}]
Chinese and English ignoring the extension:
regex:[\p{Han}].*\.[^.]*$ regex:[\p{Latin}].*\.[^.]*$
The following scripts are also supported:
regex:[\p{Adlam}]
regex:[\p{Ahom}]
regex:[\p{Anatolian_Hieroglyphs}]
regex:[\p{Arabic}]
regex:[\p{Armenian}]
regex:[\p{Avestan}]
regex:[\p{Balinese}]
regex:[\p{Bamum}]
regex:[\p{Bassa_Vah}]
regex:[\p{Batak}]
regex:[\p{Bengali}]
regex:[\p{Bhaiksuki}]
regex:[\p{Bopomofo}]
regex:[\p{Brahmi}]
regex:[\p{Braille}]
regex:[\p{Buginese}]
regex:[\p{Buhid}]
regex:[\p{Canadian_Aboriginal}]
regex:[\p{Carian}]
regex:[\p{Caucasian_Albanian}]
regex:[\p{Chakma}]
regex:[\p{Cham}]
regex:[\p{Cherokee}]
regex:[\p{Chorasmian}]
regex:[\p{Common}]
regex:[\p{Coptic}]
regex:[\p{Cuneiform}]
regex:[\p{Cypriot}]
regex:[\p{Cypro_Minoan}]
regex:[\p{Cyrillic}]
regex:[\p{Deseret}]
regex:[\p{Devanagari}]
regex:[\p{Dives_Akuru}]
regex:[\p{Dogra}]
regex:[\p{Duployan}]
regex:[\p{Egyptian_Hieroglyphs}]
regex:[\p{Elbasan}]
regex:[\p{Elymaic}]
regex:[\p{Ethiopic}]
regex:[\p{Georgian}]
regex:[\p{Glagolitic}]
regex:[\p{Gothic}]
regex:[\p{Grantha}]
regex:[\p{Greek}]
regex:[\p{Gujarati}]
regex:[\p{Gunjala_Gondi}]
regex:[\p{Gurmukhi}]
regex:[\p{Han}]
regex:[\p{Hangul}]
regex:[\p{Hanifi_Rohingya}]
regex:[\p{Hanunoo}]
regex:[\p{Hatran}]
regex:[\p{Hebrew}]
regex:[\p{Hiragana}]
regex:[\p{Imperial_Aramaic}]
regex:[\p{Inherited}]
regex:[\p{Inscriptional_Pahlavi}]
regex:[\p{Inscriptional_Parthian}]
regex:[\p{Javanese}]
regex:[\p{Kaithi}]
regex:[\p{Kannada}]
regex:[\p{Katakana}]
regex:[\p{Kayah_Li}]
regex:[\p{Kharoshthi}]
regex:[\p{Khitan_Small_Script}]
regex:[\p{Khmer}]
regex:[\p{Khojki}]
regex:[\p{Khudawadi}]
regex:[\p{Lao}]
regex:[\p{Latin}]
regex:[\p{Lepcha}]
regex:[\p{Limbu}]
regex:[\p{Linear_A}]
regex:[\p{Linear_B}]
regex:[\p{Lisu}]
regex:[\p{Lycian}]
regex:[\p{Lydian}]
regex:[\p{Mahajani}]
regex:[\p{Makasar}]
regex:[\p{Malayalam}]
regex:[\p{Mandaic}]
regex:[\p{Manichaean}]
regex:[\p{Marchen}]
regex:[\p{Masaram_Gondi}]
regex:[\p{Medefaidrin}]
regex:[\p{Meetei_Mayek}]
regex:[\p{Mende_Kikakui}]
regex:[\p{Meroitic_Cursive}]
regex:[\p{Meroitic_Hieroglyphs}]
regex:[\p{Miao}]
regex:[\p{Modi}]
regex:[\p{Mongolian}]
regex:[\p{Mro}]
regex:[\p{Multani}]
regex:[\p{Myanmar}]
regex:[\p{Nabataean}]
regex:[\p{Nandinagari}]
regex:[\p{New_Tai_Lue}]
regex:[\p{Newa}]
regex:[\p{Nko}]
regex:[\p{Nushu}]
regex:[\p{Nyakeng_Puachue_Hmong}]
regex:[\p{Ogham}]
regex:[\p{Ol_Chiki}]
regex:[\p{Old_Hungarian}]
regex:[\p{Old_Italic}]
regex:[\p{Old_North_Arabian}]
regex:[\p{Old_Permic}]
regex:[\p{Old_Persian}]
regex:[\p{Old_Sogdian}]
regex:[\p{Old_South_Arabian}]
regex:[\p{Old_Turkic}]
regex:[\p{Old_Uyghur}]
regex:[\p{Oriya}]
regex:[\p{Osage}]
regex:[\p{Osmanya}]
regex:[\p{Pahawh_Hmong}]
regex:[\p{Palmyrene}]
regex:[\p{Pau_Cin_Hau}]
regex:[\p{Phags_Pa}]
regex:[\p{Phoenician}]
regex:[\p{Psalter_Pahlavi}]
regex:[\p{Rejang}]
regex:[\p{Runic}]
regex:[\p{Samaritan}]
regex:[\p{Saurashtra}]
regex:[\p{Sharada}]
regex:[\p{Shavian}]
regex:[\p{Siddham}]
regex:[\p{SignWriting}]
regex:[\p{Sinhala}]
regex:[\p{Sogdian}]
regex:[\p{Sora_Sompeng}]
regex:[\p{Soyombo}]
regex:[\p{Sundanese}]
regex:[\p{Syloti_Nagri}]
regex:[\p{Syriac}]
regex:[\p{Tagalog}]
regex:[\p{Tagbanwa}]
regex:[\p{Tai_Le}]
regex:[\p{Tai_Tham}]
regex:[\p{Tai_Viet}]
regex:[\p{Takri}]
regex:[\p{Tamil}]
regex:[\p{Tangsa}]
regex:[\p{Tangut}]
regex:[\p{Telugu}]
regex:[\p{Thaana}]
regex:[\p{Thai}]
regex:[\p{Tibetan}]
regex:[\p{Tifinagh}]
regex:[\p{Tirhuta}]
regex:[\p{Toto}]
regex:[\p{Ugaritic}]
regex:[\p{Unknown}]
regex:[\p{Vai}]
regex:[\p{Vithkuqi}]
regex:[\p{Wancho}]
regex:[\p{Warang_Citi}]
regex:[\p{Yezidi}]
regex:[\p{Yi}]
regex:[\p{Zanabazar_Square}]
PCRE Unicode character properties
Re: Search by file name language type
hi thanks for the reply, however can you give me more hints on what term should I user to search for the sytanx for say Japanese?
pcre2pattern spec is too technical for me and I have tried to search for "pre2 unicode language list", "pre2 unicode Japanese" ..etc and I can't find anything that work.
pcre2pattern spec is too technical for me and I have tried to search for "pre2 unicode language list", "pre2 unicode Japanese" ..etc and I can't find anything that work.
Re: Search by file name language type
To search for Hiragana OR Katakana:
regex:[\p{Hiragana}\p{Katakana}]
To search for Hiragana OR Katakana OR Han:
regex:[\p{Hiragana}\p{Katakana}\p{Han}]
Using unicode ranges might be better.
For example, Kanji:
regex:[\u4E00-\u9FFF]
https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
regex:[\p{Hiragana}\p{Katakana}]
To search for Hiragana OR Katakana OR Han:
regex:[\p{Hiragana}\p{Katakana}\p{Han}]
Using unicode ranges might be better.
For example, Kanji:
regex:[\u4E00-\u9FFF]
https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
Re: Search by file name language type
wow this is pretty new to me, thanks for the help and reference! Couldn't have understand it otherwise
Re: Search by file name language type
oh I played it around abit, actually the syntax of using ranges doesn't work, may be additional symbol is needed? I also tried the range from other language, and especially english, and none of it work. (AC00, D743; U+0000, U+007F)void wrote: ↑Wed Feb 22, 2023 11:40 pm To search for Hiragana OR Katakana:
regex:[\p{Hiragana}] | regex:[\p{Katakana}]
To search for Hiragana OR Katakana OR Han:
regex:[\p{Hiragana}] | regex:[\p{Katakana}] | regex:[\p{Han}]
Using unicode ranges might be better.
For example, Kanji:
regex:[\u4E00-\u9FFF]
https://stackoverflow.com/questions/19899554/unicode-range-for-japanese
http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
Re: Search by file name language type
The PCRE syntax is:
\x{hhh..} character with hex code hhh..
Use a - inside [ and ] to specify a range.
Please try the following:
regex:[\p{Hiragana}\p{Katakana}\x{4E00}-\x{9FFF}]
PCRE Non-printing characters
\x{hhh..} character with hex code hhh..
Use a - inside [ and ] to specify a range.
Please try the following:
regex:[\p{Hiragana}\p{Katakana}\x{4E00}-\x{9FFF}]
PCRE Non-printing characters
Re: Search by file name language type
work perfectly! Really appreicate the help!
Re: Search by file name language type
That is very useful. However, what can I do to search for filenames written in a given language (like Korean) exclusively without any characters from another language?
Re: Search by file name language type
The Korean alphabet is called Hangul (says Internet ..)
With that:
Explanation:
regex:[^\p{Hangul}] = Show all files/folders that have non-Korean characters in them anywhere.
!regex:... = show alkl files/folders, except the ones found above, meaning only files with Korean characters exclusively.
So I don't know how practical this will be, but this is what you asked
Regular Expressions Syntax
With that:
Code: Select all
!regex:[^\p{Hangul}]
regex:[^\p{Hangul}] = Show all files/folders that have non-Korean characters in them anywhere.
!regex:... = show alkl files/folders, except the ones found above, meaning only files with Korean characters exclusively.
Note that the search query above will not list files with a "normal" extension, like .txt, .jpg, .zip as those are non-Korean characters. Same goes for files with numbers (0...9) in them.
So I don't know how practical this will be, but this is what you asked
Regular Expressions Syntax
Re: Search by file name language type
Please consider the following search to ignore the extension:
regex:^[\p{Hangul}]+\.[a-z]+$
regex:^[\p{Hangul}]+\.[a-z]+$
Re: Search by file name language type
Thank you for your response,
I have some downloaded files with a name in a foreign language only and I want to add a translation to my language to the name of these files without removing the original names.
When I search for the Korean language in the file name for example, the search result lists:
- file names in Korean only,
- and file names in Korean and other languages (which I have already changed).
Re: Search by file name language type
I understand. Thanks for explaining. What I meant was that usually the file extension is *not* in Korean, so that would skip lots of files that still might be of interest. But you mentioned:
But now I get that you wanted the filename *without extension* to be Korean-only.
Anyway .. problem solved
And any .txt file contains characters from another language, namely t,x and t.exclusively without any characters from another language?
But now I get that you wanted the filename *without extension* to be Korean-only.
Anyway .. problem solved