Correct regex usage

javanius · Post by **javanius** » Wed May 17, 2023 2:02 pm

The regex code I created in Chatgpt did not work correctly in Everything.
My goal is this:
* Turkish characters in file names.
* File names can be single word or contain more than one word.
* File names have spaces between words.
* The first characters of the words will not be checked. Characters after the first characters will be checked.
* I want the regex code that matches the uppercase C in the checked characters.

Examples:
"İstanbulda Geçen Bir Roman.docx" - will not match. Because there is no C character.
"İzmirde Bir GeCe.docx" - Will match. Because the character C appears and it appears anywhere after the first character.
"Geçen Cuma Günü Gelen Mutluluk.docx" - Will not match. Because C is the first character.
"TerCih.docx" - Will match. Even if it is a single word, it contains the character C and occurs anywhere after the first character.

Can you please write the regex code that provides all these rules here?

I apologize for my English. I translated with Deepl.

Post by **NotNull** » Wed May 17, 2023 2:37 pm

Regular Expressions in Everything are case-insensitive. Add the case: search-modifier to make the expression case sensitive.

If I understood correctly, this should work:

Code: Select all

case:regex:"\w+C"

(not tested)

EDIT: Forgot to take care of the file extensions ...

Code: Select all

stem:case:regex:"\w+C"

(still not tested

)

(can you post the ChatGPT answer, so people will have an even better understanding what you are after)

javanius · Post by **javanius** » Wed May 17, 2023 3:03 pm

I can't believe it!
I've been trying for minutes and I couldn't get any results from AI applications.
But the code you wrote solved my problem. Thank you very, very much, NotNull.
What I can't believe is that the code you gave both works and is so short...

Look at the code given by ChatGpt!:
\b(?:[a-zA-Zçğıöşü]*[Cc]|[a-zA-Zçğıöşü]+[Cc][^\s]*)(?!\S)

Look at Bing's code (which is said to be using ChatGpt version 4!):
^[a-zA-Z]+(\s+[a-zA-Z]+)*C[a-zA-Z]*\.[a-zA-Z]+$

Note: The codes they gave me - apologizing - every time I told them it didn't work didn't work either. I tried so many alternative codes!
Unfortunately this is what happens when you don't know regex.

Post by **NotNull** » Wed May 17, 2023 3:09 pm

That is good news! You're welcome

Note that I made an edit, after I realized it would also match aaaaa.doCx. The edited version only matches the aaaa part (the stem of the filename).

javanius · Post by **javanius** » Wed May 17, 2023 3:15 pm

case:regex:"\w+D"
This code also worked with the file extension:
It brought this file for example: "Merhaba Nasılsın.pDf"

stem:case:regex:"\w+D"
When I typed this, the file "Merhaba Nasılsın.pDf" did not come.
I wonder why? Did I make a mistake somewhere?

Note: The version of Everything I use: 1.5.0.1346a (x64)

Post by **NotNull** » Wed May 17, 2023 3:37 pm

No mistake; that is the expected behaviour.

Entire filename is "Merhaba Nasılsın.pDf" (Hello to you too, btw

)

Without stem: , regex will process the entire filename, WITH wxtension
And as there is a D in the extension there is a match and this file will be reported.

With stem:, Everything strips the extension of the filename and feeds the remaining part -- "Merhaba Nasılsın" -- to the regex engine.
There is no D in the remaining part, so the regex engine sees no match, so this filename will not be reported.

javanius · Post by **javanius** » Wed May 17, 2023 3:53 pm

Now I understand.

However, this time I encountered the following problem.
stem:case:regex:"\w+D"
The following file should have come because of this code, but it did not.

"Merhaba SevDa.docx"

Because as far as I understand from what you said, stem threw docx, which is the extension of this file, and it was supposed to look for the letter D in "Merhaba SevDa", which is the root of the file name.
And it should have found it, but it didn't.

Post by **NotNull** » Wed May 17, 2023 4:15 pm

This time I tested

You are right: adding stem: does not work as intended.It should, so this is something that has to be fixed.

In the meantime ....

An alternative: let regex take care of it:

Code: Select all

case:regex:\w+D.*\.[^.]*$

the ".*\.[^.]+$" part basically says: this is the extension and I don't care what it looks like.
(so the regex matching concentrates on the stem-part of the filename)

Works here

javanius · Post by **javanius** » Wed May 17, 2023 5:02 pm

You are the best.
case:regex:\w+D.*\.[^.]*$
It worked.

With your permission, I replaced "\w" with "[^\s]".
Because I also wanted it to find the following file and [^\s] provided it.

"Bu .Dosyayı Bulsun.txt"

So it should also take into account the capital D after the dot character.
I noticed that "\w" does not take into account the dot character.

It looks like I have to work a lot on regex.
Thank you for your help.

Post by **NotNull** » Wed May 17, 2023 6:05 pm

Hey, if it works, it works! (and you definitely don't need my permission ;D )

voidtools forum

Correct regex usage

Correct regex usage

Re: Correct regex usage

Re: Correct regex usage

Re: Correct regex usage

Re: Correct regex usage

Re: Correct regex usage

Re: Correct regex usage

Re: Correct regex usage

Re: Correct regex usage

Re: Correct regex usage