frustrations with regular expressions

Off-topic posts of interest to the "Everything" community.
Post Reply
Debugger
Posts: 633
Joined: Thu Jan 26, 2017 11:56 am

frustrations with regular expressions

Post by Debugger »

My frustrations with regular expressions and I would like to try to explain why difficulties may arise and what the limitations and possibilities are for their use.
Why do regexes have limitations?

Regular expressions are a powerful tool, but their performance is limited by specific rules that apply to different regex engines. Some of these limitations are due to how regexes are designed to process text - their job is to find patterns, but some operations, like relative comparisons based on different lines or group references in the ‘lookahead’ in some regex engines, can be difficult to implement.


It wants to compare filenames without considering the numbering at the beginning, and also detect duplicates.
Problem:

A lookahead with a group (e.g., (?=)) does not work in most cases, because it is not possible in regex engines to refer to groups in the context of a ‘lookahead’ (which only checks the match in the future). This introduces difficulties when trying to find duplicates in the way I am trying to do.

I tested in 101regex, and it keeps showing ‘not match’.

^(?:\d{2}\.)?(.*)
^(?:\d{2}\.)?(.*)(?=\r?\n\1)
^(?:\d{2}\.)?(.*?)(?=\r?\n\1)
^\d{2}\.(.*)

01.Q爱(DJ谋 Electro Remix)王.mp3
12.Q爱(DJ谋 Electro Remix)王.mp3

0 matches are found in 0 lines. Cannot find ^(?:\d{2}\.)?(.*)(?=\r?\n\1) above the current position.
dedupeit
Posts: 46
Joined: Thu Jul 28, 2022 9:52 pm

Re: frustrations with regular expressions

Post by dedupeit »

If I understand correctly you want to compare the filename excluding any digits at the start of the filename?

If so, then you can use

Code: Select all

regex:\d*(.*) add-column:1 dupe:1
That captures all of the filename minus any leading digits, then it puts the result in a column, and then it search for duplicates based on that column
Post Reply