With its default settings, at least, Everything appears to conflate hyphens [-], en dashes [–], and em dashes [—]. Whether I search for – or — or #x2013: or #x2014:, my results list includes files that have only hyphens in them.
I'm pretty sure I have some filenames with en dashes in them, and possibly some with em dashes. I'd like to be able to find them without having to wade through 300,000+ "hyphens-only" files.
Is there a configuration setting or search operator that would allow me to override Everything's default dash conflation and search only for the specific, literal type of dash I've entered?
Hopefully, I haven't overlooked something obvious ... but I wouldn't count on it!
Thanks for any help anyone is able to provide.
How to search for en dash or em dash?
Re: How to search for en dash or em dash?
You can enable the Match Diacritics filter (Menu:Search > Match Diacritics ) [1] and search for:
to find all file- and foldernames that contain a en-dash and/or an m-dash
Alternatively, search for
(No need to enable the Match Diacritics filter in this case)
[1] Don't forget to set change filter back to Everything afterwards; otherwise all your following searches will be "diacritics-sensitive"
Note:
I hope the forum software doesn't mess up the dashes. But I suspect you get the point
Code: Select all
– | —
Alternatively, search for
Code: Select all
diacritics:— | diacritics:–
[1] Don't forget to set change filter back to Everything afterwards; otherwise all your following searches will be "diacritics-sensitive"
Note:
I hope the forum software doesn't mess up the dashes. But I suspect you get the point
Re: How to search for en dash or em dash?
regex:\x{2013}
Re: How to search for en dash or em dash?
And another suggestion if you want to replace the dashes with regular hyphens:
Everything has a multi-rename feature that you can use to replace patterns with something else in multiple filenames at once.
Everything has a multi-rename feature that you can use to replace patterns with something else in multiple filenames at once.
- Use one of the searches from above
- Select the files where you want to replace the dashes with hypens (or something else)
- Menu:File > Rename
- Enable Regex
- Old Format: (don't use spaces here
–|—
- New format: - (I used __ here for clarity
A preview of the new names is shown in the New Filenames box
Something like this: - If all looks good, press the OK button
Re: How to search for en dash or em dash?
Wow — lots of feedback, all of it prompt and useful! Thank you all!
I think the "Match diacritics" Search menu setting is going to serve me best, most of the time.
I can see how conflating all types of dashes might be a good default for most users, but I happened to come across a similarly useful conflation — possibly a more useful conflation — that wasn't an Everything default:
By default, Everything doesn't seem to find results with either single-character ligatures or their equivalent two-character sequence. Examples:
Ditto for œ, mutatis mutandis.
If you search for ß, you only get results containing ß, regardless of whether Match diacritics is enabled or not.
Ditto for ij, mutatis mutandis.
And if you search for a two-character sequence, you don't get results containing the equivalent ligature.
This is important because some filenames and content may contain the "proper" ligatures, and others may have been typed by people who used the quick-and-dirty two-character substitute. (There may also be different conventions in different countries, e.g., Switzerland and Liechtenstein, where ß has fallen out of use and is generally replaced with ss.) Typically, users will want to find both (and only both).
Is there a general setting anywhere that would enable "ligature conflation"? Or is it necessary to use special "or" search syntax each time? And if there isn't a general setting, do you think it's worth proposing one as a new feature?
I realize this is off-topic, but my curiosity got piqued and the respondents to this thread seem fairly knowledgeable, so I thought I'd bring it up on the off chance someone has an answer.
Again, thank you all for the very helpful feedback. It fixed my problem! I appreciate it!
I think the "Match diacritics" Search menu setting is going to serve me best, most of the time.
I can see how conflating all types of dashes might be a good default for most users, but I happened to come across a similarly useful conflation — possibly a more useful conflation — that wasn't an Everything default:
By default, Everything doesn't seem to find results with either single-character ligatures or their equivalent two-character sequence. Examples:
- æ and ae
- œ and oe
- ß and ss
- ij [Unicode hexadecimal character 0133] and ij
Ditto for œ, mutatis mutandis.
If you search for ß, you only get results containing ß, regardless of whether Match diacritics is enabled or not.
Ditto for ij, mutatis mutandis.
And if you search for a two-character sequence, you don't get results containing the equivalent ligature.
This is important because some filenames and content may contain the "proper" ligatures, and others may have been typed by people who used the quick-and-dirty two-character substitute. (There may also be different conventions in different countries, e.g., Switzerland and Liechtenstein, where ß has fallen out of use and is generally replaced with ss.) Typically, users will want to find both (and only both).
Is there a general setting anywhere that would enable "ligature conflation"? Or is it necessary to use special "or" search syntax each time? And if there isn't a general setting, do you think it's worth proposing one as a new feature?
I realize this is off-topic, but my curiosity got piqued and the respondents to this thread seem fairly knowledgeable, so I thought I'd bring it up on the off chance someone has an answer.
Again, thank you all for the very helpful feedback. It fixed my problem! I appreciate it!
Re: How to search for en dash or em dash?
Future version, 1.5:
æ + Match Diacritics, finds, Ágætis byrjun.mp3
æ + (no match), finds both: Ágætis byrjun.mp3 & also Antrum Sibyllae.mp3
ß + Match Diacritics, finds Arne Zank - Ich weiß es nicht.mp3
ß + (no match), finds both: Ich weiß es nicht.mp3 & also Miss Moon.mp3
The others, not sure about offhand.
æ + Match Diacritics, finds, Ágætis byrjun.mp3
æ + (no match), finds both: Ágætis byrjun.mp3 & also Antrum Sibyllae.mp3
ß + Match Diacritics, finds Arne Zank - Ich weiß es nicht.mp3
ß + (no match), finds both: Ich weiß es nicht.mp3 & also Miss Moon.mp3
The others, not sure about offhand.
Re: How to search for en dash or em dash?
œ and æ are 'composed' characters (probably not the right term for it), like ä and ò.
You can search for - for example - halos to find hælos and HÆLOS
In the upcoming major upgrade, Everything 1.5 (currently in development), the mechanism to find those characters will change. Then you can use - next to what therube already mentioned - :
- ss to find ß
- oe to find œ
- etc.
Quick tip:
I use this to find all non-standard ASCII characters:
P.S. : i liked reading your posts!
You can search for - for example - halos to find hælos and HÆLOS
In the upcoming major upgrade, Everything 1.5 (currently in development), the mechanism to find those characters will change. Then you can use - next to what therube already mentioned - :
- ss to find ß
- oe to find œ
- etc.
Quick tip:
I use this to find all non-standard ASCII characters:
Code: Select all
regex:"[^ -~]"
Re: How to search for en dash or em dash?
Very cool news that more sophisticated digraph/ligature/diacritic support will be coming with version 1.5!
This is actually a somewhat more involved subject than I initially assumed. For example:
I haven't thought this through, but I wonder whether it would ultimately be useful to add language-specific "character substitution tables," distinct from the the global "Match diacritics" setting. I'm not a coder, but I have a hunch it wouldn't be conceptually or technically difficult — just tedious work fleshing out the contents for each language. And for the end user, it could be as easy as simply enabling or disabling common character substitutions for a given language. But as I said, I haven't really thought it through.
Thanks once again to everyone who chimed in. I'm grateful for such a responsive and supportive forum!
This is actually a somewhat more involved subject than I initially assumed. For example:
- The uppercase version of ij [U+0133] is IJ [U+0132] in Dutch but a standard capital Y [U+0059] in Afrikaans.
- In German, ä, ö and ü were derived from ae, oe, and ue and are still replaced with the original two characters in a pinch — but not necessarily in other languages.
- The Icelandic eth (Ð and ð) and thorn (Þ and þ) are usually transliterated as D and d and TH/Th and th on keyboards and in character sets that don't support them.
- In French, there is at least one word that can be correctly spelled using either ñ or ny: cañon / canyon). (Would it be worth including a substitution for that one word? Probably not.)
- Successfully searching for any type of quotation mark in a French filename or document would require returning ", “, ”, «, and ». And if you're searching for a word or phrase inside quotation marks, you would have to ignore any kind of space following « or preceding ». Normally, you're supposed to use narrow no-break spaces inside French quotation marks [« word »] or, failing that, regular no-break spaces [« word »], but most typists use breaking regular spaces [« word »] or no spaces at all [«word»]. (Would it be worth providing substitutions for quotes? Absolutely.)
I haven't thought this through, but I wonder whether it would ultimately be useful to add language-specific "character substitution tables," distinct from the the global "Match diacritics" setting. I'm not a coder, but I have a hunch it wouldn't be conceptually or technically difficult — just tedious work fleshing out the contents for each language. And for the end user, it could be as easy as simply enabling or disabling common character substitutions for a given language. But as I said, I haven't really thought it through.
Thanks once again to everyone who chimed in. I'm grateful for such a responsive and supportive forum!