Search in .htm (.html) files

Found a bug in "Everything"? report it here
Post Reply
AE_AE
Posts: 2
Joined: Mon Nov 11, 2019 6:56 pm

Search in .htm (.html) files

Post by AE_AE »

Hello!

Thanks very much for the super desktop search utility for Windows. "Everything" is the very useful, helpful and easy search engine. Every day I search files by filename and by content many times. And so I found some bugs.

My English is poor, but I try to put my ideas (thought) into words clearly.

fact_01: BUG.
Everything_1.4.1.935 DOESN'T find English and Russian words (letters) in .htm and .html types of files, irrespective of character encoding (UTF-8 with BOM or UTF-8 without BOM) in part of code Location of Bookmark <A HREF=" " >.

fact_02: Normal. Everything_1.4.1.935 FIND English and Russian words (letters) in .htm and .html types of files, irrespective of character encoding (UTF-8 with BOM or UTF-8 without BOM) in parts of code Location and Description of Bookmark <DD> and Name of Bookmark > </A>.

For example:

There are two .html files (Everything_bookmarks_UTF8_BOM.html and Everything_bookmarks_UTF8_no_BOM.html) in attachments. These files are bookmarks from browser. These bookmarks contain 2 hyperlinks:

Вопросы и ответы - voidtools
https://www.voidtools.com/ru-ru/faq/
Очень быстрый поиск с программой Everything / Хабр
https://habr.com/ru/post/42354/



<DL><p>
<DT><A HREF="https://www.voidtools.com/ru-ru/faq/" ADD_DATE="1573496435" LAST_MODIFIED="1573496435" ICON_URI="https://www.voidtools.com/favicon.ico" >Вопросы и ответы - voidtools</A>
<DT><A HREF="https://habr.com/ru/post/42354/" ADD_DATE="1573496463" LAST_MODIFIED="1573496463" ICON_URI="https://habr.com/images/favicon-16x16.png" >Очень быстрый поиск с программой Everything / Хабр</A>
<DD>Начну немного «издалека». Дело в том, что я (и думаю не я один) — очень люблю маленькие но функциональные программы. Я встречал несколько таких приложений, которые иначе чем шедеврами софтостроения...

</DL>

Using Everything_1.4.1.935

01. You CAN find these words: "Вопросы", "voidtools", "Everything", "Хабр", "встречал", "люблю", because they are in part of code <DD> or > </A>.

02. You CANNOT find these words: "voidtools.com", "ru-ru", "habr", "post/42354/", "habr.com", "ww.void" because they are in part of code <A HREF=" " >.

Please, fix this BUG in next versions of Everything (if this possible).

Thanks very much.
Attachments
Everything_.htm_Search.zip
Demonstrative Example
(5.52 KiB) Downloaded 361 times
Last edited by AE_AE on Sun Mar 22, 2020 4:11 am, edited 1 time in total.
NotNull
Posts: 5517
Joined: Wed May 24, 2017 9:22 pm

Re: Search in .htm (.html) files

Post by NotNull »

If I understand the content: function correctly (I don't use it very often), it will search in the resulting text of a document (minus formatting and layout), just as Windows Search uses for indexing (see iFilter).

If you want to search in the raw text, you can use some other Everything functions:
ansicontent:
utf8content:
utf16content:
utf16becontent:

In your case, replace content: with utf8content: to also search in - for example - HREF attributes.
AE_AE
Posts: 2
Joined: Mon Nov 11, 2019 6:56 pm

Re: Search in .htm (.html) files

Post by AE_AE »

Hello, NotNull!

Thanks very much for your quick answer. You helped me.

Now I find any text in .htm (.html) files by using request: .HTM utf8content:

Also I have began to use your advice for another cases.

For example, Everything_1.4.1.935 DOESN'T find Russian words (letters) in .txt type of files, when character encoding is UTF-8 without BOM.

Now I find any text in .txt files, when character encoding is UTF-8 without BOM, by using request: .txt utf8content:
NotNull
Posts: 5517
Joined: Wed May 24, 2017 9:22 pm

Re: Search in .htm (.html) files

Post by NotNull »

:thumbsup:

Glad that I could help.

FYI: it is under consideration for a next major version of Everything to bypass this iFilter behaviour for certain text-based files like xml, html and json.
That way you don't have to use the utf8content: function to find your text, but you can use the "normal" content: function instead.
Post Reply