Re: Re: [code] [textadept] Find in Files performance

From: johannes <jodak932.att.gmail.com>
Date: Sun, 21 Oct 2018 18:31:02 +0200

Hi Mitchell,

the problem with external tools like ack or grep that I see, is that
they always need to be installed somewhere on the system. And they may
be different on different OS. As I've seen Geany uses GNU grep and
there's a plugin for Gedit that uses it also. But there are other
Editors for example like sublime, that uses (according to Wikipedia) the
engine Oniguruma. Thats even faster than grep, at least on my system
with these editors. But such engines (i.e. oniguruma, pcre, boost) are
really heavy and I think they are not suitable for textadept. It seems,
that one reason that they're heavy is that they support modern regex
style, which makes them complicated. There's an interesting discussion
that I've skimmed through:
https://www.reddit.com/r/programming/comments/7j3433/regex_was_taking_5_days_to_run_so_i_built_a_tool/dr3kxt1/
There are also smaller (ca. 500 LOC) engines, i.e.:
https://github.com/kokke/tiny-regex-c
https://github.com/omtinez/tiny-rex/tree/master/src
They support mainly classical regex, but what I've seen they also would
not fit perfectly and would have to be adjusted. I'm thinking of an
engine that takes the whole document's text as one single string as
input, together with a regular expression and returns the begin and end
of each match. I know, the document's text also must be read in first
with an extra C-function.
An other option would be to use the C++11 stdlib engine, but it has weak
points as we know (i.e. no matching of \newlines)  Also it's performance
seems not to be the best:
https://github.com/fish-shell/fish-shell/issues/4304
So if you would be interested, I could attempt to write a small one in
C. Provided the performance is good and there are no bugs you could
compile it together with your C-code.
I really don't know how difficult this gets, but I've read a few things
in the last days that make me think it should be possible. But before I
really start, I want to know your opinion.
It's not only 'find_in_files' that could benefit, also in my module the
function 'goto_related_keyline' (similar to 'goto_definition') which
works with regex search. If it takes in addition to the project folder
also the libraries folders into account, then it gets slow at the moment.

Cheers,
Johannes

Am 20.10.2018 um 20:33 schrieb Mitchell:
> Hi Johannes,
>
> On Fri, 19 Oct 2018, johannes wrote:
>
>> Hi Mitchell,
>>
>> the performance of the 'find_in_files' function isn't the best. As I
>> think,
>> thats because the find mechanism is mainly taken from Scintilla, via
>> SCI_SEARCHINTARGET. For this the filecontent must be loaded every
>> time inside
>> a temporary buffer and then SCI_TARGETWOHLEDOCUMENT is called, which
>> makes
>> the whole process slow (correct me if I'm wrong). Do you think there
>> is a
>> possibility to speed it up via the existing scintilla toolchain?
>> Haven't seen
>> one so far.
>> In comparisson, other editors perform magnitudes faster at in-files
>> searching. But they all use external regex engines, or call other
>> processes
>> like GNU grep for this task.
>> A list with lots of regex engines and their use in texteditors can be
>> found
>> here:
>> https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
>> Do you think it would be theoretically possible to include some external
>> regex library and then refactor 'find_in_files' a little? Or maybe it
>> would
>> be good enough to implement a function which uses the regex library
>> of the
>> C++11 standard library.
>> For other tasks in my module I could use better regex search as well.
>
> I am open to suggestions. I don't know of a way to speed up "find in
> files" via the existing Scintilla toolchain. Perhaps the only
> effective way is to call upon an external tool, as most editors do.
> For the longest time I've just been using 'ack', but recently I've
> been forcing myself to use Textadept's built-in feature with some
> specially crafted filters. It's fast enough for me, even with large
> projects.
>
> Cheers,
> Mitchell

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Sun 21 Oct 2018 - 12:31:02 EDT

This archive was generated by hypermail 2.2.0 : Mon 22 Oct 2018 - 06:48:34 EDT