Re: Re: [code] [textadept] Find in Files performance

From: Mitchell <m.att.foicica.com>
Date: Mon, 22 Oct 2018 09:43:41 -0400 (EDT)

Hi Johannes,

On Sun, 21 Oct 2018, johannes wrote:

> Hi Mitchell,
>
> the problem with external tools like ack or grep that I see, is that they
> always need to be installed somewhere on the system.

I agree completely.

> And they may be
> different on different OS. As I've seen Geany uses GNU grep and there's a
> plugin for Gedit that uses it also. But there are other Editors for example
> like sublime, that uses (according to Wikipedia) the engine Oniguruma. Thats
> even faster than grep, at least on my system with these editors. But such
> engines (i.e. oniguruma, pcre, boost) are really heavy and I think they are
> not suitable for textadept. It seems, that one reason that they're heavy is
> that they support modern regex style, which makes them complicated. There's
> an interesting discussion that I've skimmed through:
> https://www.reddit.com/r/programming/comments/7j3433/regex_was_taking_5_days_to_run_so_i_built_a_tool/dr3kxt1/
> There are also smaller (ca. 500 LOC) engines, i.e.:
> https://github.com/kokke/tiny-regex-c
> https://github.com/omtinez/tiny-rex/tree/master/src
> They support mainly classical regex, but what I've seen they also would not
> fit perfectly and would have to be adjusted. I'm thinking of an engine that
> takes the whole document's text as one single string as input, together with
> a regular expression and returns the begin and end of each match. I know, the
> document's text also must be read in first with an extra C-function.
> An other option would be to use the C++11 stdlib engine, but it has weak
> points as we know (i.e. no matching of \newlines)  Also it's performance
> seems not to be the best:
> https://github.com/fish-shell/fish-shell/issues/4304
> So if you would be interested, I could attempt to write a small one in C.
> Provided the performance is good and there are no bugs you could compile it
> together with your C-code.
> I really don't know how difficult this gets, but I've read a few things in
> the last days that make me think it should be possible. But before I really
> start, I want to know your opinion.
> It's not only 'find_in_files' that could benefit, also in my module the
> function 'goto_related_keyline' (similar to 'goto_definition') which works
> with regex search. If it takes in addition to the project folder also the
> libraries folders into account, then it gets slow at the moment.

I am wary of adding an additional dependency to Textadept, even a small regex engine. I see the end result likely being a 3rd-party module that can be 'require'd and overrides Textadept's default "find in files" searches. (Similar to the module on the wiki that overrides the default Regex find with Lua patterns instead).

Cheers,
Mitchell

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Mon 22 Oct 2018 - 09:43:41 EDT

This archive was generated by hypermail 2.2.0 : Tue 23 Oct 2018 - 06:43:08 EDT