Re: [code] [scintillua] More lexer improvements from the vis editor community

From: Mitchell <>
Date: Sun, 19 Nov 2017 13:56:35 -0500 (EST)

Hi Marc,

On Sat, 25 Feb 2017, Mitchell wrote:

> Hi Marc,
> On Sat, 25 Feb 2017, Marc André Tanner wrote:
>> On Wed, Feb 22, 2017 at 10:18:46AM -0500, Mitchell wrote:
>>> On Wed, 22 Feb 2017, Marc André Tanner wrote:
>> [snip]
>>>> I noticed that you removed some lexers. Any particular reason for that?
>>>> Did they have specific problems I should be aware of? I know that at
>>>> least one of my users cares about APL, so I will most likely add that
>>>> back in my repo.
>>> No reason other than I figured it's not worth the effort in refactoring
>>> them
>>> if they're not likely to be used. I'm surprised to hear of a user of APL.
>> Ok, fair enough. I assume he would be willing to update the more obscure
>> lexers himself (he contributed the APL, Faust, Man, Protobuf, Pure and Spin
>> lexers).
> Now that you mention it, deleting APL was a mistake. For some reason, I
> thought I authored it... Like I said in a previous thread, it's been a long
> week :(
>> [snip]
>>>> Also is there a place where I can read upon the motivation / goals of
>>>> the refactoring? I'm not sure I agree with some of the changes (e.g.
>>>> word_match taking a string rather than a table?). But then you have
>>>> much more experience in Lua, I'm sure there is a good reason for it.
>>> No, it's all in my head :) Feel free to ask questions. I'm not even 100%
>>> sure this was/is a good idea. It's an experiment right now that I might
>>> end
>>> up throwing away.
>>> First I'd like to point out that one of my goals is to keep compatibility
>>> with legacy lexers.
>> I'm not sure I like that in the long term. If we manage to come up
>> with a clear improvement we should spend a one time effort to convert
>> all existing lexers to the new mechanism. Then deprecate the old one
>> and eventually remove it. Unless the maintenance effort for both schemes
>> is negligible. Though, I understand you probably provide backward
>> compatibility guarantees for textadept?
> Sure, in the long term that is reasonable. For the short term I'd want to
> maintain backwards compatibility for people that are using old, custom lexers
> they have written that are not in the repository.
>> [snip]
>>> Since compatibility is important, you can keep the table form of
>>> `word_match()` if you want. I personally don't like the idea of creating
>>> giant tables of keywords and having them stick around in memory. That's
>>> why
>>> I've moved to a single string.
>> I understand (and support) the motivation. However, does it actually
>> achieve that?
>> I meant to look at the actual implementation (can't seem to find your
>> branch
>> right now?) and what LPeg does under the hood, but didn't yet have the time
>> to do so, meaning the following high-level argumentation might be wrong.
>> You start out with one large string which has better memory characteristics
>> than many little ones (mostly due to the associated meta data). However,
>> then you split it, causing the creation of many tiny strings anyway. At
>> this
>> point you are consuming more memory (the same splitted strings plus the
>> additional long one) than in the old scheme. Now it depends whether you
>> keep references to them, thus preventing the GC from collecting them.
>> At first glance this seems to be the case because `word_match()` captures
>> the local variable `word_list` which uses the strings as table indices.
>> I haven't analyzed how Lua's short string optimization (keywords will
>> typically be shorter than 40 bytes) and string interning play into that.
> I have not committed a branch yet, but I've attached my working draft for
> your reference until I do (if I still continue with this endeavor). It is a
> drop-in replacement if you want to play around with it.
> The key is that the giant string passed to `word_match()` is an argument, and
> not local. Thus, after the lexer finishes loading, the GC throws it away, and
> only keeps the little strings in memory. You are correct that Lua interns
> small strings, so the original implementation of giant tables only has the
> table itself taking up memory. I suppose the gain is not as large as I
> thought.
>>> Some other motivations are I don't like the idea of "magic fields" (e.g.
>>> `_rules`, `_tokenstyles`). I think the object-style of `:add_rule()` and
>>> `:add_style()` is better practice.
>> I can see the benefits in that. Although the `_rules` approach had the
>> nice effect that all rule references were grouped together making it easier
>> to spot mistakes in rule ordering.
> Yes, you are correct.
> Anyway, I ran into some issues converting old lexers to the new format, so
> I'm currently rethinking my approach (and whether or not it is still worth
> doing). I also like the idea of retaining comments in tables of words passed
> to `word_list()` for documentation of purposes. Giant strings cannot have the
> same thing.
> Thanks very much for your feedback. It's quite helpful. I think I got a bit
> overzealous and should take a more methodical approach moving forward.

I wanted to follow up with the fact that I have gone ahead and converted all but a half-dozen lexers to the new object-oriented lexer format along with documentation and committed to hg. We can start analyzing memory consumption, etc. for any further improvements.


You are subscribed to
To change subscription settings, send an e-mail to
To unsubscribe, send an e-mail to
Received on Sun 19 Nov 2017 - 13:56:35 EST

This archive was generated by hypermail 2.2.0 : Mon 20 Nov 2017 - 06:52:37 EST