Re: [code] Re: Debugging language modules

From: Mitchell <>
Date: Mon, 28 Mar 2016 22:23:38 -0400 (EDT)

Hi Arnel,

On Tue, 29 Mar 2016, Arnel wrote:

> On Mon, 28 Mar 2016 09:15:46 -0400 (EDT), Mitchell <> wrote:
>>>>> - Is there a better way to debug lexer modules? The Racket lexer I'm working on
>>>>> was based off the Scheme lexer file provided with TA. I've read somewhere in
>>>>> the API manual that troubleshooting lexers can be tricky and it's recommended
>>>>> to run TA in the terminal to get the error messages. I tried this but I didn't
>>>>> get any. Those who have written lexers for other languages before - any
>>>>> pointers? Anything on seeing what's actually captured by the LPEG expressions
>>>>> would be great.
>>>> If you don't see any error messages in the terminal by default, then that
>>>> means your lexer is well formed and is processing text just fine. However,
>>>> that doesn't mean your lexer is processing text as you'd expect! Robert
>>>> already mentioned using Scintillua as a library (which is an idea I hadn't
>>>> thought of!). Normally I just use:
>>>> P(function(input, index)
>>>> _G.print(...)
>>>> return index
>>>> end
>>>> and put that in a pattern I'm debugging. The "return index" line ensures
>>>> that debug function "matches" so that text matching can continue.
>>> Could you elaborate further how I can add this to any pattern? Say I have
>>> something like:
>>> local keywords = token(l.KEYWORD, word_match({
>>> '#%app', '#%datum', '#%declare', '#%expression', '#%module-begin',
>>> ;; ...
>>> }, '!#%*+-./:=>?_'))
>>> How do I use that to print the captured text (or index as it were)?
>>> (I tried the example given in Scintillua for using it as a library, but for
>>> some reason I'm getting an error message about not seeing 'lpeg' even though
>>> I've installed it via luarocks, so I thought I'd try this instead.)
>> local keywords = token(l.KEYWORD, word_match({
>> '#%app', '#%datum', '#%declare', '#%expression', '#%module-begin',
>> ;; ...
>> }, '!#%*+-./:=>?_')) * P(function(input,index)
>> print('keyword end:', index)
>> return index
>> end)
>> That will print the index of the end of a matched keyword.
>> As for your issue with LPeg, it may help to check where LuaRocks installed
>> lpeg (`luarocks list`) and open a Lua interactive session (`lua`) and type
>> `=package.cpath`. If lpeg isn't in the cpath, then you may be running the
>> incorrect version of Lua. For example on my Ubuntu machine, I have Lua
>> 5.0, Lua 5.1, and Lua 5.2 installed, but only Lua 5.1 can see LuaRocks
>> modules by default. Since `lua` points to Lua 5.2, I have to use `lua5.1`
>> in order to 'require' lpeg.
> Finally figured out what was causing my Racket lexer not to work.
> The list of the built-in keywords and function names were originally generated
> by a Racket script given by someone from the Racket IRC channel. I suspect
> it was a slightly modified version of the one provided with Emacs's
> "racket-mode".
> Anyway, it turned out a couple of the function names contained Unicode
> characters (U+2200 and U+018E), and the original script for some reason
> provided their names as a pair of "u'new-/c'" strings. When I commented out
> those two strings, the lexer started working properly.
> There's a small number of functions in Racket which use Unicode characters
> (including the lambda symbol). Should I leave all of them out for now? I'm not
> completely sure if they will work properly on Windows (unless they install a
> font with support for Unicode symbols like DejaVu Sans Mono, maybe).

If you use those UTF-8 characters directly in your keyword strings, (no
strange u'xxxx', U+xxxx, or \uXXXX notation) then they may work just fine
as keywords. You are correct that you'll need a font to display such
characters, but the lexer should be able to handle the bytes okay.


You are subscribed to
To change subscription settings, send an e-mail to
To unsubscribe, send an e-mail to
Received on Mon 28 Mar 2016 - 22:23:38 EDT

This archive was generated by hypermail 2.2.0 : Tue 29 Mar 2016 - 06:54:38 EDT