Re: [code] Regular expression \t does not behave as expected.

From: Mitchell <m.att.foicica.com>
Date: Wed, 1 Nov 2017 20:22:52 -0400 (EDT)

Hi Danny,

On Wed, 1 Nov 2017, Danny MacMillan wrote:

> It seems that \t sometimes matches a literal t rather than a literal tab
> character (maybe?)
>
> I have a TSV file as such:
>
> DEMO_UPK_KNOW Procedure ADJUST_BLUESTONE_AU
> DEMO_UPK_KNOW Procedure ADJUST_MENTOR_PACKAGE
> DEMO_UPK_KNOW Procedure ADJUST_MENTOR_THREAD
> DEMO_UPK_KNOW Procedure ADJUST_NOTE
> DEMO_UPK_KNOW Procedure GETAUIDBYAUCODEANDAUSID
> DEMO_UPK_KNOW Procedure GETEXTERNALAUCHILDREN
> DEMO_UPK_KNOW Procedure GETEXTERNALTITLEAULIST
> DEMO_UPK_KNOW Procedure GETTITLEAULIST
> DEMO_UPK_KNOW Procedure GETUPKMAPAUS
> DEMO_UPK_KNOW Procedure RESETALLLOTRACKING
>
> If I do a search for the following regex in a file with the above contents,
> it finds nothing.
>
> ^([^\t]+)\t([^\t]+)\t([^\t]+)$
>
> The real file is much larger than the above example. It will eventually find
> a row - the PROJWBS row below.
>
> EAI_P6_SANDBOX_DASH Synonym PROJECT
> EAI_P6_SANDBOX_DASH Synonym PROJWBS
> EAI_P6_SANDBOX_DASH Synonym TASK
>
> My initial surmise was that the previous row ending with "T" accounted for
> this. But I don't believe this is so, or at least I think there must be
> something else wrong perhaps in addition to this. The next match in the file
> is composed of all but the first and last lines in the below (the match spans
> 5 lines, which should not happen with the ^ and $ in there).
>
> EAI_P6_SANDBOX_DASH Synonym TASKACTV
> EAI_P6_SANDBOX_DASH View EAI_GREEN_UP
> EAI_P6_SANDBOX_DASH View EAI_GREEN_UP_SCHED_VARIANCE
> EAI_P6_SANDBOX_DASH View EAI_GREEN_UP2
> EAI_P6_SANDBOX_DASH View EAI_SCHEDULE_VARIANCE_VIEW
> EAI_P6_SANDBOX_DASH View EAI_SCHEDULE_VARIANCE_VIEW_AVG
> EAI_P6_SANDBOX_DASH View INSPECTION_SUMMARY
>
> HOWEVER!!! The behavour of "Find Next" and the behaviour of "Find Prev" are
> different. Find next will find the 5 middle lines as a single match. If I
> find next past this block, then find prev, it will find each of those 5 lines
> as its own match, which is what the behaviour should be. Unfortunately
> neither find next nor find prev are finding everything they should.

This looks like a known bug[1] in TRE, the library Textadept uses for its regex searches. If I change the last part of your pattern to:

   ^([^\t]+)\t([^\t]+)\t(.+)$

Textadept appears to match things correctly.

When I have some time I may attempt to investigate this, but I'm not terribly confident I'll be able to fix the issue.

Cheers,
Mitchell

[1]: https://github.com/laurikari/tre/issues/20

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Wed 01 Nov 2017 - 20:22:52 EDT

This archive was generated by hypermail 2.2.0 : Thu 02 Nov 2017 - 06:52:49 EDT