Re: [code] Enabling ta to edit large files

From: Peter Rolf <indiego.att.gmx.net>
Date: Mon, 25 Jun 2018 14:30:28 +0200

Hi Nicholas,

Am 2018-06-08 um 15:56 schrieb Nicholas Ochiel:
> I attempted to open a 40MB log file and a compiled react app (one
> large js file) and noticed that ta couldn't open/edit these files in a
> reasonable amount of time and without thrashing the cpu the whole time
> the buffer remained open. I notice the "large file" issue has been
> mentioned before but a more detailed technical breakdown of why this
> happens has never been provided as far as I can tell.
>
> Recently, vscode and atom both fixed their problems in this regard by
> implementing piece-table styled data structures for buffers:
> - http://blog.atom.io/2017/10/12/atoms-new-buffer-implementation.html
> - https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation
> - vis (C/Lua) also uses a "piece chain"
> https://github.com/martanne/vis/wiki/Text-management-using-a-piece-chain
> and opens large files instantly.
>
> I'd like to ask:
> - Please could a more technical description be of the ta/scintilla
> buffer data structure and its performance characteristics be provided
> so that the problem can be understood by plebeians such as myself?
>
> - What is the best solution to enable ta to perform as well as the
> above mentioned editors? Is there a reason why it shouldn't?
>
> - If a solution has already been proposed/considered, would anyone be
> willing to provide mentorship for me to implement the solution? If
> one hasn't been discussed, please could I be pointed in the right
> direction on
> 1) where in the codebase of ta/scintilla to focus my attention.
> 2) The best approach to profiling large file performance.
>
> - Would such a patch be accepted?
> --
> Sincerely,
> Nicholas Ochiel
>

You should try

  buffer.wrap_mode = WRAP_NONE
  buffer.idle_styling = IDLESTYLING_NONE

and see if it has an impact.

From my own experience with a synthetic test file for 'Elastic Tabstop
(ETS)' (4.5MB, a monolithic 'text' block of 36635 lines with 17
'tabstops' per line; all lines are wrapped). Results from this morning ...

~55s, wrap_mode = WRAP_WHITESPACE ; idle_styling = IDLESTYLING_ALL
~52s, wrap_mode = WRAP_WHITESPACE ; idle_styling = IDLESTYLING_NONE
~23s, wrap_mode = WRAP_NONE ; idle_styling = IDLESTYLING_ALL
~ 1s, wrap_mode = WRAP_NONE ; idle_styling = IDLESTYLING_NONE

All times are hand stopped (average of three runs) and Textadept was
started with the test file only.

A good amount of the time is used to calculate all the columns (>600K)
in ETS, but the bigger part is taken by the style calculation for the
word wrapping. Mh, makes me wonder.
The 'idle_styling' entry should be ignored (still +3s), if word wrapping
is active [1]. The default 'idle_styling' setting with deactivated word
wrapping gives the best results here (my new standard).

[1] https://www.scintilla.org/ScintillaDoc.html#SCI_SETIDLESTYLING

With both parameters set to 'NONE' the automatic 'initialization' of ETS
is no longer triggered at start-up, which explains the instant load. The
'style_at[]' table is still filled, but it takes a lot more time.

tl;dr
If I really need word wrapping (and the file allows it), I activate it
by hand.

Hope this helps,

Peter

@Mitchell: some outdated default values in the documentation

buffer.idle_styling: default is IDLESTYLING_ALL (IDLESTYLING_NONE for
CURSES)
buffer.indentation_guides: default is IV_LOOKBOTH (IV_NONE for CURSES)

-- 
You are subscribed to code.att.foicica.com.
To change subscription settings, send an e-mail to code+help.att.foicica.com.
To unsubscribe, send an e-mail to code+unsubscribe.att.foicica.com.
Received on Mon 25 Jun 2018 - 08:30:28 EDT

This archive was generated by hypermail 2.2.0 : Tue 26 Jun 2018 - 06:41:59 EDT