Re: Encodings

From: Alex <alex.bep....at.gmail.com>
Date: Sat, 28 Feb 2009 05:59:25 -0800 (PST)

> I think that at least reading the BOM is a good start and gets us
> parity with Scite. It is the good ole 80/20 rule, right - reading the
> BOM gets results with minimal effort.

I do not see the 80 bit in your proposal, Vais. :) Yes, it offers a
simple solution to detecting BOMs. But as I said, in my experience
they are rare. Personally, I see the 80 bit in distinguishing between
Unicode and the platform encoding.

- Alex

On Feb 27, 6:28 am, vais <vsalik....at.gmail.com> wrote:
> Mitchell, this is just a proof of concept. I imagine you would call it
> from the open_helper function, right after you check to see whether
> the text is binary (contains null char) and set the buffer's encoding
> accordingly.
>
> I think that at least reading the BOM is a good start and gets us
> parity with Scite. It is the good ole 80/20 rule, right - reading the
> BOM gets results with minimal effort.  Heuristical analysis of text to
> determine encoding  is a whole different beast that I personally have
> no interest in - I never had any problems with Scite's encoding
> detection, and if TA does the same, I am happy.
>
> Vais
>
> On Feb 26, 8:52 pm, mitchell <mforal.n....at.gmail.com> wrote:
>
> > Vais,
>
> > > Without further ado, here is a very lame Lua implementation of Unicode
> > > encoding detection for textadept that actually gets the job done (I
> > > put it into file_io.lua and call it from open_helper, right after the
> > > null byte detection routine used to detect binary files):
> > > <snip>
>
> > Forgive me, but how do you use this function? Or is it just a proof of
> > concept in detecting encoding?
>
Received on Sat 28 Feb 2009 - 08:59:25 EST

This archive was generated by hypermail 2.2.0 : Thu 08 Mar 2012 - 11:37:37 EST