STEDT & ED, regular expressions now available

(1 / 1)
Date: August 06, 1987 15:32
From: CHARM::MORRIS
To: @sys$mail:engineer
Announcing the latest version of STEDT V6.01.

This is an EDT/TPU like editor for the ST. Also available on the VAX.

I have added Regular expression searching and subtitution to STEDT. (ED on VAX)
This is a really powerful way to do search and replace.
^X PF3 (instead of Gold PF3) lets you enter a regular expression. Then
PF3 will search for a match to that regular expression just like the normal
search. As RE's can have unexpected match results the string actually matched
will be printed out on the bottom line enclosed in "< >".
Currently RE searches are only forward.

Another feature is being able to select a range and hit ^X PF3 and the selected
range will be copied to the regular expression. This allows you to have a file
of commonly used RE's and include it to the top window, select the RE you want
and then do ^X PF3. This will not search for the first match until you do PF3.
It will also allow you to edit the RE until it works properly, very handy.
NOTE that the RE's must only be one line.

Lastly the substitute command (Gold Enter) works slightly differently when
used with RE's. The matched string can be substituted into the text with '&'
and sub-matches can be substituted with \1 \2 etc.
This is explained on the next page.

An example is if you wanted to append every label in an assembly file with ':'

The RE is

^([A-Za-z][A-Za-z0-9_\.]*)		! matches a string whose first character
					! is alpha, and has 0 of more alphanum's
					! _ or . (ie a valid label)

the substitute string would be
\1:					! this will print out the label followed
					! by a colon.

Or if you wanted to do the same to labels not already terminated with a ':'
RE is
^([A-Za-z][A-Za-z0-9_]*)([ 	])	! anything terminated with space or tab
and substitute string
\1:\2					! substitute label part : then separator


The Documentation on RE's following can also be found in:

CHARM$USERDISK:[MORRIS.CPM68K.EMACS]STEDTDOC.TXT

The ST program is in:

CHARM$USERDISK:[MORRIS.CPM68K.EMACS]STEDT.PRG

and the VAX version:

ED:== $CHARM$USERDISK:[MORRIS.CPM68K.EMACS]ED.EXE

REGULAR EXPRESSION SYNTAX
A  regular  expression  is zero or more branches, separated by |.
It matches anything that matches one of the branches.  

A  branch  is  zero  or  more pieces, concatenated.  It matches a
match for the first, followed by a match for the second, etc.  

A piece is an atom possibly followed by *, +, or ?.
An atom followed by * matches a sequence of 0 or more matches of the atom.
An atom followed by + matches a sequence of 1 or more matches of the atom.
An atom followed by ? matches a match of the atom, or the null string.

An  atom is a regular expression in parentheses (matching a match
for  the  regular  expression), a range (see below), .  (matching
any  single  character),  ^  (matching  the  null  string  at the
beginning  of  the  input  string),  (matching the null string at
the  end of the input string), a \ followed by a single character
(matching  that  character),  or a single character with no other
significance (matching that character).  

A  range is a sequence of characters enclosed in [].  It normally
matches  any single character from the sequence.  If the sequence
begins  with ^, it matches any single character not from the rest
of   the  sequence.   If  two  characters  in  the  sequence  are
separated  by  -,  this  is  shorthand for the full list of ASCII
characters  between  them (e.g. [0-9] matches any decimal digit).
To  include  a  literal  ]  in  the  sequence,  make it the first
character  (following  a  possible  ^).   To include a literal -,
make it the first or last character.  

AMBIGUITY 
If  a  regular  expression could match two different parts of the
input  string,  it  will match the one which begins earliest.  If
both  begin  in  the  same  place but match different lengths, or
match  the  same  length in different ways, life gets messier, as
follows.  

In   general,  the  possibilities  in  a  list  of  branches  are
considered  in  left-to-right  order, the possibilities for *, +,
and   ?  are  considered  longest-first,  nested  constructs  are
considered  from  the  outermost  in, and concatenated constructs
are  considered leftmost-first.  The match that will be chosen is
the  one  that  uses the earliest possibility in the first choice
that  has to be made.  If there is more than one choice, the next
will  be  made  in the same manner (earliest possibility) subject
to the decision on the first choice.  And so forth.  

For  example,  '(ab|a)b*c'  could match 'abc' in one of two ways.
The  first choice is between 'ab' and 'a'; since 'ab' is earlier,
and  does  lead  to  a  successful  overall  match, it is chosen.
Since  the  'b'  is  already  spoken for, the 'b*' must match its
last  possibility,  the  empty  string, since it must respect the
earlier choice.  

In   the  particular  case where no |'s are present and there  is
only  one  *,  +,  or  ?,  the  net  effect is that  the  longest
possible   match   will  be  chosen.   So  'ab*', presented  with
'xabbbby',   will  match  'abbbb'.   Note  that if 'ab*' is tried
against   'xabyabbbz',  it will match 'ab' just after 'x', due to
the   begins-earliest   rule.   (In effect, the decision on where
to  start   the  match  is  the  first  choice  to be made, hence
subsequent  choices    must    respect    it    even    if   this
leads  them  to less-preferred alternatives.)  

REGULAR EXPRESSION SUBSTITUTION
===============================
Substitutions are made according to the  most  recent  RE search.
Each  instance  of  '&'  in  the  paste buffer is replaced by the
string that matched the whole  regular  expression. Each instance
of  '\n',  where  n  is  a digit, is replaced  by  the substrings
that  matched  parenthesized  expressions  within   the   regular
expression,    with   parenthesized   expressions   numbered   in
left-to-right order of their opening parentheses.  

To get a literal '&' or '\n' into dest, prefix it with '\'; to get
a literal '\' preceding '&' or '\n', prefix it with another '\'.