atari email archive

a collection of messages sent at Atari from 1983 to 1992.

On building a spellchecker in the 80's

I am reviewing contraction and possessive recognition techniques and all of the suggestions I have received have some flaw in them.

Good spellcheckers are hard to build in the 80's.


Webster, A dictionary program

(1 / 8)


     There is now available a spelling checker program. This program takes an
English text file and outputs the file with all the misspelled words and
their line numbers at the end.
     There are over 47000 words in the dictionary. This is a lot but not
enough to guarantee that your favorite words are in there. So, When you
find correctly spelled words marked as not in the dictionary please send
them to me and I will update the dictionary.
     To use this program put the following command in your login.com file:

      $	WEB*STER :== $DOC:WEBSTR 'P1

     Typing WEBSTER file.ext will run the program and process the mentioned
file. The default input extension is .MEM. Webster will create a file of the
same name with the extension .CRF

     Please forward any problems and comments to me. I will update the
dictionary every couple of weeks. If your favorite words do not show up
after you send them to me please be patient.

oops

(2 / 8)


I have been informed that the 'p1 in the command line of my previous
message is incorrect and should be removed. The correct line should be:
  $ web*ster :== $ doc:webstr
This should work fine. If there any problems let me know.

Webster, Contractions or possessives

(3 / 8)


     It has come to my attention that contractions (can't, didn't, etc) don't
work in the webster program. The Dictionary has these words and the program
will recognize them if I define the apostrophe (') as an alphabetic character.
I have done this and contractions now work. There is, however, a price for
this feature. The dictionary does not have all of the possible possessive
forms of words. Since the program is able to recognize the word "didn't",
it will not find the word "witch's" in the dictionary. I use contractions
more then I use possessives. If there is a problem with this I would like
to hear about it.

Webster, How about this?

(4 / 8)


Webster can now handle almost all cases of contractions and possessives.
If it can't find a word in the dictionary it scans the word and
truncates the word at any apostrophe that may be there. For example:
"witch's" will become "witch". The dictionary is consulted again with the
new word. This leave the only problem being a misspelling like
"can'tj". the 'tj is removed and the word "can" is in the dictionary.
So, no mention will be made of the word in the listing. If you don't like
this feature I will set it back to not handling possessives at all. 

Webster, The great debate

(5 / 8)


I guess I was unclear. Webster will only strip after the apostrophe if
the word is not found in the dictionary. Words like can't and didn't
will be found in the dictionary so they will not be flaged as wrong.
     The only inaccuracy is the "can'tj" case mentioned earlier as the
word "can'tj" is not in the dictionary.
     I am reviewing contraction and possessive recognition techniques
and all of the suggestions I have received have some flaw in them. Therefore,
I am regrouping and will release a new version over the weekend.

Webster, Contractions and possessives the final word.

(6 / 8)


     This is the final word on contractions vs. possessives. Webster
first looks up a word in the dictionary. This includes any apostrophes
that may be in the word. Most of the common contractions are listed
in the dictionary and will be found this way. If the word is not in
the dictionary Webster checks do see if the word ends in S'. If it
does, the ' is removed and a dictionary lookup is done on the root word.
If the root is not found in the dictionary the original word with the
apostrophe is added to the misspelled list. If the word ends in 'S
the 'S is removed and a spelling check is done on the root word. Once
again if the root is not in the dictionary the whole word is placed in
the misspelled list.
     This system will catch almost all cases of contractions and
possessives. There is one exception, however. A singular word ending
in S is, by the rules of usage, made into a possessive by adding 'S.
So, the possessive of abacus is abacus's. If you have this correct
in your text Webster will not complain. However, A common mistake for
this case is to treat the word as if it was plural. ( abacus') This
is incorrect but Webster will not flag it as so.
     This is not a major problem. I have found only a very few nouns
that end in S. It is difficult to say words like abacus's so people 
usually put these words in a prepositional phrase ( of the abacus).
The ackwardness of these words has caused most of them to fall into
disuse. Spending 10 minutes looking in the dictionary only uncovered
a few. ( abacus, lotus, mass, and marquis) I sure there are more but
not many.

Webstr, New words

(7 / 8)


The file [MAHAR]NUWORDS.DOC lists the new words that have been added
to Websters dictionary.

webster

(8 / 8)


Webster is no longer in the disk area DOC:. It is now in the area
DOK:. You should change your login.com file to reflect this change.
Message 1 of 8

Jan 26, 1984