Date : Mon, 27 May 1985 14:46:26 GMT
From : Mark Mallett <mem%sii.uucp@BRL.ARPA>
Subject: TOPS20 style parsing
Greetings.
About a year ago, I wrote a set of C routines to implement TOPS-20 style
parsing on my CP/M system. I have used these routines for a number of
programs, such as a reminder program and software which composes my
bulletin board system here in NH. With that, I am now convinced that
the routines work (at least to the point where I can use them), and have
put them together as a kit. What follows is the document for the
library.
If you have comments, please send me mail. If there is enough interest,
I shall post the sources. I hope that if this happens, someone will tell
me whether the net.micro.cpm or the net.sources area is appropriate; I am
sending this message off to both.
Cheers,
Mark Mallett
decvax!sii!mem or ittvax!sii!mem
C O M N D
A TOPS-20 style command parsing library
for personal computers
Documentation and source code Copyright (C) 1985 by Mark
E. Mallett; permission is granted to distribute this document
and the code indiscriminately. Please leave credits in place,
and add your own as appropriate.
This Document
This document contains the following sections:
o Document overview (this here section)
o Introduction and history
o Functional overview
o How to write programs using the subroutine library
o How to make the library work on your system
Introduction and History
This document describes the COMND subroutine package for C
programmers. COMND is a subroutine library to effect
consistent parsing of user input, and in general is well suited
for verb-argument style command interfaces. The library
provides a consistent user interface as well as a program
interface which, I believe, could well remain unchanged if the
parsing library were re-written to support different interface
requirements (such as menu interaction).
The COMND interface is based on the TOPS-20 model.
TOPS-20 is an operating system which is/was used by Digital
Equipment Corporation on their PDP-20 computer. TOPS-20 was
based on TENEX, written by BBN (I think, I think). TOPS-20
COMND is much more robust and consistent than the library which
this document describes; this library being intended for small
computer applications, it provides the most commonly used
functions.
This library was written on a Z-80 system running Digital
Research Corporation's CP/M operating system version 3.0
(CPM+). I have also compiled and tried it on a VAX 11/780
running VMS. It is completely written in the C language, and
contains only a few operating system specific elements.
The COMND JSYS section of the TOPS-20 Monitor Calls manual
is probably a good thing to read.
Please note: while there are a few unimplemented sections
of this library, I felt that it was nevertheless worthwhile to
submit it to public domain since it is usable for almost all
general command parsing and since the call interface is well
defined. I have used this library extensively since sometime
in 1984.
Functional Overview
The COMND subroutine library provides a command-oriented
user interface which is consistent at the programmer level and
at the user level. At the program level, it gives an
algorithmically controlled parsing flow, where a call to the
library exists for each field or choice of fields to be
parsed.
At the user level, the interface provides:
o Command prompting.
o Consistent command line editing. The user may use editing
keys to erase the last character or word, and to echo the
current input line and prompt.
o Input abbreviation and defaulting. The user may type
abbreviations of keywords, or may type nothing to have
defaults applied.
o Incremental help. By pressing a known key (usually a
question mark), the user can find out what choices s/he
has.
o Guide strings. Parenthesized guide words are shown at the
users option.
o Command completion. Where the subroutine library can judge
what the succesful completion of a portion of user input
will be, the user can elect to have this input completed and
shown automatically.
Using the COMND Library
While you read this part of the document, you might want
to look at the sample program named TEST.C which has been
included with this package. It is an over-commented guide to
the use of the COMND library.
Any module which makes use of this library shall include
the definition file named "comnd.h". This file contains
definitions which are necessary to the caller-library
interface. Mnemonics (structures and constants) mentioned in
relation to this interface are defined in this file.
The philosophy of parsing with the COMND library is that a
command line is typed, the program inspects it, then the
program acts on the directions given in that line. This
process is repeated until the program finishes. The COMND
library assists the user in typing the command line and the
program in inspecting it. Acting on it is left up to the
calling program.
The typing and parsing of fields in the command line go
essentially hand-in-hand with this library. The single
subroutine COMND() is used to effect all parsing. This routine
is called for each element of the input line to be parsed.
Parsing is done according to a current parse state, which is
maintained in a parameter block passed between caller and
library. The state block contains the following sort of
information (described in detail later):
o What to use for a prompt string.
o Addresses of scratch buffers for user input and atom
storage.
o How much the user has entered.
o How much of the line the program has parsed.
An important thing to note is that the indexes (how much
entered and parsed) are both variable. The program begins
parsing of the input line upon a break signal by the user (such
as the typing of a carriage return, question mark, etc). The
user may then resume typing and erase characters back to a
point BEFORE that already parsed. It is very important that
the program does not take any action on what has been parsed
until the line has been completely processed, otherwise that
action could be undesired.
Since the user may back up the command input to a point
before that already processed by the application program, a
mechanism must be provided to backup the program to the correct
point. Rather than going to the point backed up to, the COMND
library expects the application program to return to the
beginning of the line, and start again. The user's input has
remained in the command line buffer, and the library will take
care of buffering the rest of the input when that parse point
is again reached. However, this means that there must be a
method of communicating to the calling program that this
"reparse" is necessary. Actually there are two methods
provided, as follows:
o Each call to the command parsing routine COMND() yields a
result code. The result may indicate that a reparse has to
take place. The program shall then back up to the point
where the parse of the line began, and start again.
o The application program may specify the address of a setjmp
buffer which identifies the reparse point. (Note setjmp is
a facility provided as part of most standard C libraries.
It allows you to mark a point in the procedure flow [call
frame, registers, and whatever else is involved in a
context], and return to that point from another part of the
program as if control had never proceeded. If you are
unfamiliar with this facility, you might want to find a
description in your C manual.) It is up to the caller to
setup the setjmp environment at the reparse point.
In either case, the reparse point (the point at which the parse
will be restarted if necessary) is the point at which the first
element of the command line is parsed. This is after the
initialization call which starts every parse.
Every call to the COMND() subroutine involves two arguments: a
command state block, in which is kept track of the parse state,
and a command function block, which describes what sort of
thing to parse next. The command state block is given a
structure called "CSBs", and a typedef called "CSB". Each
element of the structure is named with a form "CSB_xxx", where
"xxx" is representative of the element's purpose. The
following are the elements of the command state block, in the
order that they appear in the structure.
o CSB_PFL is a BYTE. This contains flags which are set by the
caller to indicate specifics of the command processing.
These flags are:
o _CFNEC: Do not echo user input.
o _CFRAI: Convert lowercase input to uppercase.
o CSB_RFL, a BYTE value, contains flags which are kept by the
library in the performance of the parse. Generally, these
flags are of no interest to the caller since their
information can be gleaned from the result code of the
COMND() call. However, they are:
o _CFNOP: No parse. Nothing matched, i.e., an error
occured.
o _CFESC: Field terminated by escape.
o _CFEOC: Field terminated by CR.
o _CFRPT: Reparse required.
o _CRSWT: Switch ended with colon.
o _CFPFE: Previous field terminated with escape.
o CSB_RSB is the address of a setjmp buffer describing the
environment at the reparse point. If this value is
non-NULL, then if a reparse is required, a longjmp()
operation is performed using this setjmp buffer.
o CSB_INP is the address of the input-character routine to
use. If this value is non-NULL, then this routine is called
to get each character of input. No line editing or special
interactive characters are recognized in this mode, since it
is assumed that this will be used for file input. Note
especially: this facility is not yet implemented, however
the definition is provided for future expansion. Thou shalt
always leave this NULL, or write the facility thyself.
o CSB_OUT is the inverse correspondent to the previous element
(CSB_INP). It is the address of a routine to process output
from the command library. Please see the warning in the
CSB_INP description about not being implemented.
o CSB_PMT is the address of the prompt string to use for
command parsing. The command library takes care of
prompting, so make sure this is filled in.
o CSB_BUF is the address of the buffer to put user input into
as s/he is typing it in.
o CSB_BSZ, an int, is the number of bytes which can be stored
in CSB_BUF; i.e., it is the buffer size.
o CSB_ABF is the address of an atom buffer. Some (if not all)
parsing functions involve extracting some number of
characters from the input buffer and interpreting or simply
returning this extracted string. This buffer is necessary
for those operations. It should probably be as large as the
input buffer (CSB_BUF), but it is really up to you.
o CSB_ASZ, an int, is the number of characters which can be
stored in CSB_ABF; i.e., it is the size of that buffer.
** Note ** CSB elements from here to the end do not have to
be initialized by the calling program. They are used to
store state information and are initialized as required by
the library.
o CSB_PRS, an int, contains the parse index. This is the
point in the command buffer up to which parsing has been
achieved.
o CSB_FLN, an int, is the filled length of the command
buffer. This is the number of characters which have been
typed by the user.
o CSB_RCD, an int, is a result code of the parse. This is the
same value which is returned as the result of the COMND()
procedure call.
o CSB_RVL is a union which is used to contain either an int or
a long value. The names of the union elements are: _INT
for int, _ADR for address (note that a typecast should be
used for proper address assignment). This element contains
a value returned from some parse functions which return
values which are single values. For example, if an integer
is parsed, its value is returned here.
o CSB_CFB is the address of a command function block for which
a parse was successful. This is significant in cases where
there are alternative possible interpretations of the next
command element.
The parse of each element in a command line involves, as well
as the Command State Block just described, a Command Function
Block which identifies the sort of thing to be parsed. This
block is defined in a structure named "CFBs", which has a
corresponding typedef named "CFB". Elements of the CFB, named
"CFB_xxx", are as follows (in the order they appear in the
structure):
o CFB_FNC, a BYTE, is the function code. This defines the
function to be performed. The function codes are listed,
and their actions described, a little later.
o CFB_FLG, a BYTE, contains flags which the caller specifies
to the library. These are very significant, and in most
cases affect the presentation to the user. The flag bits
are:
o _CFHPP: A help string has been supplied and should be
given when the user types the help character ("?").
o _CFDPP: A default string has been supplied, and shall be
used if the user does not type anything at this point
(typing nothing means typing a return or requesting
command completion). Note that this flag (and the
default string) is ONLY significant for the CFB passed in
the call to the COMND() routine, and not for any others
referenced as alternatives by that CFB.
o _CFSDH: The default help message should be supressed if
the user types the help character ("?"). This is
normally used in conjunction with the _CFHPP flag.
However, if this flag is present and the _CFHPP is not
selected, then the help operation is inhibited, and the
help character becomes insignificant (just like any other
character).
o _CFCC: A character characteristic table has been
provided. A CC table identifies which characters may be
part of the element being recognized. Not all functions
support this table (for example, it does not make sense
to re-specify which characters may compose decimal
numbers). This table also specifies which characters are
break characters, causing the parser to "wake up" the
calling program when one of them is typed. If this bit
is not set (as is usually the case), a default table is
associated according to the function code.
o _CFDTD: For parsing date and time, specifies that the
date should be parsed.
o _CFDTT: For parsing date and time, specifies that the
time should be parsed.
o CFB_CFB is the address of another CFB which may be invoked
if the user input does not satisfy this CFB. CFBs may be
chained in this manner at will. Recognize, however, that
the ORDER of the chain plays an important part in how input
is handled, particularly in disambiguation of input. Note
also that only the first CFB of the chain is used for
specifying a default string and CC table (for command
wake-up).
CFB chaining is a very important part of parsing with this
library.
o CFB_DAT is defined as a long, since it is used to contain
address or int values. It should be referenced via
typecast. It is not defined as a union because it is
inconvenient or impossible to initialize unions at compile
time with most (all?) C compilers, and initialization of
these blocks at runtime is not desirable. This element
contains data used in parsing of a field in the command
line. For instance, in parsing an integer, the caller
specifies the default radix of the integer here.
o CFB_HLP is the address of a caller-supplied help string.
This is only significant if the flag bit _CFHPP is set in
the CFB_FLG byte.
o CFB_DEF is the address of a caller-supplied default string.
This is only significant if the flag bit _CFDPP is set in
the CFB_FLG byte, and only for the first CFB in the CFB
chain.
o CFB_CC is the address of a character characteristics table.
This is only significant if the flag bit _CFCC is set in the
CFB_FLG byte. This is the address of a 16-word table, each
word containing 16 bits which are interpreted as 8 2-bit
characteristic entries. The most significant bits
correspond to the lower ASCII values, etc. The 2-bit binary
value has the following meaning, per character:
o 00: Character may not be part of the element being
parsed.
o 01: Character may be part of the element only if it is
not the first character of that element.
o 02: Character may be part of the element.
o 03: Character may not be part of the element;
furthermore, when it is typed, it will case parsing to
begin immediately (a wake-up character).
The function code in the CFB_FC element of the command
function block specifies the operation to be performed on
behalf of that function block. Functions are described now.
CFB function _CMINI: Initialize
Every parse of a command line must begin with an
initialization call. This tells the command library to reset
its indexes, that the user must be prompted, etc. There may be
NO other CFBs chained to this one, because if they are, they
are ignored.
The reparse point is the point right after this call. If
the setjmp method is used, then the setjmp environment should
be defined here. After the reparse point, any variables etc
which may be the victims of parsing side-effects should be
initialized.
CFB function _CMKEY: Keyword parse
_CMKEY parses a keyword from a given list. The CFB_DAT
element of the function block should point to a table of string
pointers, ending with a NULL pointer. The user may type any
unique abbreviation of a keyword, and may use completion to
fill out the rest of a known match. The address of the pointer
to the matching string is returned in the CSB_RVL element of
the command state block. The value is returned this way so
that the index can be easily calculated, and because it is
consistent with the general keyword parsing mechanism
(_CMGSK).
The incremental help associated with keyword parsing is
somewhat special. The default help string is "Keyword, one
of:" followed by a list of keywords which match anything
already typed. If a help string has been supplied (indicated
by _CFHPP) and no suppression of the default help is specified,
then the initial part ("Keyword, ") is replaced with the
supplied help string and the help is otherwise the same. If a
help string has been supplied and the default has been
supressed, then the given help string is presented unaltered.
CFB function _CMNUM: number
This parses a number. The caller supplies a radix in the
CFB_DAT element of the function block. The number parsed is
returned (as an int) in the CSB_RVL element of the state
block.
CFB function _CMNOI: guide word string
This function parses a guide word string (noise words).
Guide words appear between significant parts of the command
line, if they are in parentheses. They do not have to be
typed, but if they are, they must match what is expected. If
the previous field ended with command completion, then the
guide words are shown automatically by the parser.
An interesting use of guide word strings is to provide
alternate sets with the command chaining feature. The parse
(and program) flow can be altered depending on which string was
matched.
CFB function _CMCFM: confirmation
A confirmation is a carriage return. The caller should
parse a confirmation as the last thing before processing what
was parsed. Since carriage return is by default a wake-up
character, requiring a confirmation will (if you don't change
this wake-up attribute) require that the parse be completed
with no extra characters typed. A parse with this function
code returns only a status.
CFB function _CMGSK: General storage keyword
This call provides for parsing of one of a set of keywords
which are not arranged in a table. Often, keywords are
actually stored in a file or in a linked list. The caller
fills in the CFB_DAT element of the command function block with
the address of a structure named CGKs (typedef CGK), which
contains the following elements:
o CGK_BAS: A base address to give to the fetch routine. Does
not matter what this is, as long as the fetch routine
understands it.
o CFK_CFR: The address of a keyword fetch routine. The
routine is called with the CGK_BAS value, and the address of
the pointer to the previous keyword. It is expected to
return the address of the pointer to the next keyword, or
with the first one if the passed value for the previous
pointer is NULL.
When this function completes successfully, it returns
the address of the pointer to the string in the CSB_RVL
element in the command state block. Please see the
description of the _CMKEY function code for a description of
help and other processing.
CFB function _CMSWI: Parse a switch.
This is functionally equivalent to _CMKEY, and exists to
fill a need for switch parsing. Basically it is a placeholder
for an unimplemented function.
CFB function _CMTXT: Rest of line
This function parses the text to the end of the line.
Note that this does not parse the trailing break character
(i.e. the carriage return). The text is returned in the atom
buffer which is defined (by the caller) by the CSB_ABF and
CSB_ASZ elements of the command state block.
CFB function _CMTOK: token
This function will parse an exact match of a particular
token. A token is a string of characters, whose address is
supplied by the caller in the CFB_DAT element of the command
function block. This function is mainly useful for parsing
such things as commas and other separators, especially where it
is one of several alternative parse functions. It returns no
value other than its status.
CFB function _CMUQS: unquoted string
This function parses an unquoted string, consisting of any
characters other than spaces, tabs, slashes, or commas. This
set may of course be changed by supplying a CC table. The
unquoted string is returned in the atom buffer associated with
the command state block.
CFB function _CMDAT: parse date/time
This function parses a date and/or time. The caller
specifies, via flag bits in the CFB_FLG byte of the command
function block (as identified above) which of date, time, or
both, are to be parsed. The date and time are returned as the
first two ints in the atom buffer which is associated with the
command state block. Note that both date and time are
returned, regardless of which were requested.
Note further that this routine is not fully implemented as
of this writing.
Calling the COMND library
All that you need to know to use the above information is
how to call the command library. Basically, there is one
support routine: COMND(). It is used like this:
status = COMND (csbp, cfbp);
Here, "csbp" is the address of the command state block,
and "cfbp" is the address of the command function block. The
COMND() routine returns an int status value, which is one of
the following:
o _CROK: The call succeeded; a requested function was
performed. The address of the matching function block is
returned in the CSB_CFB element of the command state block,
and other information is returned as described above.
o _CRNOP: The call did not succeed; nothing matched.
o _CRRPT: The call did not succeed because the user took back
some of what had already been parsed. In other words, a
reparse is required, and your program must back up to the
reparse point. Note that if you specify a setjmp buffer
address in the CSB_RSB element of the command state block,
you will never see this value because the COMND library will
execute a longjmp() operation using that setjmp buffer.
o _CRIFC: The call failed because you provided an invalid
function code in the command function block (or in one which
is chained to it). You have made a programming error.
o _CRBOF: Buffer overflow. The atom buffer is too small to
contain the parsed field.
o _CRBAS: Invalid radix for number parse.
o _CRAGN: You should not see this code. It is reserved for a
support-mode call to the subroutine library.
Installing the COMND library
This part of the document describes the modules which come
with the COMND library kit, and what you might have to look at
if the code does not instantly work on your system (which will
probably be the case if your system is not the same kind as the
one which you got it from).
The files which come in the COMND kit are as follows:
o COMND.R - Source for this document, in a form suitable for
the public domain formatting program called "roff4".
o COMND.DOC - This document.
o MEM.H - A file of my (Mark Mallett) definitions which are
used by the code in the command subroutine library.
o COMND.H - Command library interface definitions.
o COMNDI.H - Command library implementation definitions.
o COMND.C - Primary module of the COMND library. Contains
user input buffering and various library support routines.
o CMDPF1.C - First module of parse function processing
routines.
o CMDPF2.C - Second module of parse function processing
routines.
o CMDPFD.C - Contains the date/time parse function routines.
This is included in a separate module so that it can be
replaced with a stub, since few programs (that I have
written, anyway) use this function, and it does take up a
bit of code.
o CMDPSD.C - A stub for the date/time parsing functions. This
can be linked with programs which do not actually use the
date/time parse function.
o CMDOSS.CPM - Operating system specific code which works for
CP/M. This is provided as a model for the routines which
you will have to write for your system.
o CMDDTM.CPM - Date/time support routines for version 3.0 of
CP/M. This is a module containing routines to get the date
and time from the operating system, and to encode/decode
these values to and from internal form. This is provided as
a model; you will probably have to rewrite them for your
system.