Tokenising BBC BASIC code ========================= J.G.Harston, 25-Nov-1997 BBC BASIC programs are tokenised, that is, BASIC keywords are stored as one or two byte values. This result in programs which execute faster and are more compact. A tokenised line can easily be detokenised, or expanded, as there is a one-to-one mapping between token values and the expanded string. For example, code similar to the following would expand a tokenised line: quote%=FALSE REPEAT IF ?addr%<128 OR quote% THEN VDU ?addr% ELSE P.token$(?addr%); IF ?addr%=34 quote%=NOT quote% addr%=addr%+1 UNTIL ?addr%=13 Tokenising, however, is more fiddly. Tokens can be abbreviated on entry and characters are only tokenised at certain parts of the line. For instance, in the following line: ON NOON GOTO 1,2 the fist 'ON' is the token ON, but the second 'ON' is part of the variable 'NOON'. The second 'ON' must be left untokenised. EVAL tokenises the supplied string and evaluates it as an expression. Usefully, the tokenised string can be retrived from where BASIC has stored it. In 6502 BASIC: A%=EVAL("0:"+A$) token$=$((!4 AND &FFFF)-LENA$-1) In Z80 BASIC: A%=EVAL("0:"+A$) token$=$(string_buffer) In 32000 BASIC: A%=EVAL("0:"+A$) token$=$(!&1B2+2) in PDP11 BASIC: A%=EVAL("0:"+A$) token$=$(^@%-510) In ARM BASIC: SYS "XOS_GenerateError",0,STRING$(255,"*") TO ,A% A%!-36=0:B%=EVAL("0:"+A$) token$:=$(A%-14+4*(A%!-36<>0)) In DOS BASIC: B%=EVAL("0:"+A$) token$=$&102 In Windows BASIC: B%=EVAL("0:"+A$) token$=$(!332+2) By preceding the code you want to tokenise with "0:" you can safely pass it to EVAL without provoking a Syntax error. You can then extract the tokenised code from memory, so long as you do it immediately after calling EVAL. In later versions of ARM BASIC the stack has an extra word on it and the string is stored lower in memory. In Z80 BASIC the string buffer is in a different location in different versions. When machine code is entered with CALL or USR IX is set pointing to the string buffer, and this can be used to find it. This can be written as functions as follows: DEFFNTokenise_65(A$):LOCAL A%,B% A%=(!4AND&FFFF)-LENA$-1 B%=EVAL("0:"+A$):=$A% : DEFFNTokenise_Z80(A$):LOCAL A%,P%:Tokenise_Z80%=Tokenise_Z80% IF Tokenise_Z80%=0:DIM A% 4:!A%=&D9E1E5DD:A%?4=&C9:Tokenise_Z80%=USRA% A%=EVAL("0:"+A$):=$(Tokenise_Z80%-254) : DEFFNTokenise_32(A$):LOCAL A% A%=EVAL("0:"+A$):=$(!&1B2+2) : DEFFNTokenise_PDP(A$):LOCAL A% A%=EVAL("0:"+A$):=$(^@%-510) : DEFFNTokenise_ARM(A$):LOCAL A%,B% SYS "XOS_GenerateError",0,STRING$(255,"*") TO ,A% A%!-36=0:B%=EVAL("0:"+A$):=$(A%-14+4*(A%!-36<>0)) : DEFFNTokenise_DOS(A$):LOCAL A% A%=EVAL("0:"+A$):=$&102 : DEFFNTokenise_Win(A$):LOCAL A% WHILELEFT$(A$,1)=" ":A$=MID$(A$,2):ENDWHILE A%=EVAL("0:"+A$):=$(!332+2) These functions are used in full in the 'Tokenise' BASIC library at http://mdfs.net/System/Library/BLib. References ---------- Richard Russell, "Using the tokeniser", yahoogroups.com/group/bb4w message 86 History ------- 28-Apr-2009: Updated for ARM BASIC 1.30+, Z80 and DOS.