Storing Acorn/BBC metadata on other systems =========================================== File: Docs.Comp.BBC.Filing.Metadata - Update 0.10 Author: J.G.Harston - Date 26-07-2012 Introduction ============ Acorn/BBC filing systems store metadata (data about the data) with the file data. All filing systems on all platforms have a minimum of two pieces of metadata to be useful - the file name and the file length. Acorn filing systems have additional metadata, the minimum being a file's load address and execution address. Various filing systems and file storage systems have additional metadata. Metadata ======== Associated with any stored object is the follow object metadata: OSFILE block filename XY+0..1 The standard metadata: object type A 1 byte load address XY+2..5 4 bytes execution address XY+6..9 4 bytes object length XY+10..13 4 bytes access byte XY+14 1 byte modification date XY+15..16 2 bytes Additional metadata: modification time 3 bytes creation date 2 bytes creation time 3 bytes user account number 2 bytes auxilary account number 2 bytes These are typically abbreviated to: name type load exec length access mdate mtime cdate ctime acc aux When dealing with just the OSFILE metadata the access byte and modification date are often dealt with as a single attributes item, attr. The most common methods of storing Acorn/BBC metadata on other systems are in ZIP files and in INF files. ZIP files ========= ZIP files are an archive file format designed to be usable across different platforms and extendable to store additional file metadata. Objects stored in a ZIP file have a header with the object's metadata, and the "Acorn" field stores Acorn metadata: ZIP header Contents Size Acorn name 0 file header id "PK",&03,&04 4 bytes 4 version needed to extract 2 bytes 6 general purpose bit flag 2 bytes 8 compression method 2 bytes 10 last modified time in DOS format 2 bytes mtime 12 last modified date in DOS format 2 bytes mdate 14 crc-32 4 bytes 18 compressed size (l) 4 bytes 22 uncompressed size 4 bytes length 26 filename length (n) 2 bytes 28 extra field length (e) 2 bytes 30 filename n bytes name+type 30+n extra field e bytes 30+n+0 extra header id "AC" 2 bytes 30+n+2 extra header sublength 2 bytes 30+n+4 Acorn header id "ARC0" 4 bytes 30+n+8 load address 4 bytes load 30+n+12 execution address 4 bytes exec 30+n+16 attributes 4 bytes access/attr 30+n+20 &00000000 4 bytes 30+n+24 creation time 2 bytes ctime 30+n+26 creation date 2 bytes cdate 30+n+28 main account number 2 bytes acc 30+n+30 auxilary account number 2 bytes aux 30+n+e data l bytes 30+n+e+l ... Times and dates are stored in DOS format. The filename is stored in Unix format: directories are seperated by '/'s. Directory entries have '/' at the end of their filename and compressed and uncompressed sizes of zero. ZIP filenames and BBC filenames can be converted bidirectionally by swapping the following characters: / <-> . ? <-> # $ <-> < ^ <-> > = <-> @ & <-> + % <-> ; As ZIP files are extracted onto DOS/Windows platforms with no transformation other than using '\' for the DOS/Windows directory seperator, this is also the mapping to use when converting between BBC and DOS/Windows filenames. When extracting a filename with a spaces in it the space character is usually converted to '_', but note that this creates a non-bidirectional character mapping. The extra field only holds that information specified by the extra header length. On extraction, only that data present in the extra field should be written to the extracted files or directories. If there is no extra field it should be ignored or suitable defaults from the ZIP header used: If omitted: Use: load address 0 exec address load address access byte &33 creation time modification time creation date modification date user account do not set auxilary account user account Application Supported metadata ------------------------------- Archive type load exec length access mdate BBCZip type load exec length access mdate mtime cdate ctime acc aux DSUnzip type load exec length access mdate InfoZip type load exec length access Spark type load exec length access SprowUnZip type load exec length access ZipToInf type load exec length access mdate mtime cdate ctime acc aux INF Files ========= INF files are files that store Acorn/BBC metadata on filing systems that otherwise cannot store it. It is a text file with the same name as the data file, with a '.inf' or equivalent filename extension. It contains several space-seperated fields with the file's filename followed by the file's metadata in upper case hexadecimal in the following order: name load exec length access mdate mtime cdate ctime acc aux for example: MyFileName FFFF1900 FFFF8023 00001273 33 7B23 123106 7B20 112708 0100 0040 The most common INF files just have the first four fields: MyFileName FFFF1900 FFFF8023 00001273 or just the first three fields: MyFileName FFFF1900 FFFF8023 The date and time fields are in filing system format. The type of the object is determined by the type of the object the INF file refers to. When output the INF file should be rigidly formatted, with the filename left padded within 11 spaces, wider only if the filename is longer than ten characters, exactly one space between the data fields, and the data fields with the full number of digits - 8 digits for the addresses, 2 digits for the access byte, 4 digits for dates, 6 digits for times. The line should end with CHR$13+CHR$10, a DOS end-of-line sequence. When reading, extra spaces and extra fields must be ignored silently. Fields may have leading zeros omitted. If a field is omitted, the application must ignore it, the reader must not report an error. As a special case, the access field can be a string starting with "L" which should be converted to "19" for Locked on DFS. Any fields can be progressively dropped from the righthand end of the string, for example: MyFileName FFFF1900 FFFF8023 MyFileName FFFF1900 FFFF8023 1273 MyFileName FFFF1900 FFFF8023 1273 33 7B23 123106 On extraction, only that data present in the INF file should be written to the extracted files or directories: If omitted: Use: load address 0 exec address load address access byte &33 modification date do not set modification time do not set creation date modification date creation time modification time user account do not set auxilary account user account INF files sometimes have two additional fields at the end of the line, CRC= and BOOT=, for instance: FRED FFFF1900 FFFF8023 CRC=1234 BOOT=1 JIM FFFF1900 FFFF8023 BOOT=2 SHEILA FFFF1900 FFFF8023 CRC=3456 On writing, they must be appended to the end of the line. If both fields are written they must be in the order CRC= BOOT=. On reading they should be assumed to be at the end of the line if they are present. Being space-seperated fields, INF files can be parsed with the same code used to parse a command line, for example, using the CmdLine library: A$=inf$ opt$="":A%=INSTR(A$,"BOOT="):IF A%:opt$=MID$(A$,A%+5):A$=LEFT$(A$,A%-1) crc$="":A%=INSTR(A$,"CRC=") :IF A%:crc$=MID$(A$,A%+4):A$=LEFT$(A$,A%-1) name$ =FNcl("",0) load$ =FNcl("",0) exec$ =FNcl("",0):IF exec$="":exec$=load$ length$=FNcl("",0) attr$ =FNcl("",0):IF LEFT$(attr$,1)="L":attr$="19" ELSE IF attr$="":attr$="33" mdate$ =FNcl("",0) mtime$ =FNcl("",0) cdate$ =FNcl("",0):IF cdate$="":cdate$=mdate$ ctime$ =FNcl("",0):IF ctime$="":ctime$=mtime$ acc$ =FNcl("",0) aux$ =FNcl("",0):IF aux$="":aux$=acc$ Application Supported metadata ------------------------------- ZipToInf load exec length access mdate mtime cdate ctime acc aux SJFiler load exec length access mdate mtime cdate ctime acc aux SoftMDFS load exec length access mdate mtime cdate ctime acc aux MkImg load exec SerialTube load exec Other INF-type files -------------------- SoftMDFS stores all the metadata for all directory entries in a single INF-style metadata file. This is a text file usually called "!!Metadata" or CHR$160+"Metadata" (a filename unreadable by NFS). It contains one line for each directory entry, possibly in random order. Each line is the filename, a TAB character, then an unbroken string of hex bytes representing eight words of metadata, with a LF end-of-line character. For example: ROMS FFFF0900FFFF091A000001FA0000003300000000000000000000000000000000 VERS FFFF0E23FFFF0E230000001F0000001900000000000000000000000000000000 Other archive files =================== There are some other archive file formats encountered on Acorn systems. The most common are Black archives, used with Andrew Black's Archive program, and GetBack archives, used with Acorn's Archive/GetBack programs originally written for archiving and restoring file server data. Black Archive format -------------------- 0000 00 00 0002 40 FF hh mm ll filelength-17 0007 40 00 00 00 00 000C 40 00 00 00 00 Repeated for each entry: 0 00 nn cc cc ... reversed_filename 2+nn 40 hh mm mm ll length 7+nn 40 hh mm mm ll load address 12+nn 40 hh mm mm ll exec address 17+nn 40 hh mm mm ll attrs 22+nn file data Header data is stored big-endian, high byte to low byte, as read and written by Acorn BBC BASIC's PRINT# and INPUT#. GetBack Archive format ---------------------- Repeated for each file: 0 filename, n filetype - 1=file, 2=directory n+1 disk number - starts at 1 n+2 load address n+6 exec address n+10 length - should be ignored if filetype=2 n+14 access byte n+15 modification date in filing system format n+17 modification time hh, mm (no seconds) n+19 bytes of file data Application Supported metadata ------------------------------- GetBack type load exec access mdate mtime Black type load exec access mdate ZX Spectrum =========== ZX Spectrum files can be saved on BBC filesystems, and BBC files can be saved on Spectrum filesystems. ZX Spectrum metadata can be interchanged with ZX Spectrum metadata in the following way: BBC Spectrum Load Addr b0-b15 <-> Start Addr (autorun line, array name, load address) Exec Addr b0-b15 <-> Parameter Addr (VARS-PROG, execution address) Length <-> Length Load/Exec b16-b17 <-> File type The Spectrum filetype encodes b16-b17 of the BBC load/exec addresses by taking b16-17 of the BBC load address and adding 4 times the difference between b16-17 of the BBC load address and b16-b17 of the execution address. The load/exec addresses are recreated by using b0-b1 of the Spectrum filetype to form b16-b17 of the BBC load address, and the filetype divided by 4 is subtracted from b0-b1 and forms b16-17 of the BBC execution address. This results in the usual Spectrum filetypes 0, 1, 2 and 3 being reflected in the load and execution address high bytes being the same, as here: LoadAddr ExecAddr &0000xxxx &0000xxxx -> Type 0 (BASIC) &0001xxxx &0001xxxx -> Type 1 (NumArray) &0002xxxx &0002xxxx -> Type 2 (CharArray) &0003xxxx &0003xxxx -> Type 3 (Code) Note that Interface 1 headers hold the metadata in a slightly order to tape headers, as shown here. Tape Header Interface 1 Header ----------- ------------------ Type 00 - BASIC 00 01 - - 0A 0B 0C 0D 0E 0F 10 00 01 02 03 04 05 06 07 08 +--+--- -- ---+--+--+---+--+---+--+ +--+--+--+---+--+---+--+---+--+ |00| Filename | LEN | LOAD | EXEC | |00| LEN |(LOAD)| EXEC | LOAD | +--+--- -- ---+--+--+---+--+---+--+ +--+--+--+---+--+---+--+---+--+ Type 01 - Number Array 00 01 - - 0A 0B 0C 0D 0E 0F 10 00 01 02 03 04 05 06 07 08 +--+--- -- ---+--+--+---+--+---+--+ +--+--+--+---+--+---+--+---+--+ |01| Filename | LEN | LOAD | EXEC | |01| LEN | EXEC | LOAD |(LOAD)| +--+--- -- ---+--+--+---+--+---+--+ +--+--+--+---+--+---+--+---+--+ Type 02 - Character Array 00 01 - - 0A 0B 0C 0D 0E 0F 10 00 01 02 03 04 05 06 07 08 +--+--- -- ---+--+--+---+--+---+--+ +--+--+--+---+--+---+--+---+--+ |02| Filename | LEN | LOAD | EXEC | |02| LEN | EXEC | LOAD |(LOAD)| +--+--- -- ---+--+--+---+--+---+--+ +--+--+--+---+--+---+--+---+--+ Type 03 - CODE 00 01 - - 0A 0B 0C 0D 0E 0F 10 00 01 02 03 04 05 06 07 08 +--+--- -- ---+--+--+---+--+---+--+ +--+--+--+---+--+---+--+---+--+ |03| Filename | LEN | LOAD | EXEC | |03| LEN | LOAD | EXEC |(LOAD)| +--+--- -- ---+--+--+---+--+---+--+ +--+--+--+---+--+---+--+---+--+ Time and Date formats ===================== DOS format date and times are stored as: date b0-b4 day of month 1-31 b5-b8 month 1-12 b9-b15 year-1980 0-127 time b0-b4 seconds/2 0-29 b5-b10 minutes 0-59 b11-b15 hours 0-23 Filing system format date and times are stored as: date b0-b4 day of month 0-31 b5-b7 (year-1981) DIV 16 b8-b11 month 1-12 b12-b15 (year-1981) MOD 16 time b0-b7 hours 0-59 b8-b15 minutes 0-59 b16-b23 seconds 0-59 References ========== * INF file format - mdfs.net/Docs/Comp/BBC/FileFormat/INFfile * ZIP Extra Field - mdfs.net/Docs/Comp/Archiving/Zip/ExtraField * ZIP file format - mdfs.net/Docs/Comp/Archiving/Zip/Format * Archive - mdfs.net/Apps/Archivers/Archiver * Archive/GetBack - mdfs.net/Apps/Archivers/Acorn * BBCZip - mdfs.net/Apps/Archivers/BBCZip * CmdLine library - mdfs.net/blib * InfoZip - mdfs.net/Apps/Archivers/InfoZip * MkImg - mdfs.net/Apps/DiskTools * Sainty Unzip - mdfs.net/Apps/Archivers/Sainty * SerialTube - mdfs.net/Software/Tube/Serial * Spectrum FileMap - mdfs.net/Software/Spectrum/Docs/FileMap * SJFiler - mdfs.net/Apps/Networking/MDFS * SoftMDFS - mdfs.net/Apps/Networking/FServers * SparkFS - mdfs.net/Apps/Archivers * Sprow UnZip - mdfs.net/Apps/Archivers * ZipToInf - mdfs.net/Apps/Archivers/ZipTools * RISC OS DOSFS sources