basic2c/README.md
2026-02-21 18:51:40 -06:00

951 lines
23 KiB
Markdown

# basic2c
A BASIC-to-C transpiler. Translates BASIC source code into equivalent C source
code with an embedded runtime library.
## Build
```
cc -Wall -o basic2c basic2c.c -lm
```
## Usage
```
basic2c [--release|-r] input.bas [output.c]
```
- If `output.c` is omitted, C code is written to stdout.
- `--release` (or `-r`) selects the release runtime (see [Runtime Modes](#runtime-modes)).
Compile the generated C:
```
cc -Wall -o program output.c -lm
```
## Architecture
The transpiler is a single-file C program with three phases:
1. **Lexer** — tokenizes BASIC source (case-insensitive keywords)
2. **Parser** — recursive descent, builds an AST
3. **Codegen** — walks the AST, emits C source with a small runtime library
## Data Types
| BASIC Type | C Type | Suffix | Notes |
|----------------|------------|--------|------------------------------|
| `BYTE` | `uint8_t` | | Unsigned 8-bit |
| `INTEGER` | `int16_t` | `%` | Signed 16-bit |
| `LONG` | `int32_t` | | Signed 32-bit |
| `FLOAT` | `float` | `!` | Single precision |
| `DOUBLE` | `double` | `#` | Double precision (default numeric) |
| `STRING` | `char*` | `$` | Dynamic, heap-allocated |
Type suffixes on variable names are recognized: `name$` is STRING, `count%` is
INTEGER, `total#` is DOUBLE, `rate!` is FLOAT. Variables without a suffix or
explicit type declaration default to DOUBLE.
Numeric types follow a promotion hierarchy: BYTE < INTEGER < LONG < FLOAT <
DOUBLE. Mixed-type expressions promote to the higher-ranked type.
## Variables and Arrays
### Declaration
```basic
DIM x AS DOUBLE
DIM name AS STRING
DIM count AS INTEGER
```
Variables can also be used without declaration they are implicitly declared
based on their type suffix or as DOUBLE by default.
### Arrays
```basic
DIM arr(10) AS INTEGER ' 1D array, indices 0..10
DIM matrix(3, 4) AS DOUBLE ' 2D array, indices 0..3 x 0..4
DIM cube(2, 3, 4) AS INTEGER ' 3D array
```
Arrays are zero-based. The dimension value is the upper bound (inclusive), so
`DIM arr(10)` allocates 11 elements (0 through 10).
### REDIM
```basic
REDIM arr(20) AS INTEGER ' Resize array (contents reset to zero)
REDIM matrix(5, 5) AS DOUBLE ' Resize multidimensional array
```
`REDIM` frees the previous allocation and creates a new zero-initialized array.
## Operators
### Arithmetic
| Operator | Description |
|----------|----------------------|
| `+` | Addition |
| `-` | Subtraction / unary negation |
| `*` | Multiplication |
| `/` | Division |
| `\` | Integer division |
| `MOD` | Modulo |
| `^` | Exponentiation |
### Comparison
| Operator | Description |
|----------|----------------------|
| `=` | Equal |
| `<>` | Not equal |
| `<` | Less than |
| `>` | Greater than |
| `<=` | Less than or equal |
| `>=` | Greater than or equal|
### Bitwise / Logical
| Operator | Description |
|----------|----------------------|
| `AND` | Bitwise AND |
| `OR` | Bitwise OR |
| `NOT` | Bitwise NOT |
| `XOR` | Bitwise XOR |
These operators work as both bitwise and logical operators. When used with
comparisons (which return 0 or 1), they behave logically: `x > 5 AND y < 10`.
When used with integers, they operate on individual bits: `15 AND 9` gives `9`.
### String
| Operator | Description |
|----------|----------------------|
| `+` | Concatenation (when operands are strings) |
| `&` | Concatenation (explicit) |
## Control Flow
### IF / THEN / ELSE
Single-line:
```basic
IF x > 0 THEN PRINT "positive" ELSE PRINT "non-positive"
```
Multi-line:
```basic
IF x > 0 THEN
PRINT "positive"
ELSEIF x = 0 THEN
PRINT "zero"
ELSE
PRINT "negative"
END IF
```
### FOR / NEXT
```basic
FOR i = 1 TO 10
PRINT i
NEXT i
FOR i = 10 TO 0 STEP -2
PRINT i
NEXT i
```
### WHILE / WEND
```basic
WHILE x > 0
x = x - 1
WEND
```
### DO / LOOP
```basic
DO
x = x + 1
LOOP UNTIL x >= 10
DO WHILE x < 100
x = x * 2
LOOP
```
### SELECT CASE
```basic
SELECT CASE grade
CASE 90 TO 100
PRINT "A"
CASE 80 TO 89
PRINT "B"
CASE 70 TO 79
PRINT "C"
CASE IS < 60
PRINT "F"
CASE ELSE
PRINT "D"
END SELECT
```
CASE values support single values (`CASE 1`), comma-separated values
(`CASE 1, 2, 3`), ranges (`CASE 5 TO 10`), comparisons (`CASE IS > 100`),
and a default (`CASE ELSE`). Works with both numeric and string expressions.
### EXIT
```basic
EXIT FOR
EXIT WHILE
EXIT DO
EXIT SUB
EXIT FUNCTION
```
### CONTINUE
```basic
CONTINUE FOR
CONTINUE WHILE
CONTINUE DO
```
Skips the rest of the current loop iteration and jumps to the next iteration.
### GOTO
```basic
GOTO 100 ' Jump to line number
GOTO myLabel ' Jump to named label
```
### GOSUB / RETURN
```basic
GOSUB 200
GOSUB myRoutine
' ...
200 PRINT "in subroutine"
RETURN
myRoutine:
PRINT "named routine"
RETURN
```
GOSUB uses a compile-time dispatch mechanism each GOSUB site gets a unique
return-point ID, and RETURN uses a switch statement to jump back.
### ON GOTO / ON GOSUB
```basic
ON choice GOTO label1, label2, label3
ON choice GOSUB routine1, routine2, routine3
```
Branches to the Nth label based on the expression value (1-based). If the
value is out of range, execution continues at the next statement.
### Labels
Both classic line numbers and named labels are supported:
```basic
10 PRINT "line 10"
20 GOTO 10
myLabel:
PRINT "named label"
GOTO myLabel
```
## Constants
```basic
CONST PI = 3.14159
CONST MAX_SIZE = 100
CONST GREETING$ = "Hello"
```
Constants are evaluated at compile time and substituted directly into
expressions. They cannot be reassigned.
## SWAP
```basic
SWAP a, b
SWAP s1$, s2$
```
Exchanges the values of two variables of the same type.
## Procedures
### SUB
```basic
SUB greet(name AS STRING)
PRINT "Hello, "; name
END SUB
CALL greet("World")
greet "World" ' CALL keyword is optional
```
### FUNCTION
```basic
FUNCTION square(x AS DOUBLE) AS DOUBLE
square = x * x
END FUNCTION
PRINT square(5)
```
Functions return values by assigning to the function name or using `RETURN expr`.
### Parameter Passing
```basic
SUB increment(BYREF x AS INTEGER)
x = x + 1
END SUB
SUB display(BYVAL x AS INTEGER)
PRINT x
END SUB
```
- `BYREF` (default) passes a pointer; changes affect the caller's variable
- `BYVAL` passes a copy; changes are local to the procedure
### LOCAL and STATIC
```basic
SUB counter()
STATIC count AS INTEGER
LOCAL temp AS INTEGER
count = count + 1
temp = count
PRINT temp
END SUB
```
- `LOCAL` declares a variable scoped to the procedure
- `STATIC` declares a variable that persists across calls
## User-Defined Types
### TYPE / END TYPE
```basic
TYPE PersonRecord
firstName AS STRING * 20
lastName AS STRING * 30
age AS INTEGER
salary AS DOUBLE
END TYPE
DIM person AS PersonRecord
person.firstName = "John"
person.lastName = "Doe"
person.age = 30
person.salary = 55000.50
```
String fields in TYPE definitions require a fixed length (`STRING * N`). Dynamic
strings (`AS STRING` without a length) are not permitted in TYPE definitions
because struct copy would produce dangling pointers.
Supported field types: `BYTE`, `INTEGER`, `LONG`, `FLOAT`, `DOUBLE`,
`STRING * N`, and other user-defined types (nesting).
### Nested UDTs
```basic
TYPE Vec2
x AS DOUBLE
y AS DOUBLE
END TYPE
TYPE Circle
center AS Vec2
radius AS DOUBLE
END TYPE
DIM c AS Circle
c.center.x = 10.0
c.center.y = 20.0
c.radius = 5.0
```
Nesting depth is unlimited. Chained dot-access works for both reads and writes.
### UDT Arrays
```basic
DIM points(10) AS Vec2
points(0).x = 1.5
points(0).y = 2.5
```
### UDT Assignment
Whole-struct copy via assignment:
```basic
DIM a AS Vec2
DIM b AS Vec2
a.x = 1.0
a.y = 2.0
b = a ' Copies all fields
```
Sub-struct copy also works:
```basic
DIM saved AS Vec2
saved = c.center ' Copy nested struct out
c.center = saved ' Copy nested struct in
```
Array element copy:
```basic
circles(0) = circles(2)
```
### SIZEOF
```basic
DIM sz AS LONG
sz = SIZEOF(PersonRecord)
```
Returns the byte size of a user-defined type. Used primarily with random-access
file I/O to specify record length.
## Built-in Functions
### String Functions
| Function | Description |
|-----------------------|------------------------------------------------|
| `LEN(s$)` | Length of string |
| `MID$(s$, start, len)` | Substring (1-based start position) |
| `LEFT$(s$, n)` | First n characters |
| `RIGHT$(s$, n)` | Last n characters |
| `CHR$(n)` | Character from ASCII code |
| `ASC(s$)` | ASCII code of first character |
| `STR$(n)` | Convert number to string |
| `VAL(s$)` | Convert string to number |
| `UCASE$(s$)` | Convert to uppercase |
| `LCASE$(s$)` | Convert to lowercase |
| `INSTR(haystack$, needle$)` | Find substring position (1-based, 0 if not found) |
| `STRING$(n, char$)` | Repeat a character n times |
| `LTRIM$(s$)` | Remove leading spaces |
| `RTRIM$(s$)` | Remove trailing spaces |
| `TRIM$(s$)` | Remove leading and trailing spaces |
| `SPACE$(n)` | String of n spaces |
| `HEX$(n)` | Hexadecimal string representation |
| `OCT$(n)` | Octal string representation |
### MID$ Assignment
```basic
DIM s AS STRING
s = "Hello World"
MID$(s, 7, 5) = "BASIC" ' s is now "Hello BASIC"
```
Replaces characters in a string starting at a 1-based position. The length
parameter limits how many characters are replaced.
### Math Functions
| Function | Description |
|------------|------------------------------------------|
| `ABS(n)` | Absolute value |
| `INT(n)` | Truncate to integer |
| `SQR(n)` | Square root |
| `SIN(n)` | Sine (radians) |
| `COS(n)` | Cosine (radians) |
| `TAN(n)` | Tangent (radians) |
| `ATN(n)` | Arctangent (returns radians) |
| `LOG(n)` | Natural logarithm |
| `EXP(n)` | e raised to the power n |
| `SGN(n)` | Sign: -1, 0, or 1 |
| `RND` | Random number between 0 and 1 |
Numeric expressions also support `^` for exponentiation (emitted as `pow()`).
### Print Formatting Functions
| Function | Description |
|------------|------------------------------------------|
| `TAB(n)` | Output spaces to reach column n |
| `SPC(n)` | Output exactly n spaces |
These functions are used within PRINT statements:
```basic
PRINT "Name"; TAB(20); "Value"
PRINT "A"; SPC(5); "B" ' Outputs "A B"
```
`RND` can be called with or without parentheses, and accepts an optional argument
(which is ignored) for compatibility with other BASIC dialects. Use `RANDOMIZE`
to seed the random number generator:
```basic
RANDOMIZE ' Seed from system clock
RANDOMIZE 12345 ' Seed with specific value
x = RND ' Random double 0..1
x = RND(1) ' Same as RND (argument ignored)
```
### Array Functions
| Function | Description |
|---------------|----------------------------------------------|
| `LBOUND(arr)` | Lower bound of array (always 0) |
| `UBOUND(arr)` | Upper bound of array |
### I/O Functions
| Function | Description |
|--------------|--------------------------------------------------|
| `EOF(n)` | Returns true (-1) if at end of file n |
| `LOF(n)` | Returns byte length of file n |
| `FREEFILE()` | Returns the next available file number |
## Console I/O
### PRINT
```basic
PRINT "Hello, World!"
PRINT "x = "; x
PRINT x; " "; y ' Semicolon suppresses newline between items
PRINT x, y ' Comma advances to next tab stop
PRINT "no newline"; ' Trailing semicolon suppresses final newline
? "shortcut" ' ? is a shortcut for PRINT
```
The `?` character can be used as a shortcut for `PRINT`, for compatibility with
classic BASIC dialects and interactive use.
### PRINT USING
```basic
PRINT USING "###.##"; 123.456 ' Outputs: 123.46
PRINT USING "$$#,###.##"; 1234.56 ' Outputs: $1,234.56
PRINT USING "+###.##"; -45.6 ' Outputs: -45.60
PRINT USING "**###.##"; 9.99 ' Outputs: ****9.99
PRINT USING "!"; "Hello" ' Outputs: H
PRINT USING "&"; "World" ' Outputs: World
PRINT USING "\ \"; "Testing" ' Outputs: Testin (6 chars)
```
Format specifiers for numbers:
| Format | Description |
|--------|-------------|
| `#` | Digit placeholder |
| `.` | Decimal point position |
| `,` | Thousands separator (in format, not output) |
| `+` | Show sign (+ or -) at start |
| `-` | Trailing minus for negative numbers |
| `$$` | Floating dollar sign |
| `**` | Fill leading spaces with asterisks |
Format specifiers for strings:
| Format | Description |
|--------|-------------|
| `!` | First character only |
| `&` | Entire string |
| `\ \` | Fixed width (spaces between backslashes + 2) |
Multiple values can be formatted with one format string:
```basic
PRINT USING "### + ### = ###"; 10; 20; 30
' Outputs: 10 + 20 = 30
```
### INPUT
```basic
INPUT "Enter name: "; name$
INPUT x
```
### LINE INPUT
```basic
LINE INPUT "Enter text: "; line$
```
Reads an entire line including commas and spaces.
## File I/O
### Sequential Files
```basic
' Write
OPEN "data.txt" FOR OUTPUT AS #1
PRINT #1, "Hello"
PRINT #1, 42
CLOSE #1
' Read
OPEN "data.txt" FOR INPUT AS #1
LINE INPUT #1, text$
INPUT #1, value
CLOSE #1
' Append
OPEN "log.txt" FOR APPEND AS #1
PRINT #1, "new entry"
CLOSE #1
```
### WRITE #
```basic
WRITE #1, name$, age, salary
```
Outputs CSV-style: strings are quoted, values are comma-separated, terminated
with a newline.
### Binary Files
```basic
OPEN "file.dat" FOR BINARY AS #1
```
### Random-Access Files
```basic
TYPE Record
name AS STRING * 20
value AS DOUBLE
END TYPE
DIM rec AS Record
rec.name = "test"
rec.value = 3.14
OPEN "data.dat" FOR RANDOM AS #1 LEN = SIZEOF(Record)
PUT #1, 1, rec ' Write record at position 1 (1-based)
GET #1, 1, rec ' Read record at position 1
CLOSE #1
```
Random-access uses `GET` and `PUT` with 1-based record numbers. The `LEN`
clause specifies record size in bytes. Records can be read and written in any
order.
### File Modes
| Mode | C Mode | Description |
|----------|--------|--------------------------------------|
| `INPUT` | `"r"` | Read sequential text |
| `OUTPUT` | `"w"` | Write sequential text (truncates) |
| `APPEND` | `"a"` | Append sequential text |
| `BINARY` | `"rb"` | Binary read |
| `RANDOM` | `"r+b"`| Random access (creates if not found) |
## DATA / READ / RESTORE
```basic
DATA 10, 20, 30, "hello"
DIM x AS INTEGER
DIM s AS STRING
READ x ' x = 10
READ x ' x = 20
READ x ' x = 30
READ s ' s = "hello"
RESTORE ' Reset read pointer to beginning
READ x ' x = 10 again
```
`DATA` statements define a pool of literal values. `READ` consumes them in
order. `RESTORE` resets the read pointer (optionally to a specific line number).
## Comments
```basic
' This is a comment
REM This is also a comment
x = 5 ' Inline comment
```
## $INCLUDE Metacommand
```basic
'$INCLUDE: 'helpers.bas'
```
The `$INCLUDE` metacommand inserts the contents of another file at the point
of the directive, before lexing and parsing. The directive is placed inside a
comment (the leading `'` makes it invisible to editors that don't understand it).
### Syntax
The filename is enclosed in single quotes after `'$INCLUDE:`. The keyword is
case-insensitive. Any amount of whitespace may appear between the colon and the
opening quote.
### Nested Includes
Included files may themselves contain `$INCLUDE` directives:
```basic
' main.bas
'$INCLUDE: 'math_lib.bas'
'$INCLUDE: 'string_lib.bas'
```
```basic
' math_lib.bas — can include further files
'$INCLUDE: 'constants.bas'
FUNCTION Square(x AS DOUBLE) AS DOUBLE
Square = x * x
END FUNCTION
```
### Path Resolution
Filenames are resolved relative to the **including file's directory**, not the
working directory. If `src/main.bas` includes `'lib/util.bas'`, the transpiler
looks for `src/lib/util.bas`.
### Error Reporting
When `$INCLUDE` is used, error messages show the originating file and line:
```
Error (math_lib.bas:12): undeclared variable 'q'
```
Without includes, the format is the same but shows the input filename:
```
Error (main.bas:5): type mismatch
```
### Circular Include Detection
If file A includes file B which includes file A, the transpiler reports a fatal
error rather than looping infinitely:
```
Error: Circular include detected: main.bas
```
## Extensible Functions
The transpiler supports two mechanisms for defining additional functions:
### Built-in Functions (builtins.def)
The `builtins.def` file is compiled into basic2c and provides functions that are
always available. To add permanent built-in functions, edit `builtins.def` and
recompile basic2c.
Default built-ins include:
**Math functions:**
| Function | Description |
|----------|-------------|
| `SQR(n)` | Square root |
| `SIN(n)` | Sine (radians) |
| `COS(n)` | Cosine (radians) |
| `TAN(n)` | Tangent (radians) |
| `ATN(n)` | Arctangent (returns radians) |
| `LOG(n)` | Natural logarithm |
| `EXP(n)` | e raised to power n |
| `SGN(n)` | Sign: -1, 0, or 1 |
| `RND()` | Random number 0 to 1 |
| `CEIL(n)` | Round up to integer |
| `FLOOR(n)` | Round down to integer |
| `ROUND(n)` | Round to nearest integer |
| `FIX(n)` | Truncate toward zero |
| `FRAC(n)` | Fractional part |
| `HYPOT(x, y)` | Hypotenuse (sqrt(x² + y²)) |
| `MAX(a, b)` | Maximum of two values |
| `MIN(a, b)` | Minimum of two values |
**String functions:**
| Function | Description |
|----------|-------------|
| `CHR$(n)` | Character from ASCII code |
| `STR$(n)` | Convert number to string |
| `UCASE$(s)` | Convert to uppercase |
| `LCASE$(s)` | Convert to lowercase |
| `LTRIM$(s)` | Remove leading spaces |
| `RTRIM$(s)` | Remove trailing spaces |
| `TRIM$(s)` | Remove leading and trailing spaces |
| `SPACE$(n)` | String of n spaces |
| `HEX$(n)` | Hexadecimal representation |
| `OCT$(n)` | Octal representation |
| `TAB(n)` | Spaces to reach column n |
| `SPC(n)` | Output n spaces |
| `ENVIRON$(name)` | Get environment variable |
**System:**
| Function | Description |
|----------|-------------|
| `TIMER()` | Seconds since program start |
### External Functions (functions.def)
The `functions.def` file is loaded at runtime from two locations (both if present):
1. The directory containing the `basic2c` binary (global extensions)
2. The directory containing the input `.bas` file (project-specific)
Functions from the input file's directory are loaded second, allowing project-specific
definitions to supplement or override earlier ones.
### Definition Format
Both `builtins.def` and `functions.def` use the same format:
```
# Comment lines start with #
# Format: name : type : c_template
SQUARE : double : ((%) * (%))
CUBE : double : ((%) * (%) * (%))
```
Each line defines:
- **name** The BASIC function name (case-insensitive)
- **type** Return type: `byte`, `integer`, `long`, `float`, `double`, or `string`
- **c_template** C code with argument placeholders
### Argument Placeholders
- `%` or `%1` First argument
- `%2` Second argument
- `%3` Third argument (and so on)
Arguments are substituted directly, so use parentheses in templates to ensure
correct precedence: `((%) * (%2))` not `% * %2`.
### Usage
```basic
PRINT CEIL(3.7) ' Outputs: 4
PRINT MAX(5, 10) ' Outputs: 10
t = TIMER() ' Get elapsed time
PRINT ENVIRON$("HOME") ' Print home directory
```
Extensible functions require parentheses, even with no arguments: `TIMER()` not `TIMER`.
## Runtime Modes
The transpiler supports two runtime modes selected at transpile time:
### Debug Mode (default)
The debug runtime includes error checking and diagnostics:
- NULL guards on string function arguments
- `malloc`/`calloc` failure checks with error messages
- File number bounds checking
- `fopen` failure reporting with filename
- GOSUB stack overflow/underflow detection
- All errors print to stderr and call `exit(1)`
### Release Mode (`--release` or `-r`)
The release runtime strips all diagnostic checks for minimal generated code:
- No NULL guards on string functions
- No malloc failure checks
- No file number bounds checking
- No GOSUB stack overflow/underflow checks
- ~8% fewer lines of generated C code
Functional guards are preserved in release mode to prevent crashes:
- `EOF()` returns true (-1) for NULL file handles (enables file existence checks)
- `LOF()` returns 0 for NULL file handles
- `CLOSE` is a no-op for NULL file handles
- `LINE INPUT` is a no-op for NULL file handles
- Temp string pool management (`_bfree_temps`, `_btmp`)
- String variable management (`_bstr_assign`)
## Limits
| Resource | Maximum |
|------------------------|---------|
| Token length | 4096 |
| Identifier length | 128 |
| Parameters per procedure | 32 |
| Symbol table entries | 2048 |
| GOSUB return sites | 512 |
| Line number labels | 4096 |
| AST nodes | 65536 |
| Arguments per call | 64 |
| User-defined types | 64 |
| Fields per type | 32 |
| Constants | 256 |
| Include nesting depth | 16 |
| Included files | 64 |
| Total source lines | 65536 |
## Example
```basic
TYPE Item
name AS STRING * 20
price AS DOUBLE
END TYPE
DIM items(2) AS Item
items(0).name = "Widget"
items(0).price = 9.99
items(1).name = "Gadget"
items(1).price = 24.95
items(2).name = "Doohickey"
items(2).price = 4.50
DIM i AS INTEGER
DIM total AS DOUBLE
total = 0
FOR i = 0 TO 2
PRINT items(i).name; " $"; items(i).price
total = total + items(i).price
NEXT i
PRINT "Total: $"; total
```
Transpile and run:
```
./basic2c example.bas example.c
cc -Wall -o example example.c -lm
./example
```