# basic2c A BASIC-to-C transpiler. Translates BASIC source code into equivalent C source code with an embedded runtime library. ## Build ``` cc -Wall -o basic2c basic2c.c -lm ``` ## Usage ``` basic2c [--release|-r] input.bas [output.c] ``` - If `output.c` is omitted, C code is written to stdout. - `--release` (or `-r`) selects the release runtime (see [Runtime Modes](#runtime-modes)). Compile the generated C: ``` cc -Wall -o program output.c -lm ``` ## Architecture The transpiler is a single-file C program with three phases: 1. **Lexer** — tokenizes BASIC source (case-insensitive keywords) 2. **Parser** — recursive descent, builds an AST 3. **Codegen** — walks the AST, emits C source with a small runtime library ## Data Types | BASIC Type | C Type | Suffix | Notes | |----------------|------------|--------|------------------------------| | `BYTE` | `uint8_t` | | Unsigned 8-bit | | `INTEGER` | `int16_t` | `%` | Signed 16-bit | | `LONG` | `int32_t` | | Signed 32-bit | | `FLOAT` | `float` | `!` | Single precision | | `DOUBLE` | `double` | `#` | Double precision (default numeric) | | `STRING` | `char*` | `$` | Dynamic, heap-allocated | Type suffixes on variable names are recognized: `name$` is STRING, `count%` is INTEGER, `total#` is DOUBLE, `rate!` is FLOAT. Variables without a suffix or explicit type declaration default to DOUBLE. Numeric types follow a promotion hierarchy: BYTE < INTEGER < LONG < FLOAT < DOUBLE. Mixed-type expressions promote to the higher-ranked type. ## Variables and Arrays ### Declaration ```basic DIM x AS DOUBLE DIM name AS STRING DIM count AS INTEGER ``` Variables can also be used without declaration — they are implicitly declared based on their type suffix or as DOUBLE by default. ### Arrays ```basic DIM arr(10) AS INTEGER ' 1D array, indices 0..10 DIM matrix(3, 4) AS DOUBLE ' 2D array, indices 0..3 x 0..4 DIM cube(2, 3, 4) AS INTEGER ' 3D array ``` Arrays are zero-based. The dimension value is the upper bound (inclusive), so `DIM arr(10)` allocates 11 elements (0 through 10). ### REDIM ```basic REDIM arr(20) AS INTEGER ' Resize array (contents reset to zero) REDIM matrix(5, 5) AS DOUBLE ' Resize multidimensional array ``` `REDIM` frees the previous allocation and creates a new zero-initialized array. ## Operators ### Arithmetic | Operator | Description | |----------|----------------------| | `+` | Addition | | `-` | Subtraction / unary negation | | `*` | Multiplication | | `/` | Division | | `\` | Integer division | | `MOD` | Modulo | | `^` | Exponentiation | ### Comparison | Operator | Description | |----------|----------------------| | `=` | Equal | | `<>` | Not equal | | `<` | Less than | | `>` | Greater than | | `<=` | Less than or equal | | `>=` | Greater than or equal| ### Bitwise / Logical | Operator | Description | |----------|----------------------| | `AND` | Bitwise AND | | `OR` | Bitwise OR | | `NOT` | Bitwise NOT | | `XOR` | Bitwise XOR | These operators work as both bitwise and logical operators. When used with comparisons (which return 0 or 1), they behave logically: `x > 5 AND y < 10`. When used with integers, they operate on individual bits: `15 AND 9` gives `9`. ### String | Operator | Description | |----------|----------------------| | `+` | Concatenation (when operands are strings) | | `&` | Concatenation (explicit) | ## Control Flow ### IF / THEN / ELSE Single-line: ```basic IF x > 0 THEN PRINT "positive" ELSE PRINT "non-positive" ``` Multi-line: ```basic IF x > 0 THEN PRINT "positive" ELSEIF x = 0 THEN PRINT "zero" ELSE PRINT "negative" END IF ``` ### FOR / NEXT ```basic FOR i = 1 TO 10 PRINT i NEXT i FOR i = 10 TO 0 STEP -2 PRINT i NEXT i ``` ### WHILE / WEND ```basic WHILE x > 0 x = x - 1 WEND ``` ### DO / LOOP ```basic DO x = x + 1 LOOP UNTIL x >= 10 DO WHILE x < 100 x = x * 2 LOOP ``` ### SELECT CASE ```basic SELECT CASE grade CASE 90 TO 100 PRINT "A" CASE 80 TO 89 PRINT "B" CASE 70 TO 79 PRINT "C" CASE IS < 60 PRINT "F" CASE ELSE PRINT "D" END SELECT ``` CASE values support single values (`CASE 1`), comma-separated values (`CASE 1, 2, 3`), ranges (`CASE 5 TO 10`), comparisons (`CASE IS > 100`), and a default (`CASE ELSE`). Works with both numeric and string expressions. ### EXIT ```basic EXIT FOR EXIT WHILE EXIT DO EXIT SUB EXIT FUNCTION ``` ### CONTINUE ```basic CONTINUE FOR CONTINUE WHILE CONTINUE DO ``` Skips the rest of the current loop iteration and jumps to the next iteration. ### GOTO ```basic GOTO 100 ' Jump to line number GOTO myLabel ' Jump to named label ``` ### GOSUB / RETURN ```basic GOSUB 200 GOSUB myRoutine ' ... 200 PRINT "in subroutine" RETURN myRoutine: PRINT "named routine" RETURN ``` GOSUB uses a compile-time dispatch mechanism — each GOSUB site gets a unique return-point ID, and RETURN uses a switch statement to jump back. ### ON GOTO / ON GOSUB ```basic ON choice GOTO label1, label2, label3 ON choice GOSUB routine1, routine2, routine3 ``` Branches to the Nth label based on the expression value (1-based). If the value is out of range, execution continues at the next statement. ### Labels Both classic line numbers and named labels are supported: ```basic 10 PRINT "line 10" 20 GOTO 10 myLabel: PRINT "named label" GOTO myLabel ``` ## Constants ```basic CONST PI = 3.14159 CONST MAX_SIZE = 100 CONST GREETING$ = "Hello" ``` Constants are evaluated at compile time and substituted directly into expressions. They cannot be reassigned. ## SWAP ```basic SWAP a, b SWAP s1$, s2$ ``` Exchanges the values of two variables of the same type. ## Procedures ### SUB ```basic SUB greet(name AS STRING) PRINT "Hello, "; name END SUB CALL greet("World") greet "World" ' CALL keyword is optional ``` ### FUNCTION ```basic FUNCTION square(x AS DOUBLE) AS DOUBLE square = x * x END FUNCTION PRINT square(5) ``` Functions return values by assigning to the function name or using `RETURN expr`. ### Parameter Passing ```basic SUB increment(BYREF x AS INTEGER) x = x + 1 END SUB SUB display(BYVAL x AS INTEGER) PRINT x END SUB ``` - `BYREF` (default) — passes a pointer; changes affect the caller's variable - `BYVAL` — passes a copy; changes are local to the procedure ### LOCAL and STATIC ```basic SUB counter() STATIC count AS INTEGER LOCAL temp AS INTEGER count = count + 1 temp = count PRINT temp END SUB ``` - `LOCAL` — declares a variable scoped to the procedure - `STATIC` — declares a variable that persists across calls ## User-Defined Types ### TYPE / END TYPE ```basic TYPE PersonRecord firstName AS STRING * 20 lastName AS STRING * 30 age AS INTEGER salary AS DOUBLE END TYPE DIM person AS PersonRecord person.firstName = "John" person.lastName = "Doe" person.age = 30 person.salary = 55000.50 ``` String fields in TYPE definitions require a fixed length (`STRING * N`). Dynamic strings (`AS STRING` without a length) are not permitted in TYPE definitions because struct copy would produce dangling pointers. Supported field types: `BYTE`, `INTEGER`, `LONG`, `FLOAT`, `DOUBLE`, `STRING * N`, and other user-defined types (nesting). ### Nested UDTs ```basic TYPE Vec2 x AS DOUBLE y AS DOUBLE END TYPE TYPE Circle center AS Vec2 radius AS DOUBLE END TYPE DIM c AS Circle c.center.x = 10.0 c.center.y = 20.0 c.radius = 5.0 ``` Nesting depth is unlimited. Chained dot-access works for both reads and writes. ### UDT Arrays ```basic DIM points(10) AS Vec2 points(0).x = 1.5 points(0).y = 2.5 ``` ### UDT Assignment Whole-struct copy via assignment: ```basic DIM a AS Vec2 DIM b AS Vec2 a.x = 1.0 a.y = 2.0 b = a ' Copies all fields ``` Sub-struct copy also works: ```basic DIM saved AS Vec2 saved = c.center ' Copy nested struct out c.center = saved ' Copy nested struct in ``` Array element copy: ```basic circles(0) = circles(2) ``` ### SIZEOF ```basic DIM sz AS LONG sz = SIZEOF(PersonRecord) ``` Returns the byte size of a user-defined type. Used primarily with random-access file I/O to specify record length. ## Built-in Functions ### String Functions | Function | Description | |-----------------------|------------------------------------------------| | `LEN(s$)` | Length of string | | `MID$(s$, start, len)` | Substring (1-based start position) | | `LEFT$(s$, n)` | First n characters | | `RIGHT$(s$, n)` | Last n characters | | `CHR$(n)` | Character from ASCII code | | `ASC(s$)` | ASCII code of first character | | `STR$(n)` | Convert number to string | | `VAL(s$)` | Convert string to number | | `UCASE$(s$)` | Convert to uppercase | | `LCASE$(s$)` | Convert to lowercase | | `INSTR(haystack$, needle$)` | Find substring position (1-based, 0 if not found) | | `STRING$(n, char$)` | Repeat a character n times | | `LTRIM$(s$)` | Remove leading spaces | | `RTRIM$(s$)` | Remove trailing spaces | | `TRIM$(s$)` | Remove leading and trailing spaces | | `SPACE$(n)` | String of n spaces | | `HEX$(n)` | Hexadecimal string representation | | `OCT$(n)` | Octal string representation | ### MID$ Assignment ```basic DIM s AS STRING s = "Hello World" MID$(s, 7, 5) = "BASIC" ' s is now "Hello BASIC" ``` Replaces characters in a string starting at a 1-based position. The length parameter limits how many characters are replaced. ### Math Functions | Function | Description | |------------|------------------------------------------| | `ABS(n)` | Absolute value | | `INT(n)` | Truncate to integer | | `SQR(n)` | Square root | | `SIN(n)` | Sine (radians) | | `COS(n)` | Cosine (radians) | | `TAN(n)` | Tangent (radians) | | `ATN(n)` | Arctangent (returns radians) | | `LOG(n)` | Natural logarithm | | `EXP(n)` | e raised to the power n | | `SGN(n)` | Sign: -1, 0, or 1 | | `RND` | Random number between 0 and 1 | Numeric expressions also support `^` for exponentiation (emitted as `pow()`). ### Print Formatting Functions | Function | Description | |------------|------------------------------------------| | `TAB(n)` | Output spaces to reach column n | | `SPC(n)` | Output exactly n spaces | These functions are used within PRINT statements: ```basic PRINT "Name"; TAB(20); "Value" PRINT "A"; SPC(5); "B" ' Outputs "A B" ``` `RND` can be called with or without parentheses, and accepts an optional argument (which is ignored) for compatibility with other BASIC dialects. Use `RANDOMIZE` to seed the random number generator: ```basic RANDOMIZE ' Seed from system clock RANDOMIZE 12345 ' Seed with specific value x = RND ' Random double 0..1 x = RND(1) ' Same as RND (argument ignored) ``` ### Array Functions | Function | Description | |---------------|----------------------------------------------| | `LBOUND(arr)` | Lower bound of array (always 0) | | `UBOUND(arr)` | Upper bound of array | ### I/O Functions | Function | Description | |--------------|--------------------------------------------------| | `EOF(n)` | Returns true (-1) if at end of file n | | `LOF(n)` | Returns byte length of file n | | `FREEFILE()` | Returns the next available file number | ## Console I/O ### PRINT ```basic PRINT "Hello, World!" PRINT "x = "; x PRINT x; " "; y ' Semicolon suppresses newline between items PRINT x, y ' Comma advances to next tab stop PRINT "no newline"; ' Trailing semicolon suppresses final newline ? "shortcut" ' ? is a shortcut for PRINT ``` The `?` character can be used as a shortcut for `PRINT`, for compatibility with classic BASIC dialects and interactive use. ### PRINT USING ```basic PRINT USING "###.##"; 123.456 ' Outputs: 123.46 PRINT USING "$$#,###.##"; 1234.56 ' Outputs: $1,234.56 PRINT USING "+###.##"; -45.6 ' Outputs: -45.60 PRINT USING "**###.##"; 9.99 ' Outputs: ****9.99 PRINT USING "!"; "Hello" ' Outputs: H PRINT USING "&"; "World" ' Outputs: World PRINT USING "\ \"; "Testing" ' Outputs: Testin (6 chars) ``` Format specifiers for numbers: | Format | Description | |--------|-------------| | `#` | Digit placeholder | | `.` | Decimal point position | | `,` | Thousands separator (in format, not output) | | `+` | Show sign (+ or -) at start | | `-` | Trailing minus for negative numbers | | `$$` | Floating dollar sign | | `**` | Fill leading spaces with asterisks | Format specifiers for strings: | Format | Description | |--------|-------------| | `!` | First character only | | `&` | Entire string | | `\ \` | Fixed width (spaces between backslashes + 2) | Multiple values can be formatted with one format string: ```basic PRINT USING "### + ### = ###"; 10; 20; 30 ' Outputs: 10 + 20 = 30 ``` ### INPUT ```basic INPUT "Enter name: "; name$ INPUT x ``` ### LINE INPUT ```basic LINE INPUT "Enter text: "; line$ ``` Reads an entire line including commas and spaces. ## File I/O ### Sequential Files ```basic ' Write OPEN "data.txt" FOR OUTPUT AS #1 PRINT #1, "Hello" PRINT #1, 42 CLOSE #1 ' Read OPEN "data.txt" FOR INPUT AS #1 LINE INPUT #1, text$ INPUT #1, value CLOSE #1 ' Append OPEN "log.txt" FOR APPEND AS #1 PRINT #1, "new entry" CLOSE #1 ``` ### WRITE # ```basic WRITE #1, name$, age, salary ``` Outputs CSV-style: strings are quoted, values are comma-separated, terminated with a newline. ### Binary Files ```basic OPEN "file.dat" FOR BINARY AS #1 ``` ### Random-Access Files ```basic TYPE Record name AS STRING * 20 value AS DOUBLE END TYPE DIM rec AS Record rec.name = "test" rec.value = 3.14 OPEN "data.dat" FOR RANDOM AS #1 LEN = SIZEOF(Record) PUT #1, 1, rec ' Write record at position 1 (1-based) GET #1, 1, rec ' Read record at position 1 CLOSE #1 ``` Random-access uses `GET` and `PUT` with 1-based record numbers. The `LEN` clause specifies record size in bytes. Records can be read and written in any order. ### File Modes | Mode | C Mode | Description | |----------|--------|--------------------------------------| | `INPUT` | `"r"` | Read sequential text | | `OUTPUT` | `"w"` | Write sequential text (truncates) | | `APPEND` | `"a"` | Append sequential text | | `BINARY` | `"rb"` | Binary read | | `RANDOM` | `"r+b"`| Random access (creates if not found) | ## DATA / READ / RESTORE ```basic DATA 10, 20, 30, "hello" DIM x AS INTEGER DIM s AS STRING READ x ' x = 10 READ x ' x = 20 READ x ' x = 30 READ s ' s = "hello" RESTORE ' Reset read pointer to beginning READ x ' x = 10 again ``` `DATA` statements define a pool of literal values. `READ` consumes them in order. `RESTORE` resets the read pointer (optionally to a specific line number). ## Comments ```basic ' This is a comment REM This is also a comment x = 5 ' Inline comment ``` ## $INCLUDE Metacommand ```basic '$INCLUDE: 'helpers.bas' ``` The `$INCLUDE` metacommand inserts the contents of another file at the point of the directive, before lexing and parsing. The directive is placed inside a comment (the leading `'` makes it invisible to editors that don't understand it). ### Syntax The filename is enclosed in single quotes after `'$INCLUDE:`. The keyword is case-insensitive. Any amount of whitespace may appear between the colon and the opening quote. ### Nested Includes Included files may themselves contain `$INCLUDE` directives: ```basic ' main.bas '$INCLUDE: 'math_lib.bas' '$INCLUDE: 'string_lib.bas' ``` ```basic ' math_lib.bas — can include further files '$INCLUDE: 'constants.bas' FUNCTION Square(x AS DOUBLE) AS DOUBLE Square = x * x END FUNCTION ``` ### Path Resolution Filenames are resolved relative to the **including file's directory**, not the working directory. If `src/main.bas` includes `'lib/util.bas'`, the transpiler looks for `src/lib/util.bas`. ### Error Reporting When `$INCLUDE` is used, error messages show the originating file and line: ``` Error (math_lib.bas:12): undeclared variable 'q' ``` Without includes, the format is the same but shows the input filename: ``` Error (main.bas:5): type mismatch ``` ### Circular Include Detection If file A includes file B which includes file A, the transpiler reports a fatal error rather than looping infinitely: ``` Error: Circular include detected: main.bas ``` ## Extensible Functions The transpiler supports two mechanisms for defining additional functions: ### Built-in Functions (builtins.def) The `builtins.def` file is compiled into basic2c and provides functions that are always available. To add permanent built-in functions, edit `builtins.def` and recompile basic2c. Default built-ins include: **Math functions:** | Function | Description | |----------|-------------| | `SQR(n)` | Square root | | `SIN(n)` | Sine (radians) | | `COS(n)` | Cosine (radians) | | `TAN(n)` | Tangent (radians) | | `ATN(n)` | Arctangent (returns radians) | | `LOG(n)` | Natural logarithm | | `EXP(n)` | e raised to power n | | `SGN(n)` | Sign: -1, 0, or 1 | | `RND()` | Random number 0 to 1 | | `CEIL(n)` | Round up to integer | | `FLOOR(n)` | Round down to integer | | `ROUND(n)` | Round to nearest integer | | `FIX(n)` | Truncate toward zero | | `FRAC(n)` | Fractional part | | `HYPOT(x, y)` | Hypotenuse (sqrt(x² + y²)) | | `MAX(a, b)` | Maximum of two values | | `MIN(a, b)` | Minimum of two values | **String functions:** | Function | Description | |----------|-------------| | `CHR$(n)` | Character from ASCII code | | `STR$(n)` | Convert number to string | | `UCASE$(s)` | Convert to uppercase | | `LCASE$(s)` | Convert to lowercase | | `LTRIM$(s)` | Remove leading spaces | | `RTRIM$(s)` | Remove trailing spaces | | `TRIM$(s)` | Remove leading and trailing spaces | | `SPACE$(n)` | String of n spaces | | `HEX$(n)` | Hexadecimal representation | | `OCT$(n)` | Octal representation | | `TAB(n)` | Spaces to reach column n | | `SPC(n)` | Output n spaces | | `ENVIRON$(name)` | Get environment variable | **System:** | Function | Description | |----------|-------------| | `TIMER()` | Seconds since program start | ### External Functions (functions.def) The `functions.def` file is loaded at runtime from two locations (both if present): 1. The directory containing the `basic2c` binary (global extensions) 2. The directory containing the input `.bas` file (project-specific) Functions from the input file's directory are loaded second, allowing project-specific definitions to supplement or override earlier ones. ### Definition Format Both `builtins.def` and `functions.def` use the same format: ``` # Comment lines start with # # Format: name : type : c_template SQUARE : double : ((%) * (%)) CUBE : double : ((%) * (%) * (%)) ``` Each line defines: - **name** — The BASIC function name (case-insensitive) - **type** — Return type: `byte`, `integer`, `long`, `float`, `double`, or `string` - **c_template** — C code with argument placeholders ### Argument Placeholders - `%` or `%1` — First argument - `%2` — Second argument - `%3` — Third argument (and so on) Arguments are substituted directly, so use parentheses in templates to ensure correct precedence: `((%) * (%2))` not `% * %2`. ### Usage ```basic PRINT CEIL(3.7) ' Outputs: 4 PRINT MAX(5, 10) ' Outputs: 10 t = TIMER() ' Get elapsed time PRINT ENVIRON$("HOME") ' Print home directory ``` Extensible functions require parentheses, even with no arguments: `TIMER()` not `TIMER`. ## Runtime Modes The transpiler supports two runtime modes selected at transpile time: ### Debug Mode (default) The debug runtime includes error checking and diagnostics: - NULL guards on string function arguments - `malloc`/`calloc` failure checks with error messages - File number bounds checking - `fopen` failure reporting with filename - GOSUB stack overflow/underflow detection - All errors print to stderr and call `exit(1)` ### Release Mode (`--release` or `-r`) The release runtime strips all diagnostic checks for minimal generated code: - No NULL guards on string functions - No malloc failure checks - No file number bounds checking - No GOSUB stack overflow/underflow checks - ~8% fewer lines of generated C code Functional guards are preserved in release mode to prevent crashes: - `EOF()` returns true (-1) for NULL file handles (enables file existence checks) - `LOF()` returns 0 for NULL file handles - `CLOSE` is a no-op for NULL file handles - `LINE INPUT` is a no-op for NULL file handles - Temp string pool management (`_bfree_temps`, `_btmp`) - String variable management (`_bstr_assign`) ## Limits | Resource | Maximum | |------------------------|---------| | Token length | 4096 | | Identifier length | 128 | | Parameters per procedure | 32 | | Symbol table entries | 2048 | | GOSUB return sites | 512 | | Line number labels | 4096 | | AST nodes | 65536 | | Arguments per call | 64 | | User-defined types | 64 | | Fields per type | 32 | | Constants | 256 | | Include nesting depth | 16 | | Included files | 64 | | Total source lines | 65536 | ## Example ```basic TYPE Item name AS STRING * 20 price AS DOUBLE END TYPE DIM items(2) AS Item items(0).name = "Widget" items(0).price = 9.99 items(1).name = "Gadget" items(1).price = 24.95 items(2).name = "Doohickey" items(2).price = 4.50 DIM i AS INTEGER DIM total AS DOUBLE total = 0 FOR i = 0 TO 2 PRINT items(i).name; " $"; items(i).price total = total + items(i).price NEXT i PRINT "Total: $"; total ``` Transpile and run: ``` ./basic2c example.bas example.c cc -Wall -o example example.c -lm ./example ```