454 lines
16 KiB
Markdown
454 lines
16 KiB
Markdown
# Security -- Diffie-Hellman Key Exchange and XTEA-CTR Cipher
|
|
|
|
Cryptographic library providing Diffie-Hellman key exchange, XTEA
|
|
symmetric encryption in CTR mode, and a DRBG-based pseudo-random number
|
|
generator. Optimized for 486-class DOS hardware running under DJGPP/DPMI.
|
|
|
|
This library has no dependencies on the serial stack and can be used
|
|
independently for any application requiring key exchange, encryption,
|
|
or random number generation.
|
|
|
|
|
|
## Components
|
|
|
|
### 1. XTEA Cipher (CTR Mode)
|
|
|
|
XTEA (eXtended Tiny Encryption Algorithm) is a 64-bit block cipher with a
|
|
128-bit key and 32 Feistel rounds. In CTR (counter) mode, it operates as
|
|
a stream cipher: an incrementing counter is encrypted with the key to
|
|
produce a keystream, which is XOR'd with the plaintext. Because XOR is
|
|
its own inverse, the same operation encrypts and decrypts.
|
|
|
|
**Why XTEA instead of AES or DES:**
|
|
|
|
XTEA requires zero lookup tables, no key schedule, and compiles to
|
|
approximately 20 instructions per round (shifts, adds, and XORs only).
|
|
This makes it ideal for a 486 where the data cache is tiny (8KB) and
|
|
AES's 4KB S-boxes would thrash it. DES is similarly table-heavy and has
|
|
a complex key schedule. XTEA has no library dependencies -- the entire
|
|
cipher fits in about a dozen lines of C. At 32 rounds, XTEA provides
|
|
128-bit security with negligible per-byte overhead even on the slowest
|
|
target hardware.
|
|
|
|
**CTR mode properties:**
|
|
|
|
- Encrypt and decrypt are the same function (XOR is symmetric)
|
|
- No padding required -- operates on arbitrary-length data
|
|
- Random access possible (set the counter to any value)
|
|
- CRITICAL: the same counter value must never be reused with the same key.
|
|
Reuse reveals the XOR of two plaintexts. The secLink layer prevents this
|
|
by deriving separate TX/RX cipher keys for each direction.
|
|
|
|
**XTEA block cipher internals:**
|
|
|
|
The Feistel network uses the golden-ratio constant (delta = 0x9E3779B9)
|
|
as a round key mixer. Each round combines the two 32-bit halves using
|
|
shifts, additions, and XORs. The delta ensures each round uses a
|
|
different effective subkey, preventing slide attacks. No S-boxes or lookup
|
|
tables are involved anywhere in the computation.
|
|
|
|
### 2. Diffie-Hellman Key Exchange (1024-bit)
|
|
|
|
Uses the RFC 2409 Group 2 safe prime (1024-bit MODP group) with a
|
|
generator of 2. Private exponents are 256 bits for fast computation on
|
|
486-class hardware.
|
|
|
|
**Why 1024-bit DH with 256-bit private exponents:**
|
|
|
|
RFC 2409 Group 2 provides a well-audited, interoperable safe prime.
|
|
256-bit private exponents (versus full 1024-bit) reduce the modular
|
|
exponentiation from approximately 1024 squarings+multiplies to approximately
|
|
256 squarings + approximately 128 multiplies (half the exponent bits are 1 on
|
|
average). This makes key generation feasible on a 486 in under a second
|
|
rather than minutes. The security reduction is negligible -- Pollard's
|
|
rho on a 256-bit exponent requires approximately 2^128 operations, matching
|
|
XTEA's key strength.
|
|
|
|
**Key validation:**
|
|
|
|
`secDhComputeSecret()` validates that the remote public key is in the
|
|
range [2, p-2] to prevent small-subgroup attacks. Keys of 0, 1, or p-1
|
|
would produce trivially guessable shared secrets.
|
|
|
|
**Key derivation:**
|
|
|
|
The 128-byte shared secret is reduced to a symmetric key via XOR-folding:
|
|
each byte of the secret is XOR'd into the output key at position
|
|
`i % keyLen`. For a 16-byte XTEA key, each output byte is the XOR of
|
|
8 secret bytes, providing thorough mixing. A proper KDF (HKDF, etc.)
|
|
would be more rigorous but adds complexity and code size for marginal
|
|
benefit in this use case.
|
|
|
|
### 3. Pseudo-Random Number Generator
|
|
|
|
XTEA-CTR based DRBG (Deterministic Random Bit Generator). The RNG
|
|
encrypts a monotonically increasing 64-bit counter with a 128-bit XTEA
|
|
key, producing 8 bytes of pseudorandom output per block. The counter
|
|
never repeats (64-bit space is sufficient for any practical session
|
|
length), so the output is a pseudorandom stream as long as the key has
|
|
sufficient entropy.
|
|
|
|
**Hardware entropy sources:**
|
|
|
|
- PIT (Programmable Interval Timer) -- runs at 1.193182 MHz. Its LSBs
|
|
change rapidly and provide approximately 10 bits of entropy per read,
|
|
depending on timing jitter. Two readings with intervening code execution
|
|
provide additional jitter.
|
|
- BIOS tick count -- 18.2 Hz timer at real-mode address 0040:046C. Adds
|
|
a few more bits of entropy.
|
|
|
|
Total from hardware: roughly 20 bits of real entropy per call to
|
|
`secRngGatherEntropy()`. This is not enough on its own for
|
|
cryptographic use but is sufficient to seed the DRBG when supplemented
|
|
by user interaction timing (keyboard, mouse jitter).
|
|
|
|
**Seeding and mixing:**
|
|
|
|
The seed function (`secRngSeed()`) XOR-folds the entropy into the XTEA
|
|
key, derives the initial counter from the key bits, and then generates and
|
|
discards 64 bytes to advance past any weak initial output. This discard
|
|
step is standard DRBG practice -- it ensures the first bytes the caller
|
|
receives do not leak information about the seed material.
|
|
|
|
Additional entropy can be stirred in at any time via `secRngAddEntropy()`
|
|
without resetting the RNG state. This function XOR-folds new entropy into
|
|
the key and then re-mixes by encrypting the key with itself, diffusing
|
|
the new entropy across all key bits.
|
|
|
|
Auto-seeding: if `secRngBytes()` is called before `secRngSeed()`, it
|
|
automatically gathers hardware entropy and seeds itself as a safety net.
|
|
|
|
|
|
## BigNum Arithmetic
|
|
|
|
All modular arithmetic uses a 1024-bit big number type (`BigNumT`)
|
|
stored as 32 x `uint32_t` words in little-endian order. Operations:
|
|
|
|
| Function | Description |
|
|
|----------------|------------------------------------------------------|
|
|
| `bnAdd` | Add two bignums, return carry |
|
|
| `bnSub` | Subtract two bignums, return borrow |
|
|
| `bnCmp` | Compare two bignums (-1, 0, +1) |
|
|
| `bnBit` | Test a single bit by index |
|
|
| `bnBitLength` | Find the highest set bit position |
|
|
| `bnShiftLeft1` | Left-shift by 1, return carry |
|
|
| `bnClear` | Zero all words |
|
|
| `bnSet` | Set to a 32-bit value (clear upper words) |
|
|
| `bnCopy` | Copy from source to destination |
|
|
| `bnFromBytes` | Convert big-endian byte array to little-endian words |
|
|
| `bnToBytes` | Convert little-endian words to big-endian byte array |
|
|
| `bnMontMul` | Montgomery multiplication (CIOS variant) |
|
|
| `bnModExp` | Modular exponentiation via Montgomery multiply |
|
|
|
|
|
|
## Montgomery Multiplication
|
|
|
|
The CIOS (Coarsely Integrated Operand Scanning) variant computes
|
|
`a * b * R^(-1) mod m` in a single pass without explicit division by the
|
|
modulus. This replaces the expensive modular reduction step (division by a
|
|
1024-bit number) with cheaper additions and right-shifts.
|
|
|
|
For each of the 32 outer iterations (one per word of operand `a`):
|
|
1. Accumulate `a[i] * b` into the temporary product `t`
|
|
2. Compute the Montgomery reduction factor `u = t[0] * m0inv mod 2^32`
|
|
3. Add `u * mod` to `t` and shift right by 32 bits (implicit division)
|
|
|
|
After all iterations, the result is in the range [0, 2m), so a single
|
|
conditional subtraction brings it into [0, m).
|
|
|
|
**Montgomery constants** (computed once, lazily on first DH use):
|
|
|
|
- `R^2 mod p` -- computed via 2048 iterations of shift-left-1 with
|
|
conditional subtraction. This is the Montgomery domain conversion
|
|
factor.
|
|
- `-p[0]^(-1) mod 2^32` -- computed via Newton's method (5 iterations,
|
|
doubling precision each step: 1->2->4->8->16->32 correct bits). This
|
|
is the Montgomery reduction constant.
|
|
|
|
**Modular exponentiation** uses left-to-right binary square-and-multiply
|
|
scanning. For a 256-bit private exponent, this requires approximately 256
|
|
squarings plus approximately 128 multiplies (half the bits are 1 on average),
|
|
where each operation is a Montgomery multiplication on 32-word numbers.
|
|
|
|
|
|
## Secure Zeroing
|
|
|
|
Key material (private keys, shared secrets, cipher contexts) is erased
|
|
using a volatile-pointer loop:
|
|
|
|
```c
|
|
static void secureZero(void *ptr, int len) {
|
|
volatile uint8_t *p = (volatile uint8_t *)ptr;
|
|
for (int i = 0; i < len; i++) {
|
|
p[i] = 0;
|
|
}
|
|
}
|
|
```
|
|
|
|
The `volatile` qualifier prevents the compiler from optimizing away the
|
|
zeroing as a dead store. Without it, the compiler would see that the
|
|
buffer is about to be freed and remove the memset entirely. This is
|
|
critical for preventing sensitive key material from lingering in freed
|
|
memory where a later `malloc` could expose it.
|
|
|
|
|
|
## Performance
|
|
|
|
At serial port speeds, XTEA-CTR encryption overhead is minimal:
|
|
|
|
| Speed | Blocks/sec | CPU Cycles/sec | % of 33 MHz 486 |
|
|
|----------|------------|----------------|------------------|
|
|
| 9600 | 120 | ~240K | < 1% |
|
|
| 57600 | 720 | ~1.4M | ~4% |
|
|
| 115200 | 1440 | ~2.9M | ~9% |
|
|
|
|
DH key exchange takes approximately 0.3 seconds at 66 MHz or 0.6 seconds
|
|
at 33 MHz (256-bit private exponent, 1024-bit modulus, Montgomery
|
|
multiplication).
|
|
|
|
|
|
## API Reference
|
|
|
|
### Constants
|
|
|
|
| Name | Value | Description |
|
|
|---------------------|-------|-----------------------------------|
|
|
| `SEC_DH_KEY_SIZE` | 128 | DH public key size in bytes |
|
|
| `SEC_XTEA_KEY_SIZE` | 16 | XTEA key size in bytes |
|
|
| `SEC_SUCCESS` | 0 | Success |
|
|
| `SEC_ERR_PARAM` | -1 | Invalid parameter or NULL pointer |
|
|
| `SEC_ERR_NOT_READY` | -2 | Keys not yet generated/derived |
|
|
| `SEC_ERR_ALLOC` | -3 | Memory allocation failed |
|
|
|
|
### Types
|
|
|
|
```c
|
|
typedef struct SecDhS SecDhT; // Opaque DH context
|
|
typedef struct SecCipherS SecCipherT; // Opaque cipher context
|
|
```
|
|
|
|
### RNG Functions
|
|
|
|
```c
|
|
int secRngGatherEntropy(uint8_t *buf, int len);
|
|
```
|
|
|
|
Reads hardware entropy from the PIT counter and BIOS tick count. Returns
|
|
the number of bytes written (up to 8). Provides roughly 20 bits of true
|
|
entropy -- not sufficient alone, but enough to seed the DRBG when
|
|
supplemented by user interaction timing.
|
|
|
|
```c
|
|
void secRngSeed(const uint8_t *entropy, int len);
|
|
```
|
|
|
|
Initializes the DRBG with the given entropy. XOR-folds the input into
|
|
the XTEA key, derives the initial counter, and generates and discards 64
|
|
bytes to advance past weak initial output.
|
|
|
|
```c
|
|
void secRngAddEntropy(const uint8_t *data, int len);
|
|
```
|
|
|
|
Mixes additional entropy into the running RNG state without resetting it.
|
|
XOR-folds data into the key and re-mixes by encrypting the key with
|
|
itself. Use this to stir in keyboard timing, mouse jitter, or other
|
|
runtime entropy sources.
|
|
|
|
```c
|
|
void secRngBytes(uint8_t *buf, int len);
|
|
```
|
|
|
|
Generates `len` pseudorandom bytes. Auto-seeds from hardware entropy if
|
|
not previously seeded. Produces 8 bytes per XTEA block encryption of the
|
|
internal counter.
|
|
|
|
### Diffie-Hellman Functions
|
|
|
|
```c
|
|
SecDhT *secDhCreate(void);
|
|
```
|
|
|
|
Allocates a new DH context. Returns `NULL` on allocation failure. The
|
|
context must be destroyed with `secDhDestroy()` when no longer needed.
|
|
|
|
```c
|
|
int secDhGenerateKeys(SecDhT *dh);
|
|
```
|
|
|
|
Generates a 256-bit random private key and computes the corresponding
|
|
1024-bit public key (`g^private mod p`). Lazily initializes Montgomery
|
|
constants on first call. The RNG should be seeded before calling this.
|
|
|
|
```c
|
|
int secDhGetPublicKey(SecDhT *dh, uint8_t *buf, int *len);
|
|
```
|
|
|
|
Exports the public key as a big-endian byte array into `buf`. On entry,
|
|
`*len` must be at least `SEC_DH_KEY_SIZE` (128). On return, `*len` is
|
|
set to 128.
|
|
|
|
```c
|
|
int secDhComputeSecret(SecDhT *dh, const uint8_t *remotePub, int len);
|
|
```
|
|
|
|
Computes the shared secret from the remote side's public key
|
|
(`remote^private mod p`). Validates the remote key is in range [2, p-2].
|
|
Both sides compute this independently and arrive at the same value.
|
|
|
|
```c
|
|
int secDhDeriveKey(SecDhT *dh, uint8_t *key, int keyLen);
|
|
```
|
|
|
|
Derives a symmetric key by XOR-folding the 128-byte shared secret down
|
|
to `keyLen` bytes. Each output byte is the XOR of `128/keyLen` input
|
|
bytes.
|
|
|
|
```c
|
|
void secDhDestroy(SecDhT *dh);
|
|
```
|
|
|
|
Securely zeroes the entire DH context (private key, shared secret, public
|
|
key) and frees the memory. Must be called to prevent key material from
|
|
lingering in the heap.
|
|
|
|
### Cipher Functions
|
|
|
|
```c
|
|
SecCipherT *secCipherCreate(const uint8_t *key);
|
|
```
|
|
|
|
Creates an XTEA-CTR cipher context with the given 16-byte key. The
|
|
internal counter starts at zero. Returns `NULL` on allocation failure or
|
|
NULL key.
|
|
|
|
```c
|
|
void secCipherCrypt(SecCipherT *c, uint8_t *data, int len);
|
|
```
|
|
|
|
Encrypts or decrypts `data` in place. CTR mode is symmetric -- the same
|
|
function handles both directions. The internal counter advances by one
|
|
for every 8 bytes processed (one XTEA block). The counter must never
|
|
repeat with the same key; callers are responsible for ensuring this
|
|
(secLink handles it by using separate cipher instances per direction).
|
|
|
|
```c
|
|
void secCipherSetNonce(SecCipherT *c, uint32_t nonceLo, uint32_t nonceHi);
|
|
```
|
|
|
|
Sets the 64-bit nonce/counter to a specific value. Both the nonce
|
|
(baseline) and the running counter are set to the same value. Call this
|
|
before encrypting if you need a deterministic starting point.
|
|
|
|
```c
|
|
void secCipherDestroy(SecCipherT *c);
|
|
```
|
|
|
|
Securely zeroes the cipher context (key and counter state) and frees the
|
|
memory.
|
|
|
|
|
|
## Usage Examples
|
|
|
|
### Full Key Exchange
|
|
|
|
```c
|
|
#include "security.h"
|
|
#include <string.h>
|
|
|
|
// Seed the RNG
|
|
uint8_t entropy[16];
|
|
secRngGatherEntropy(entropy, sizeof(entropy));
|
|
secRngSeed(entropy, sizeof(entropy));
|
|
|
|
// Create DH context and generate keys
|
|
SecDhT *dh = secDhCreate();
|
|
secDhGenerateKeys(dh);
|
|
|
|
// Export public key to send to remote
|
|
uint8_t myPub[SEC_DH_KEY_SIZE];
|
|
int pubLen = SEC_DH_KEY_SIZE;
|
|
secDhGetPublicKey(dh, myPub, &pubLen);
|
|
// ... send myPub to remote, receive remotePub ...
|
|
|
|
// Compute shared secret and derive a 16-byte XTEA key
|
|
secDhComputeSecret(dh, remotePub, SEC_DH_KEY_SIZE);
|
|
|
|
uint8_t key[SEC_XTEA_KEY_SIZE];
|
|
secDhDeriveKey(dh, key, SEC_XTEA_KEY_SIZE);
|
|
secDhDestroy(dh); // private key no longer needed
|
|
|
|
// Create cipher and encrypt
|
|
SecCipherT *cipher = secCipherCreate(key);
|
|
uint8_t message[] = "Secret message";
|
|
secCipherCrypt(cipher, message, sizeof(message));
|
|
// message is now encrypted
|
|
|
|
// Decrypt (reset counter first, then apply same operation)
|
|
secCipherSetNonce(cipher, 0, 0);
|
|
secCipherCrypt(cipher, message, sizeof(message));
|
|
// message is now plaintext again
|
|
|
|
secCipherDestroy(cipher);
|
|
```
|
|
|
|
### Standalone Encryption (Without DH)
|
|
|
|
```c
|
|
// XTEA-CTR can be used independently of Diffie-Hellman
|
|
uint8_t key[SEC_XTEA_KEY_SIZE] = { /* your key */ };
|
|
SecCipherT *c = secCipherCreate(key);
|
|
|
|
uint8_t data[1024];
|
|
// ... fill data ...
|
|
secCipherCrypt(c, data, sizeof(data)); // encrypt in place
|
|
|
|
secCipherDestroy(c);
|
|
```
|
|
|
|
### Random Number Generation
|
|
|
|
```c
|
|
// Seed from hardware
|
|
uint8_t hwEntropy[16];
|
|
secRngGatherEntropy(hwEntropy, sizeof(hwEntropy));
|
|
secRngSeed(hwEntropy, sizeof(hwEntropy));
|
|
|
|
// Stir in user-derived entropy (keyboard timing, etc.)
|
|
uint8_t userEntropy[4];
|
|
// ... gather from timing events ...
|
|
secRngAddEntropy(userEntropy, sizeof(userEntropy));
|
|
|
|
// Generate random bytes
|
|
uint8_t randomBuf[32];
|
|
secRngBytes(randomBuf, sizeof(randomBuf));
|
|
```
|
|
|
|
|
|
## Building
|
|
|
|
```
|
|
make # builds ../lib/libsecurity.a
|
|
make clean # removes objects and library
|
|
```
|
|
|
|
Cross-compiled with the DJGPP toolchain targeting i486+ CPUs. Compiler
|
|
flags: `-O2 -Wall -Wextra -march=i486 -mtune=i586`.
|
|
|
|
Objects are placed in `../obj/security/`, the library in `../lib/`.
|
|
|
|
No external dependencies -- the library is self-contained. It uses only
|
|
DJGPP's `<pc.h>`, `<sys/farptr.h>`, and `<go32.h>` for hardware entropy
|
|
collection (PIT and BIOS tick count access).
|
|
|
|
|
|
## Files
|
|
|
|
- `security.h` -- Public API header (types, constants, function prototypes)
|
|
- `security.c` -- Complete implementation (bignum, Montgomery, DH, XTEA, RNG)
|
|
- `Makefile` -- DJGPP cross-compilation build rules
|
|
|
|
|
|
## Used By
|
|
|
|
- `seclink/` -- Secure serial link (DH handshake, cipher creation, RNG seeding)
|