Linux hacking part 4: Measuring cache hit and cache miss times in linux.

5 minute read

﷽

Hello, cybersecurity enthusiasts and white hackers!

malware

I continue my series of posts about Linux hacking and linux malware. This is a short but interesting post about one important concept that will be demonstrated in this and the next posts in this series.

a few words about levels of cache memory

A Central Processing Unit (CPU) may have several levels of cache memory to store frequently accessed data and instructions. The cache memory is organized into three levels:

L1 cache: quickest yet smallest data and instructions.
L2 cache: slower but larger, data-only.
L3 cache - slowest but largest, data-only.

The lscpu command provides detailed information about CPU architecture, features, and cache size. lscpu shows the L1 cache, L2 cache, and L3 cache sizes, which are crucial to processor performance and caching:

lscpu

malware

As you can see, the output provides information about the size of the L1, L2, and L3 cache, CPU architecture, and total CPUs.

cache hit and cache miss

Understanding how the CPU cache works is crucial for performance optimization in systems programming, reverse engineering, and security research. The difference between a cache hit and a cache miss can impact the speed of execution significantly. In this blog post, we’ll explore a simple C program that measures the time taken to access cached and non-cached memory locations.

This technique is often used in security research, particularly in side-channel attacks where attackers measure access times to infer sensitive data. However, here we focus on educational and performance optimization purposes.

practical example

Before diving into the code, let’s understand what we’re trying to measure:
Cache Hit - When data is already present in the CPU cache, access is very fast.
Cache Miss - When data needs to be fetched from RAM, which takes significantly longer.

In our example, we’ll:

Access a memory location that is already in the cache and measure the access time.
Flush the cache to ensure that a different memory location is not cached.
Access the new memory location and measure the time taken.

This comparison will help us see the difference between cache hits and cache misses in real-time.

First of all, include necessary headers:

#include <stdio.h>
#include <stdint.h>
#include <x86intrin.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

These headers provide functions for input/output, integer types, intrinsic functions for CPU instructions, memory management, and basic system utilities.

Then define a large array for testing:

#define ARRAY_SIZE (1024 * 1024)  // 1 MB array
uint8_t array[ARRAY_SIZE];

We define a 1MB array to work with, ensuring we can access different parts and observe cache behavior.

At the next step, let’s measure memory access time:

uint64_t measure_access_time(volatile uint8_t *address) {
  uint64_t start, end;
  unsigned int aux;  // auxiliary variable for __rdtscp
  
  _mm_mfence();  // memory fence to prevent instruction reordering
  start = __rdtscp(&aux);  // start timing
  (void)*address;      // access memory
  _mm_mfence();  // memory fence to prevent instruction reordering
  end = __rdtscp(&aux);  // end timing

  return end - start;
}

This function measures the time taken to access a specific memory address using __rdtscp, an x86 instruction that reads the processor’s timestamp counter.

_mm_mfence() - ensures that the memory operations are executed in order.
__rdtscp(&aux) - reads the timestamp before and after the memory access.

The difference between end and start gives the cycle count taken for memory access.

Then, simulate a cache flush:

void flush_cache() {
  // flush the cache by accessing a large amount of data
  uint8_t *dummy = (uint8_t *)malloc(ARRAY_SIZE);
  for (int i = 0; i < ARRAY_SIZE; i++) {
    dummy[i] = i;
  }
  free(dummy);
}

Why we need this? Since CPU caches automatically manage which data stays in cache, we force a cache flush by allocating and filling a large dummy array. This ensures that previously cached data is evicted.

Finally, we need main execution logic:

int main() {
  uint64_t cached_time, uncached_time;

  // access a cached memory location
  volatile uint8_t *cached_addr = &array[0];
  *cached_addr = 42;  // load into cache
  cached_time = measure_access_time(cached_addr);
  
  // flush the cache and access a different memory location
  flush_cache();
  volatile uint8_t *uncached_addr = &array[ARRAY_SIZE/2];  
  uncached_time = measure_access_time(uncached_addr);

  printf("cache hit time: %lu cycles\n", cached_time);
  printf("cache miss time: %lu cycles\n", uncached_time);

  return 0;
}

What is going on here?

we first access an element (array[0]) to ensure it is loaded into the cache.
we measure and store the access time as cached_time.
we then flush the cache using flush_cache().
after cache flushing, we access a different memory location (array[ARRAY_SIZE/2]).
we measure the time taken to access the uncached memory and store it as uncached_time.
finally, we print both values.

So the full source code of this example is looks like this (hack.c):

/*
 * hack.c
 * measuring cache hit and cache miss times
 * author @cocomelonc
 * https://cocomelonc.github.io/linux/2025/02/01/linux-hacking-4.html
 */
#include <stdio.h>
#include <stdint.h>
#include <x86intrin.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define ARRAY_SIZE (1024 * 1024)  // 1 MB array

uint8_t array[ARRAY_SIZE];

uint64_t measure_access_time(volatile uint8_t *address) {
  uint64_t start, end;
  unsigned int aux;  // auxiliary variable for __rdtscp
  
  _mm_mfence();  // memory fence to prevent instruction reordering
  start = __rdtscp(&aux);  // start timing
  (void)*address;      // access memory
  _mm_mfence();  // memory fence to prevent instruction reordering
  end = __rdtscp(&aux);  // end timing

  return end - start;
}

void flush_cache() {
  // flush the cache by accessing a large amount of data
  uint8_t *dummy = (uint8_t *)malloc(ARRAY_SIZE);
  for (int i = 0; i < ARRAY_SIZE; i++) {
    dummy[i] = i;
  }
  free(dummy);
}

int main() {
  uint64_t cached_time, uncached_time;

  // access an element that is already in the cache (CACHE HIT)
  volatile uint8_t *cached_addr = &array[0];
  *cached_addr = 42;  // load into cache
  cached_time = measure_access_time(cached_addr);
  
  // flush the cache and access a different element (CACHE MISS)
  flush_cache();
  volatile uint8_t *uncached_addr = &array[ARRAY_SIZE/2];  
  uncached_time = measure_access_time(uncached_addr);

  printf("cache hit time: %lu cycles\n", cached_time);
  printf("cache miss time: %lu cycles\n", uncached_time);

  return 0;
}

demo

Let’s see everything in action. Compile it:

gcc -o hack hack.c

malware

When you run the program:

./hack

you should see something like this:

malware

or this:

malware

The actual values may vary based on your CPU, cache size, and system state. However, cache misses should always take significantly longer than cache hits.

This technique is particularly used in side-channel attacks where timing differences can reveal sensitive information.

If you’re interested in learning more about low-level performance analysis and side-channel attacks stay tuned for future posts!

I hope this post with practical example is useful for malware researchers, linux programmers and everyone who interested on linux kernel programming and attacking techniques.

Linux malware development 1: intro to kernel hacking. Simple C example
Linux malware development 2: find process ID by name. Simple C example
source code in github

This is a practical case for educational purposes only.

Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine

Share on

Twitter Facebook LinkedIn

cocomelonc

Linux hacking part 4: Measuring cache hit and cache miss times in linux.

a few words about levels of cache memory

cache hit and cache miss

practical example

demo

Share on

You may also enjoy

Malware and cryptography 42 - encrypt/decrypt payload via Speck cipher. Simple C example.

AIYA - Mobile malware development book. First edition

Malware development trick 47: simple Windows clipboard hijacking. Simple C example.

Malware development trick 46: simple Windows keylogger. Simple C example.