Disclaimer: Unlike the previous posts which talked about an experience of mine, this is just one of my fun (it’s a subjective word) experiments. Note the idea of doing this stemmed from me wanting to write about it, and not the other way around, so if you’re reading this, I succeeded.

Disclaimer 2: The title might be a little clickbaity, but I promise it’s pretty cool, so read on!

Introduction

While I’m writing this, I have no idea how I’m going to do what I want to do; I just know it is possible theoretically, so I’m going to work my way through the problem and hopefully take you all through the process of it all.

What I want to do: When you declare a string in your program char str[] = "ThisBlogIsPrettyCool", I know it gets stored somewhere in my memory. I also know that the memory is technically available to every other process on my PC, so theoretically, it should be possible to change it from outside that program, with that program never knowing about it, right?

My environment

  • Pop!_OS 20.04
    • Linux pop-os 5.4.0-7634-generic #38~1591219791~20.04~6b1c5de-Ubuntu SMP Thu Jun 4 02:56:10 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  • gcc version 9.3.0
  • Python 3.8

But any version of Linux and Python3.x should work in theory, let me know if it doesn’t. Note: To run this, you need admin (sudo) access on your machine.

Prerequisites

  • Very basic C programming
  • Basic Python programming

Some Concepts

Key words

  • Main Memory: Fancy word for the RAM
  • Memory Address: A number that uniquely identifies where your data is stored in a storage device.
  • Process: Fancy word used for a program (this definition is limited to the scope of this article).
  • PID / Process ID - A unique number that identifies each process running in your system.

Virtual Memory

In computing, virtual memory is a memory management technique, that abstracts the physical storage that you have. It maps memory addresses used by a program (called the Virtual Memory) into actual physical memory addresses used by storage devices.

This allows the computer to do all sorts of fancy stuff like:

  • It can now show each process a continuous block of memory which can be mapped to non continuous blocks in the actual memory.
  • You can now have a virtual memory spread across multiple RAM sticks or in some cases even a small fraction of your harddrive, with no extra work, because your program won’t know where it’s actually getting stored.

Now, somethings to remember:

  • Each process has it’s own virtual memory.
  • The virtual memory is divided into distinct sections meant to store different things. Depending on your operating system, these sections may differ, but generically, it is the one shown below.
Source

Now, to anyone going Heh?, we don’t need a lot for this article, just the following points:

  • The text segment, basically stores the “code” of your program.
  • The stack and the heap are on opposite ends of the virtual memory; the stack grows downwards (yeah, seems a little counterintuitive, but that’s the way it is), while the heap grows upwards.
  • The heap is where all the dynamically allocated memory is stored (i.e., memory that is assigned to a program during runtime (basically, malloc() calls.)). To everyone who didn’t understand the previous line, this is the part of the virtual memory we’ll be hacking!

The C program

We want a very basic C program that will create a string and store it in the heap.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    char s[] = "ThisIsAGoodStArt";
    char _ptr = malloc(sizeof(char) * 17);
    strcpy(ptr, s);
    printf("%s", ptr);

    return 0;
}

Running this gives us what we’d expect, just the string nothing else.

ThisIsAGoodStArt

Okay, now we need a few other things too.

  • We know that each process has its own virtual memory, so we need to find out that the process id of this process is.
  • As soon as this program ends, the string is removed from the memory by the OS, which serves us no good. So we need to run this program for as long as we want.
  • We also want to get the location of the string in the virtual memory, we probably won’t use it, but it’ll be a good thing to have.

So let’s make those changes.

// Ignore all these files.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>

int main() {

    char s[] = "ThisIsAGoodStArt";
    char _ptr = malloc(sizeof(char) _ 17);
    int i = 0;

    printf("The process id is: %d\n", (int) getpid());

    strcpy(ptr, s);

    while(++i) {
        printf("#%d - %s : %p\n", i, ptr, ptr);
        sleep(1);
    }

    return 0;
}

After which we get the following output that runs forever:

The process id is: 32612
#1 - ThisIsAGoodStArt : 0x558bfee292a0
#2 - ThisIsAGoodStArt : 0x558bfee292a0
#3 - ThisIsAGoodStArt : 0x558bfee292a0
#4 - ThisIsAGoodStArt : 0x558bfee292a0
#5 - ThisIsAGoodStArt : 0x558bfee292a0
#6 - ThisIsAGoodStArt : 0x558bfee292a0
...

Now, if you try and run this, you *will* end up getting different numbers (Seriously, it is impossible for you to get the same exact output). Infact, you probably will get different number every time you run it.

Now we know that the process id is 32612, and the string in our memory starts from somewhere around 0x558a5508c2a0 (This is a base 16 number).

Cool. So far so good.

/proc - This is seriously cool stuff.

The /proc directory in a Linux System, according to me is the coolest directory to mess around with. It’s a trove of information for all the processes running in your computer.

Nerd Talk: /proc isn’t a regular directory, but a virtual file system. It doesn’t contain real files but runtime information about your entire system. Again, how cool is that?

The /proc directory has a lot of folders, each corresponding to a individual process. Here’s mine:

~ ls /proc
1     1094   13    1408  1590  166    177    1834   20     2192   1095  2401   26752  281    28904  3125  3834  48    54    60   acpi         iomem        mtrr           tty
10    32612   1300  1440  1592  167    1773   18369  202    21921  2347  2405   26963  282    28955  3165  39    4856  5402  622  asound       ioports      net            uptime
1055  1097   1302  1445  1594  16715  1776   18399  2056   21927  2348  2425   27     283    29     3171  4     50    559   63   buddyinfo    irq          pagetypeinfo   version
1056  1098   1305  1462  1595  1675   178    18408  21     21928  2349  2426   27064  284    29025  3188  40    504   56    630  bus          kallsyms     partitions     version_signature
1059  11     1307  1467  1596  168    1797   1842   21191  22     2350  2431   2708   28488  291    32    4046  5040  560   64   cgroups      kcore        pressure       vmallocinfo
1060  1107   1309  1476  1597  169    18     1844   21211  2201   2353  2444   275    28515  2912   3201  41    5046  566   65   cmdline      keys         sched_debug    vmstat
1064  1116   1315  1484  1599  17     1802   1847   21224  2205   2355  2455   27554  2854   2970   3207  42    507   567   66   consoles     key-users    schedstat      zoneinfo
1067  11535  1330  15    16    170    18026  185    2134   2219   2358  24835  276    28547  29956  326   429   5078  568   67   cpuinfo      kmsg         scsi
1072  11537  1336  1532  1600  17090  1805   18680  21495  222    2359  24837  27606  2859   3      33    430   5094  569   68   crypto       kpagecgroup  self
1073  11614  1342  1535  1601  17121  18082  18681  2151   2231   2372  24923  277    28598  30     330   44    51    57    848  devices      kpagecount   slabinfo
1075  1167   1346  1538  1602  17241  18104  187    2156   22468  2373  2503   27721  2860   30162  331   45    512   570   862  diskstats    kpageflags   softirqs
1077  1177   1352  1540  1603  173    18123  18742  2157   2266   2374  2524   27800  2863   30197  34    4542  5175  571   863  dma          loadavg      stat
1081  1184   1356  1554  1611  1735   1814   19237  2159   2283   2375  2598   27858  2864   30212  3489  4568  519   572   864  driver       locks        swaps
1085  12     1366  1558  162   174    18155  1931   2162   2286   2379  26     27867  28753  30230  35    46    52    573   9    execdomains  mdstat       sys
1088  1216   1373  1561  163   1752   1818   19672  2165   2295   2386  2639   279    2880   30237  36    4661  53    576   949  fb           meminfo      sysrq-trigger
1090  1267   1385  1576  164   1753   182    19747  21786  23     2390  26467  28     2883   30421  369   47    5352  58    950  filesystems  misc         sysvipc
1091  1288   1390  1588  165   1764   1823   198    21860  2303   2396  26468  280    28858  3071   38    475   536   59    976  fs           modules      thread-self
1092  1289   14    1589  1657  1767   1827   2      2187   2318   24    26658  2802   2889   30862  3816  4781  5367  6     977  interrupts   mounts       timer_list

Each item listed in a dark blue colour is a folder, and the name corresponds to the PID of the process. Now we know the process ID of our process is 32612, so lets peek inside that folder.

32612 ls
arch_status      environ    mountinfo      personality   statm
attr             exe        mounts         projid_map    status
autogroup        fd         mountstats     root          syscall
auxv             fdinfo     net            sched         task
cgroup           gid_map    ns             schedstat     timers
clear_refs       io         numa_maps      sessionid     timerslack_ns
cmdline          limits     oom_adj        setgroups     uid_map
comm             loginuid   oom_score      smaps         wchan
coredump_filter  map_files  oom_score_adj  smaps_rollup
cpuset           maps       pagemap        stack
cwd              mem        patch_state    stat

For our project we only need to look at two files:

  • /proc/32612/maps
  • /proc/32612/mem

/proc/pid/maps

From man proc (Basically a manual page)

  /proc/[pid]/maps
        A  file containing the currently mapped memory regions and their
        access permissions.  See mmap(2) for  some  further  information
        about memory mappings.

Okay, fancy stuff aside. Let’s take a look at what my maps file looks like.

~ cat /proc/32612/maps
558bfe5ca000-558bfe5cb000 r--p 00000000 08:04 1705261                    /home/yash/git_projects/blog/a.out
558bfe5cb000-558bfe5cc000 r-xp 00001000 08:04 1705261                    /home/yash/git_projects/blog/a.out
558bfe5cc000-558bfe5cd000 r--p 00002000 08:04 1705261                    /home/yash/git_projects/blog/a.out
558bfe5cd000-558bfe5ce000 r--p 00002000 08:04 1705261                    /home/yash/git_projects/blog/a.out
558bfe5ce000-558bfe5cf000 rw-p 00003000 08:04 1705261                    /home/yash/git_projects/blog/a.out
558bfee29000-558bfee4a000 rw-p 00000000 00:00 0                          [heap]
7f87d213b000-7f87d2160000 r--p 00000000 103:05 4628                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f87d2160000-7f87d22d8000 r-xp 00025000 103:05 4628                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f87d22d8000-7f87d2322000 r--p 0019d000 103:05 4628                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f87d2322000-7f87d2323000 ---p 001e7000 103:05 4628                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f87d2323000-7f87d2326000 r--p 001e7000 103:05 4628                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f87d2326000-7f87d2329000 rw-p 001ea000 103:05 4628                      /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f87d2329000-7f87d232f000 rw-p 00000000 00:00 0 
7f87d2346000-7f87d2347000 r--p 00000000 103:05 4236                      /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f87d2347000-7f87d236a000 r-xp 00001000 103:05 4236                      /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f87d236a000-7f87d2372000 r--p 00024000 103:05 4236                      /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f87d2373000-7f87d2374000 r--p 0002c000 103:05 4236                      /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f87d2374000-7f87d2375000 rw-p 0002d000 103:05 4236                      /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f87d2375000-7f87d2376000 rw-p 00000000 00:00 0 
7ffeceb35000-7ffeceb56000 rw-p 00000000 00:00 0                          [stack]
7ffecebd1000-7ffecebd4000 r--p 00000000 00:00 0                          [vvar]
7ffecebd4000-7ffecebd5000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]

Okay, breathe in.

We see the [heap], that sounds familiar. This line:

558bfee29000-558bfee4a000 rw-p 00000000 00:00 0                          [heap]
  • 558bfee29000-558bfee4a000: This is the address range of the heap for our program. Now going back to our code, we see that our string is stored at 0x558bfee292a0, Now 0x558bfee29000 < 0x558bfee292a0 < 0x558bfee4a000, which means now we have substantial proof that our string is somewhere in the heap. Good.
  • The rw signifies that our program can read and write to this section (Duh!).

/proc/pid/mem

From man proc

  /proc/[pid]/mem
      This file can be used to access the pages of a process's  memory
      through open(2), read(2), and lseek(2).

For people who haven’t had a mindblow yet: The man page tells us /proc/pid/mem allows us access to the processors virtual memory like it’s any other file on your PC, no special mumbojumbo.

Isn’t that really really cool? (Rhetorical)

So, what do we have to do. Write a Python script which:

  • Locates the addresses of the heap from /proc/pid/maps,
  • Finds where our string is located in the heap, and
  • Overwrite it. Simple!

For anyone who wants to give this a shot on their own, stop reading here.

The funnest part of them all

Writing the code. Now this is a fairly simple exercise and barely needs any explanation. Reading the code once is more than enough for you to understand it, else just scroll down to see the magic. This code is decently commented, but if you have any doubts / suggestions, please leave a comment below!

#!/usr/bin/env python3
"""
No error handling done.
Run it as:
python3 name.py <pid of your c code> <string to replace> <replacement string>

NOTE: YOU MIGHT HAVE TO RUN THIS WITH SUDO
"""
from sys import argv

_, pid, initial_string, new_string = argv[:4]

maps_filename = "/proc/{}/maps".format(pid)
mem_filename = "/proc/{}/mem".format(pid)

maps_file = open(maps_filename, 'r')

maps_file_line = maps_file.readline()

while maps_file_line:
    temp = maps_file_line.split()
    if temp[-1] != "[heap]": # If the line isn't describing the heap, move on.
        maps_file_line = maps_file.readline()
    else:
        print("* Found the heap")
        addr*range, perm, offset, dev, inode, path = temp
        print("* Address range: ", addr_range)
        print("\* Permissions: ", perm)

        try:
            assert('r' in perm and 'w' in perm) # Making sure we have all the permissions.
        except:
            print("Couldn't find permissions, try running with sudo?")
            maps_file.close()
            exit(1)

        low, high = addr_range.split("-") # Getting the addresses of my heap
        low = int(low, 16) # Getting it from Base 16
        high = int(high, 16) # Getting it from Base 16

        print("The heap starts at: {}".format(low))
        print("The heap ends at: {}".format(high))

        mem_file = open(mem_filename, 'rb+')

        # Now we want to seek to the start of our heap
        # The start address of the heap is given to us by low
        mem_file.seek(low)

        # Now, we need to read our heap
        # So from low, we read the size of the heap
        # Which is given by (high - low)
        heap = mem_file.read(high - low)

        # Now let's find our string
        # Because our heap is stored in binary, we'll convert our string to binary
        try:
            start_index = heap.index(bytes(initial_string, "ASCII"))
        except ValueError:
            print("Did not find {} in heap, are you sure that's what you want?".format(initial_string))
            maps_file.close()
            mem_file.close()
            exit(1)

        print("* Found {}".format(initial_string))

        print("* Writing {} in the heap".format(new_string))

        # We want to start writing our new string where
        # the old string is stored, which is at (low + start_index)
        mem_file.seek(low + start_index)
        mem_file.write(bytes(new_string, "ASCII"))

        maps_file.close()
        mem_file.close()

        break

The magic

So I run my program as

~ sudo python3 script.py 32612 ThisIsAGoodStArt ThisIsAmazing
* Found the heap
* Address range:  558bfee29000-558bfee4a000
* Permissions:  rw-p
The heap starts at: 94059765075968
The heap ends at: 94059765211136
* Found ThisIsAGoodStArt
* Writing ThisIsAmazing in the heap

And back where my C code was executing, I see the magic.

...
#2456 - ThisIsAGoodStArt : 0x558bfee292a0
#2457 - ThisIsAGoodStArt : 0x558bfee292a0
#2458 - ThisIsAGoodStArt : 0x558bfee292a0
#2459 - ThisIsAGoodStArt : 0x558bfee292a0
#2460 - ThisIsAGoodStArt : 0x558bfee292a0
#2461 - ThisIsAGoodStArt : 0x558bfee292a0
#2462 - ThisIsAmazingArt : 0x558bfee292a0 <-- CHECK THIS OUT
#2463 - ThisIsAmazingArt : 0x558bfee292a0
#2464 - ThisIsAmazingArt : 0x558bfee292a0
#2465 - ThisIsAmazingArt : 0x558bfee292a0

WOHOOOOOOOOOOO!

Yes, 2465 means writing I had it on for 2400+ seconds (~40 minutes), which is how much time it took for me to smoothen over my code.

The string now reads and prints ThisIsAmazingArt (which it is, undeniably 😉) because we merely overwrote a part of the heap with our content, and didn’t replace the whole string. Which is why Art is still stored in the heap. For better results try using strings of the same length.

Serious Talk Time

Now, even though this is very fun for a side project, it has a few very serious implications and things you should steer away from.

  • Like our C program stored a dumb string in memory, know that usually a lot of important things are stored in program memory too, passwords, sensitive details etc., and all of them are usually accessible using the above method; so refrain from running programs with sudo or Run as Administrator on Windows unless you really trust it (Yes, Mr A, I’m looking at you too).
  • A few of you reading this, with some experience in development might argue that passwords are hashed and not stored in the plain form, well yes, they are. That just prevents the malicious program from knowing what your password is, it doesn’t stop it from replacing it in the memory with a different hash (which happen to be of the same size xD).

If any of you find any bugs, or have any suggestions/feedback, please feel free to reach out to me on LinkedIn or via email

Also, the source codes for this project are available on GitHub

This is all for you, for now!