Disclaimer: Unlike the previous posts which talked about an experience of mine, this is just one of my fun (it’s a subjective word) experiments. Note the idea of doing this stemmed from me wanting to write about it, and not the other way around, so if you’re reading this, I succeeded.
Disclaimer 2: The title might be a little clickbaity, but I promise it’s pretty cool, so read on!
Introduction
While I’m writing this, I have no idea how I’m going to do what I want to do; I just know it is possible theoretically, so I’m going to work my way through the problem and hopefully take you all through the process of it all.
What I want to do: When you declare a string in your program char str[] = "ThisBlogIsPrettyCool"
, I know it gets stored somewhere in my memory. I also know that the memory is technically available to every other process on my PC, so theoretically, it should be possible to change it from outside that program, with that program never knowing about it, right?
My environment
- Pop!_OS 20.04
- Linux pop-os 5.4.0-7634-generic #38~1591219791~20.04~6b1c5de-Ubuntu SMP Thu Jun 4 02:56:10 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
- gcc version 9.3.0
- Python 3.8
But any version of Linux and Python3.x should work in theory, let me know if it doesn’t. Note: To run this, you need admin (sudo) access on your machine.
Prerequisites
- Very basic C programming
- Basic Python programming
Some Concepts
Key words
- Main Memory: Fancy word for the RAM
- Memory Address: A number that uniquely identifies where your data is stored in a storage device.
- Process: Fancy word used for a program (this definition is limited to the scope of this article).
- PID / Process ID - A unique number that identifies each process running in your system.
Virtual Memory
In computing, virtual memory is a memory management technique, that abstracts the physical storage that you have. It maps memory addresses used by a program (called the Virtual Memory) into actual physical memory addresses used by storage devices.
This allows the computer to do all sorts of fancy stuff like:
- It can now show each process a continuous block of memory which can be mapped to non continuous blocks in the actual memory.
- You can now have a virtual memory spread across multiple RAM sticks or in some cases even a small fraction of your harddrive, with no extra work, because your program won’t know where it’s actually getting stored.
Now, somethings to remember:
- Each process has it’s own virtual memory.
- The virtual memory is divided into distinct sections meant to store different things. Depending on your operating system, these sections may differ, but generically, it is the one shown below.
Now, to anyone going Heh?, we don’t need a lot for this article, just the following points:
- The text segment, basically stores the “code” of your program.
- The stack and the heap are on opposite ends of the virtual memory; the stack grows downwards (yeah, seems a little counterintuitive, but that’s the way it is), while the heap grows upwards.
- The heap is where all the dynamically allocated memory is stored (i.e., memory that is assigned to a program during runtime (basically,
malloc()
calls.)). To everyone who didn’t understand the previous line, this is the part of the virtual memory we’ll be hacking!
The C program
We want a very basic C program that will create a string and store it in the heap.
Running this gives us what we’d expect, just the string nothing else.
ThisIsAGoodStArt
Okay, now we need a few other things too.
- We know that each process has its own virtual memory, so we need to find out that the process id of this process is.
- As soon as this program ends, the string is removed from the memory by the OS, which serves us no good. So we need to run this program for as long as we want.
- We also want to get the location of the string in the virtual memory, we probably won’t use it, but it’ll be a good thing to have.
So let’s make those changes.
After which we get the following output that runs forever:
The process id is: 32612
#1 - ThisIsAGoodStArt : 0x558bfee292a0
#2 - ThisIsAGoodStArt : 0x558bfee292a0
#3 - ThisIsAGoodStArt : 0x558bfee292a0
#4 - ThisIsAGoodStArt : 0x558bfee292a0
#5 - ThisIsAGoodStArt : 0x558bfee292a0
#6 - ThisIsAGoodStArt : 0x558bfee292a0
...
Now, if you try and run this, you *will* end up getting different numbers (Seriously, it is impossible for you to get the same exact output). Infact, you probably will get different number every time you run it.
Now we know that the process id is 32612, and the string in our memory starts from somewhere around 0x558a5508c2a0
(This is a base 16 number).
Cool. So far so good.
/proc - This is seriously cool stuff.
The /proc directory in a Linux System, according to me is the coolest directory to mess around with. It’s a trove of information for all the processes running in your computer.
Nerd Talk: /proc isn’t a regular directory, but a virtual file system. It doesn’t contain real files but runtime information about your entire system. Again, how cool is that?
The /proc directory has a lot of folders, each corresponding to a individual process. Here’s mine:
➜ ~ ls /proc 1 1094 13 1408 1590 166 177 1834 20 2192 1095 2401 26752 281 28904 3125 3834 48 54 60 acpi iomem mtrr tty 10 32612 1300 1440 1592 167 1773 18369 202 21921 2347 2405 26963 282 28955 3165 39 4856 5402 622 asound ioports net uptime 1055 1097 1302 1445 1594 16715 1776 18399 2056 21927 2348 2425 27 283 29 3171 4 50 559 63 buddyinfo irq pagetypeinfo version 1056 1098 1305 1462 1595 1675 178 18408 21 21928 2349 2426 27064 284 29025 3188 40 504 56 630 bus kallsyms partitions version_signature 1059 11 1307 1467 1596 168 1797 1842 21191 22 2350 2431 2708 28488 291 32 4046 5040 560 64 cgroups kcore pressure vmallocinfo 1060 1107 1309 1476 1597 169 18 1844 21211 2201 2353 2444 275 28515 2912 3201 41 5046 566 65 cmdline keys sched_debug vmstat 1064 1116 1315 1484 1599 17 1802 1847 21224 2205 2355 2455 27554 2854 2970 3207 42 507 567 66 consoles key-users schedstat zoneinfo 1067 11535 1330 15 16 170 18026 185 2134 2219 2358 24835 276 28547 29956 326 429 5078 568 67 cpuinfo kmsg scsi 1072 11537 1336 1532 1600 17090 1805 18680 21495 222 2359 24837 27606 2859 3 33 430 5094 569 68 crypto kpagecgroup self 1073 11614 1342 1535 1601 17121 18082 18681 2151 2231 2372 24923 277 28598 30 330 44 51 57 848 devices kpagecount slabinfo 1075 1167 1346 1538 1602 17241 18104 187 2156 22468 2373 2503 27721 2860 30162 331 45 512 570 862 diskstats kpageflags softirqs 1077 1177 1352 1540 1603 173 18123 18742 2157 2266 2374 2524 27800 2863 30197 34 4542 5175 571 863 dma loadavg stat 1081 1184 1356 1554 1611 1735 1814 19237 2159 2283 2375 2598 27858 2864 30212 3489 4568 519 572 864 driver locks swaps 1085 12 1366 1558 162 174 18155 1931 2162 2286 2379 26 27867 28753 30230 35 46 52 573 9 execdomains mdstat sys 1088 1216 1373 1561 163 1752 1818 19672 2165 2295 2386 2639 279 2880 30237 36 4661 53 576 949 fb meminfo sysrq-trigger 1090 1267 1385 1576 164 1753 182 19747 21786 23 2390 26467 28 2883 30421 369 47 5352 58 950 filesystems misc sysvipc 1091 1288 1390 1588 165 1764 1823 198 21860 2303 2396 26468 280 28858 3071 38 475 536 59 976 fs modules thread-self 1092 1289 14 1589 1657 1767 1827 2 2187 2318 24 26658 2802 2889 30862 3816 4781 5367 6 977 interrupts mounts timer_list
Each item listed in a dark blue colour is a folder, and the name corresponds to the PID of the process. Now we know the process ID of our process is 32612
, so lets peek inside that folder.
➜ 32612 ls arch_status environ mountinfo personality statm attr exe mounts projid_map status autogroup fd mountstats root syscall auxv fdinfo net sched task cgroup gid_map ns schedstat timers clear_refs io numa_maps sessionid timerslack_ns cmdline limits oom_adj setgroups uid_map comm loginuid oom_score smaps wchan coredump_filter map_files oom_score_adj smaps_rollup cpuset maps pagemap stack cwd mem patch_state stat
For our project we only need to look at two files:
- /proc/32612/maps
- /proc/32612/mem
/proc/pid/maps
From man proc
(Basically a manual page)
/proc/[pid]/maps
A file containing the currently mapped memory regions and their
access permissions. See mmap(2) for some further information
about memory mappings.
Okay, fancy stuff aside. Let’s take a look at what my maps file looks like.
➜ ~ cat /proc/32612/maps 558bfe5ca000-558bfe5cb000 r--p 00000000 08:04 1705261 /home/yash/git_projects/blog/a.out 558bfe5cb000-558bfe5cc000 r-xp 00001000 08:04 1705261 /home/yash/git_projects/blog/a.out 558bfe5cc000-558bfe5cd000 r--p 00002000 08:04 1705261 /home/yash/git_projects/blog/a.out 558bfe5cd000-558bfe5ce000 r--p 00002000 08:04 1705261 /home/yash/git_projects/blog/a.out 558bfe5ce000-558bfe5cf000 rw-p 00003000 08:04 1705261 /home/yash/git_projects/blog/a.out 558bfee29000-558bfee4a000 rw-p 00000000 00:00 0 [heap] 7f87d213b000-7f87d2160000 r--p 00000000 103:05 4628 /usr/lib/x86_64-linux-gnu/libc-2.31.so 7f87d2160000-7f87d22d8000 r-xp 00025000 103:05 4628 /usr/lib/x86_64-linux-gnu/libc-2.31.so 7f87d22d8000-7f87d2322000 r--p 0019d000 103:05 4628 /usr/lib/x86_64-linux-gnu/libc-2.31.so 7f87d2322000-7f87d2323000 ---p 001e7000 103:05 4628 /usr/lib/x86_64-linux-gnu/libc-2.31.so 7f87d2323000-7f87d2326000 r--p 001e7000 103:05 4628 /usr/lib/x86_64-linux-gnu/libc-2.31.so 7f87d2326000-7f87d2329000 rw-p 001ea000 103:05 4628 /usr/lib/x86_64-linux-gnu/libc-2.31.so 7f87d2329000-7f87d232f000 rw-p 00000000 00:00 0 7f87d2346000-7f87d2347000 r--p 00000000 103:05 4236 /usr/lib/x86_64-linux-gnu/ld-2.31.so 7f87d2347000-7f87d236a000 r-xp 00001000 103:05 4236 /usr/lib/x86_64-linux-gnu/ld-2.31.so 7f87d236a000-7f87d2372000 r--p 00024000 103:05 4236 /usr/lib/x86_64-linux-gnu/ld-2.31.so 7f87d2373000-7f87d2374000 r--p 0002c000 103:05 4236 /usr/lib/x86_64-linux-gnu/ld-2.31.so 7f87d2374000-7f87d2375000 rw-p 0002d000 103:05 4236 /usr/lib/x86_64-linux-gnu/ld-2.31.so 7f87d2375000-7f87d2376000 rw-p 00000000 00:00 0 7ffeceb35000-7ffeceb56000 rw-p 00000000 00:00 0 [stack] 7ffecebd1000-7ffecebd4000 r--p 00000000 00:00 0 [vvar] 7ffecebd4000-7ffecebd5000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
Okay, breathe in.
We see the [heap]
, that sounds familiar. This line:
558bfee29000-558bfee4a000 rw-p 00000000 00:00 0 [heap]
558bfee29000-558bfee4a000
: This is the address range of the heap for our program. Now going back to our code, we see that our string is stored at0x558bfee292a0
, Now0x558bfee29000 < 0x558bfee292a0 < 0x558bfee4a000
, which means now we have substantial proof that our string is somewhere in the heap. Good.- The
rw
signifies that our program can read and write to this section (Duh!).
/proc/pid/mem
From man proc
/proc/[pid]/mem
This file can be used to access the pages of a process's memory
through open(2), read(2), and lseek(2).
For people who haven’t had a mindblow yet: The man page tells us /proc/pid/mem allows us access to the processors virtual memory like it’s any other file on your PC, no special mumbojumbo.
Isn’t that really really cool? (Rhetorical)
So, what do we have to do. Write a Python script which:
- Locates the addresses of the heap from
/proc/pid/maps
, - Finds where our string is located in the heap, and
- Overwrite it. Simple!
For anyone who wants to give this a shot on their own, stop reading here.
The funnest part of them all
Writing the code. Now this is a fairly simple exercise and barely needs any explanation. Reading the code once is more than enough for you to understand it, else just scroll down to see the magic. This code is decently commented, but if you have any doubts / suggestions, please leave a comment below!
The magic
So I run my program as
➜ ~ sudo python3 script.py 32612 ThisIsAGoodStArt ThisIsAmazing * Found the heap * Address range: 558bfee29000-558bfee4a000 * Permissions: rw-p The heap starts at: 94059765075968 The heap ends at: 94059765211136 * Found ThisIsAGoodStArt * Writing ThisIsAmazing in the heap
And back where my C code was executing, I see the magic.
...
#2456 - ThisIsAGoodStArt : 0x558bfee292a0
#2457 - ThisIsAGoodStArt : 0x558bfee292a0
#2458 - ThisIsAGoodStArt : 0x558bfee292a0
#2459 - ThisIsAGoodStArt : 0x558bfee292a0
#2460 - ThisIsAGoodStArt : 0x558bfee292a0
#2461 - ThisIsAGoodStArt : 0x558bfee292a0
#2462 - ThisIsAmazingArt : 0x558bfee292a0 <-- CHECK THIS OUT
#2463 - ThisIsAmazingArt : 0x558bfee292a0
#2464 - ThisIsAmazingArt : 0x558bfee292a0
#2465 - ThisIsAmazingArt : 0x558bfee292a0
WOHOOOOOOOOOOO!
Yes, 2465 means writing I had it on for 2400+ seconds (~40 minutes), which is how much time it took for me to smoothen over my code.
The string now reads and prints ThisIsAmazingArt
(which it is, undeniably 😉) because we merely overwrote a part of the heap with our content, and didn’t replace the whole string. Which is why Art
is still stored in the heap. For better results try using strings of the same length.
Serious Talk Time
Now, even though this is very fun for a side project, it has a few very serious implications and things you should steer away from.
- Like our C program stored a dumb string in memory, know that usually a lot of important things are stored in program memory too, passwords, sensitive details etc., and all of them are usually accessible using the above method; so refrain from running programs with
sudo
orRun as Administrator
on Windows unless you really trust it (Yes, Mr A, I’m looking at you too). - A few of you reading this, with some experience in development might argue that passwords are hashed and not stored in the plain form, well yes, they are. That just prevents the malicious program from knowing what your password is, it doesn’t stop it from replacing it in the memory with a different hash (which happen to be of the same size xD).
If any of you find any bugs, or have any suggestions/feedback, please feel free to reach out to me on LinkedIn or via email
Also, the source codes for this project are available on GitHub
This is all for you, for now!