229 points by ekr____ 5 days ago | 88 comments
bluetomcat 2 days ago
lines = realloc(lines, (num_lines + 1) * sizeof(char *));
In case it cannot service the reallocation and returns NULL, it will overwrite "lines" with NULL, but the memory that "lines" referred to is still there and needs to be either freed or used.The proper way to call it would be:
tmp = realloc(lines, (num_lines + 1) * sizeof(char *));
if (tmp == NULL) {
free(lines);
lines = NULL;
// ... possibly exit the program (without a memory leak)
} else {
lines = tmp;
}
returningfory2 2 days ago
dataflow 1 day ago
Really, there's no excuse whatsoever for not having a separate function that takes the pointer by reference & performs the reallocation and potential termination inside itself, and using that instead of calling realloc directly.
returningfory2 21 hours ago
Talking about "making future changes" seems to be missing the point of what the author is doing. They're not committing code to the Linux kernel. They're writing a beginner's article about memory management.
dataflow 21 hours ago
I realize, and that's what makes it even worse. First impressions have a heck of a stronger effect than 10th impressions. Beginners need to learn the right way in the beginning, not the wrong way.
Whenever did "safety first" stop being a thing? This is like like skipping any mention of goggles when teaching chemistry or woodworking for "pedagogical reasons". You're supposed to first you teach your students the the best way to do things, then you can teach them how to play fast and loose if it's warranted. Not the other way around!
returningfory2 20 hours ago
And no, you're not supposed to teach your students the best way to do things at the start. That's not how teaching works. You start with the simpler (but still correct) way, and then work towards the best way. This is why introductions to Rust are full of clone calls. The best Rust code minimizes the number of clones. But when you're introducing people to something, you don't necessarily do the optimal thing first because that disrupts the learning process.
dataflow 20 hours ago
And hence we circle back to what I just wrote above: you're confusing the code with the program that it compiles to. Because the code isn't there solely for the purpose of being compiled into a program, it's also serving as a stepping stone for other things (learning, modification, whatever). https://news.ycombinator.com/item?id=42733611
If it helps to phrase it differently: the code might be "compile-safe", but not "modification-safe" or "learning-safe".
returningfory2 17 hours ago
dataflow 17 hours ago
tptacek 1 day ago
Since that's essentially what EKR is doing here (albeit manually), I don't think this observation about losing the original `lines` pointer is all that meaningful.
dundarious 19 hours ago
It helps in the vast amount of cases where sensible memory bounds are known or can be inferred, and it means that all system memory allocation errors* can be dealt with up front with proper error handling (including perhaps running in a more restrictive mode with less memory), and then all application memory allocation errors (running out of space in the arena) can be auto-abort() as before (and be treated as bugs). The other huge benefit is that there is no free() logic for incremental allocations within the arena, you just munmap/VirtualFree the arena in its entirety when done.
Of course, there are cases where there are no sensible memory bounds (in space or perhaps in time) and where this method is not appropriate without significant modification.
*modulo Linux's overcommit... which is a huge caveat
tptacek 17 hours ago
dundarious 11 hours ago
So I agree, given malloc and friends are a combination of OS and application-level allocators, they should auto-abort(). I don't focus on malloc and friends though, because I'm not a fan of using the Rube Goldberg machine of "general purpose" allocators in most non-trivial situations. They're complicated hierarchies of size-based pools, and free lists, and locks, and on and on.
ekr____ 2 days ago
Thanks for the flag. As you have probably noticed, I just abort the program a few lines below on realloc failure, so this doesn't leak so much as crash. However, this is a nice example of how fiddly C memory management is.
witrak 18 hours ago
To say that "this is a nice example of how fiddly C memory management is" in the discussion is a bit too little - perhaps intended readers of the article would prefer an explicit warning there, just to be aware that they shouldn't forget to abort the program as you do.
lionkor 2 days ago
bluetomcat 2 days ago
josephg 1 day ago
In theory, sure. But vanishingly little software actually deals with OOM gracefully. What do you do? Almost any interaction with the user may result in more memory allocations in turn - which presumably may also fail. It’s hard to even test OOM on modern systems because of OS disk page caching.
Honestly, panicking on OOM is a totally reasonable default for most modern application software. In languages like rust, this behaviour is baked in.
tptacek 1 day ago
PhilipRoman 2 days ago
I would say this is pointless on many modern systems unless you also disable overcommit, since otherwise any memory access can result in a crash, which is impossible to check for explicitly.
lionkor 1 day ago
kevin_thibedeau 1 day ago
josephg 1 day ago
Maybe not in embedded work - but in that case you might want to preallocate memory anyway.
samsquire 2 days ago
In my spare time working with C as a hobby I am usually in "vertical mode" which is different to how I would work (carefully) at work, which is just getting things done end-to-end as fast as possible, not careful at every step that we have no memory errors. So I am just trying to get something working end-to-end so I do not actually worry about memory management when writing C. So I let the operating system handle memory freeing. I am trying to get the algorithm working in my hobby time.
And since I wrote everything in Python or Javascript initially, I am usually porting from Python to C.
If I were using Rust, it would force me to be careful in the same way, due to the borrow checker.
I am curious: we have reference counting and we have Profile guided optimisation.
Could "reference counting" be compiled into a debug/profiled build and then detect which regions of time we free things in before or after (there is a happens before relation with dropping out of scopes that reference counting needs to run) to detect where to insert frees? (We Write timing metadata from the RC build, that encapsulates the happens before relationships)
Then we could recompile with a happens-before relation file that has correlations where things should be freed to be safe.
EDIT: Any discussion about those stack diagrams and alignment should include a link to this wikipedia page;
jvanderbot 2 days ago
One horrible but fun thing a former professor of mine pointed out: If your program isn't going to live long, then you never have to deallocate memory. Once it exits, the OS will happily clean it up for you.
This works in C or perhaps lazy GC languages, but for stateful objects where destructors do meaningful work, like in C++, this is dangerous. This is one of the reasons I hate C++ so much: Unintended side effects that you have to trigger.
> Could "reference counting" be compiled into a debug/profiled build and then detect which regions of time we free things in before or after (there is a happens before relation with dropping out of scopes that reference counting needs to run) to detect where to insert frees?
This is what Rust does, kinda.
C++ also does this with "stack" allocated objects - it "frees" (calls destructor and cleans up) when they go out of scope. And in C++, heap allocated data (if you're using a smart pointer) will automatically deallocate when the last reference drops, but this is not done at compile time.
Those are the only two memory management models I'm familiar with enough to comment on.
MarkSweep 2 days ago
https://devblogs.microsoft.com/oldnewthing/20180228-00/?p=98...
> This sparked an interesting memory for me. I was once working with a customer who was producing on-board software for a missile. In my analysis of the code, I pointed out that they had a number of problems with storage leaks. Imagine my surprise when the customers chief software engineer said "Of course it leaks". He went on to point out that they had calculated the amount of memory the application would leak in the total possible flight time for the missile and then doubled that number. They added this much additional memory to the hardware to "support" the leaks. Since the missile will explode when it hits its target or at the end of its flight, the ultimate in garbage collection is performed without programmer intervention.
jvanderbot 2 days ago
Have you heard the related story about the patriot missile system?
https://www.cs.unc.edu/~smp/COMP205/LECTURES/ERROR/lec23/nod...
Not a GC issue, but fun software bug.
gpderetta 1 day ago
But of course that would never happen, wouldn't it?
pjmlp 2 days ago
jvanderbot 2 days ago
close(x) is not memory management - not at the user level. This should be done.
free(p) has no O/S side effects like this in C - this can be not-done if you don't malloc all your memory.
You can get away with not de-allocating program memory, but (as mentioned), that has nothing to do with freeing Os/ kernel / networking resources in C.
PhilipRoman 2 days ago
pjmlp 1 day ago
This is a very non portable assumption, even we constrain it to only across UNIX/POSIX flavours.
PhilipRoman 1 day ago
Consequences of Process Termination
Process termination caused by any reason shall have the following consequences:
[..] All of the file descriptors, directory streams, conversion descriptors, and message catalog descriptors open in the calling process shall be closed.
[..] Each attached shared-memory segment is detached and the value of shm_nattch (see shmget()) in the data structure associated with its shared memory ID shall be decremented by 1.
For each semaphore for which the calling process has set a semadj value (see semop()), that value shall be added to the semval of the specified semaphore.
[..] If the process is a controlling process, the controlling terminal associated with the session shall be disassociated from the session, allowing it to be acquired by a new controlling process.
[..] All open named semaphores in the calling process shall be closed as if by appropriate calls to sem_close().
Any memory locks established by the process via calls to mlockall() or mlock() shall be removed. If locked pages in the address space of the calling process are also mapped into the address spaces of other processes and are locked by those processes, the locks established by the other processes shall be unaffected by the call by this process to _Exit() or _exit().
Memory mappings that were created in the process shall be unmapped before the process is destroyed.
Any blocks of typed memory that were mapped in the calling process shall be unmapped, as if munmap() was implicitly called to unmap them.
All open message queue descriptors in the calling process shall be closed as if by appropriate calls to mq_close().
caspper69 2 days ago
There are tools that will tell you they're missing, however. Read up on Valgrind and ASAN.
In C, non-global variables go out of scope when the function they are created in ends. So if you malloc() in a fn, free() at the end.
If you're doing everything with globals in a short-running program, let the OS do it if that suits you (makes me feel dirty).
This whole problem doesn't get crazy until your program gets more complicated. Once you have a lot of pointers among objects with different lifetimes. or you decide to add some concurrency (or parallelism), or when you have a lot of cooks in the kitchen.
In the applications you say you are writing, just ask yourself if you're going to use a variable again. If not, and it is using dynamically-allocated memory, free() it.
Don't psych yourself out, it's just C.
And yes, there are ref-counting libraries for C. But I wouldn't want to write my program twice, once to use the ref-counting library in debug mode and another to use malloc/free in release mode. That sounds exhausting for all but the most trivial programs.
SkiFire13 2 days ago
Profile guided optimizations can only gather informations about what's most probable, but they can't give knowledge about things about what will surely happen. For freeing however you most often want that knowledge, because not freeing will result in a memory leak (and freeing too early will result in a use-aftee-free, which you definitely want to avoid so the analysis needs to be conservative!). In the end this can only be an _optimization_ (just like profile guided _optimization_s are just optimizations!) on top of a workflows that is ok with leaking everything.
mgaunard 2 days ago
You could make every object its own allocated entity, but then you're losing most of the benefits of using C, which is the ability to control memory layout of objects.
pjmlp 2 days ago
gizmo686 1 day ago
I've written hardware device drivers in pure C where you need need to peek and poke at specific bits on the memory bus. I defined a struct that matched the exact memory layout that the hardware specifies. Then cast an integer to a pointer to that struct type. At which point I could interact with the hardware by directly reading/writing fields if the struct (most of which were not even byte aligned).
It is not quite that simple, as you also have to deal with bypassing the cache, memory barriers, possibly virtual memory, finding the erreta that clarifies the originaly published register address was completely wrong. But I don't think any of that is what people mean when they say "memory layout".
pjmlp 1 day ago
You are aware that some of those casting tricks are UB, right?
gizmo686 1 day ago
If you want to control register layout, then C is not going to help you, but that is not typically what is meant by "memory layout".
And if you want to control cache usage ... Some architectures do expose some black magic which you would need to go to assembly to access. But for the most part controlling cache involves understanding how the cache works, then controlling the memory layout and accesses to work well with the cache.
AdieuToLogic 1 day ago
char *strdup(const char *str) {
size_t len = strlen(str);
char *retval = malloc(len);
if (!retval) {
return NULL;
}
strcpy(retval, str);
return retval;
}
Has a very common defect. The malloc call does not reserve enough space for the NUL byte required for successful use of strcpy, thus introducing heap corruption.Also, assuming a NULL pointer is bitwise equal to 0 is not portable.
msarnoff 1 day ago
See the C FAQ questions 5-3 and 5-10, et al. https://c-faq.com/null/
writebetterc 1 day ago
Here's a much easier way to write the program:
1. Dump whole file into buffer as one string
2. Find newlines in buffer, replace with NULs. This also let's you find each line and save them in another buffer
3. Sort the buffer of all the lines you found
4. qsort the buffer
5. Print everything
6. Free both buffers
Or, as a C program: https://godbolt.org/z/38nq1MorM
commandlinefan 14 hours ago
... unless the file is too big to fit into memory?