210 points by transpute 1 day ago | 68 comments
tuukkah 1 day ago
> Pithy statements such as “only use exceptions for exceptional cases” or “Let it crash”, while catchy, do not do justice to the complexities that programmers need to deal with when deciding how to define, propagate, handle and reason about errors.
I think the author is on to something here w.r.t. how current programming languages and conventions brush over what good code might need.
lo_zamoyski 14 hours ago
gavinhoward 24 hours ago
That makes me sad because that is what my systems programming language uses.
For context, conditions and restarts come from the Lisp world. They do not unwind the stack. Instead, they run an error handler on error, and if it handles the error, you have the option of restarting the operation that failed.
I have implemented them in C; they are great for systems programming.
masfuerte 14 hours ago
layer8 22 hours ago
gavinhoward 22 hours ago
The difference is that callers (even indirect callers) can register handlers to handle those errors without unwinding the stack.
Maybe the error handler makes it return a default value. Maybe it will deallocate a cache to free up memory in a memory-constrained situation. Maybe it purely ignores EOF or a FILE DOES NOT EXIST error where those do not matter.
I also have a special error handler for parsing. If the error is a parse error, it prints the parse error in standard format, with source file, line, etc. Otherwise, it passes the error on to the next error handler in the chain.
This helps me modularize my code by keeping the parse error concerns separate from error handling for everything else.
typesanitizer 15 hours ago
After feedback on Lobste.rs, I plan on adding a section on conditions and restarts, hopefully sometime later today if time permits. :)
I'd be happy to add some information about Yao as well alongside Common Lisp. It would be helpful for me to have some more details about Yao before writing about it, so I have some follow-up questions below. Please feel free to link me to existing writing; I may be misremembering details, but I don't think the answers to these have been covered somewhere.
I looked at the docs on the master branch as of mid Jan 2025 (could you confirm if these are up-to-date), particularly design.md, and I noticed these points:
> That means that Yao will have unsafe code, like Rust's unsafe. However, unlike Rust, Yao's way of doing unsafe code will be harder to use,
So Yao has a delineation between safe and unsafe code, correct? Does "safe" in Yao have the same (or stronger) set of guarantees as Rust (i.e. no memory safety if all necessary invariants are upheld by unsafe code, and boundary of safe-unsafe code)?
> Yao's memory management will be like C++'s RAII [..]
Does Yao guarantee memory safety in the absence of unsafe blocks, and does the condition+restart system in Yao fall under the safe subset of the language? If so, I'm curious how lifetimes/ownership/regions are represented at runtime (if they are), and how they interact with restarts. Specifically:
1. Are the types of restart functions passed down to functions that are called? 2. If the type information is not passed, and conditions+restarts are part of the safe subset of Yao, then how is resumption logic checked for type safety and lifetime safety? Can resumptions cause run-time type errors and/or memory unsafety, e.g. by escaping a value beyond its intended lifetime?
---
For reading the docs, I used this archive.org link (I saw you're aware of the Gitea instance being down in another comment): https://web.archive.org/web/20250114231213/https://git.yzena...
gavinhoward 10 hours ago
Sorry, I was AFK for several hours.
Thank you for the offer, but I don't think Yao should be featured yet.
Edit: I guess I'll answer your questions anyway.
> So Yao has a delineation between safe and unsafe code, correct?
Correct. However, Yao's "unsafe" will actually be separate files, written directly in an LLVM-like assembly (Yvm in the repo). That's how it will be harder to use.
> Does "safe" in Yao have the same (or stronger) set of guarantees as Rust (i.e. no memory safety if all necessary invariants are upheld by unsafe code, and boundary of safe-unsafe code)?
Stronger.
First, Yvm (Yao's assembly/unsafe) is made specifically for structured languages, and it will still do bounds checks by default. Yes, there will be ways of not doing bounds checks, of course, but even in "unsafe," bounds checks will exist.
Second, Yao and Yvm are both explicitly designed for better formal verification. [1] This includes user-defined formal properties.
> Does Yao guarantee memory safety in the absence of unsafe blocks?
Yes.
> does the condition+restart system in Yao fall under the safe subset of the language?
It's still being implemented (hence why Yao should not be featured), but it will be a part of the safe subset. That is a guarantee; I will completely redesign Yao if I cannot fit conditions and restarts in the safe subset. But I am confident that they will work as-is because I implemented a prototype in C that is safer than C itself.
> 1. Are the types of restart functions passed down to functions that are called?
Not directly. My C code uses, and Yao will use, what I call "context stacks," an idea that comes from Jonathan Blow's Jai.
These are more useful than just restart functions, but there is explicitly one context stack for restart functions, per thread. Registering a restart function means pushing it onto the context stack.
Then, when an error happens, the context stack is walked backwards until a restart function handles the error. If no function handles it, the context stack for the parent thread is walked (starting at the point where the child thread was created), and so on until some function handles it.
I push a default restart function at the root, so errors will always be handled.
> 2. If the type information is not passed, and conditions+restarts are part of the safe subset of Yao, then how is resumption logic checked for type safety and lifetime safety? Can resumptions cause run-time type errors and/or memory unsafety, e.g. by escaping a value beyond its intended lifetime?
This is one of the craziest parts of Yao: it will have the capability to be generic over types at runtime. In addition, Yao has something like Go interfaces or Rust traits. What it has is more powerful, though.
The end result is that errors will actually be interfaces, and everything will be type checked at comptime, but be memory safe.
I hope this answers your questions.
[1]: https://gavinhoward.com/2024/05/what-rust-got-wrong-on-forma...
paddy_m 18 hours ago
I don't see how they could be used in most cases where you want a program to run without a programmer intervening. Could you list some more usecases.
gavinhoward 17 hours ago
A classic case would be restarting memory allocation after failure and after the error handler freed cache memory.
Another case would be retrying to open a file when the process has too many files open already. The error handler may close a few that are not critical, then have the restart fire.
Another case is sending something through TCP. Perhaps you try to send something, and it gives you an error. Unbeknownst to you, the message was already sent, but you wait a second or until other connections do less, then restart and try again, and it succeeds. The other end gets a duplicate, but no matter; it's TCP.
Another case is DNS. Say you need to get the IP address for some URL, and you connect to your first default DNS server. However, it happens to be run by your local incompetent sysadmins, and it happens to be down. Your error handler may choose a different, maybe public, DNS server, like Cloudflare or Google, and then restart.
If you think, 'Oh, well, I could program those in without restarts,' you are correct, but the thing is that doing so couples things.
Take the DNS example: if you put that extra error handling logic in the code that actually tries resolving things, then how do you change error handling when you need to?
Let's make the example even more detailed: perhaps you have a fleet of servers, a whole data center. Most of those servers could use a public DNS if they needed to, but perhaps your head node must NEVER use a public DNS for security reasons. The typical way to implement that would mean having an `if` statement for acting differently based on whatever condition would indicate head node or not. That is coupling the DNS resolution with error handling.
But if you have conditions and restarts, then you simply register a different DNS error handler at startup based on if it's the head node or not. Or the error handler could have the `if` statement instead. Either way would decouple DNS resolution from the error handling.
I hope all of that helps.
demurgos 1 day ago
Regarding context handling, one issue that I struggle with is where the context should be attached: by the caller or callee? In particular, in the context of intermediate libraries, I would like to limit needless work attaching the context if the caller already handles it (some sort of opt-out).
Usually I just attach the context in the callee as it's the simplest, but I'd like some systematic/reusable approach that would control this. It would be nice if it fits in the type system too. I use a bunch of languages like Rust, TypeScript, Kotlin, Python and it feels like a larger design issue.
To give a more concrete example. In Node, `fs.readSync(path)` includes the error path when it fails: the callee attaches the context to the error. In Rust, `std::fs::read(path)` does not attach the path to the error: the caller is responsible for the context. I would like some lightweight way to control if the file read should include or not the path. The ancestor scanning example from the article is a case where caller context is good IMO, but usually callee context is better for unrecoverable errors. Since it's contextual, having control would be nice.
jandrewrogers 23 hours ago
Expanding on your caller/callee example, sometimes both are far removed from the path, e.g. only having a file descriptor. There are plenty of ways to resolve that file descriptor into a path but one that consistently has zero overhead except in the case where it is required to contextualize an error is not trivial. Being able to resolve outside context implies the management and transmission of context in cases where it is not needed.
I am also coming around to the idea that the error handling context needs to be at least partly co-designed with the logging APIs and internals to have a little more flexibility around when and where things happen. In most software neither of these is really aware of the other.
jstimpfle 12 hours ago
Galanwe 1 day ago
So in a laps of 15 years (2010-1025), they hand picked 20 bugs from 5 open source filesystem projects (198 total), and extrapolated this result.
That is not science.
typesanitizer 1 day ago
> So in a laps of 15 years (2010-1025),
The paper was published in 2014, so the period is 2010-2014, not 2010-2025.
> they hand picked 20 bugs from 5 open source filesystem projects (198 total)
The bugs were randomly chosen; "hand picked" would imply that the authors investigated the contents of the bug reports deeply before deciding whether to include them (which would certainly fall under "bad science"). The paper states the following:
> We studied 198 randomly sampled, real world fail- ures reported on five popular distributed data-analytic and storage systems, including HDFS, a distributed file system [27]; Hadoop MapReduce, a distributed data- analytic framework [28]; HBase and Cassandra, two NoSQL distributed databases [2, 3]; and Redis, an in- memory key-value store supporting master/slave replica- tion [54]
So only 1 out of 5 projects is a file system.
> and extrapolated this result. That is not science.
The authors also provide a measure of statistical confidence in the 'Limitations section'.
> (3) Size of our sample set. Modern statistics suggests that a random sample set of size 30 or more is large enough to represent the entire population [57]. More rigorously, under standard assumptions, the Central Limit Theorem predicts a 6.9% margin of error at the 95% confidence level for our 198 random samples. Obviously, one can study more samples to further reduce the margin of error
Do you believe that this is insufficient or that the reasoning in this section is wrong?
Galanwe 1 day ago
Oh indeed, my bad.
> The bugs were randomly chosen; "hand picked" would imply that the authors investigated the contents of the bug reports deeply before deciding whether to include them
No, no, they were not randomly chosen, they even have a whole paragraph explaining how they randomly picked _from a pool of manually selected bugs_. Their criterion of selection varied from "being serious", to "having a lot of comments" and "they can understand the patch", or "the patch is not from the reporter".
> The authors also provide a measure of statistical confidence in the 'Limitations section'.
This is a measure of confidence of their random sampling being representative of the hand picked bugs...
> under standard assumptions, the Central Limit Theorem predicts a 6.9% margin of error at the 95% confidence level
I would love to see them prove the normality assumption of bug root cause distribution.
Also, the whole categorization they do seem purely qualitative.
This paper seems to lack rigor to me.
gtirloni 1 day ago
I have been guilty of such kind of comments and I have come to realize they do nothing to further my point. I would suggest to read the guidelines because they are very well written about this: https://news.ycombinator.com/newsguidelines.html
Galanwe 1 day ago
Thanks for the considerate feedback.
epr 1 day ago
So, analyzing databases, we picked Java, Java, Java, Java, and one in C. This does not seem very random. I suppose this may provide insight into failure modes in Java codebases in particular, but I'm not sure I'd be in a hurry to generalize.
dbtc 1 day ago
perching_aix 1 day ago
Instead of going for rhetorical KOs, how about remaining constructive? You're well aware that this is what they meant, you could have just written it plain out if you disagree.