220 points by chmaynard 9 months ago | 50 comments
Hendrikto 9 months ago
Sounds promising. Just like EEVDF, this both simplifies and improves the status quo. Does not get better than that.
amelius 9 months ago
Why isn't the level of preemption a property of the specific event, rather than of some global mode? Some events need to be handled with less latency than others.
btilly 9 months ago
To stand ready to reliably respond to any one kind of event with low latency, every CPU intensive program must suffer a performance penalty all the time. And this is true no matter how rare those events may be.
zeusk 9 months ago
xedrac 9 months ago
btilly 9 months ago
zeusk 9 months ago
Someone 9 months ago
Not necessarily. The CPU can do it in hardware. As a simple example, the 6502 had separate “interrupt request” (IRQ) and “non-maskable interrupts (NMI) pins, supporting two interrupt levels. The former could be disabled; the latter could not.
A programmable interrupt controller (https://en.wikipedia.org/wiki/Programmable_interrupt_control...) also could ‘know’ that it need not immediately handle some interrupts.
themulticaster 9 months ago
By the way, NMI still exist on x86 to this day, but AFAIK they're only used for serious machine-level issues and watchdog timeouts.
wizzwizz4 9 months ago
refulgentis 9 months ago
Generally, any given software can be done in hardware.
Specifically, we could attach small custom coprocessors to everything for the Linux kernel, and Linux could require them to do any sort of multitasking.
In practice, software allows us to customize these things and upgrade them and change them without tightly coupling us to a specific kernel and hardware design.
btilly 9 months ago
This doesn't mean that moving logic into hardware can't be a win. It often is. But we should also expect that what has tended to wind up in software, will continue to do so in the future. And that includes complex decisions about the priority of interrupts.
wizzwizz4 9 months ago
sroussey 9 months ago
Wait, what? I’ve been out of compiler design for a couple decades, but that definitely used to be a thing.
namibj 9 months ago
wizzwizz4 9 months ago
amluto 9 months ago
RandomThoughts3 9 months ago
There are two different notions which are easy to get confused about here: when a process can be preempted and when a process will actually be preempted.
Potential preemption point is a property of the scheduler and is what is being discussed with the global mode here. More preemption points mean more chances for processes to be preempted at inconvenient time obviously but it also means more chances to properly prioritise.
What you call level of preemption, which is to say priority given by the scheduler, absolutely is a property of the process and can definitely be set. The Linux default scheduler will indeed do its best to allocate more time slices and preempt less processes which have priority.
biorach 9 months ago
jabl 9 months ago
> SCHED_IDLE, SCHED_BATCH and SCHED_NORMAL/OTHER get the lazy thing, FIFO, RR and DEADLINE get the traditional Full behaviour.
acters 9 months ago
Though when it comes to gaming, there is a delicate balance as game performance should be prioritized but not be allowed to cause the system to lock up for multitasking purposes.
Either way, considering this is mostly for idle tasks. It has little importance to allow it to be automated beyond giving users a simple command for scripting purposes that users can use for toggling various behaviors.
biorach 9 months ago
withinboredom 9 months ago
vvanders 9 months ago
Even for non-rendering systems those still usually run at game tick-rates since running those full-tilt can starve adjacent cores depending on false sharing, cache misses, bus bandwidth limits and the like.
I can't think of a single title I worked on that did what you describe, embedded stuff for sure but that's a whole different class that is likely not even running a kernel.
ahoka 9 months ago
Tomte 9 months ago
harry8 9 months ago
Do no syscalls. Timer tick. Kernel takes over and does whatever as well.
No_HZ_FULL, isolated cpu cores, interrupts on some other core and you can spin using 100% cpu forever on a core. Do games do anything like this?
biorach 9 months ago
I haven't heard of it being done with PC games. I doubt the environment would be predictable enough. On consoles tho..?
vvanders 9 months ago
From what I recall we mostly did it for predictability so that things that may go long wouldn't interrupt deadline sensitive things(audio, physics, etc).
biorach 9 months ago
chainingsolid 9 months ago
acters 9 months ago
But yeah thanks for making that distinction. Forgot to touch on the differences
ajross 9 months ago
How do you know which thread is needed to "handle" this particular "event" though? I mean, maybe you're about to start a high priority video with low latency requirements[1]. And due to a design mess your video player needs to contact some random auth server to get a DRM cookie for the stream.
How does the KERNEL know that the auth server is on the critical path for the backup camera? That's a human-space design issue, not a scheduler algorithm.
[1] A backup camera in a vehicle, say.
AtlasBarfed 9 months ago
I guess it's nice to keep Linux relevant to older single CPU architectures, especially with regards to embedded systems.
But if Linux is going to be targeted towards modern cpu architectures primarily, accidentally basically assume that there is a a single CPU available to evaluate priority and leave the CPU intensive task bound to other cores?
I mean this has to be what high low is for, outside of mobile efficiency.
kbolino 9 months ago
edit: That having been said, I may be misinterpreting what you described; there's a comment in another thread by @zeusk which says to me that more or less this (single core used/reserved for making priority decisions) is already the case on many multi-core systems anyway, thanks to IPI (inter-processor interrupts). So, presumably, the prioritization core handles the preemption interrupts, then runs decision logic on what threads actually need to be preempted, and sends those decisions out to the respective core(s) using IPI, which causes the kernel code on those cores to unconditionally preempt the running thread.
However, I'd wonder still about the risk of memory barriers or locks starving out the kernel scheduler in this kind of architecture. Maybe the CPU can arbitrate the priority for these in hardware? Or maybe the kernel scheduler always runs for a small portion of every time slice, but only takes action if an interrupt handler has set a flag?
yndoendo 9 months ago
Software PLCs will bind to a core which is not exposed to the OS environment and will show a dual core is a single or a quad core as a tri core.
weinzierl 9 months ago
Is this about kernel tasks, user tasks or both?
GrayShade 9 months ago
fguerraz 9 months ago
temac 9 months ago
simfoo 9 months ago
biorach 9 months ago
> There is also, of course, the need for extensive performance testing; Mike Galbraith has made an early start on that work, showing that throughput with lazy preemption falls just short of that with PREEMPT_VOLUNTARY.
spockz 9 months ago
hifromwork 9 months ago
hamilyon2 9 months ago
If one wanted to drastically simplify scheduler, for example for some scientific application which doesn't care about preemption at all, can it be done in clean, modular way? And will be any benefit?
p_l 9 months ago
The standard way is to set interrupt masks so they don't go to "work" cpus and use cpusets to only allow specific cgroup to execute on given cpuset.
toast0 9 months ago
Whatever the scheduler does should be pretty low impact, because the runlist will be very short. If your application doesn't do much I/O, you won't get many interrupts either. If you can run a tickless kernel (is that still a thing, or is it normal now?), you might not get any interrupts for large periods.
marcosdumay 9 months ago
But the reason for drastically simplifying it would be to avoid bugs, there isn't much performance to gain compared to a well-set default one (there are plenty of settings tough). And there haven't been many bugs there. On most naive simplifications you will lose performance, not gain it.
If you are running a non-interactive system, the easiest change to make is to increase the size of the process time quantum.
kevin_thibedeau 9 months ago
AvaSayes 9 months ago
you gotta balance precision (targeted leads) with efficiency so everything runs smoothly overall.
lincpa 9 months ago