Skip to content

Instantly share code, notes, and snippets.

@yalue
Last active March 2, 2022 14:29
Show Gist options
  • Save yalue/6852e9b88abbc60beba9c855a0045271 to your computer and use it in GitHub Desktop.
Save yalue/6852e9b88abbc60beba9c855a0045271 to your computer and use it in GitHub Desktop.
Notes I made while porting LITMUS^RT to Linux 5.16.
To compare Joshua's changes vs. the baseline 5.4 he used:
https://github.com/JoshuaJB/litmus-rt/compare/b90344f7d6000deba0709d75225f30cbf79ec2f0...55ce62849f94dd9c9fc0a5397b4f8cf40b40a324
General notes:
- The "state" field in task_struct is now __state and should be accessed using
the READ_ONCE and WRITE_ONCE accessors, or other macros from sched.h.
- The "lock" field in "struct rq" is now __lock, and should be
acquired/released using the raw_spin_rq_lock/unlock functions.
- The list of scheduler classes is different, and handled using some linker
stuff. It's no longer a linked list.
- The pick_next_task and select_task_rq scheduler class functions now take
fewer arguments. Fortunately, LITMUS didn't use any of the ones that were
removed.
- The get/put_online_cpus functions have been replaced with cpus_read_lock and
cpus_read_unlock.
- The stop_cpus function is no longer exported (it's been made static). I tried
replacing it with stop_machine_cpuslocked(), but that caused the machine to
permanently hang when switching plugins. So, instead, I just made the
stop_cpus() non-static again.
- The __call_single_data struct no longer has a "flags" field, which we
didn't appear to rely on anyway.
- The proc_create function no longer takes a file_operations struct, but a new
proc_ops struct. Fortunately the fields are basically the same, with
slightly different names.
- Scheduler classes now use a DEFINE_SCHED_CLASS macro, in order to define the
linker stuff used when building the list of classes.
- Made KERNEL_IMAGE_SIZE be the same regardless of CONFIG_RANDOMIZE_BASE, at
least on x86. See the comment in arch/x86/include/asm/page_64_types.h.
Necessary config settings:
Note that not all of these config changes may be necessary, but in order to
keep the scheduling logic simpler, I changed a couple default options:
- Disable CONFIG_NUMA_BALANCING. This may or may not work, but it enables
several code paths in kernel/sched/core.c that I wasn't sure would work
correctly.
- Disable CONFIG_SCHED_CORE. "Core scheduling" is relatively new, and I have
no idea how it plays with LITMUS stuff. In any case, a RT scheduler ought to
be making core-assignment decisions itself, and disabling this avoids a lot
of complexity in kernel/sched/core.c.
- As with older LITMUS versions, disable "Group scheduling for SCHED_OTHER".
If you're editing the .config directly, this means you should disable
CONFIG_FAIR_GROUP_SCHED.
- Disable CONFIG_SCHED_AUTOGROUP as well. ("Automatic process group
scheduling" if you're in menuconfig.)
Here the list of changed files:
- Makefile: Done
- arch/arm/Kconfig: Done. I no longer use the ARCH_CALLS_IRQ_ENTER... define,
so I no longer include it here.
- arch/arm64/Kconfig: Done. See note for arch/arm/Kconfig.
- arch/x86/Kconfig: Done. See note for arch/arm/Kconfig.
- arch/x86/kernel/Makefile: Done
- fs/exec.c: Done
- fs/inode.c: Done
- fs/select.c: Done
- include/linux/fs.h: Done
- include/linux/hardirq.h: Done, context changed a bit, but probably fine.
- include/linux/hrtimer.h: Done
- include/linux/sched.h: Done
- include/uapi/linux/sched.h: Done
- kernel/exit.c: Done
- kernel/fork.c: Done
- kernel/locking/rwsem.c: Done; changed a little in rwsem_try_write_lock;
probably fine.
- kernel/printk/printk.c: Done; changed diversion into the printk_sprint
function which feels like a major hack, but should achieve the same behavior
as before. As of the work related to https://lwn.net/Articles/779550/,
printk should no longer require locks. Perhaps in the future, should TRACE
wrap printk directly?
- kernel/sched/Makefile: Done
- kernel/sched/core.c: Done
- The "preempt" argument to __schedule(..) has been replaced with a
"sched_mode" argument, which is a bit more flexible. However, I'm pretty
sure that as long as it's nonzero something was "preempted."
- The sched_state_ipi() call is now in include/linux/sched.h, as
scheduler_ipi is now defined there.
- Next up: figure out when/where to call ft_irq_fired after sched_state_ipi
- I don't think ft_irq_fired is needed any more after recent refactoring?
See Linux kernel commit 90b5363acd47.
- The modification to ttwu_queue(...) was refactored at some point into a
separate ttwu_queue_wakelist(...) function that changes depending on
whether CONFIG_SMP is enabled. The is_realtime check is therefore now in
the SMP version of this function and should behave the same as before.
- There's a lot of changed stuff prior to the "goto litmus_out_activate"
line, I can't tell at a glance if any behavior changed that's important
to us. However, I found that a crash would occur later on if a task woke
from a self-suspension in this function, and was able to fix it by not
skipping anything here.
- There's no longer a call to balance_callback(rq) in schedule_tail(). Does
this indicate a balance callback got moved somewhere else that I should
be aware of?
- There's some different logic in __schedule() prior to pick_next_task().
- The interval between TS_SCHED2_START and TS_SCHED2_END no longer includes
balance_callback(rq). However, __balance_callbacks(rq) is called if
prev == next a few lines earlier. Is that OK?
- I changed a few places to call litmus_policy(...) instead of checking
(p->policy == SCHED_LITMUS).
- kernel/sched/deadline.c: Done
- kernel/sched/rt.c: Done
- kernel/sched/sched.h: Done
- include/uapi/linux/sched.h: Done
- kernel/exit.c: Done
- kernel/fork.c: Done
- kernel/locking/rwsem.c: Done; changed a little in rwsem_try_write_lock;
probably fine.
- kernel/printk/printk.c: Done; changed diversion into the printk_sprint
function which feels like a major hack, but should achieve the same behavior
as before. As of the work related to https://lwn.net/Articles/779550/,
printk should no longer require locks. Perhaps in the future, should TRACE
wrap printk directly?
- kernel/sched/Makefile: Done
- kernel/sched/core.c: Done
- The "preempt" argument to __schedule(..) has been replaced with a
"sched_mode" argument, which is a bit more flexible. However, I'm pretty
sure that as long as it's nonzero something was "preempted."
- The sched_state_ipi() call is now in include/linux/sched.h, as
scheduler_ipi is now defined there.
- Next up: figure out when/where to call ft_irq_fired after sched_state_ipi
- I don't think ft_irq_fired is needed any more after recent refactoring?
See Linux kernel commit 90b5363acd47.
- The modification to ttwu_queue(...) was refactored at some point into a
separate ttwu_queue_wakelist(...) function that changes depending on
whether CONFIG_SMP is enabled. The is_realtime check is therefore now in
the SMP version of this function and should behave the same as before.
- There's a lot of changed stuff prior to the "goto litmus_out_activate"
line, I can't tell at a glance if any behavior changed that's important
to us. However, I found that a crash would occur later on if a task woke
from a self-suspension in this function, and was able to fix it by not
skipping anything here.
- There's no longer a call to balance_callback(rq) in schedule_tail(). Does
this indicate a balance callback got moved somewhere else that I should
be aware of?
- There's some different logic in __schedule() prior to pick_next_task().
- The interval between TS_SCHED2_START and TS_SCHED2_END no longer includes
balance_callback(rq). However, __balance_callbacks(rq) is called if
prev == next a few lines earlier. Is that OK?
- I changed a few places to call litmus_policy(...) instead of checking
(p->policy == SCHED_LITMUS).
- kernel/sched/deadline.c: Done
- kernel/sched/rt.c: Done
- kernel/sched/sched.h: Done
- sched_class_highest is now determined by some linker ordering in
include/asm-generic/vmlinux.lds.h, so I modified that file to insert
litmus_sched_class. It is still above stop_sched_class; as much as I'd
like to get rid of that oddity, attempting to do so caused the kernel to
hang on boot. No idea why, but keeping litmus_sched_class at the top
boots properly.
- kernel/sched/stop_task.c: Done
- Behavior of pick_next_task_stop changed slightly; set_next_task_stop(..)
is now only called if pick_task_stop returns non-NULL. Based on the
existing LITMUS comment at this location, I made it so
sched_state_task_picked is also only called if pick_task_stop returns
non-NULL. Is this correct, or should it always be called instead?
- kernel/time/hrtimer.c: Done
- mm/page-writeback.c: Done
- mm/page_alloc.c: Done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment