Last active
March 2, 2022 14:29
-
-
Save yalue/6852e9b88abbc60beba9c855a0045271 to your computer and use it in GitHub Desktop.
Notes I made while porting LITMUS^RT to Linux 5.16.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To compare Joshua's changes vs. the baseline 5.4 he used: | |
https://github.com/JoshuaJB/litmus-rt/compare/b90344f7d6000deba0709d75225f30cbf79ec2f0...55ce62849f94dd9c9fc0a5397b4f8cf40b40a324 | |
General notes: | |
- The "state" field in task_struct is now __state and should be accessed using | |
the READ_ONCE and WRITE_ONCE accessors, or other macros from sched.h. | |
- The "lock" field in "struct rq" is now __lock, and should be | |
acquired/released using the raw_spin_rq_lock/unlock functions. | |
- The list of scheduler classes is different, and handled using some linker | |
stuff. It's no longer a linked list. | |
- The pick_next_task and select_task_rq scheduler class functions now take | |
fewer arguments. Fortunately, LITMUS didn't use any of the ones that were | |
removed. | |
- The get/put_online_cpus functions have been replaced with cpus_read_lock and | |
cpus_read_unlock. | |
- The stop_cpus function is no longer exported (it's been made static). I tried | |
replacing it with stop_machine_cpuslocked(), but that caused the machine to | |
permanently hang when switching plugins. So, instead, I just made the | |
stop_cpus() non-static again. | |
- The __call_single_data struct no longer has a "flags" field, which we | |
didn't appear to rely on anyway. | |
- The proc_create function no longer takes a file_operations struct, but a new | |
proc_ops struct. Fortunately the fields are basically the same, with | |
slightly different names. | |
- Scheduler classes now use a DEFINE_SCHED_CLASS macro, in order to define the | |
linker stuff used when building the list of classes. | |
- Made KERNEL_IMAGE_SIZE be the same regardless of CONFIG_RANDOMIZE_BASE, at | |
least on x86. See the comment in arch/x86/include/asm/page_64_types.h. | |
Necessary config settings: | |
Note that not all of these config changes may be necessary, but in order to | |
keep the scheduling logic simpler, I changed a couple default options: | |
- Disable CONFIG_NUMA_BALANCING. This may or may not work, but it enables | |
several code paths in kernel/sched/core.c that I wasn't sure would work | |
correctly. | |
- Disable CONFIG_SCHED_CORE. "Core scheduling" is relatively new, and I have | |
no idea how it plays with LITMUS stuff. In any case, a RT scheduler ought to | |
be making core-assignment decisions itself, and disabling this avoids a lot | |
of complexity in kernel/sched/core.c. | |
- As with older LITMUS versions, disable "Group scheduling for SCHED_OTHER". | |
If you're editing the .config directly, this means you should disable | |
CONFIG_FAIR_GROUP_SCHED. | |
- Disable CONFIG_SCHED_AUTOGROUP as well. ("Automatic process group | |
scheduling" if you're in menuconfig.) | |
Here the list of changed files: | |
- Makefile: Done | |
- arch/arm/Kconfig: Done. I no longer use the ARCH_CALLS_IRQ_ENTER... define, | |
so I no longer include it here. | |
- arch/arm64/Kconfig: Done. See note for arch/arm/Kconfig. | |
- arch/x86/Kconfig: Done. See note for arch/arm/Kconfig. | |
- arch/x86/kernel/Makefile: Done | |
- fs/exec.c: Done | |
- fs/inode.c: Done | |
- fs/select.c: Done | |
- include/linux/fs.h: Done | |
- include/linux/hardirq.h: Done, context changed a bit, but probably fine. | |
- include/linux/hrtimer.h: Done | |
- include/linux/sched.h: Done | |
- include/uapi/linux/sched.h: Done | |
- kernel/exit.c: Done | |
- kernel/fork.c: Done | |
- kernel/locking/rwsem.c: Done; changed a little in rwsem_try_write_lock; | |
probably fine. | |
- kernel/printk/printk.c: Done; changed diversion into the printk_sprint | |
function which feels like a major hack, but should achieve the same behavior | |
as before. As of the work related to https://lwn.net/Articles/779550/, | |
printk should no longer require locks. Perhaps in the future, should TRACE | |
wrap printk directly? | |
- kernel/sched/Makefile: Done | |
- kernel/sched/core.c: Done | |
- The "preempt" argument to __schedule(..) has been replaced with a | |
"sched_mode" argument, which is a bit more flexible. However, I'm pretty | |
sure that as long as it's nonzero something was "preempted." | |
- The sched_state_ipi() call is now in include/linux/sched.h, as | |
scheduler_ipi is now defined there. | |
- Next up: figure out when/where to call ft_irq_fired after sched_state_ipi | |
- I don't think ft_irq_fired is needed any more after recent refactoring? | |
See Linux kernel commit 90b5363acd47. | |
- The modification to ttwu_queue(...) was refactored at some point into a | |
separate ttwu_queue_wakelist(...) function that changes depending on | |
whether CONFIG_SMP is enabled. The is_realtime check is therefore now in | |
the SMP version of this function and should behave the same as before. | |
- There's a lot of changed stuff prior to the "goto litmus_out_activate" | |
line, I can't tell at a glance if any behavior changed that's important | |
to us. However, I found that a crash would occur later on if a task woke | |
from a self-suspension in this function, and was able to fix it by not | |
skipping anything here. | |
- There's no longer a call to balance_callback(rq) in schedule_tail(). Does | |
this indicate a balance callback got moved somewhere else that I should | |
be aware of? | |
- There's some different logic in __schedule() prior to pick_next_task(). | |
- The interval between TS_SCHED2_START and TS_SCHED2_END no longer includes | |
balance_callback(rq). However, __balance_callbacks(rq) is called if | |
prev == next a few lines earlier. Is that OK? | |
- I changed a few places to call litmus_policy(...) instead of checking | |
(p->policy == SCHED_LITMUS). | |
- kernel/sched/deadline.c: Done | |
- kernel/sched/rt.c: Done | |
- kernel/sched/sched.h: Done | |
- include/uapi/linux/sched.h: Done | |
- kernel/exit.c: Done | |
- kernel/fork.c: Done | |
- kernel/locking/rwsem.c: Done; changed a little in rwsem_try_write_lock; | |
probably fine. | |
- kernel/printk/printk.c: Done; changed diversion into the printk_sprint | |
function which feels like a major hack, but should achieve the same behavior | |
as before. As of the work related to https://lwn.net/Articles/779550/, | |
printk should no longer require locks. Perhaps in the future, should TRACE | |
wrap printk directly? | |
- kernel/sched/Makefile: Done | |
- kernel/sched/core.c: Done | |
- The "preempt" argument to __schedule(..) has been replaced with a | |
"sched_mode" argument, which is a bit more flexible. However, I'm pretty | |
sure that as long as it's nonzero something was "preempted." | |
- The sched_state_ipi() call is now in include/linux/sched.h, as | |
scheduler_ipi is now defined there. | |
- Next up: figure out when/where to call ft_irq_fired after sched_state_ipi | |
- I don't think ft_irq_fired is needed any more after recent refactoring? | |
See Linux kernel commit 90b5363acd47. | |
- The modification to ttwu_queue(...) was refactored at some point into a | |
separate ttwu_queue_wakelist(...) function that changes depending on | |
whether CONFIG_SMP is enabled. The is_realtime check is therefore now in | |
the SMP version of this function and should behave the same as before. | |
- There's a lot of changed stuff prior to the "goto litmus_out_activate" | |
line, I can't tell at a glance if any behavior changed that's important | |
to us. However, I found that a crash would occur later on if a task woke | |
from a self-suspension in this function, and was able to fix it by not | |
skipping anything here. | |
- There's no longer a call to balance_callback(rq) in schedule_tail(). Does | |
this indicate a balance callback got moved somewhere else that I should | |
be aware of? | |
- There's some different logic in __schedule() prior to pick_next_task(). | |
- The interval between TS_SCHED2_START and TS_SCHED2_END no longer includes | |
balance_callback(rq). However, __balance_callbacks(rq) is called if | |
prev == next a few lines earlier. Is that OK? | |
- I changed a few places to call litmus_policy(...) instead of checking | |
(p->policy == SCHED_LITMUS). | |
- kernel/sched/deadline.c: Done | |
- kernel/sched/rt.c: Done | |
- kernel/sched/sched.h: Done | |
- sched_class_highest is now determined by some linker ordering in | |
include/asm-generic/vmlinux.lds.h, so I modified that file to insert | |
litmus_sched_class. It is still above stop_sched_class; as much as I'd | |
like to get rid of that oddity, attempting to do so caused the kernel to | |
hang on boot. No idea why, but keeping litmus_sched_class at the top | |
boots properly. | |
- kernel/sched/stop_task.c: Done | |
- Behavior of pick_next_task_stop changed slightly; set_next_task_stop(..) | |
is now only called if pick_task_stop returns non-NULL. Based on the | |
existing LITMUS comment at this location, I made it so | |
sched_state_task_picked is also only called if pick_task_stop returns | |
non-NULL. Is this correct, or should it always be called instead? | |
- kernel/time/hrtimer.c: Done | |
- mm/page-writeback.c: Done | |
- mm/page_alloc.c: Done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment