Scheduling¶

CPU Scheduling¶

Definition: The decisions made by the OS to figure out which ready processes/threads should run and for how long.
The policy is the scheduling strategy. 怎么选择下一个要执行的进程
The mechanism is the dispatcher. 该机制是调度器怎么快速切换到下一进程

CPU-I/O Burst Style¶

Most processes alternate between CPU and I/O activities
I/O-bound process: most waiting I/O, many short CPU bursts. 大部分操作都I/O-bound
CPU-bound process: 渲染、计算...
Histogram of CPU-burst Times

The CPU Scheduler¶

Whenever the CPU becomes idle, a ready process must be selected for execution

当 CPU 空闲时，从所有的 ready processes 中选一个继续跑。

Non-preemptive scheduling: a process holds the CPU until it is willing to give it up.

非抢占式，它一直占着 CPU，直到它自己放弃。

Preemptive scheduling: a process can be preempted even though it could have happily continued executing.

抢占式，CPU 决定每个进程能跑多久，可以强制中止正在跑的进程。 OS一般抢占式

Scheduling Decision Points: scheduling decisions can occur

在非抢占式的情况中第二种情况不会发生，无论在哪种中第六种都不会发生

A process goes from RUNNING to WAITING

e.g. waiting for I/O to complete
A process goes from RUNNING to READY

e.g. when an interrupt occurs (such as a timer going off)
A process goes from WAITING to READY

e.g. an I/O operation has completed
A process goes from RUNNING to TERMINATED
A process goes from NEW to READY
A process goes from READY to WAITING

Scheduling Objectives¶

Maximize CPU Utilization
Maximize Throughput
Minimize Turnarround Time 周转时间，指进程从创建到完成的时间。
Minimize Waiting Time
Minimize Response Time 响应时间，指进程从创建到第一次响应被接受的时间。

Scheduling Mechanisms¶

Dispatcher¶

Dispatcher module gives control of the CPU to the process selected by the scheduler.

switching to kernel mode

kernel_entry, 用户态的信息存在 pt_regs 中。

switching context

上下文存在 PCB 中。

switching to user mode
jumping to the proper location in the user program to restart that program

Dispatch latency – time it takes for the dispatcher to stop one process and start another to run. 这是 pure overhead，因为 CPU 没有做实际的工作。

Scheduling Algorithms¶

The algorithm cannot be overly complicated so that they can be fast

First-Come, First-Served Scheduling (FCFS)¶

Waiting time = start time – arrival time
Turnaround time = finish time – arrival time
Convoy effect - short process behind long process 慢车在快车后面，所有车都在后面等着。

Shortest-Job-First(SJF) Scheduling¶

Use these lengths to schedule the process with the shortest time.

注意分为抢占式和非抢占式的！

有多段的执行，等待时间我们要计算这个进程在执行结束前，有多少时间没有被执行，即 25-10=15。

理论最优，但由于不知道执行一个进程需要多久，因此没有价值
Predicting CPU burst durations: 据之前的时间，预测一个进程的下一次执行时间：\(\tau_{n+1}=\alpha t_n+(1-\alpha)\tau_n\)

Round-Robin Scheduling¶

RR Scheduling is preemptive and designed for time-sharing.

给进程一个固定时间片，用完了就跑到 ready queue 末尾排队。

Ready Queue is a FIFO. Whenever a process changes its state to READY it is placed at the end of the FIFO.

Scheduling:

Pick the first process from the ready queue
Set a timer to interrupt the process after 1 quantum
Dispatch the process

No starvation, so better response time

在 SJF 中，如果不停的有时间短的进程进来，那么长进程就可能永远无法执行，称为 starvation。

The wait time is bounded.
Trade-off
Short quantum: great response/interactivity but high overhead
Long quantum: poor response/interactivity, but low overhead
In practice, %CPU time spent on switching is very low
time quantum: 10ms to 100ms
context-switching time: 10μs

Priority Scheduling¶

优先级高的先被调度，优先级低的后被调度。（No convention: low number can mean low or high priority）

Priorities can be internal.

e.g. in SJF it’s the predicted burst time, the number of open files.

Priorities can be external.

e.g. set by users to specify relative importance of jobs.

Processes with the same priority run round-robin

starvation: 优先级低的可能永远无法执行
solution: Priority aging: Increase the priority of a process as it ages

Multilevel Queue Scheduling¶

Scheduling within queues
每个queue有自己的调度policy
如high-priority使用RR，low-priority使用FCFS
Scheduling between queues
抢占式 or time-slicing among queues

Multilevel Feedback Queues¶

Processes can move among the queues.
当process的characteristic变化时改变queue，同时是一个good way to implement priority aging

有三层队列，第一、二层是 Round-Robin。来了一个进程先放到第一个队列里准备执行，如果没执行完就放到第二个队列里，如果还没执行完就放到第三个队列里 FCFS。如果最开始在 Q0 就执行完了，很可能是 I/O bound 的进程，我们把它的优先级设的很高；否则可能是 CPU-bound 我们就降低它的优先级。

非 CPU-intensive 的进程应该尽快得到 CPU，因为它们可能是交互式进程。

可配置性很高。包括queue的数量，每个queue的调度算法，queue之间的调度算法，提升或降级某个process的方法

不同system之间最好的调度算法可能不一样

What’s a Good Scheduling Algorithm?¶

Few analytical/theoretical results are available
Another option: Simulation
Finally: Implementation

Thread Scheduling¶

process-contention scope (PCS)

每个进程分到时间片一样，然后进程内部再对线程进行调度。

system-contention scope (SCS)

所有线程进行调度。

现在主流 CPU 都是以线程为粒度进行调度的。

Multiple-Processor Scheduling¶

Multi-processor may be any one of the following architectures:

Multi-core CPUs
Multi-threaded cores

Multithreaded Multicore System¶

Symmetric multiprocessing (SMP) is where each processor is self scheduling
All threads may be in a common ready queue (a)
Each processor may have its own private queue of threads (b)

现在大部分是这种架构。

CPU 中计算单元很快，但是内存访问是很慢的，需要 stall。为了利用这段 stall 的时间，我们就多用一个 thread，在这个 thread stall 时执行另一个 thread。（hyperthreading）

Chip-multithreading (CMT) assigns each core multiple hardware threads. (Intel refers to this as hyperthreading.)

hyperthreading 属于硬件线程，由硬件来调度，不同于 OS 里的 thread。

Two levels of scheduling:
The operating system deciding which software thread to run on a logical CPU
How each core decides which hardware thread to run on the physical core.

Multiple-Processor Scheduling¶

Load Balancing¶

Load balancing attempts to keep workload evenly distributed
Push migration – periodic task checks load on each processor, and if found pushes task from overloaded CPU to other CPUs.

core 上工作太多，要推给其他的 core。

Pull migration – idle processors pulls waiting task from busy processor.

core 上工作太少，就从其他的 core 上拉一些任务过来。

Processor Affinity 相关性¶

有的进程我们想要在一个 core 上跑。

Soft affinity – the operating system attempts to keep a thread running on the same processor, but no guarantees.
Hard affinity – allows a process to specify a set of processors it may run on.

Linux Scheduling¶

Nice command
数越小，优先级越高
ps -e -o uid,pid,ppid,pri,ni,cmd

Round-Robin + priority.
第一个红框 \(O(N)\) 找 counter 最大的进程，如果 counter 不为 0 就执行，否则说明所有的进程都已经跑完自己的时间片了，重新赋值时间片，按照优先级赋值。（当时数越大，说明优先级越高，后来相反了）

0.11

1.2 使用circular queue实现

2.2 Scheduling classes, Priorities within classes

每次找进程都要 \(O(N)\)，后来改为了 \(O(1)\) 的算法（Linux 2.6）

active task: its time slice hasn’t been fully consumed
expired task: has used all of its time slice

The kernel keeps two arrays of round-robin queues

One for active tasks: one Round Robin queue per priority level
One for expired tasks: one Round Robin queue per priority level

每个优先级都对应一个数组，每个数组里有一个 Round Robin 队列。

struct prio_array {
    int nr_active; // total num of tasks
    unsigned long bitmap[5]; // priority bitmap
    struct list_head queue[MAX_PRIO]; // the queues
}

The bitmap contains one bit for each priority level.
bitmap 存哪个优先级里还有元素，最开始所有位都是 0，如果有优先级里有进程，就把对应的位设为 1。找优先级最高的就是从左往右遍历，找到第一个 1 的位。x86 上正好有一个指令 bsfl（bit scan forward - from right to left）可以直接找到对应的位，然后再从对应的 task_list 取出一个进程。

prio_array.head_queue[bsfl(bitmap)].task_struct

一个任务执行完它的时间片后，就从 active array 移到 expired array。当 active array 为空时，就把 expired array 和 active array 交换。

问题在于：优先级数量受限制；而且 policy 和 mechanism 紧密绑定，难以维护，所以后来没有继续使用。

CFS: Completely Fair Scheduler

Developed by the developer of \(O(1)\), with ideas from others
Main idea: keep track of how fairly the CPU has been allocated to tasks, and “fix” the unfairness
For each task, the kernel keeps track of its virtual time
The sum of the time intervals during which the task was given the CPU since the task started
Could be much smaller than the time since the task started
Goal of the scheduler: give the CPU to the task with the smallest virtual time. i.e., to the task that’s the least "happy"

Scheduling¶

CPU Scheduling¶

CPU-I/O Burst Style¶

The CPU Scheduler¶

Scheduling Objectives¶

Scheduling Mechanisms¶

Dispatcher¶

Scheduling Algorithms¶

First-Come, First-Served Scheduling (FCFS)¶

Shortest-Job-First(SJF) Scheduling¶

Round-Robin Scheduling¶

Priority Scheduling¶

Multilevel Queue Scheduling¶

Multilevel Feedback Queues¶

What’s a Good Scheduling Algorithm?¶

Thread Scheduling¶

Multiple-Processor Scheduling¶

Multithreaded Multicore System¶

Multiple-Processor Scheduling¶

Load Balancing¶

Processor Affinity 相关性¶

Linux Scheduling¶

评论