YJL: BFS (Brain Fuck Scheduler) Test

A post caught my interesting, which is about the huge boost on x264 encoding. We will all see the change when the kernel 2.6.32 comes, it uses BFS¹, a new process scheduler. The current kernel uses CFS (Completely Fair Scheduler). Phoronix has a benchmark report for BFS.

I decided to do a little test. Because I have no background, so the test is just a rough numbers. Here is my system:

Core 2 Duo T5600 1.83GHz
Gentoo amd64
Kernel 2.6.31-gentoo-r4 (It’s kernel 2.6.31 with genpatch)
Kernel 2.6.32-rc5 (Just for reference)
kernel config file from my 2.6.30 configuration
BFS v304²

Contents

1 Simple concurrent process test
2 Compilation Test
3 Conclusion

1 Simple concurrent process test

First, I have no idea how should I test. I just want to get some number to read. My test program is written in Python (So you knew I really have no idea), because Python’s threading could only use one core, therefore I used processing to utilize my two cores. The test code shows as follows:

#!/usr/bin/env python

from time import sleep
from time import time
import os

from processing import activeChildren
from processing import Process


MAX_COUNT = 1000000
MAX_PROCESS = 32
PASSES = 5


def counter():

  c = 0
  while c < MAX_COUNT:
    c += 1


def do_test(PROCESS_COUNT):

  processes = []
  for i in range(PROCESS_COUNT):
    p = Process(target=counter)
    p.setDaemon(True)
    processes.append(p)

  t_start = time()
  for p in processes:
    p.start()
  while activeChildren():
    sleep(0.001)
  t_end = time()
  return t_end - t_start


def main():

  avg_times = []
  for concurr in range(1, MAX_PROCESS + 1):
    print 'Running concurrent %3d processes...' % concurr,
    acc_time = 0.0
    for i in range(PASSES):
      acc_time += do_test(concurr)
    avg_time = acc_time / PASSES
    avg_times.append(avg_time)
    print '%9f seconds' % avg_time

  fname = 'test-c%d-p%d-%s' % (MAX_COUNT, MAX_PROCESS, os.uname()[2])
  f = open(fname, 'w')
  data = (os.uname()[2], avg_times)
  f.write(repr(data))
  f.close()
  print
  print 'Data written to %s.' % fname


if __name__ == '__main__':
  main()

As you can see, the created process doesn’t do actual things. It just keeps counting up. Each test would run five times and get an average of elapsed time.

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-KYR4pBMB2USJuQvH0bl6JjJOZpbUibkWsRSbnTN-uAVwfUhxzKG53ke-7K6PM8pykEHn64Ss3sYuwwKK29-s-pU4emt4yerEdW-XHhY5HlQZoMdD25G2OQuRA5-y042IW2PBb6e7R2w/s1600/p.png

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqEjIowwfj7JoEJpwdYzgrlY-LyKRwEulMqz7dbOcNnB2E0BSVL5tdEfeGgUMsg3qzp25X06-j39AztQ14VMN5Vovrqv4Pa6pneGcFDzATkA-NRl-2KKAZSGb1OvHtBeg3NDUs-L_MJG0/s1600/cost.png

#Proc  2.6.31   (Cost)  2.6.31-bfs   (Cost)  2.6.32-rc5   (Cost)     Impr%
  1  0.156739( 0.156739)  0.151051( 0.151051)  0.156921( 0.156921)   3.63%
  2  0.220114( 0.110057)  0.152029( 0.076014)  0.155770( 0.077885)  30.93%
  3  0.311475( 0.103825)  0.230545( 0.076848)  0.238812( 0.079604)  25.98%
  4  0.318061( 0.079515)  0.316369( 0.079092)  0.307760( 0.076940)   0.53%
  5  0.451945( 0.090389)  0.379189( 0.075838)  0.390504( 0.078101)  16.10%
  6  0.493751( 0.082292)  0.462939( 0.077157)  0.466906( 0.077818)   6.24%
  7  0.617738( 0.088248)  0.534748( 0.076393)  0.542779( 0.077540)  13.43%
  8  0.639118( 0.079890)  0.614359( 0.076795)  0.619499( 0.077437)   3.87%
  9  0.759511( 0.084390)  0.685477( 0.076164)  0.698372( 0.077597)   9.75%
 10  0.782722( 0.078272)  0.766383( 0.076638)  0.775433( 0.077543)   2.09%
 11  0.883538( 0.080322)  0.834171( 0.075834)  0.852352( 0.077487)   5.59%
 12  0.942532( 0.078544)  0.913356( 0.076113)  0.930670( 0.077556)   3.10%
 13  1.056529( 0.081271)  0.998580( 0.076814)  1.014098( 0.078008)   5.48%
 14  1.096089( 0.078292)  1.062566( 0.075898)  1.092146( 0.078010)   3.06%
 15  1.207977( 0.080532)  1.136702( 0.075780)  1.172611( 0.078174)   5.90%
 16  1.276357( 0.079772)  1.215091( 0.075943)  1.253329( 0.078333)   4.80%
 17  1.360708( 0.080042)  1.289036( 0.075826)  1.323230( 0.077837)   5.27%
 18  1.415339( 0.078630)  1.372739( 0.076263)  1.408926( 0.078274)   3.01%
 19  1.507686( 0.079352)  1.442193( 0.075905)  1.486329( 0.078228)   4.34%
 20  1.617503( 0.080875)  1.510492( 0.075525)  1.562270( 0.078114)   6.62%
 21  1.652496( 0.078690)  1.592773( 0.075846)  1.644975( 0.078332)   3.61%
 22  1.737832( 0.078992)  1.680954( 0.076407)  1.735154( 0.078871)   3.27%
 23  1.797144( 0.078137)  1.738987( 0.075608)  1.833433( 0.079714)   3.24%
 24  1.873404( 0.078058)  1.814224( 0.075593)  1.880322( 0.078347)   3.16%
 25  1.959188( 0.078368)  1.886172( 0.075447)  1.958433( 0.078337)   3.73%
 26  2.044073( 0.078618)  1.967344( 0.075667)  2.041252( 0.078510)   3.75%
 27  2.094033( 0.077557)  2.032489( 0.075277)  2.110076( 0.078151)   2.94%
 28  2.186407( 0.078086)  2.112829( 0.075458)  2.197447( 0.078480)   3.37%
 29  2.256274( 0.077803)  2.187584( 0.075434)  2.267832( 0.078201)   3.04%
 30  2.341825( 0.078061)  2.263803( 0.075460)  2.344355( 0.078145)   3.33%
 31  2.413136( 0.077843)  2.338420( 0.075433)  2.446170( 0.078909)   3.10%
 32  2.500092( 0.078128)  2.429601( 0.075925)  2.522627( 0.078832)   2.82%

Note that Improvement% is 2.6.31-bfs improvement over 2.6.31, cost means the time to run 1000000 count per process,

2 Compilation Test

For a real use test, I think compilation is a good one to do. So I chose e2fsprog 1.41.9 to test, here are the result:

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFmn9F8SGqwa7_6W2M8uANNlPRPIlIESsVFcnRIeyIr7_tFhszKgUWiibLuz6iAe8xPl0EGlsm0VAXJrxhJPOYHceTAvI_5NgsG9wsBs-tU5cgHyTQFu1XKSUERbGfb6oX0Y90WSl7iGs/s1600/compilation.png

#Job 2.6.31  2.6.31-bfs 2.6.32-rc5  Impr%
  1 36.232000 35.235000 35.746000   2.75%
  2 23.511000 20.168000 20.327000  14.22%
  3 21.740000 20.885000 20.902000   3.93%
  4 21.700000 21.308000 21.397000   1.81%
  5 21.954000 21.433000 21.741000   2.37%

Note that Improvement% is 2.6.31-bfs improvement over 2.6.31.

3 Conclusion

In both tests, all got improved. With the number of concurrent running processes and the number of jobs are the same as the number of cores, the improvements have reached peaks.

One thing more interesting is, with CFS, the performance is better when you use 3 to 4 jobs, especially 4 jobs, it has best time. And the cost comes to minimum.

With BFS, the cost time reaches minimum just 2 jobs. It begins slightly reducing improvement after 2 jobs, the number of cores.

Also BFS has resulted smaller kernel build:

Size    Kernel
1991232 kernel-x86_64-2.6.31-gentoo-r4
1974720 kernel-x86_64-2.6.31-gentoo-r4-bfs

It seems that CFS reaches the best performance at 2*N processes and BFS reaches the best performance at N processes, where N is the number of cores. And BFS always got the better performance in these tests.

CFS doesn’t stay steady after 2*N processes, the curve finally becomes steady after 24 processes. But BFS is already steady at N processes.

BFS does look good.

(results plotting python codes: 1, 2, 3)

[1]	`http://ck.kolivas.org/patches/bfs/sched-BFS.txt` is gone.

[2]	`http://ck.kolivas.org/patches/bfs/old/2.6.31-sched-bfs-304.patch` is gone

2 comments:

AlyssaSeptember 11, 2011 at 6:01 AM
How would you know what scheduler is used by the kernel? Is it only determined by the kernel's name? No command to be typed in the command prompt?
livibetterSeptember 11, 2011 at 6:40 AM
What you see the "-bfs" is the local version what you can give when you configure your kernel, so, no, the name of kernel does mean anything except the version number.

To be honest, I didn't follow after I posted this, I thought it might get into trunk but it didn't seem so. The author still puts on patches upto 3.0.

So, if you need to ask the question, then it's clear you are not using custom kernel, therefore you don't patch it. There is no way any major distro would apply such patch since it's an important component. You might want to read little more on Wikipedia.

I am pretty sure your kernel still uses the CFS.

Note: Only a member of this blog may post a comment.