Playing with Solaris processor sets Comments Off on Playing with Solaris processor sets
The idea behind processor sets has been around for a decade or so in the HPC arena. You’ve got certain jobs, that require a certain amount of CPU resources, or a certain IO profile, so you want to dedicate some CPUs just to them. Solaris has had processor controls in since the dark days of 2.6.
*Note:* I’m going to be freely talking about CPUs as the processing unit. This is all on T2ks and so I know that they’re not *real* CPUs – call them thread processing units or something, but for simplicity this document will just call them CPUs and be done with it.
The actual management of processor sets is very straightforward, and I’ll be playing about with them on one of my favourite bits of kit – the Sun T2000.
First of all we use the psrinfo command to view the status of our processors:
bash-3.00# psrinfo 0 on-line since 11/21/2006 20:24:57 1 on-line since 11/21/2006 20:24:58 2 on-line since 11/21/2006 20:24:58 3 on-line since 11/21/2006 20:24:58 4 on-line since 11/21/2006 20:24:58 5 on-line since 11/21/2006 20:24:58 6 on-line since 11/21/2006 20:24:58 7 on-line since 11/21/2006 20:24:58 8 on-line since 11/21/2006 20:24:58 9 on-line since 11/21/2006 20:24:58 10 on-line since 11/21/2006 20:24:58 11 on-line since 11/21/2006 20:24:58 12 on-line since 11/21/2006 20:24:58 13 on-line since 11/21/2006 20:24:58 14 on-line since 11/21/2006 20:24:58 15 on-line since 11/21/2006 20:24:58 16 on-line since 11/21/2006 20:24:58 17 on-line since 11/21/2006 20:24:58 18 on-line since 11/21/2006 20:24:58 19 on-line since 11/21/2006 20:24:58 20 on-line since 11/21/2006 20:24:58 21 on-line since 11/21/2006 20:24:58 22 on-line since 11/21/2006 20:24:58 23 on-line since 11/21/2006 20:24:58 24 on-line since 11/21/2006 20:24:58 25 on-line since 11/21/2006 20:24:58 26 on-line since 11/21/2006 20:24:58 27 on-line since 11/21/2006 20:24:58 28 on-line since 11/21/2006 20:24:58 29 on-line since 11/21/2006 20:24:58 30 on-line since 11/21/2006 20:24:58 31 on-line since 11/21/2006 20:24:58
Let’s do a quick network performance test with iperf to see what sort of throughput we can get when all processing units are able to process network IO:
bash-3.00# ./iperf --client np1unx0006 --time 60 --dualtest ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 48.0 KByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to np1unx0006, TCP port 5001 TCP window size: 48.0 KByte (default) ------------------------------------------------------------ [ 5] local 192.168.105.62 port 37438 connected with 192.168.105.59 port 5001 [ 4] local 192.168.105.62 port 5001 connected with 192.168.105.59 port 63459 [ 5] 0.0-60.0 sec 3.77 GBytes 540 Mbits/sec [ 4] 0.0-60.0 sec 3.62 GBytes 518 Mbits/sec
At the same time, let’s have a look with mpstat to get an idea of what the processors are dealing with while this is going on.
The important colums here are intr, showing the amount of interrupts each CPU is handling. We also need to keep an eye on the number of system calls each CPU is fielding (syscl) and also the context switches and involuntary context switches (csw and icsw respectively) to make sure jobs are completely before the scheduler kicks them off the CPU.
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 96 0 248 3481 0 7126 4 34 439 0 6041 2 25 0 74 1 122 0 171 1332 0 2796 2 24 340 0 2369 2 14 0 85 2 79 0 216 646 0 1472 0 18 226 0 202 0 5 0 95 3 30 0 143 356 0 829 0 16 137 0 23 0 2 0 98 4 47 0 260 618 0 1514 0 18 163 0 74 0 3 0 97 5 56 0 257 714 0 1662 1 19 234 0 311 1 6 0 94 6 67 0 466 1085 0 2593 1 19 588 0 1234 0 17 0 82 7 26 0 268 894 0 2031 0 18 202 0 136 0 4 0 96 8 241 0 341 993 0 2286 0 22 258 0 358 1 7 0 91 9 190 0 292 1431 0 3102 1 21 257 0 1551 1 9 0 90 10 114 0 336 1155 0 2580 0 18 286 0 429 0 6 0 94 11 28 0 283 837 0 1883 1 18 551 0 1283 1 15 0 84 12 0 0 1 2 0 3 0 1 0 0 0 0 0 0 100 13 0 0 1 3 0 4 0 1 1 0 1 0 0 0 100 14 3 0 2 5 0 9 0 1 4 0 3 0 0 0 100 15 0 0 0 9 0 0 8 0 4 0 534955 75 25 0 0 16 64 0 423 1299 110 2418 0 18 286 0 59 0 5 0 95 17 89 0 454 1473 0 3233 0 19 319 0 793 1 7 0 92 18 46 0 397 960 1 2217 0 18 290 0 39 0 4 0 96 19 79 0 321 1048 2 2340 2 19 494 0 2073 2 15 0 83 20 79 0 205 852 1 1773 1 21 313 0 1493 1 14 0 85 21 27 0 19965 41259 41036 635 15 28 2862 0 415 0 47 0 53 22 65 0 129 1069 0 2274 1 21 139 0 1053 1 7 0 92 23 62 0 134 681 0 1446 1 20 370 0 931 1 14 0 85 24 115 0 260 799 0 1986 0 22 212 0 313 0 4 0 95 25 113 0 273 962 1 2225 1 22 266 0 684 1 7 0 93 26 73 0 312 1241 0 2862 0 23 271 0 663 0 6 0 94 27 115 0 270 862 0 2017 0 22 201 0 209 1 5 0 95 28 179 0 225 689 0 1548 0 17 213 0 302 1 5 0 94 29 42 0 224 656 0 1507 0 15 163 0 134 0 3 0 97 30 40 0 298 774 0 1821 1 14 459 0 1316 1 17 0 83 31 27 0 227 649 0 1544 1 15 644 0 1418 1 18 0 82
From this we can see we’re getting fairly decent throughput over GigE, and that the interrupts are spread across all the CPUs.
Now let’s create a processor set, and stick half our CPUs in it.
The command is psrset with the -c option to create a set. As this is the first processor set it will be processor set 1 – the next would be 2, etc. etc.
Remember we can get the number of our CPUs from the psrinfo command.
bash-3.00# psrset -c 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 created processor set 1 processor 0: was not assigned, now 1 processor 1: was not assigned, now 1 processor 2: was not assigned, now 1 processor 3: was not assigned, now 1 processor 4: was not assigned, now 1 processor 5: was not assigned, now 1 processor 6: was not assigned, now 1 processor 7: was not assigned, now 1 processor 8: was not assigned, now 1 processor 9: was not assigned, now 1 processor 10: was not assigned, now 1 processor 11: was not assigned, now 1 processor 12: was not assigned, now 1 processor 13: was not assigned, now 1 processor 14: was not assigned, now 1 processor 15: was not assigned, now 1
Now that we’ve assigned half our CPUs to processor set 1, we want to disable interrupt handling for them. We could use the psradm command to do it on a per CPU basis, but it’s much easier to just apply the setting to the entire processor set.
bash-3.00# psrset -f 1
The -f option disables interrupt handling, and the 1 is the processor set we want to apply this to.
We can check the effect by calling psrinfo again:
bash-3.00# psrinfo 0 no-intr since 12/19/2006 18:15:15 1 no-intr since 12/19/2006 18:15:15 2 no-intr since 12/19/2006 18:15:15 3 no-intr since 12/19/2006 18:15:15 4 no-intr since 12/19/2006 18:15:15 5 no-intr since 12/19/2006 18:15:15 6 no-intr since 12/19/2006 18:15:15 7 no-intr since 12/19/2006 18:15:15 8 no-intr since 12/19/2006 18:15:15 9 no-intr since 12/19/2006 18:15:15 10 no-intr since 12/19/2006 18:15:15 11 no-intr since 12/19/2006 18:15:15 12 no-intr since 12/19/2006 18:15:15 13 no-intr since 12/19/2006 18:15:15 14 no-intr since 12/19/2006 18:15:15 15 no-intr since 12/19/2006 18:15:15 16 on-line since 11/21/2006 20:24:58 17 on-line since 11/21/2006 20:24:58 18 on-line since 11/21/2006 20:24:58 19 on-line since 11/21/2006 20:24:58 20 on-line since 11/21/2006 20:24:58 21 on-line since 11/21/2006 20:24:58 22 on-line since 11/21/2006 20:24:58 23 on-line since 11/21/2006 20:24:58 24 on-line since 11/21/2006 20:24:58 25 on-line since 11/21/2006 20:24:58 26 on-line since 11/21/2006 20:24:58 27 on-line since 11/21/2006 20:24:58 28 on-line since 11/21/2006 20:24:58 29 on-line since 11/21/2006 20:24:58 30 on-line since 11/21/2006 20:24:58 31 on-line since 11/21/2006 20:24:58
Rock on! psrinfo clearly shows that half our CPUs will no longer handle interrupts. Let’s kick off another iperf throughput test and see what happens:
bash-3.00# ./iperf --client np1unx0006 --time 60 --dualtest ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 48.0 KByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to np1unx0006, TCP port 5001 TCP window size: 48.0 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.105.62 port 37419 connected with 192.168.105.59 port 5001 [ 5] local 192.168.105.62 port 5001 connected with 192.168.105.59 port 63457 [ 4] 0.0-60.0 sec 3.36 GBytes 481 Mbits/sec [ 5] 0.0-60.0 sec 3.05 GBytes 436 Mbits/sec
Looking at mpstat we can clearly see the effects:
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 5 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 7 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 8 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 12 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 13 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 14 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 15 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 16 336 0 276 1854 112 3372 12 49 844 0 135008 24 37 0 39 17 261 0 115 2236 1 4594 6 51 1050 0 27470 7 37 0 56 18 104 0 105 1699 3 3506 9 36 1071 0 103500 17 38 0 44 19 106 0 148 855 1 1800 2 28 286 0 10881 2 8 0 89 20 290 0 19102 42492 42200 838 28 42 2676 0 74629 17 60 0 23 21 256 0 801 1952 0 4397 5 39 1272 0 2196 2 29 0 68 22 209 0 475 1191 0 2663 2 38 552 0 776 1 12 0 87 23 260 0 500 1134 4 2540 2 38 597 0 13071 4 13 0 84 24 455 0 752 2213 1 5038 4 41 916 0 10316 4 20 0 77 25 500 0 803 2485 0 5499 4 45 1352 0 17171 5 31 0 64 26 654 0 683 1773 0 4009 5 45 933 0 2119 8 19 0 73 27 503 0 516 1812 0 3952 5 45 748 0 21682 6 16 0 79 28 552 0 860 2332 0 5093 7 40 1065 0 12217 16 21 0 63 29 480 0 688 2292 0 4996 4 47 924 0 1395 3 17 0 80 30 663 0 476 1553 0 3357 5 45 658 0 2680 9 16 0 75 31 485 0 445 1520 0 3297 4 47 716 0 1167 2 16 0 82
We can see the non-interrupt handling CPUs in processor set 1 are totally idle – they’re just sitting there, twiddling their thumbs, and laughing at the other 16 CPUs working their socks off.
Involuntary context switches aren’t causing us an issue, so we can see that even with the reduced number of CPUs handling the interrupts, they’re still managed to deal with the load.
Now let’s see what happens when we execute the single-thread iperf process inside processor set 1. We can control this by using the psrset command to launch our app.
bash-3.00# psrset -e 1 ./iperf --client np1unx0006 --time 60 --dualtest ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 48.0 KByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to np1unx0006, TCP port 5001 TCP window size: 48.0 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.105.62 port 37419 connected with 192.168.105.59 port 5001 [ 5] local 192.168.105.62 port 5001 connected with 192.168.105.59 port 63457 [ 4] 0.0-60.0 sec 3.36 GBytes 481 Mbits/sec [ 5] 0.0-60.0 sec 3.05 GBytes 436 Mbits/sec
And mpstat should give us an idea of what’s happening:
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 0 7751 0 16048 3 0 179 0 15235 3 42 0 55 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 4 0 0 0 201 0 403 6 0 2478 0 8254 3 81 0 16 5 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 7 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 8 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 9 0 0 1 3 0 4 0 0 2 0 0 0 0 0 100 10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 12 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 13 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 14 0 0 0 1 0 0 0 0 0 0 0 0 0 0 100 15 0 0 0 8 0 0 7 0 4 0 532794 75 25 0 0 16 313 0 942 1346 114 2533 1 20 459 0 419 1 13 0 86 17 306 0 450 687 1 1529 0 13 297 0 558 1 13 0 86 18 399 0 467 653 3 1442 1 14 255 0 3559 13 13 0 74 19 221 0 326 509 1 1164 0 11 196 0 401 1 7 0 92 20 646 0 10825 47201 47171 90 7 18 2405 0 1210 4 55 0 40 21 156 0 673 757 0 1769 0 15 346 0 201 1 9 0 91 22 220 0 959 1055 0 2467 0 15 488 0 397 2 12 0 86 23 204 0 844 791 2 1844 0 14 443 0 223 1 12 0 88 24 341 0 718 1033 1 2439 1 13 401 0 2571 10 14 0 76 25 205 0 570 804 0 1945 0 13 314 0 376 1 9 0 90 26 262 0 369 584 0 1379 0 12 204 0 422 1 7 0 92 27 199 0 348 519 0 1226 0 13 180 0 533 2 6 0 92 28 356 0 434 726 0 1730 0 17 247 0 515 2 9 0 89 29 393 0 267 428 0 1043 1 18 197 0 999 3 11 0 86 30 367 0 491 812 0 1829 0 17 298 0 449 1 10 0 89 31 302 0 424 675 0 1538 0 15 245 0 339 1 8 0 91
Well, that’s broken things. How come the processors in the set are now handling interrupts?
It looks like executing the binary inside the processor set still generates interrupts – but these are unlikely to be network I/O. Check out the number of syscalls being generated! It’s likely an artefact of my poor choice of application – iperf generates a huge amount of interrupts and can really cane your ethernet interfaces.
We could use dtrace to have a real poke around, but I think that should be the topic for another day.
Now we’ve finished playing around, we need to re-enable interrupt handling on those CPUs. As the -f flag to psrset disabled interrupt handling, -n is the option we need to re-enabled interrupt handling on a processor set.
bash-3.00# psrset -n 1
Now the CPUs are handling interrupts again, we need to delete the processor set. We do this by passing the psrset command the -d option, and giving it the processor set number:
bash-3.00# psrset -d 1 removed processor set 1
Finally let’s run psrinfo and double check the state of our CPUs:
bash-3.00# psrinfo 0 on-line since 12/19/2006 18:23:42 1 on-line since 12/19/2006 18:23:42 2 on-line since 12/19/2006 18:23:42 3 on-line since 12/19/2006 18:23:42 4 on-line since 12/19/2006 18:23:42 5 on-line since 12/19/2006 18:23:42 6 on-line since 12/19/2006 18:23:42 7 on-line since 12/19/2006 18:23:42 8 on-line since 12/19/2006 18:23:42 9 on-line since 12/19/2006 18:23:42 10 on-line since 12/19/2006 18:23:42 11 on-line since 12/19/2006 18:23:42 12 on-line since 12/19/2006 18:23:42 13 on-line since 12/19/2006 18:23:42 14 on-line since 12/19/2006 18:23:42 15 on-line since 12/19/2006 18:23:42 16 on-line since 11/21/2006 20:24:58 17 on-line since 11/21/2006 20:24:58 18 on-line since 11/21/2006 20:24:58 19 on-line since 11/21/2006 20:24:58 20 on-line since 11/21/2006 20:24:58 21 on-line since 11/21/2006 20:24:58 22 on-line since 11/21/2006 20:24:58 23 on-line since 11/21/2006 20:24:58 24 on-line since 11/21/2006 20:24:58 25 on-line since 11/21/2006 20:24:58 26 on-line since 11/21/2006 20:24:58 27 on-line since 11/21/2006 20:24:58 28 on-line since 11/21/2006 20:24:58 29 on-line since 11/21/2006 20:24:58 30 on-line since 11/21/2006 20:24:58 31 on-line since 11/21/2006 20:24:58
Solaris processor sets are the easiest to use of all the resource controls built into the OS. We can peg things like zones, individual applications, or even specific processes, to their own processor sets to control and manage resource usage. This gives us some really fine grained control over how the system is used, and with a machine like the T2000 it allows us to really scale performance.