1. 03 Aug, 2017 2 commits
  2. 26 Jul, 2017 1 commit
  3. 11 Jul, 2017 2 commits
    • Krister Johansen's avatar
      Teach irqbalance about Intel CoD. · 7bc1244f
      Krister Johansen authored
      This originally surfaced as a bug in placing network interrupts.  In
      the case that this submitter observed, the NIC card was in NUMA domain
      0, but each RSS interrupt was getting an affinity list for all CPUs in
      the domain.  The expected behavior is for a single cpu to be chosen when
      attempting to fan out NIC interrupts.  Due to other implementation
      details of interrupt placement, this effectively caused all interrupt
      mappings for this NIC to end up on CPU 0.
      
      The bug turns out ot have been caused by Intel Cluster on Die breaking
      an assumption in irqbalance about the design of the component hierarchy.
      The CoD topology allows a CPU package to belong to more than one NUMA
      node, which is not expected.
      
      The RCA was that when the second NUMA node was wired up to the existing
      physical package, it overwrote the mappings that were placed there by
      the first.
      
      This patch attempts to solve that problem by permitting a package to
      have multiple NUMA nodes.  The CPU component hierarchy is preserved, in
      case other parts of the code depend upon walking it.  When a CoD
      topology is detected, the NUMA node -> CPU component mapping is moved
      down a level, so that the nodes point to the first level where the
      affinity becomes distinct.  In practice, this has been observed to be
      the LLC.
      
      A quick illustration (now, with COD, it looks like this):
      
                       +-----------+
                       | NUMA Node |
                       |     0     |
                       +-----------+
                             |
                             |        +-------+
                            \|/     / | CPU 0 |
                         +---------+  +-------+
                         | Cache 0 |
                         +---------+  +-------+
                         /          \ | CPU 1 |
            +-----------+             +-------+
            | Package 0 |
            +-----------+             +-------+
                        \           / | CPU 2 |
                         +---------+  +-------+
                         | Cache 1 |
                         +---------+
                             ^      \ +-------+
                             |        | CPU 3 |
                             |        +-------+
                       +-----------+
                       | NUMA Node |
                       |     1     |
                       +-----------+
      
      Whereas, previously only NUMA Node 1 would end up pointing to package 0.
      The topology should not be different on platforms that do not enable
      CoD.
      Signed-off-by: 's avatarKrister Johansen <kjlx@templeofstupid.com>
      7bc1244f
    • Krister Johansen's avatar
      Oneshot mode doesn't exit. · 9ea96c1f
      Krister Johansen authored
      Have the oneshot mode code call event loop exit routine, which causes
      irqbalance to correctly quit after one iteration.
      Signed-off-by: 's avatarKrister Johansen <kjlx@templeofstupid.com>
      9ea96c1f
  4. 15 Jan, 2017 1 commit
    • Timo Teräs's avatar
      Fix struct msghdr initialization · d00f237d
      Timo Teräs authored
      musl defines struct msghdr with padding fields to be strictly
      POSIX compliant. The current code gives following warnings:
      
      irqbalance.c: In function 'sock_handle':
      irqbalance.c:333:42: warning: initialization makes integer from pointer without a cast [-Wint-conversion]
        struct msghdr msg = { NULL, 0, &iov, 1, NULL, 0, 0 };
                                                ^~~~
      irqbalance.c:333:42: note: (near initialization for 'msg.__pad1')
      irqbalance.c:333:9: warning: missing initializer for field '__pad2' of 'struct msghdr' [-Wmissing-field-initializers]
        struct msghdr msg = { NULL, 0, &iov, 1, NULL, 0, 0 };
               ^~~~~~
      In file included from /usr/include/sys/socket.h:20:0,
                       from /usr/include/fortify/sys/socket.h:20,
                       from irqbalance.c:34:
      /usr/include/bits/socket.h:7:28: note: '__pad2' declared here
        socklen_t msg_controllen, __pad2;
                                  ^~~~~~
      
      Fix this by not relying on field ordering. Alternatively
      designated initializers could be used, but as they are not
      used elsewhere in the code, I used explicit assignments.
      Signed-off-by: 's avatarTimo Teräs <timo.teras@iki.fi>
      d00f237d
  5. 03 Jan, 2017 1 commit
    • Veronika Kabatova's avatar
      Add ability for socket communication · d1993bcd
      Veronika Kabatova authored
      This will be used with user interface and also can be used as API for
      users to create their own scripts on top of. The socket communication
      can be used for receiving data about IRQs-to-CPUs assignments and setup,
      as well as setting some options during runtime.
      
      Socket address: irqbalance<PID>.sock
      
      Data to send to socket:
      stats: get the assignment tree of CPUs and IRQs
      setup: get values of sleep interval, banned IRQs and banned CPUs
      settings sleep <int>: set new sleep interval value
      settings cpus <cpu_number1> <cpu_number2> ... : ban listed CPUs from
                                                      IRQ handling (old values
                                                      are forgotten, not added to)
      settings ban irqs <irq1> <irq2> ... : ban listed IRQs from balancing (old
                                            values are forgotten, not added to)
      Signed-off-by: 's avatarVeronika Kabatova <vkabatov@redhat.com>
      d1993bcd
  6. 26 Apr, 2016 1 commit
  7. 27 Jul, 2015 1 commit
  8. 29 Jun, 2015 1 commit
  9. 19 Mar, 2015 4 commits
  10. 12 Mar, 2015 1 commit
    • Rik van Riel's avatar
      parse isolcpus= from /proc/cmdline to set up banned_cpus · ca5a3f13
      Rik van Riel authored
      When the user specifies a range of CPUs to be isolated from system
      tasks with isolcpus= on the kernel command line, it would be nice
      if those CPUs could automatically be excluded from getting interrupts
      routed to them, as well.
      
      This patch does that, by looking at /proc/cmdline
      
      The environment variable IRQBALANCE_BANNED_CPUS will override the
      automatically detectable banned_cpus.
      Signed-off-by: 's avatarRik van Riel <riel@redhat.com>
      ca5a3f13
  11. 03 Mar, 2015 5 commits
  12. 10 Dec, 2014 2 commits
  13. 08 Sep, 2014 1 commit
  14. 20 May, 2014 1 commit
    • Neil Horman's avatar
      track hint policy on a per-irq basis · b6da319b
      Neil Horman authored
      Currently the hintpolicy for irqbalance is a global setting, applied equally to
      all irqs.  Thats undesireable however, as different devices may want to follow
      different policies.  Track the hint policy in each irq_info struct instead.
      This still just follows the global policy, but paves the way to allow overriding
      through the policyscript option
      Signed-off-by: 's avatarNeil Horman <nhorman@tuxdriver.com>
      b6da319b
  15. 16 May, 2014 1 commit
    • Neil Horman's avatar
      irqbalance: separate cmomand line banned irqs from listed banned irqs · 7f072d94
      Neil Horman authored
      irqbalance was using one list for tracking banned irqs, but the list was being
      used for disperate pruposes in different places.  It was tracking command line
      banned irqs and irqs that were banned via banscript and policyscript.  The
      former needs to be remembered accross db rebuilds, while the latter needs to be
      rebuilt every time.  This patch separates the two in to two lists, so that we
      don't stop banning command line specified irqs after the first db rebuild.
      Signed-off-by: 's avatarNeil Horman <nhorman@tuxdriver.com>
      7f072d94
  16. 27 Jan, 2014 2 commits
    • Neil Horman's avatar
      Set default deepestcache to 2 · d9a2cf22
      Neil Horman authored
      Been meaning to do this for a while.  By default having a deepestcache value of
      ULONG_MAX causes irqbalance to always find the deepest cache level in a systems
      which causes it on some systems to think that all cpus share a cache, and as
      such, that no balancing is needed.  Rectify this such that the default deepest
      cache level is defaulted to 2
      Signed-off-by: 's avatarNeil Horman <nhorman@tuxdriver.com>
      d9a2cf22
    • Neil Horman's avatar
      Adjust default hintpolicy value · d9138c78
      Neil Horman authored
      affinity_hint values are something of a holdover from prior to when irqbalance
      was re-written.  Previous to the rewrite irqbalance could not determine with
      great accuracy the device to which an msi(x) irq was associatied, the node it
      was local to, etc, and so was not able to balance it well.  kernel affinity_hint
      values were created to work around that by just telling user space where to put
      an interrupt.  However, since the rewrite, irqbalance is perfectly capable of
      parsing all information about an irq out of sysfs, and can make superior policy
      decisions about balancing according to user input, over kernel suggestions.  As
      such, allow users to honor affinity_hint, but by default ignore it
      Signed-off-by: 's avatarNeil Horman <nhorman@tuxdriver.com>
      d9138c78
  17. 26 Sep, 2013 3 commits
  18. 06 Sep, 2013 1 commit
    • Shawn Bohrer's avatar
      Fix infinite rescan loop when new non-PCI IRQs appear · 3adba266
      Shawn Bohrer authored
      As reported in bug:
      http://code.google.com/p/irqbalance/issues/detail?id=58
      
      If a new non-PCI based IRQ arrives between scans (one not found in
      sysfs) then we would add the new IRQ to a new_irqs_list and set
      need_rescan=1.  The rescan would clear the interrupts_db, add all sysfs
      based IRQs, and finally add _just_ the newly found IRQ to the
      interrupts_db, but not any of the original non-PCI IRQs.  This means on
      the next parse_proc_interrupts() scan we would find the original non-PCI
      IRQs, add them to the new_irqs_list, set need_rescan=1 and the cycle
      would repeat between the newly found IRQs and the originals.
      
      The new_irq_list concept was added in fd24d8f3 "Improve rescan
      ability for newly allocated interrupts" with what appears to be the
      intention of ensuring we balance any newly found IRQs.  This change
      reverts most of that commit but attempts to keep the intention intact by
      calling force_rebalance_irq() for each non-PCI based IRQ found.  It also
      still sets need_rescan=1 if this is not the first pass through to
      hopefully catch and correctly classify any sysfs based IRQs.
      Signed-off-by: 's avatarShawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: 's avatarNeil Horman <nhorman@tuxdriver.com>
      3adba266
  19. 15 Aug, 2013 1 commit
  20. 05 Aug, 2013 1 commit
    • Junchang Wang's avatar
      Correctly get cache info to build a right CPU tree · d0517e91
      Junchang Wang authored
      Most recent CPUs consist of L3 caches. irqbalance, however, only goes deep
      into L2 in building up the CPU tree. This incidence makes irqbalance run
      unexpectedly on all of the machines I can access (including i7-2600 and
      E7-8850 CPUs from Intel and 6164HE CPU from AMD).
      
      We fix this by (1) allowing irqbalance to search the available deepest cache,
      and (2) adding a command line option, deepestcache, and preventing irqbalance
      from partitioning cache domains deeper than it. The default value of
      'deepestcache' is INT_MAX.
      Signed-off-by: 's avatarJunchang Wang <junchang.wang@gmail.com>
      Signed-off-by: 's avatarNeil Horman <nhorman@tuxdriver.com>
      d0517e91
  21. 29 Jul, 2013 1 commit
  22. 13 May, 2013 1 commit
  23. 18 Feb, 2013 1 commit
    • Shawn Bohrer's avatar
      Compute load in nanoseconds · 6e217da6
      Shawn Bohrer authored
      When computing the load_slice per irq we take the topology object load
      divided by the interrupt count for the object.  Both of these values are
      integervalues which means if the interrupt count is larger than the load
      we get a load_slice of 0.  It seems likely that on modern processors
      interrupt durations will be at least multiple nanoseconds long so if we
      compute load in nanoseconds it should be >= the interrupt count.
      
      The load is recomputed every SLEEP_INTERVAL which is currently 10s which
      makes the maximum possible load 10e9 which easily fits in a uint64_t.
      
      Note: corrected error checking on sysconf usage
      Signed-off-by: 's avatarShawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: 's avatarNeil Horman <nhorman@tuxdriver.com>
      6e217da6
  24. 29 Jan, 2013 2 commits
  25. 12 Nov, 2012 2 commits