• Krister Johansen's avatar
    Teach irqbalance about Intel CoD. · 7bc1244f
    Krister Johansen authored
    This originally surfaced as a bug in placing network interrupts.  In
    the case that this submitter observed, the NIC card was in NUMA domain
    0, but each RSS interrupt was getting an affinity list for all CPUs in
    the domain.  The expected behavior is for a single cpu to be chosen when
    attempting to fan out NIC interrupts.  Due to other implementation
    details of interrupt placement, this effectively caused all interrupt
    mappings for this NIC to end up on CPU 0.
    The bug turns out ot have been caused by Intel Cluster on Die breaking
    an assumption in irqbalance about the design of the component hierarchy.
    The CoD topology allows a CPU package to belong to more than one NUMA
    node, which is not expected.
    The RCA was that when the second NUMA node was wired up to the existing
    physical package, it overwrote the mappings that were placed there by
    the first.
    This patch attempts to solve that problem by permitting a package to
    have multiple NUMA nodes.  The CPU component hierarchy is preserved, in
    case other parts of the code depend upon walking it.  When a CoD
    topology is detected, the NUMA node -> CPU component mapping is moved
    down a level, so that the nodes point to the first level where the
    affinity becomes distinct.  In practice, this has been observed to be
    the LLC.
    A quick illustration (now, with COD, it looks like this):
                     | NUMA Node |
                     |     0     |
                           |        +-------+
                          \|/     / | CPU 0 |
                       +---------+  +-------+
                       | Cache 0 |
                       +---------+  +-------+
                       /          \ | CPU 1 |
          +-----------+             +-------+
          | Package 0 |
          +-----------+             +-------+
                      \           / | CPU 2 |
                       +---------+  +-------+
                       | Cache 1 |
                           ^      \ +-------+
                           |        | CPU 3 |
                           |        +-------+
                     | NUMA Node |
                     |     1     |
    Whereas, previously only NUMA Node 1 would end up pointing to package 0.
    The topology should not be different on platforms that do not enable
    Signed-off-by: 's avatarKrister Johansen <kjlx@templeofstupid.com>