PCI Express Switch (PLX)
The heart of the PLX chip is how it manages the data between the CPU and the PCIe slots. It does this through multiplexing, or the art of dealing with multiple signals wanting to travel through one point. We already deal with multiplexing on some motherboards with respect to the power delivery.
So what does the PLX chip do on a motherboard? Our best reasoning is that it acts as a data multiplexer with a buffer that organizes a first in, first out (FIFO) data policy for the connected GPUs. Let us take the simplest case, where the PLX chip is powering two GPUs, both at 'x16'. The GPUs are both connected to 16 lanes each to the PLX chip.
The PLX chip, in hardware, allows the CPU and memory to access the physical addresses of both GPUs. Data is sent to the first GPU only at the bandwidth of 16 lanes. The PLX chip recognizes this, and diverts all the data to the first GPU. The CPU then sends data from memory to the second GPU, and the PLX changes all the lanes to work with the second GPU.
Now let us take the situation where data is needed to be sent to each GPU asynchronously (or at the same time). The CPU can only send this data to the PLX at the bandwidth of 16 lanes, perhaps either weighted to the master/first GPU, or divided equally (or proportionally how the PLX tells the CPU at the hardware level). The PLX chip will then divert the correct proportion of lanes to each GPU. If one GPU requires less bandwidth, then more lanes are diverted to the other GPU.
This ultimately means that in the two-card scenario, at peak throughput, we are still limited to x8/x8. However, in the situation when only one GPU needs the data, it can assign all 16 lanes to that GPU. If the data is traveling upstream from the GPU to the CPU, the PLX can fill its buffer at full x16 speed from each GPU, and at the same time send as much of the data up to the CPU in a continuous stream at x16, rather than switching between the GPUs which could add latency.
This is advantageous - without a PLX chip, the GPUs have a fixed lane count that is modified only by a simple switch when other cards are added. This means in a normal x8/x8 setup that if data is needed by one GPU, the bandwidth is limited to those eight lanes at maximum.
With all this data transference (and that should data be going the other way to memory then the PLX chip will have to have a buffer in order to prevent data loss) the PEX introduces a latency to the process. This is a combination of the extra routing and the action of the PEX to adjust 'on-the-fly' as required. According to the PLX documentation, this is in the region of 100 nanoseconds and is combined with large packet memory.