浪潮 PCI-E 万兆网卡 PXE 引导失败问题

WHAT

Inspur SA5112M4 机型 PCIe 扩展的 40G XL710 网卡开机 PXE 引导失败,报错信息如下:

PXE-PC8: !PXE structure was not found in UNDI driver code segment.

img

WHY

Intel® Boot Agent Application Notes for BIOS Engineers (PDF) 文档对 PXE-PC8 故障码 的解释:

img

The Boot Agent could not locate the needed !PXE structure resource.When PXE tries to boot the system, it copies the data from PMM memory to conventional memory (below 640 KB). Then it tries to find the !PXE data structure that is embedded in the UNDI driver code segment. This error implies that the BIOS is either corrupting the PMM memory area after PXE saves itself in conventional memory, or the memory allocated by PMM is not writeable and therefore the PXE image doesn’t get saved correctly. This can also be caused by the memory BAR being set above the 4G boundary.

CN105843698 - 一种自动调节BIOS选项值的方法

为了提高服务器的性能,我们会使用一些高性能的 GPU、高性能网卡 等部件。但是当使用一些高性能部件时,如果他们的内存大于 4G 的话(如 NVIDIA K40 GPU),那么这些部件的 OPROM 将会映射到内存的 4G 以上的空间 即 High Memory 部分。那么 BIOS 中就需要打开 Above 4G Decoding 选项,以便让 PCIE 设备能够在 4G 以上的空间进行解码。但是 Above 4G Decoding 是默认 disabled,我们需要在 BIOS 代码执行前先将 Above 4G Decoding 选项打开,否则 BIOS 会反复进行 PCIE 的 retraining,出现一种反复 reset 的现象。或者说先进入 BIOS setup 页面寻找该选项修改并保存然后再将 GPU 等 PCIE 设备插上,这样比较费时费力,更不符合产线的操作规范。

https://wiki.archlinux.org/title/improving_performance#Enabling_PCI_Resizable_BAR

The PCI specification allows larger Base Address Registers to be used for exposing PCI devices memory to the PCI Controller. This can result in a performance increase for video cards. Having access to the the full vRAM improves performance, but also enables optimizations in the graphics driver. The combination of resizable BAR, above 4G encoding and these driver optimizations are what AMD calls AMD Smart Access Memory, currently available on AMD Series 500 chipset motherboards.

HOW

禁用 BIOS 【PCI Subsystem Settings】里面的【Above 4G Decoding】选项即可:

img

reference

Legacy 启动模式下万兆网卡 PXE 启动报错 2019-01-03

Intel® Boot Agent Application Notes for BIOS Engineers (PDF)