OpenAI unveils MRC — open networking protocol for 100K+ GPU clusters
OpenAI, AMD, Broadcom, Intel, Microsoft, and NVIDIA released MRC (Multipath Reliable Connection), an open protocol that spreads packets across hundreds of paths to cut network failures from milliseconds to microseconds.
• Scales to 100,000+ GPUs using only two-tier Ethernet switches
• Recovers from failures in microseconds instead of milliseconds
• Eliminates need for custom, closed-source networking hardware