NVIDIA Hopper H200 GPU Continues To Dominate In Latest MLPerf 4.0 Results: Up To 3x Gain In GenAI With TensorRT-LLM

27.03.2024 - 15:33 / wccftech.com / Hassan Mujtaba

NVIDIA continues to push the AI envelope with its strong TensorRT-LLM suite, boosting the H200 GPUs to new heights in the latest MLPerf v4.0 results.

Blackwell Is Here But NVIDIA Continues Pushing Hopper H100 & H200 AI GPUs With New TensorRT-LLM Optimizations For Up To 3x Gain In MLPerf v4.0

Generative AI or GenAI is an emerging market and all hardware manufacturers are trying to grab their slice of the cake. But despite their best efforts, it's NVIDIA that has so far taken the bulk of the share and there's no stopping the green giant as it has showcased some utterly strong benchmarks and records within the MLPerf v4.0 inference results.

Related Story Qualcomm, Intel, & Google Join Hands To Come For NVIDIA, Plans On Dethroning CUDA Through oneAPI

Fine-tuning on TensorRT-LLM has been ongoing ever since the AI Software suite was released last year. We saw a major increase in performance with the previous MLPerf v3.1 results & now with MLPerf v4.0, NVIDIA is supercharging Hopper's performance. Why inference matters is because it accounts for 40% of the data center revenue (generated last year). Inference workloads range from LLMs (Large Language Models), Visual Content, and Recommenders. As these models increase in size, there comes more complexity and the need to have both strong hardware and software.

That's why TensorRT-LLM is there as a state-of-the-art inference compiler that is co-designed with the NVIDIA GPU architectures. Some features of TensorRT-LLMs include:

In-Flight Sequence Batching (Optimizes GPU Utilization)
KV Cache Management (Higher GPU Memory Utilization)
Generalized Attention (XQA Kernel)
Multi-GPU Multi-Node (Tensor & Pipeline Parallel)
FP8 Quantization (Higher Perf & Fit Larger Models)

Using the latest TensorRT-LLM optimizations, NVIDIA has managed to squeeze in an additional 2.9x performance for its Hopper GPUs (such as the H100) in MLPerf v4.0 versus MLPerf v3.1. In today's benchmark results, NVIDIA has set new performance records in MLPerf Llama 2 (70 Billion) with up to 31,712 tokens generated per second on the H200 (Preview) and 21,806 tokens generated per second on the H100.

It should be mentioned that the H200 GPU was benchmarked about a month ago which is why its mentioned in the preview state but NVIDIA has stated that they are now shipping these GPUs to customers.

The NVIDIA H200 GPU manages to offer an additional 45% performance gain in Llama 2 versus the H100 GPUs thanks to its higher memory configuration of 141 GB HBM3E and faster bandwidth of up to 4.8 TB/s. Meanwhile, the H200 is a behemoth against Intel's Gaudi 2, the only other competitor solution submitted within the MLPerf v4.0 benchmarks while the H100 also sits in at a massive 2.7x gain.

In addition to these,

Tags: record performer NVIDIA UPS platform Markets Software

See full article on wccftech.com

The website gametalkz.com is an aggregator of news from open sources. The source is indicated at the beginning and at the end of the announcement. You can send a complaint on the news if you find it unreliable.

Top Authors

Phil Spencer

Derby County

Andrew Wilson

Stardew Valley

Playstation Plus

Tom Warren

Todd Howard

James Gunn

Disney Plus

Harley Quinn

Bandai Namco

Ryan Dinsdale

Mat Piscatella

Cal Kestis

Jason Schreier

Henry Cavill

Will Shen

Miles Morales

Microcenter Offering Just $700 US “Trade-In” Value For $2000 US GeForce RTX 4090 GPU

Microcenter's Trade-In system seems to be lowballing users by offering lower value for their GPUs such as the GeForce RTX 4090 with one of our readers receiving less than half the value of the said graphics card.

wccftech.com

11.04.2024 / 21:19

AMD Strix Point APUs With RDNA 3+ iGPU Should Match RX 6400 With 12 CUs, RTX 3050 With 16 CUs

AMD's Strix Point APUs featuring the RDNA 3+ iGPU should offer comparable performance to entry-level discrete GPUs as indicated in new performance rumors.

wccftech.com

09.04.2024 / 17:20

Intel Gaudi 3 AI Accelerator Official: 5nm, 128 GB HBM2e, Up To 900W, 50% Faster Than NVIDIA H100 & 40% More Efficient

Intel has finally revealed its next-gen AI Accelerator, the Gaudi 3, based on a 5nm process node and competing directly against NVIDIA's H100 GPUs.

wccftech.com

08.04.2024 / 14:41

NVIDIA GeForce RTX 5090 & RTX 5080 “Blackwell” GPUs Rumored To Launch In Q4 2024

NVIDIA's next-gen GeForce RTX 5090 & RTX 5080 "Blackwell" GPUs are rumored to launch in the fourth quarter of 2024.

wccftech.com

08.04.2024 / 11:53

NVIDIA Partners Raise Price Across Various GeForce RTX 40 & RTX 30 GPUs, Up To 10% Hikes

NVIDIA's board partners are reportedly increasing the prices of various GeForce RTX 40 & RTX 30 GPUs in China which is a stark contrast to what's happening in the US markets.

wccftech.com

05.04.2024 / 06:17

AMD Navi 48 “RDNA 4” GPU Confirmed In ROCm Patches, Coming To Radeon RX 8000 Gaming Cards This Year

AMD's RDNA 4-based Navi 48 GPU is more or less confirmed now in the latest ROCm commits & will be powering next-gen Radeon RX 8000 series graphics cards.

wccftech.com

01.04.2024 / 13:01

AMD RDNA 4 “Radeon RX 8000” GPU Rumors: Navi 48 Around Navi 31 Performance, Navi 44 Between Navi 33 & 32

New rumors surrounding AMD's next-gen RDNA 4 "Radeon RX 8000" GPUs and their performance positioning have been posted by @All_The_Watts.

wccftech.com

28.03.2024 / 20:57

NVIDIA GeForce RTX 4080 Crashes During Million Dollar CS2 Tournament Despite Being Selected As The GPU Of Choice

A few weeks back, PGL announced its hardware of choice for its upcoming CS2 Major Tournament which included systems with AMD Ryzen 7 7800X3D CPUs and NVIDIA GeForce RTX 4080 GPUs but it looks like things didn't go as planned as a driver crash associated with the GPU became the very reason of one team's chances of going into the playoffs being washed away.

wccftech.com

27.03.2024 / 15:35

Intel Gaudi 2 Accelerators Showcase Competitive Performance Per Dollar Against NVIDIA H100 In MLPerf 4.0 GenAI Benchmarks

Intel has just released its latest MLPerf v4.0 performance figures covering the Gaudi 2 Accelerators & 5th Gen Xeon "Emerald Rapids" CPUs, with the former showcasing strong performance per dollar values against NVIDIA's H100 GPU.

wccftech.com

27.03.2024 / 10:23

Intel Lunar Lake-MX CPU Pictured On Reference Platform: 8 CPU Cores, 8 Arc GPU Cores, On-Package LPDDR5x Memory

The newest pictures of Intel's next-gen Lunar Lake-MX CPU powering thin & light platforms have been leaked by Igor's Lab.

wccftech.com

26.03.2024 / 15:55

Snapdragon X Elite CPU With 4.6 TFLOPs Adreno GPU Runs Baldur’s Gate 3 At Around 30 FPS At 1080p

The much-awaited Snapdragon X Elite CPU platform has been demoed running Baldur's Gate 3 at 30 FPS as a demonstration of its gaming GPU performance.

wccftech.com

25.03.2024 / 13:43

Potential Intel Battlemage “Xe2-HPG” 20 & 24 Core GPU Spotted With 12 GB VRAM

Intel's next-generation Battlemage "Xe2-HPG" GPUs have potentially been spotted within the SiSoftware Sandra database.

About Us

SHOW MOREHIDE

GameTalkz - ultimate gaming hub that provides in-depth gaming reviews, expertly crafted walkthroughs, and the latest updates from the gaming industry. Immerse yourself in a lively gaming community, engage in exclusive interviews with industry experts, and embark on exhilarating multiplayer adventures. GameTalkz stands as the preferred destination for gaming enthusiasts, igniting your passion and delivering an enthralling gaming journey.. The biggest video game news, rumors, previews, and other info about the PC, PS4, Xbox, Switch, & mobile titles you play. Stay tuned & well informed 24/7 with us!

Owner: SNOWLAND s.r.o.
Registration certificate 06691200
Address:
Snowland s.r.o.
16200, Na okraji 381/41, Veleslavín, 162 00 Praha 6
Czech Republic

Info