GPU Benchmarking

Within the PoA’s health check, the drill test incorporates sophisticated benchmarking techniques such as MLPerf to evaluate machine performance comprehensively. By conducting benchmarking assessments, including MLPerf, the algorithm quantifies the machine’s efficiency. This quantitative measure serves as a reliable indicator of the machine’s condition, ensuring robustness and reliability in its operational capabilities.

Work flow of Drill Test

Here are some sample results of the drill test on Nvidia A100:

MLPerf Results Summary:

Field	Value
SUT name	BERT SERVER
Scenario	Offline
Mode	PerformanceOnly
Samples per second	1532.17
Result	VALID
Min duration satisfied	Yes
Min queries satisfied	Yes
Early stopping satisfied	Yes

Additional Stats:

Metric	Value (ns)
Min latency	3,559,383,281
Max latency	1,292,280,950,807
Mean latency	788,846,755,872
50.00 percentile latency	840,201,049,914
90.00 percentile latency	1,234,598,190,171
95.00 percentile latency	1,268,998,116,410
97.00 percentile latency	1,280,065,956,777
99.00 percentile latency	1,289,280,826,440
99.90 percentile latency	1,292,043,266,934

Test Parameters Used:

Parameter	Value
samples_per_query	1,980,000
target_qps	3,000
target_latency (ns)	0
max_async_queries	1
min_duration (ms)	600,000
max_duration (ms)	0
min_query_count	1
max_query_count	0
qsl_rng_seed	13,281,865,557,512,327,830
sample_index_rng_seed	198,141,574,272,810,017
schedule_rng_seed	7,575,108,116,881,280,410
accuracy_log_rng_seed	0
accuracy_log_probability	0
accuracy_log_sampling_target	0
print_timestamps	0
performance_issue_unique	0
performance_issue_same	0
performance_issue_same_index	0
performance_sample_count	10,833