You are troubleshooting connectivity issues in your InfiniBand network and need to test basic connectivity between nodes. Which command should you use to test basic connectivity between InfiniBand nodes?
You're troubleshooting a Spectrum-X network and notice that the System Status LED on a switch is blinking for more than 5 minutes. What is the most likely cause of this issue?
In which mode of the BlueField DPU does the ARM system on the DPU control the NIC data path, but allow access to the DPU OS from the host?
You are tasked with configuring multi-tenancy using partition key (PKey) for a high-performance storage fabric running on InfiniBand. Each tenant’s GPU server is allowed to access the shared storage system but cannot communicate with another tenant’s GPU server.
Which of the following partition key membership configurations would you implement to set up multi-tenancy in this environment?
Which of the following statements are true about AI workloads and adaptive routing?
Pick the 2 correct responses below.
You are concerned about potential security threats and unexpected downtime in your InfiniBand data center.
Which UFM platform uses analytics to detect security threats, operational issues, and predict network failures in InfiniBand data centers?
You are deploying a Kubernetes cluster for AI workloads using NVIDIA Spectrum-X switches. You need to automate the deployment and management of networking components in this environment.
Which NVIDIA tool is specifically designed to automate the deployment and management of networking components in a Kubernetes cluster with Spectrum-X switches?
You are automating the deployment of a Spectrum-X network using Ansible. You need to ensure that the playbooks can handle different switch models and configurations efficiently.
Which feature of the NVIDIA NVUE Collection helps simplify the automation by providing pre-built roles for common network configurations?
A leading AI research center is upgrading its infrastructure to support large language model projects. The team is debating whether to implement a dedicated storage fabric for their AI workloads.
Which of the following best explains why a dedicated storage fabric is crucial for this AI network architecture?
Pick the 2 correct responses below
In an AI cluster using NVIDIA GPUs, which configuration parameter in the NicClusterPolicy custom resource is crucial for enabling high-speed GPU-to-GPU communication across nodes?
You are investigating a performance issue in a Spectrum-X network and suspect there might be congestion problems.
Which component executes the congestion control algorithm in a Spectrum-X environment?
A major cloud provider is designing a new data center to support large-scale AI workloads, particularly for training large language models. They want to optimize their network architecture for maximum performance and efficiency.
Why is a rail-optimized topology considered a best practice for AI network architecture in this scenario?
Which of the following tools in Cumulus Linux is specifically useful for detecting and differentiating microbursts from regular network congestion?
Pick the 2 correct responses below
You are designing a new AI data center for a research institution that requires high-performance computing for large-scale deep learning models. The institution wants to leverage NVIDIA's reference architectures for optimal performance.
Which NVIDIA reference architecture would be most suitable for this high-performance AI research environment?