NVIDIA Mellanox Unified Fabric Manager (UFM)

AI-powered Cyber Intelligence and Analytics Platforms

Overview

The NVIDIA® Mellanox® UFM® platforms revolutionize data center networking management, by combining enhanced, real-time network telemetry with AI-powered cyber Intelligence and analytics to support scale-out InfiniBand data centers.

UFM platforms empower research and industrial data center operators to efficiently provision, monitor, manage and preventatively troubleshoot and maintain the modern data center fabric, to realize higher utilization of fabric resources and a competitive advantage, while reducing OPEX. From workload optimizations and configuration checks, to improving fabric performance through AI-based detection of network anomalies and predictive maintenance, UFM platforms comprise multiple solution levels and a comprehensive feature set to meet the broadest range of modern scale-out data center requirements.

VIEW PRODUCT BRIEF
UFM Platforms Optimize Supercomputing OPEX

Key UFM Platform Highlights

UFM Telemetry

NVIDIA MELLANOX UFM TELEMETRY

REAL-TIME MONITORING

Builds a rich database of real-time network telemetry, workloads, system configuration and more.
Platform Options: Software, Docker container or UFM Telemetry appliance
UFM Enterprise

NVIDIA MELLANOX UFM ENTERPRISE

FABRIC VISIBILITY & CONTROL

Combines the benefits of UFM Telemetry with enhanced network monitoring & management.
Platform Options: Software, Docker container or UFM Enterprise appliance
UFM Cyber AI

NVIDIA MELLANOX UFM CYBER-AI

CYBER INTELLIGENCE & ANALYTICS

Enhances the benefits of UFM Telemetry and UFM Enterprise, providing scale-out of preventive maintenance for lowering supercomputing OPEX.
Platform: Requires dedicated UFM Cyber-AI appliance on-premise

UFM TELEMETRY

REAL-TIME MONITORING

UFM Telemetry

The UFM Telemetry platform provides network validation tools to monitor network performance and conditions, capturing and streaming rich real-time network telemetry information, application workload usage, system configuration and more, to an on-premise or cloud-based database for further analysis.

Key Features:

  • Switches, Adapters, Cables telemetry
  • System validation
  • Network performance tests
  • Streaming of telemetry information into on-premise or cloud-based database

UFM ENTERPRISE

FABRIC VISIBILITY & CONTROL

UFM ENTERPRISE

The mid-tier UFM Enterprise platform combines all the benefits of UFM Telemetry with enhanced network monitoring & management capabilities, workload optimizations and periodic configuration checks. It also performs automated network discovery and provisioning, traffic monitoring, and congestion discovery. UFM Enterprise enables job scheduler provisioning and integration with leading job schedulers, cloud and cluster managers, including Slurm and Platform LSF. UFM also enables network provisioning and integration with OpenStack, Azure Cloud and VMware.

Key Features:

  • UFM Telemetry inside
  • Automated network discovery and validation
  • Secure cable management
  • Congestion tracking identifying traffic bottlenecks
  • Problem identification and resolution
  • Global software updates
  • Job scheduler provisioning, integrated with Slurm and Platform LSF
  • Advanced reporting and comprehensive REST APIs
  • Rich web-based GUI
UFM Cyber AI

NVIDIA MELLANOX UFM CYBER-AI

CYBER INTELLIGENCE & ANALYTICS

The UFM Cyber-AI appliance enhances the benefits of UFM Telemetry and UFM Enterprise, providing scale-out of preventive maintenance for lowering supercomputing OPEX.

Platform: Requires dedicated UFM Cyber-AI appliance on-premise

Key Features:

  • UFM Telemetry and UFM Enterprise inside
  • Detects performance degradations
  • Detects usage profile changes over time
  • Detect abnormal cluster behavior
  • Correlation between phenomena (that may seem non-related) powered by Artificial Intelligence
  • Alerts when preventive maintenance is needed
  • Continuous system data collection optimizes predictability

How UFM Cyber-AI Works

The unique advantages of the Cyber-AI platform are based on a process of capturing rich telemetry information over time and utilizing deep learning algorithms. Here’s how it works:

UFM Dashboard
  • The UFM learns the data center’s “heartbeat”, operation mode, conditions, usage, and workload network signatures, then builds an enhanced database of telemetry information and discovery of correlations between events.
  • The UFM translates and correlates changes of the heartbeat to indications of future performance degradations or abnormal usage of the data center’s computing resources.
  • Such changes and correlations between phenomena, trigger the performance of predictive analytics, and initiate alerts that indicate abnormal system and application behavior, as well as potential system failures.
  • System administrators can quickly detect and respond to such potential security threats and address upcoming failures in an efficient manner, saving OPEX and maintaining end-user SLAs.

Integration with Existing Data Center Management Tools

UFM provides an open and extensible object model to describe data center infrastructure and conduct all relevant management actions. UFM’s API enables integration with leading job schedulers, cloud and cluster managers, including Slurm and Platform LSF. UFM also network provisioning and integration with OpenStack, Azure Cloud and VMware.

NVIDIA Mellanox Care – Monitoring & NOC Services

Regular performance analysis is essential to ensure that your Mellanox solution is aligned with your business objectives and the latest Mellanox technology. Our Monitoring and NOC Services constantly examine your solution for any potential faults before they occur, giving you a peace of mind by identifying and addressing issues before they become problems. The end result is increased ROI and lower system maintenance costs.

Monitoring and NOC Services
  • Remote NOC, network management, and monitoring services
  • Dedicated service engineer
  • Tier 1, 2, and 3 support
  • Ongoing fault and trouble management
  • Trouble reporting and management
  • Fault analysis and reporting
  • Performance monitoring – alarms and real-time alerts
  • Scalable, cost-effective service

See it in Action

Have a Question

Ready to Purchase

NVIDIA Mellanox Cookie Policy

This website uses cookies which may help to deliver content tailored to your preferences and interests, provide you with a better browsing experience, and to analyze our traffic. You may delete and/or block out cookies from this site, but it may affect how the site operates. Further information can be found in our Privacy Policy.