HPC Technical Lead

Clearance Level
None
Category
IT Infrastructure and Operations
Location
Remote, Working from the USA
Key Skills For Success

High Performance Computing (HPC)

lustre

Portable Batch System (PBS)

Python Software Development

Red Hat Enterprise Linux (RHEL)

REQ#: RQ216274
Public Trust: None
Requisition Type: Regular
Your Impact

Own your opportunity to work alongside federal civilian agencies. Make an impact by providing services that help the government ensure the well being and support of U.S. citizens.

Job Description

GDIT is looking for an HPC Technical Lead that will provide deep technical expertise and leadership for the WCOSS production environment, ensuring stability, performance, and scalability of NOAA’s operational HPC systems. This role focuses on technical execution, troubleshooting, and guiding the team in implementing best practices for HPC operations.

Key Responsibilities

  • Technical Leadership
    • Serve as the primary technical authority for all HPC operational matters.
    • Lead and drive root‑cause analysis and problem resolution for critical incidents across compute, storage, and interconnect components.
  • System Performance & Reliability
    • Monitor, analyze, and optimize performance across compute nodes, interconnects, and storage subsystems.
    • Conduct proactive health checks and performance tuning to ensure system readiness for 24×7 mission‑critical NOAA workloads.
  • Change & Configuration Management
    • Lead technical planning and readiness reviews for all system upgrades, patches, and enhancements.
    • Maintain configuration baselines and ensure compliance with security requirements (e.g., RMF/STIG).
  • Collaboration & Customer Technical Interface
    • Act as the primary technical point of contact for NWS/NOAA for detailed HPC discussions, system behavior, and operational issues.
    • Coordinate with NOAA scientific teams to understand modeling workload needs and optimize HPC resources accordingly.

Required Skills

  • 10+ years of hands‑on HPC systems administration and troubleshooting experience, including Cray, SGI, or comparable large‑scale systems.
  • Extensive experience supporting Federal HPC environments, demonstrating readiness for NOAA/NWS operational environments.
  • Deep Linux expertise, including SLES, RHEL, and CentOS across multiple HPC platforms.
  • Strong technical experience with HPC storage (e.g., Lustre), interconnects (e.g., InfiniBand), and performance tuning of large‑scale computing systems.
  • Proven leadership of HPC technical teams, including mentoring and directing system administrators and engineers supporting very large core.
  • Demonstrated success performing root cause analysis, escalated troubleshooting, and incident recovery in production HPC environments.
  • Experience implementing STIG/RMF security controls across HPC systems and applying DoW‑grade configuration compliance.
  • Excellent communication skills, capable of translating complex technical issues to customers and stakeholders.

Preferred Qualifications

  • Prior experience supporting NOAA/NWS operational HPC systems, especially in real‑time weather or climate modeling environments.
  • Experience designing or improving HPC system architectures, monitoring frameworks, or performance analysis pipelines.
  • Experience presenting technical findings to scientific, engineering, or federal customer groups.

Work Requirements
Years of Experience

10 + years of related experience

* may vary based on technical training, certification(s), or degree

Certification

CompTIA Security+ CE | CompTIA - CompTIA

Travel Required

Less than 10%

Citizenship

U.S. Citizenship Required

Salary and Benefit Information

The likely salary range for this position is $182,750 - $247,250. This is not, however, a guarantee of compensation or salary. Rather, salary will be set based on experience, geographic location and possibly contractual requirements and could fall outside of this range.
View information about benefits and our total rewards program.

About Our Work

We are GDIT. A global technology and professional services company that delivers technology and mission services to every major agency across the U.S. government, defense and intelligence community. Our 30,000 experts extract the power of technology to create immediate value and deliver solutions at the edge of innovation. We operate across over 50 countries worldwide, offering leading capabilities in digital modernization, AI/ML, cloud, cyber and application development. Together with our customers, we strive to create a safer, smarter world by harnessing the power of deep expertise and advanced technology.

Join our Talent Community to stay up to date on our career opportunities and events at gdit.com/tc.

Equal Opportunity Employer / Individuals with Disabilities / Protected Veterans