HPC System Engineer

Clearance Level
None
Category
Systems Engineering
Location
Bethesda, Maryland

REQ#: RQ57549

Travel Required: None
Public Trust: NACI (T1)
Requisition Type: Regular

GDIT supports the world’s largest supercomputer dedicated to life sciences and biomedical research. We are looking to expand our NIH onsite High Performance Computing (HPC) support with individuals who can also contribute to GDIT’s HPC Center of Excellence across programs in NIH, NOAA, NASA, Defense and Intelligence communities.

We are looking for an experienced individual with a strong Linux background, configuration management, user account management, systems automation and network monitoring to join our team to administer the NIH Biowulf supercomputer. The position is full-time onsite at the NIH main campus in Bethesda, Maryland.

RESPONSIBILITIES & DUTIES:

This position supports the HPC systems administrative team in operating and maintaining the 4,000 node Linux cluster for ~2,500 biomedical researchers. Specific responsibilities include:

  • Work with systems staff to enhance our Configuration Management infrastructure
  • Evaluate performance impacts of planned operating system changes
  • Update and expand existing systems monitoring capabilities
  • Develop automation tools for cluster administration
  • Maintain account management procedures to support growing number of NIH researchers
  • Provide technical support to researchers using HPC resources, troubleshoot problems and develop appropriate computational strategies
  • Consult and collaborate with scientist co-workers to determine best system configurations for applications

REQUIRED QUALIFICATIONS:

  • BS or equivalent and five years experience
  • Minimum of five years RedHat or CentOS Linux system administration experience in an HPC environment.
  • Minimum of three years scripting experience with Bash, Perl or Python
  • Prior experience with configuration management tools, such as Ansible, Chef, Puppet, Cobbler
  • Ability to configure, deploy and manage a major system area, such as batch system, network, data storage, backup system, database system, or distributed computing
  • Ability to obtain a NIH Public Trust

PREFERRED QUALIFICATIONS:

  • Experience with batch systems, such as SLURM or PBS
  • Experience managing parallel and cluster file systems, such as NFS, GPFS, or Lustre
  • Network management experience especially Infiniband
  • Experience presenting and/or teaching

ATTRIBUTES FOR SUCCESS:

  • Provide technical expertise to improve HPC cluster management, performance, and resiliency
  • Ability to work both independently and as part of the team; flexibility in dealing with assignments and in working on several projects simultaneously
  • Ability to effectively communicate with people of diverse backgrounds and computer knowledge

SUMMARY:

  • The position is full-time onsite at the NIH main campus in Bethesda, Maryland.
  • Limited off-hour system maintenance activities will be planned in advance
  • There are no travel requirements.
  • Applicants must be US citizen to meet moderate level security requirement of facility.

#GDITpriority

We are GDIT. The people supporting some of the most complex government, defense, and intelligence projects across the country. We deliver. Bringing the expertise needed to understand and advance critical missions. We transform. Shifting the ways clients invest in, integrate, and innovate technology solutions. We ensure today is safe and tomorrow is smarter. We are there. On the ground, beside our clients, in the lab, and everywhere in between. Offering the technology transformations, strategy, and mission services needed to get the job done.

GDIT is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status, or any other protected class.