Job Details
High-Performance Computing (HPC) Application Support Engineer
Description
Job Description:Job Description
Maritime Systems Division has an immediate opening for a High-Performance Computing (HPC) Application Support Engineer. This is an exciting opportunity to use your skills and experience in the development and integration of a critical HPC environment. As the HPC Application Support Engineer, you will work alongside our government customer ensuring the successful delivery of this vital capability.
Primary Responsibilities
- Manage, deploy, and support applications on Red Hat Enterprise Linux (RHEL)
- Work with users to customize applications and configure software development, integration, and production environments to specification
- Work with HPC vendors to identify hardware and software solutions to meet system requirements
- Monitor internally developed applications for impact to system performance and resource utilization
- Tune applications to optimize performance and reliability of services across the High-Performance Computing (HPC) ecosystem
- Diagnose application problems quickly and effectively
- Automate administration procedures for routine and complex tasks
- Provide backup HPC system administration support
- Coordinate with vendors to resolve software problems
- Work with team to define and implement best practices
Basic Qualifications
- Requires BS and 4 – 8 years of prior relevant experience or Masters with 2 – 6 years of prior relevant experience and a minimum of 2 years of experience in Linux/UNIX Systems Administration.
- Experience supporting internally developed applications in C, C++, Java, and Python
- An equivalent combination of education and experience will be considered.
- Ability to identify requirements and to define, plan, and implement requisite solutions
- Ability to plan, organize, prioritize tasks, and complete assigned projects with minimal supervision
- This position requires an active Top Secret/SCI clearance.
- Certifications: Security+, RHCSA or RHCE
Preferred Qualifications
- Excellent interpersonal/communication skills, and the ability to work as part of a team
- 5 years of experience supporting HPC applications and development environments on RHEL
- Experience troubleshooting application execution through resource managers such as PBS Pro and Slurm
- Experience with utilities such as Git, Bitbucket, Confluence
- An understanding of code review, compilers, and debugging tools including Intel Parallel Studio, GCC, GDB, TotalView
- Experience supporting applications based on CUDA, OpenCL, OpenMPI, OpenMP, IntelMPI
- Experience using tools such as Nagios, Zabbix, and SNMP to monitor systems, metrics, and create dashboards
- Ability to develop and maintain programs and scripts that aid in the operation and automation of administrative tasks and workflows using Bash and Python