Cloud Platform
Distributed Systems
Microsoft Azure
Site Reliability Engineering
Senior Site Reliability Engineer
Discover a career that is challenging, impactful, and mission‑critical. Join our team as a Senior Site Reliability Engineer and make an impact on mission‑critical cloud systems, reliability engineering, and service availability. While you help us advance the mission, we’ll help you build your skills and advance your career.
HOW A SENIOR SITE RELIABILITY ENGINEER WILL MAKE AN IMPACT
• Ensure high availability, resilience, and performance of critical systems by designing monitoring, alerting, and automated remediation solutions.
• Analyze logs, system metrics, and performance data to identify bottlenecks and drive improvements to system health and scalability.
• Develop and maintain automation scripts, infrastructure‑as‑code configurations, and tooling to streamline deployments, scaling, and operational workflows.
• Lead initiatives to reduce manual toil and promote best‑in‑class reliability engineering practices across cloud environments.
• Participate in on‑call rotations, respond to incidents promptly, conduct root‑cause analyses, and drive systemic corrective actions.
• Collaborate with software engineering, product, and infrastructure teams to integrate reliability into the full development lifecycle.
• Provide architectural guidance on distributed systems, capacity planning, disaster recovery, and reliability patterns.
• Communicate technical risks, impacts, and reliability insights clearly to non‑technical stakeholders.
• Lead reliability‑focused projects and manage timelines, resources, and deliverables to achieve measurable improvements.
WHAT YOU’LL NEED TO SUCCEED
Education:
Bachelor’s degree in Computer Science, Engineering, IT, or a related field—or equivalent experience. Advanced degree preferred.
Experience:
• 5+ years of experience in site reliability engineering, systems engineering, or related technical roles.
• Demonstrated success designing, supporting, and improving reliability for distributed, cloud‑based systems.
Technical Skills:
• Strong software engineering background with proficiency in Python, Go, or similar languages.
• Deep understanding of distributed systems, cloud platforms (AWS, Azure, GCP), and container orchestration (Kubernetes).
• Experience with observability tools such as Prometheus, Grafana, OpenTelemetry, and log aggregation platforms.
• Skilled in defining SLIs, SLOs, and error budgets for measuring and maintaining service reliability.
Skills & Abilities:
• Excellent problem‑solving and analytical skills with a proactive and systematic approach to reliability issues.
• Strong communication skills for cross‑team collaboration and stakeholder engagement.
• Ability to lead reliability initiatives, mentor engineers, and drive alignment across technical and business teams.
Security Clearance Level:
Ability to obtain a Public Trust or higher, per FSA requirements.
Location:
Remote
GDIT IS YOUR PLACE
At GDIT, the mission is our purpose, and our people are at the center of everything we do.
● Growth: AI-powered career tool that identifies career steps and learning opportunities
● Support: An internal mobility team focused on helping you achieve your career goals
● Rewards: Comprehensive benefits and wellness packages, 401K with company match, and competitive pay and paid time off
● Community: Award-winning culture of innovation and a military-friendly workplace
OWN YOUR OPPORTUNITY
Explore an enterprise IT career at GDIT and you’ll find endless opportunities to grow alongside colleagues who share your desire to drive operations forward.
5 + years of related experience
* may vary based on technical training, certification(s), or degree
Less than 10%
The likely salary range for this position is $111,155 - $150,385. This is not, however, a guarantee of compensation or salary. Rather, salary will be set based on experience, geographic location and possibly contractual requirements and could fall outside of this range.
View information about benefits and our total rewards program.
We are GDIT. A global technology and professional services company that delivers technology and mission services to every major agency across the U.S. government, defense and intelligence community. Our 26,000 experts extract the power of technology to create immediate value and deliver solutions at the edge of innovation. We operate across over 50 countries worldwide, offering leading capabilities in digital modernization, AI/ML, cloud, cyber and application development. Together with our customers, we strive to create a safer, smarter world by harnessing the power of deep expertise and advanced technology.
Join our Talent Community to stay up to date on our career opportunities and events at gdit.com/tc.
Equal Opportunity Employer / Individuals with Disabilities / Protected Veterans