Job was saved successfully.
Job was removed from Saved Jobs.

Job Details

Meta (formerly known as Facebook)

SiteOps Global Production Operations Lead Engineer

Computer and Mathematical



Huntsville, Alabama, United States

Meta is seeking a technical leader to collaborate and guide Production Operations functions in our Data Center Site Operations team. The Production Operations team plays a key role at each of our data centers, assuring high reliability and availability of the server infrastructure required to meet the needs of more than 2 billion people actively engaged with Meta and our suite of applications. We partner closely with vendors and others at Meta, including infrastructure tooling & software development teams, product engineers & service owners, hardware design & manufacture, logistics & supply chain operations, quality & data analytics, project management, production & operations incident management, and maintenance management.The Production Operations Lead Engineer will assure exceptional availability and reliability of our hyper-scale fleet of servers. We seek a Subject Matter Expert who can continue to drive innovation in this space, spanning people, processes, infrastructure, reliability, tooling, automation, cost and quality. Ensuring high availability of our servers requires effective spare parts management, identification of improvements to ensure quality parts are on hand. We seek someone who can quickly understand and respond to the technical needs of subject matter experts, local Site leadership, and our Production Operations teams, in a rapidly evolving technical environment. The scope includes spare part management and planning. The successful candidate will gain alignment across these globally distributed teams and partner organizations, driving initiatives that deliver the most impact by prioritizing resources and focus areas.

Required Skills

SiteOps Global Production Operations Lead Engineer Responsibilities:

  • Responsible for exceptional uptime, quality, and reliability of Facebook’s global fleet of data center servers, assuring the Production Operations team meets or exceeds all operational targets
  • Organize and drive the needs and priorities of the Production Operations team in internal and partner forums, as the technical expert in this space
  • Build trusted relationships within the team, to understand the biggest challenges and opportunities, and to advocate effectively for the right initiatives
  • With partner organizations, collaboratively drive a roadmap that scales Site Operations, delivering high impact advances in tooling, hardware, and workflow
  • Drive a singular operations strategy, goals, and priorities for the global Production Operations function within Site Operations
  • Measure and benchmark the effectiveness of operational processes both internally and externally, setting performance targets and driving improvements as needed
  • Develop scaling strategies and plans, be forward thinking by understanding infrastructure growth, identifying scaling issues before they occur, and contributing to solutions
  • Liaise with site teams and logistic partners to identify improvement areas, drive changes and innovation to increase parts availability
  • Works with site teams to address material shortages and develop best practices and alternative solutions for managing the associated repairs
  • Establish practices to collect failure data on parts and then use that data to drive improvements in repair flows
  • Understand the cost trade-offs for various parts and repair flows and make recommendations that ensure high availability while managing total cost of ownership
  • Present a single message, representing SiteOps, to our logistics and procurement partners on spare part strategy and improvement areas
  • Ensure robust, timely communications across a globally distributed team, and provide the team great visibility to progress and strategy
  • Develop close partnerships with Program Management, Tooling, Hardware Design, Data Analytics, Manufacturing, Sourcing, Logistics and other teams to deliver superb operational results and manage the performance of external vendors
  • 30% - 40% travel required

Minumum Qualification

Minimum Qualifications:

  • Decision-making and problem-solving skills
  • Interpersonal, partnership and communications skills
  • Proven experience as Engineering or Operations Director, or relevant Senior Technical, Operations or Engineering Lead role
  • Data Center Logistics experience
  • Organizational, technical, and leadership skills
  • BSc/BA in technical field or commensurate experience
  • Working knowledge of IT/Operations Infrastructure
  • Prioritization skills and proven experience leading tooling, systems, automation and process

Preferred Qualification

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice [Register to View] . We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. We may use your information to maintain the safety and security of Meta, its employees, and others as required or permitted by law. You may view [Register to View] , [Register to View] notice, and [Register to View] by clicking on their corresponding links. Additionally, Meta participates in the [Register to View] in certain locations, as required by law