Jobs

Division: AC-Computing
Luis W. Alvarez Postdoctoral Fellowship and Admiral Grace M. Hopper Postdoctoral Fellowship in Computing Sciences
The Computing Sciences Area (hyperlink to https://cs.lbl.gov/) at Lawrence Berkeley National Laboratory (hyperlink to https://www.lbl.gov/) is now accepting applications for two distinguished postdoctoral fellowships in Computing Sciences:
- Luis W. Alvarez Postdoctoral Fellowship, and
- Admiral Grace M. Hopper Postdoctoral Fellowship
Researchers in computer science, mathematics, data science, or any computational science discipline who have received their Ph.D. and have completed no more than three years of postdoctoral work are encouraged to apply. Only one (1) application is needed, and it will be considered for both postdoctoral fellowships.
The successful candidates will participate in research activities in computer science, mathematics, data science, or any computational science discipline of interest to the Computing Sciences Area and Berkeley Lab. Alvarez Fellows apply advances in computer science, mathematics, computational science, data science, machine learning or AI to computational modeling, simulations, and advanced data analytics for scientific discovery in materials science, biology, astronomy, environmental science, energy, particle physics, genomics, and other scientific domains. Hopper Fellows concentrate on the development and optimization of scientific and engineering applications leveraging high-speed network capability provided by the Energy Sciences Network or run on next-generation high performance computing and data systems hosted by the National Energy Research Scientific Computing Center at Berkeley Lab.
Since its founding in 2002, Berkeley Lab’s Luis W. Alvarez Postdoctoral Fellowship (go.lbl.gov/alvarez) has cultivated exceptional early career scientists who have gone on to make outstanding contributions to computer science, mathematics, data science, and computational sciences. The Admiral Grace Hopper Postdoctoral Fellowship (go.lbl.gov/hopper) was first awarded in 2015 with the goal of enabling early career scientists to make outstanding contributions in computer science and high-performance computing (HPC) research.
About Computing Sciences at Berkeley Lab:
Whether running extreme-scale simulations on a supercomputer or applying machine-learning or data analysis to massive datasets, scientists today rely on advances in and integration across applied mathematics, computer science, and computational science, as well as large-scale computing and networking facilities, to increase our understanding of ourselves, our planet, and our universe. Berkeley Lab's Computing Sciences Area researches, develops, and deploys new tools and technologies to meet these needs and to advance research in our core capabilities of applied mathematics, computer science, data science, and computational science. In addition to fundamental advances in our core capabilities, we impact such areas as astrophysics and cosmology, accelerator physics, chemical science and materials science, combustion, fusion energy, nuclear physics, biology, climate change, and HPC systems and network technology. Research areas in Computing Sciences include but are not limited to:
- Developing scientific applications and software technologies for extreme-scale and energy-efficient computing
- Developing mathematical modeling for complex scientific problems
- Designing algorithms to improve the performance of scientific applications
- Researching digital and post-digital computer architectures for science
- Developing and advancing extreme-scale scientific data management, analysis, and visualization
- Developing and advancing next-generation machine learning, AI, and data science approaches for science
- Advancing quantum computing and networking technologies, software, algorithms and applications
- Evaluating or developing new and promising HPC systems and networking technologies
- Researching methods to control and manage next-generation network
- Managing scientific data and workflows in distributed environments
Qualifications:
- Requires a Ph.D. in computer science, mathematics, computational science, or related discipline.
Candidates must have no more than 3 years of Postdoctoral Researcher or similar experience. - Expertise with advanced algorithms, software techniques, HPC systems and/or networking in a related research field.
- Demonstrated creativity and the ability to perform independent research.
- Demonstrated excellence in a related research fiel
- Ability to develop new cross-disciplinary partnerships that use advanced computational and/or mathematical techniques to produce unique lab capabilities.
- Excellent communication skills with the ability to facilitate communications and collaborations with internal and external stakeholders.
Additional Desired Qualifications:
- Knowledge of advanced computing and high-performance computing.
Application Process:
- As part of your application process, you must upload and submit the following materials with your online application.
- Cover letter
- CV, with publication list included
- Research Statement (no more than five (5) pages in length when printed using standard letter-size (8.5 inch x 11 inch) paper with 1-inch margins (top, bottom, left, and right) and a font size not smaller than 11 point; figures and references cited, if included, must fit within the five-page limit)
- Contact information (name, affiliation, and email address) of at least three (3) individuals who will be able to provide letters of reference.
- Application deadline: October 24, 2025.
* It is highly advisable that you have all the required application materials and information ready and available prior to completing and submitting your application. Your application will not be considered complete if any of the above information is missing.
Tentative Application Timeline:
The Computing Sciences Fellowship Selection Committee is made up of a diverse representation of scientists and engineers across Berkeley Lab’s Computing Sciences Area who will conduct a thorough review of all applications received.
- Application deadline: October 24, 2025
- Review and Selection: October 2025 - January 2026
- Decisions made: January/February 2026
Notes:
- The selected candidates will be offered either the Luis W. Alvarez Postdoctoral Fellowship or the Admiral Grace M. Hopper Postdoctoral Fellowship based on their research focus.
- This position is expected to pay between $137,916 to $142,053 annually. Actual salary will depend on the candidate's overall experience and expertise, including prior experience in Postdoc roles.
- This position is represented by a union for collective bargaining purposes.
- This position may be subject to a background check. Having a conviction history will not automatically disqualify an applicant from being considered for employment. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position.
- By applying to this role, your candidacy may be considered for other relevant positions within Computing Sciences. If you see a separate opening that matches your interests and qualifications, however, you are encouraged to apply.
- Work may be performed on-site, or hybrid. Work must be performed within the United States. Starting May 7, a REAL ID or other acceptable form of identification is required to access Berkeley Lab sites (for more information click here).
Want to learn more about working at Berkeley Lab? Please visit: careers.lbl.gov
How To Apply
Apply directly online at http://50.73.55.13/counter.php?id=307909 and follow the on-line instructions to complete the application process.
Equal Employment Opportunity Employer: The foundation of Berkeley Lab is our Stewardship Values: Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.
Berkeley Lab is a University of California employer. It is the policy of the University of California to undertake affirmative action and anti-discrimination efforts, consistent with its obligations as a Federal and State contractor.
Misconduct Disclosure Requirement: As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.

Division: AM-Applied Mathematics and Computational Research
A recently funded collaborative project between the ESnet division of Lawrence Berkeley National Laboratory (LBNL) and QuEra Computing is seeking a highly motivated Postdoctoral Researcher to explore a novel optical interconnect approach for distributed quantum computing with neutral atoms. This research aims to integrate atomic ensemble qubits into optical tweezer arrays of single atoms, enabling quantum networking between state-of-the-art quantum processors and paving the way for scalable quantum computers.
This project will involve theoretical and numerical analysis of Rydberg mediated interactions of a single atom with a micro-ensemble of atoms that can be used to collectively emit and store single photons for high-rate, high fidelity remote entanglement without the need of optical resonators. As a Postdoc on this project, you will play a central role in advancing this cutting-edge research and collaborate closely with researchers and engineers from both the ESnet and QuEra teams. Your work will directly influence the experimental and engineering directions needed to realize large-scale, fault-tolerant quantum computers.
Titled “Scalable Neutral-Atom Quantum Computing via Local-Area Quantum Communications Enabled by Atomic Ensemble Qubits,” this two-year DOE-funded initiative brings together leading experts from ESnet—developers of one of the largest quantum networking testbeds—and QuEra, a pioneering quantum computing startup. This government–industry collaboration offers a unique opportunity to help shape the future of data center–scale quantum computing by leveraging QuEra’s state-of-the-art technologies and the dynamic research environments of LBNL and the University of California, Berkeley.
What You Will Do:
- Develop theoretical frameworks for quantum light-matter interactions in the context of distributed quantum computing.
- Model and analyze Rydberg-mediated interactions for both single-atom and ensemble qubits.
- Analyze optical trapping and transport dynamics of single atoms and ensembles.
- Evaluate and benchmark atomic ensemble qubits as optical quantum memories and deterministic single-photon sources for high-rate, high-fidelity remote entanglement between quantum processors.
- Investigate the integration of ensemble qubits into multi-zone processor architectures to enable scalable, fault-tolerant quantum computing.
- Collaborate closely with experimentalists and engineers to guide the development and implementation of networked quantum processors.
Additional Responsibilities as needed:
- Assist in project coordination and contribute to the preparation of project deliverables.
- Disseminate research findings through publications in high-impact journals and presentations at national and international conferences.
- Travel occasionally for collaboration and/or research dissemination.
What is Required:
- Ph.D. and/or previous Postdoc studies in at least one of the following areas: quantum optics, quantum networking, quantum computing.
- Strong theoretical background in the trapping and coherent manipulation of neutral atoms for quantum information processing and quantum communication.
- Extensive experience in theoretical and numerical studies of quantum light-matter interfaces and Rydberg interactions with applications to single-atom and atomic-ensemble neutral-atom platforms.
- Proven expertise in the physical modelling and performance analysis of key components of the neutral-atom systems, including (but not limited) to spin-wave quantum memories, single-photon sources, Rydberg-based quantum gates.
- Solid understanding of fault-tolerance aspect in quantum information processing and its implications for scalable architectures.
Desired Qualifications:
- Familiarity with experimental implementations, including relevant hardware, tools, and control systems.
- Demonstrated ability to work effectively in a collaborative, interdisciplinary teams composed of scientists, engineers, and students from diverse backgrounds.
- Strong written and oral communication skills, with experience presenting research findings in group settings and at national/international conferences.
Notes:
- This is a full-time, 2 year, postdoctoral appointment with the possibility of renewal based upon satisfactory job performance, continuing availability of funds and ongoing operational needs. You must have less than 3 years of paid postdoctoral experience. Salary for Postdoctoral positions depends on years of experience post-degree.
- This position is represented by a union for collective bargaining purposes.
- The monthly salary range for this position is $8,321-$10,418 and is expected to start at $8,321 or above. Postdoctoral positions are paid on a step schedule per union contract and salaries will be predetermined based on postdoctoral step rates. Each step represents one full year of completed post-Ph.D. postdoctoral experience.
- This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
- Work may be performed on-site, hybrid, full-time telework. The primary location for this role is Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA. Work must be performed within the United States. A REAL ID or other acceptable form of identification is required to access Berkeley Lab sites.
Want to learn more about working at Berkeley Lab? Please visit: careers.lbl.gov
How To Apply
Apply directly online at http://50.73.55.13/counter.php?id=309013 and follow the on-line instructions to complete the application process.
Equal Employment Opportunity Employer: The foundation of Berkeley Lab is our Stewardship Values: Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.
Berkeley Lab is a University of California employer. It is the policy of the University of California to undertake affirmative action and anti-discrimination efforts, consistent with its obligations as a Federal and State contractor.
Misconduct Disclosure Requirement: As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.

Division: SN-Scientific Networking
A recently funded collaborative project between the ESnet division of Lawrence Berkeley National Laboratory (LBNL) and QuEra Computing is seeking a highly motivated Postdoctoral Researcher to explore a novel optical interconnect approach for distributed quantum computing with neutral atoms. This research aims to integrate atomic ensemble qubits into optical tweezer arrays of single atoms, enabling quantum networking between state-of-the-art quantum processors and paving the way for scalable quantum computers.
This project will involve theoretical and numerical analysis of Rydberg mediated interactions of a single atom with a micro-ensemble of atoms that can be used to collectively emit and store single photons for high-rate, high fidelity remote entanglement without the need of optical resonators. As a Postdoc on this project, you will play a central role in advancing this cutting-edge research and collaborate closely with researchers and engineers from both the ESnet and QuEra teams. Your work will directly influence the experimental and engineering directions needed to realize large-scale, fault-tolerant quantum computers.
Titled “Scalable Neutral-Atom Quantum Computing via Local-Area Quantum Communications Enabled by Atomic Ensemble Qubits,” this two-year DOE-funded initiative brings together leading experts from ESnet—developers of one of the largest quantum networking testbeds—and QuEra, a pioneering quantum computing startup. This government–industry collaboration offers a unique opportunity to help shape the future of data center–scale quantum computing by leveraging QuEra’s state-of-the-art technologies and the dynamic research environments of LBNL and the University of California, Berkeley.
What You Will Do:
- Develop theoretical frameworks for quantum light-matter interactions in the context of distributed quantum computing.
- Model and analyze Rydberg-mediated interactions for both single-atom and ensemble qubits.
- Analyze optical trapping and transport dynamics of single atoms and ensembles.
- Evaluate and benchmark atomic ensemble qubits as optical quantum memories and deterministic single-photon sources for high-rate, high-fidelity remote entanglement between quantum processors.
- Investigate the integration of ensemble qubits into multi-zone processor architectures to enable scalable, fault-tolerant quantum computing.
- Collaborate closely with experimentalists and engineers to guide the development and implementation of networked quantum processors.
Additional Responsibilities as needed:
- Assist in project coordination and contribute to the preparation of project deliverables.
- Disseminate research findings through publications in high-impact journals and presentations at national and international conferences.
- Travel occasionally for collaboration and/or research dissemination.
What is Required:
- Ph.D. and/or previous Postdoc studies in at least one of the following areas: quantum optics, quantum networking, quantum computing.
- Strong theoretical background in the trapping and coherent manipulation of neutral atoms for quantum information processing and quantum communication.
- Extensive experience in theoretical and numerical studies of quantum light-matter interfaces and Rydberg interactions with applications to single-atom and atomic-ensemble neutral-atom platforms.
- Proven expertise in the physical modelling and performance analysis of key components of the neutral-atom systems, including (but not limited) to spin-wave quantum memories, single-photon sources, Rydberg-based quantum gates.
- Solid understanding of fault-tolerance aspect in quantum information processing and its implications for scalable architectures.
Desired Qualifications:
- Familiarity with experimental implementations, including relevant hardware, tools, and control systems.
- Demonstrated ability to work effectively in a collaborative, interdisciplinary teams composed of scientists, engineers, and students from diverse backgrounds.
- Strong written and oral communication skills, with experience presenting research findings in group settings and at national/international conferences.
Notes:
- This is a full-time, 2 year, postdoctoral appointment with the possibility of renewal based upon satisfactory job performance, continuing availability of funds and ongoing operational needs. You must have less than 3 years of paid postdoctoral experience. Salary for Postdoctoral positions depends on years of experience post-degree.
- This position is represented by a union for collective bargaining purposes.
- The monthly salary range for this position is $8,321-$10,418 and is expected to start at $8,321 or above. Postdoctoral positions are paid on a step schedule per union contract and salaries will be predetermined based on postdoctoral step rates. Each step represents one full year of completed post-Ph.D. postdoctoral experience.
- This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
- Work may be performed on-site, hybrid, full-time telework. The primary location for this role is Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA. Work must be performed within the United States. A REAL ID or other acceptable form of identification is required to access Berkeley Lab sites.
Want to learn more about working at Berkeley Lab? Please visit: careers.lbl.gov
How To Apply
Apply directly online at http://50.73.55.13/counter.php?id=309012 and follow the on-line instructions to complete the application process.
Equal Employment Opportunity Employer: The foundation of Berkeley Lab is our Stewardship Values: Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.
Berkeley Lab is a University of California employer. It is the policy of the University of California to undertake affirmative action and anti-discrimination efforts, consistent with its obligations as a Federal and State contractor.
Misconduct Disclosure Requirement: As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.

We are looking for engineers to build tools that efficiently process large amounts of data, produce features, and perform calculations using HPC techniques, ML training/inference, GPUs, and custom FPGA accelerators. Our team includes 100+ researchers who are constantly developing new features and models to drive trading strategies. As our data and computational demands grow, we are expanding our infrastructure team to ensure our researchers can work efficiently and effectively in a fast-paced environment.
Responsibilities
• Design and develop scalable, high-performance tools for data processing and ML workflows that support trading research and model development.
• Identify and resolve computational bottlenecks in large-scale systems.
• Collaborate closely with quantitative researchers to understand their needs and translate them into robust, user-friendly tools.
• Build and maintain a research environment that is both powerful and easy to use, enabling rapid experimentation and deployment.

1.Field Application Engineer locating in Austin TX Onsite & Remote from home serve as East-coast regional FAE for 1st tier & 2nd tier customers with immediate responsibility as local support for Dell account besides supporting all other east coast customers in different regions as necessary.
2.Engage with key AC-DC & DC-DC power supply customers as on-site FAE support to do first level failure analysis.
3.Proactively identify customer technical issues to determine correct solution selection, quality issues, etc.
4.Support to resolve field issues at customer sites, including first-level FA, FW and HW rework, etc.
5.Work with Asian engineering teams for NPI programs, facilitating technical communications, on-site testing and debugging.
6.Will support all required rework in US & Mexico if required.
7.Travel US Domestic, Mexico & Asia as needed. Traveling with our commercial team, the Field Application Engineer will be the primary customer interface for all technical discussions

1.Field Application Engineer locating in Milpitas CA Onsite serves as West-coast regional FAE for 1st tier & 2nd tier customers with immediate responsibility as local support for Cisco account besides supporting all other East to West coast customers in different regions as necessary.
2. Engage with key AC-DC & DC-DC power supply customers as on-site FAE support to do first level failure analysis.
3.Proactively identify customer technical issues to determine correct solution selection, quality issues, etc.
4.Support to resolve field issues at customer sites, including first-level FA, FW and HW rework, etc.
5.Work with Asian engineering teams for NPI programs, facilitating technical communications, on-site testing and debugging.
6.Will support all required rework in US & Mexico if required.
7.Travel US Domestic, Mexico & Asia as needed. Traveling with our commercial team, the Field Application Engineer will be the primary customer interface for all technical discussions

As a member of our Platform Development team, you will be instrumental in building and optimizing high-performance trading systems, research compute clusters, databases, support systems, and more. You will heavily utilize Linux and Windows internals while working on servers in our HPC environment.
What You'll Do:
Automate and Evolve: Contribute to our library of home-grown tools, written primarily in Python and Bash, to automate monitoring, and maintenance, allowing you to focus on performance related issues and projects.
Collaborate: Work closely with Strategy Developers, Quantitative Researchers, and trade-supporting application teams to translate complex problems into scalable solutions. Coordinate with IT infrastructure teams, including storage and networking, to identify and implement the best solutions.
Optimize: Tune operating systems and batch workflows for performance. Dive deep on root-cause analysis of systems issues. Integrate all of these solutions into our systems effectively and efficiently.
Comprehensive HPC Environment Management: Oversee all aspects of our HPC environment, including the scheduler, parallel filesystems, GPUs, and interconnects.
High throughput storage: Implement and optimize high-performance storage solutions, including Lustre, VAST, and GPFS, to efficiently support and enhance cluster performance.
Capacity Planning and Design: Develop strategies to ensure optimal resource allocation and scalability, using analytics to forecast needs and design efficient, reliable systems.
Troubleshooting and Tuning: Utilize monitoring and diagnostic tools to quickly pinpoint failures, streamline troubleshooting processes, and ensure the timely recovery of disrupted workflows.

•Field Application Engineer locating in Boston Onsite & Remote from home Serve as East-coast regional FAE for 1st tier & 2nd tier customers with immediate responsibility as local support for Oracle account besides supporting all other east coast customers in different regions as necessary.
•Engage with key AC-DC & DC-DC power supply customers as on-site FAE support to do first level failure analysis.
•Proactively identify customer technical issues to determine correct solution selection, quality issues, etc.
•Support to resolve field issues at customer sites, including first-level FA, FW and HW rework, etc.
•Work with Asian engineering teams for NPI programs, facilitating technical communications, on-site testing and debugging.
•Will support all required rework in US & Mexico if required.
•Travel US Domestic, Mexico & Asia as needed.

1. Field Application Engineer locating in Austin, TX Onsite serve as East to Weat-coast regional FAE for 1st tier & 2nd tier customers.
2. Mechanical/ Thermal FAE will leverage our vast internal organization of design, process, and product engineering to solve our customers’ Mechanical & Thermal problems and help drive design wins and revenue growth.
3. Clearly communicate abstract concepts to customers with a wide range of technical and non-technical backgrounds.
4. Conduct thermal analysis and validate hydraulic-thermal models.
5. Determine customer requirements (both explicit and implicit) and accurately communicate them back to key engineering team members.
6. Drive product design development incorporating design rules, performance optimization and manufacturing for HVM (high volume manufacturing).
7. Multi-tasking of numerous customer opportunities with the ability to prioritize efforts.
8. Role includes pre- and post- sales support, troubleshooting, debug, qualification, and communicating failure analysis results on all types of Rack/ Enclosures and Liquid Cooling Solutions.
9. Identify the unique electrical, thermal & mechanical challenges facing each application and utilize market insight, technical prowess, and the ability to deviate from convention to accelerate our engagements.
10. Must be able to develop and deliver technical presentations, technical education, system hardware demonstration and application notes in support of the Business Unit objectives.
11. Must understand customer design cycles to drive our product design-in opportunities.
12. Leverage extensive personal experience in thermal/mechanical/electrical design and material science to become a trusted partner for our customers.
13. Will support all required rework in US & Mexico if required. Travel US Domestic, Mexico & Asia as needed.
14. Traveling with our commercial team, the Field Application Engineer will be the primary customer interface for all technical discussions.

• Technical Expertise: Stay up to date on the latest technologies relevant to the company's power products, Rack enclosure as well as Liquid Cooling solutions. Translate complex technical information into clear and concise messages.
• Design Collaboration: Collaborate with the design team to create visually appealing and informative technical materials.
• Industry Knowledge: Stay current on trends and innovations in the power products industry, including renewable energy sources and energy efficiency.
• Competitive Analysis: Research and analyze competitor offerings in the power products, identifying key differentiators and value propositions for the company's products.

• Drive sales growth and consistently exceed revenue budgets.
• Conduct quarterly business reviews with Account management reviewing direction and road mapping.
• Track monthly reporting attainment to KPI’s once established.
• Identify and develop opportunities for all LITEON business units.
• Analyze customer needs and collaborate with management to develop effective strategies.
• Build strong customer relationships and comprehensive understanding to influence hierarchy, product and business roadmaps, vendor selection processes, decision-making criteria, and competitive awareness.
• Foster a customer-centric mindset, focusing on long-term partnerships.

The goal is simple: design an AI agent that writes and optimizes kernels in the same way you do. You will collaborate with the training team to define robust evaluation, validation, and reward models that will be used to train LLMs in the art of GPU kernel engineering. You will also contribute to the AI agent architecture itself, defining the workflows that enable an LLM to discover and implement high performance GPU kernels.
**This job is based in either Gdansk or New York City.** Remote work will be considered for exceptional candidates.
Responsibilities:
- Explore and analyze performance bottlenecks in ML training and inference.
- Develop and optimize high-performance computing kernels in Triton, CUDA, and/or ROCm.
- Implement programming solutions in C/C++ and Python.
- Deep dive into GPU performance optimizations to maximize efficiency and speed.
- Collaborate with the team to extend and improve existing machine learning compilers or frameworks such as MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. (This is optional but beneficial)

WHAT YOU WILL EXPERIENCE IN THIS POSITION:
• Drive the DC power roadmap with a focus on innovative, future-ready products.
• Identify opportunities to excel the market with features such as dynamic load prioritization, predictive analytics, and intelligent protection.
• Translate customer needs into creative product requirements that go beyond incremental improvements.
• Partner with engineering to develop innovative solutions, validating concepts through pilots and customer co-innovation.
• Drive early adoption by positioning products as best-in-class innovations, not just alternatives.
• Establish feedback loops with key hyperscale and telecom partners to validate new insights.
• Continuously assess how shifts in GPU and accelerator platforms drive customer requirements, ensuring product roadmaps stay ahead of demand.
• Collaborate with leading vendors and ecosystem partners (e.g., compute, racks, power components) to strengthen product differentiation and interoperability.
• Collaborate with marketing to communicate product differentiation and innovation leadership.
YOU HAVE:
• Bachelor’s in Engineering, Business, or related field; MBA preferred.
• 7+ years of product management experience in datacenter, telecom, or power systems.
• Ability to launch innovative products that differentiate in crowded markets.
• Knowledge of AI/datacenter trends and DC/HVDC power evolution.
• Excellent communication and customer engagement skills, with executive presence.

NVIDIA is hiring senior software engineers in its Infrastructure, Planning and Process Team (IPP), to accelerate AI adoption across various engineering workflows within the company. IPP is a global organization within NVIDIA. The group works with various other teams within NVIDIA such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure and software development workflow needs. As a senior engineer on AI Workflow, you will design and implement tools and software solutions that leverage Large Language Models and agentic AI to automate end to end software engineering workflows and enhance the productivity of engineers across NVIDIA.
What you’ll be doing:
Design and implement AI-driven optimizations within software development workflows to enhance developer productivity, accelerate feedback loops, and improve release reliability.
Experience designing, developing, and deploying AI agents to automate and software development workflows and processes.
Continuously measure and report on the impact of AI interventions, demonstrating improvements in key metrics like cycle time, change failure rate, and mean time to recovery (MTTR).
Create and deploy predictive models to identify high-risk commits, forecast potential build failures, and flag changes that have a high probability of failures.
Conduct research on emerging technologies to recommend best practices and improvements
What we need to see:
BE (MS preferred) or equivalent experience in EE/CS with 10+ years of work experience.
Well versed with Large Language Mode (LLM), Machine Learning (ML), Agentic AI techniques.
Hands-on experience in using large language models (LLMs) and implementing AI for software engineering workflows.
Hands-on experience on Python/Java/Go with extensive python scripting experience.
Experience in working with SQL/NoSQL database systems such as MySQL, MongoDB or Elasticsearch.
Experience in Full stack development.. Proficient in front-end (e.g., React, Angular, Vue.js, HTML, CSS, JavaScript), back-end (e.g., Node.js, Python/Django/Flask, Ruby on Rails, Java/Spring, .NET) development, database management (SQL/NoSQL), and deployment/hosting (e.g., AWS, Azure, GCP).
Experience with tools for CI/CD setup such as Jenkins, Gitlab CI, Packer, Terraform, Artifactory, Ansible, Chef or similar tools.
Good understanding of distributed systems, understanding of microservice architecture and REST APIs.
Good to have knowledge of build tools like Make, Maven or Ant.
Ability to effectively work across organizational boundaries to enhance alignment and productivity between teams.
Ways to stand out from the crowd:
Proactively track AI tool and technology trends, build insights, and collaborate with development teams early to evangelize AI driven workflows NVIDIA adoption.
We have some of the most forward-thinking and versatile people in the world working for us and, due to unprecedented growth, our best-in-class engineering teams are rapidly growing. We are building a team that will truly change the world. If you are passionate about new technologies, care about software quality, and want to be part of the future of transportation and AI, I would love for you to join us.
Expertise in leveraging large language models (LLMs) and Agentic AI to automate complex workflows, with knowledge of retrieval-augmented generation(RAG) and fine-tuning LLMs on enterprise data.
Prior development of a large software project using service oriented architecture operating with real time constraints
We have some of the most forward-thinking and versatile people in the world working for us and, due to unprecedented growth, our best-in-class engineering teams are rapidly growing. We are building a team that will truly change the world. If you are passionate about new technologies, care about software quality, and want to be part of the future of transportation and AI, we would love for you to join us.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.

WHAT YOU WILL EXPERIENCE IN THIS POSITION:
• Join a high-growth R&D Engineering team that is focused on growth and development within our High-Density Liquid Cooling business.
• Drive multi-functional product development teams as a Subject Matter Expert, applying specialized knowledge (e.g., material science, weld engineering) to design and develop products and manufacturing processes to enhance performance and repeatability while maintaining clean internal flow path to prevent debris and corrosion in fluid systems.
• Ensure ASME compliance for piping and pressure vessels and promote standard adoption in new applications and seek out or develop additional standards as needed.
• Collaborate across Product Design, Manufacturing Engineering, and Quality teams to optimize manufacturability and meet internal/industry/customer standards both within our own facility as well as at suppliers. Assist in specifying equipment and refining processes.
• Establish product and test specifications and research the latest in industry advancements.
• Mentor junior engineers by sharing knowledge to strengthen technical depth of staff.
YOU HAVE:
• Bachelor’s degree in Mechanical Engineering or related field required. Master’s degree preferred.
• 10+ years of relevant engineering experience in related field required.
• Knowledge of ASME BPV design and applications, as well as AWS and ISO standards, ASME Section IX or equivalent certification is highly preferred.
• Knowledge of material compatibility and galvanic corrosion in fluid applications.
• Experience leading test design and supervision, developing detailed test plans and analyzing results to make confident, data-driven decisions.
• Ability to provide technical direction to multi-functional teams with open communication, identifying problems and establishing resolutions.
• Up to 20% domestic travel is required.

Sabre is seeking a skilled Tier I Linux Systems Administrator to support a mission-critical Department of Defense (DoD) program dedicated to high-performance computing operations. This role offers the opportunity to collaborate with senior engineers and system administrators to maintain and optimize advanced computing infrastructure. (This position is for potential upcoming work and does not represent an active opening at this time. Qualified candidates will be contacted once the position becomes available.)
Duties include but not limited to:
Perform general Systems Administrator functions supporting all HPC users with courteous responses to their HPC system operations and maintenance activities issues.
Support Senior Systems Administrators (SA) in optimizing the functionality and performance of hardware and software components.
Ensure availability, integrity, efficiency, and reliability of servers, workstations, and data storage systems (Lustre, Bright Cluster)
Support other System Administrators (SA) in creating, administering, and managing Linux user accounts.
Triage Tier 1 Service desk tickets and produce service ticket reports and performance metrics.
Support Linux SA and Engineering team in the installation and testing of HPC system components, including setup and configuration of:
Hardware and software
Computing and data storage servers
Networking equipment
Computing peripherals
Provide responsive backup management, security issues, set up and configuration of Linux servers and software.
Perform timely hardware replacement not covered by third-party vendor maintenance agreements.
Assist Network personnel in the configuration of networking equipment, ensuring network security, and managing firewalls.
Implement necessary measures to comply with all Program security requirements.
Adhere to all IT policies, processes, and procedures.
Manage changing priorities.
Location: The selected candidate will be required to work onsite in Arlington, VA

Sabre is seeking an HPC Data Storage Engineer to support a mission-critical Department of Defense (DoD) program dedicated to high-performance computing operations. In this role, you will be responsible for managing and optimizing data storage systems that underpin large-scale HPC clusters and networking infrastructure. Your contributions will directly enable advanced data-intensive research efforts that are essential to national defense
Duties include but not limited to:
Manage where data is stored
Capacity planning and management, performance management, metrics reporting, storage quotas, policy enforcement, hardware management and tuning
Interface with HPC users to identify potential performance improvement strategies
Remotely assist a counterpart located at an alternate site as needed
Location: The selected candidate will be required to work onsite in Arlington, VA

Sabre is seeking an HPC Data Storage Engineer to support a mission-critical Department of Defense (DoD) program dedicated to high-performance computing operations. As an HPC Engineer, you will design, optimize, and maintain advanced high-performance computing environments that power large-scale data processing, simulation, and research operation. Your contributions will directly enable advanced data-intensive research efforts that are essential to national defense
Duties include but not limited to:
Utilize a wide variety of skills in system and network monitoring; large-scale systems administration; scripting and automation; security compliance; network distributed services; storage and backups; and hardware and software problem diagnosis and resolution.
Diagnose and troubleshoot technical problems, often of a complex nature, associated with computer hardware and software interrelationships and dependencies.
Conduct needs analysis, planning, and scheduling the installation of a wide variety of new or modified hardware/software.
Develop functional and technical IT system requirements and specifications. Configure and optimize system tools and applications, to include job schedulers (Slurm and PBSPro) and system resources (GitLab, LUA/TCL modules, and system support applications).
Create and brief technical presentations to technical and non-technical stakeholders. Maintain detailed documentation of system configurations, procedures, and troubleshooting guides. Develop user-facing documentation
Location: The selected candidate will be required to work onsite in Dallas, TX

Sabre is seeking an HPC Data Storage Engineer to support a mission-critical Department of Defense (DoD) program dedicated to high-performance computing operations. In this role, you will be responsible for managing and optimizing data storage systems that underpin large-scale HPC clusters and networking infrastructure. Your contributions will directly enable advanced data-intensive research efforts that are essential to national defense
Duties include but not limited to:
Manage where data is stored
Capacity planning and management, performance management, metrics reporting, storage quotas, policy enforcement, hardware management and tuning
Interface with HPC users to identify potential performance improvement strategies
Remotely assist a counterpart located at an alternate site as needed
Location: The selected candidate will be required to work onsite in Dallas, TX

Sabre is seeking a Kubernetes Engineer who will be responsible for standing up an enclave within a Red Hat environment utilizing Kubernetes for deploying and managing applications. The engineer will build and configure the Kubernetes environment and handle its management and administration throughout its lifecycle. Experience in Red Hat system administration is also required.
Duties include but not limited to:
Design, Management, and Administration of Kubernetes environment
Configure Kubernetes deployments in order to meet relevant STIGs and the JSIG
Manage Red Had Linux systems in a software development or scientific computing environment
Recommend and implement Kubernetes cluster, to include utilization, number of workloads, etc.
Work with users to recommend containerization best practices for their workloads
Assist users in deployment of containerized workloads onto Kubernetes in scalable manners
Deploy support system applications onto Kubernetes like Red Hat IDM
Deploy highly utilized software onto Kubernetes in highly available manners like GitLab, and chat applications like Mattermost
Troubleshoot Kubernetes performance Issues
Develop example Kubernetes deployment packages, as well as container build pipelines
Assist with software selection for Kubernetes and container relevant workloads
Assist with the administration of container artifact management
Recommend and implement best practices for Kubernetes storage utilization.
Location: The selected candidate will be required to work onsite in Arlington, VA

• Act as a project leader to develop a complete project plan including resources, costs, benefits and schedule; assume full responsibility for all aspects of the project to ensure it meets goals and schedule; communicate project status to partners as appropriate; and follow up to ensure project results are sustained.
• Work with operators, team leaders and value stream planners to optimize process center layouts and designs incorporating material flow and line balancing.
• Use lean tools, interpersonal skills and leadership abilities to lead and facilitate kaizen activities encouraging multi-functional groups to participate and meet common goals.
• Give to the development of the annual business unit plan based on value stream maps, metric deployment and management system assessment to drive waste reduction while improving employee safety.
• Lead and support capital expenditure requests and implementations using basic cost justification and business management principles. Projects could include capital equipment justification and capital equipment expenditure requests.
YOU HAVE:
• Bachelor of Science degree in Engineering or related degree.
• 5+ years of experience in manufacturing preferred.
• Experience in a manufacturing environment with one or more of the following systems: SolidWorks, AutoCAD, MS Office, ERP (JD Edwards preferred), PLM (Enovia preferred).
• Experience working with Automation Equipment preferred.

Job Description:
This requestion may be classified as on desk or hybrid depending on location and role.
Job Family Definition:
Designs, develops and applies programs, methodologies and systems based on advanced analytic models (e.g. advanced statistics, operations research, computer science, process) to transform structured and unstructured data into meaningful and actionable information insights that drive decision making. Uses visualization techniques to translate analytic insights into understandable business stories (eg. descriptive, inferential and predictive insights). Embeds analytics into client’s business processes and applications. Combines business acumen and scientific methods to solve business problems.
Management Level Definition:
Contributes to assignments of limited scope by applying technical concepts and theoretical knowledge acquired through specialized training, education, or previous experience. Acts as team member by providing information, analysis and recommendations in support of team efforts. Exercises independent judgment within defined parameters.
Responsibilities:
• Participates in the analysis and validation of data sets/solutions/user experience.
• Aids in the development, enhancement and maintenance of a client's metadata based on analytic objectives. May load data into the infrastructure and contributes to the creation of the hypothesis matrix. Prepares a portion of the data for the Exploratory Data Analysis (EDA) / hypotheses.
• Contributes to building models for the overall solution, validates results and performance. Contributes to the selection of the model that supports the overall solution.
• Supports the research, identification and delivery of data science solutions to problems.
• Supports visualization of the model's insights, user experience and configuration tools for the analytics model.

Sabre is seeking an HPC Data Storage Engineer to support a mission-critical Department of Defense (DoD) program dedicated to high-performance computing operations. As an HPC Engineer, you will design, optimize, and maintain advanced high-performance computing environments that power large-scale data processing, simulation, and research operation. Your contributions will directly enable advanced data-intensive research efforts that are essential to national defense
Duties include but not limited to:
Apply comprehensive knowledge of High Performance Computing (HPC) systems, comprised of high-speed, multi-petabyte Lustre file systems, Red Hat Enterprise Linux (RHEL) servers, CPU/GPU compute nodes, and high performance storage arrays, using Ethernet, fiber, Omni-Path, and InfiniBand interconnections.
Provide functional and technical expertise in support of user-developed software and technical advice and leadership to other technical staff
Utilize a wide variety of skills in system and network monitoring; large-scale systems administration; scripting and automation; security compliance; network distributed services; storage and backups; and hardware and software problem diagnosis and resolution.
Diagnose and troubleshoot technical problems, often of a complex nature, associated with computer hardware and software interrelationships and dependencies.
Conduct needs analysis, planning, and scheduling the installation of a wide variety of new or modified hardware/software.
Develop functional and technical IT system requirements and specifications. Configure and optimize system tools and applications, to include job schedulers (Slurm and PBSPro) and system resources (GitLab, LUA/TCL modules, and system support applications).
Create and brief technical presentations to technical and non-technical stakeholders. Maintain detailed documentation of system configurations, procedures, and troubleshooting guides. Develop user facing documentation.
Location: The selected candidate will be required to work onsite in Arlington, VA

We are seeking a detail-oriented Product Configurator Specialist to support the development, implementation and maintenance of system configuration tools that enable accurate, rules-based product selections for our sales and engineering teams with our data solutions products. In this role you will have a strong foundation in computer science or programmatic thinking, with working knowledge of mechanical and electrical systems.
This role involves analyzing product structures, options, and dependencies to create robust configuration logic. You will build and/or coordinate with third parties to optimize configurator tools using Python, Excel, and SQL, ensuring compatibility across system options and accessories. You will also bridge technical design with user-friendly configuration tools to streamline product customization and ensure accurate order processing.
IN THIS ROLE YOU WILL BE RESPONSIBLE FOR:
• Develop and maintain product configurator logic to ensure valid system configurations.
• Translate product rules, constraints, and BOM logic into programmatic models.
• Work closely with product management, engineering, field application specialists and sales to capture configuration requirements.
• Maintain clean, structured data sets to support configuration integrity and performance.
• Assist with data integration between configuration tools and backend systems (e.g., ERP, PLM).
YOU HAVE:
• Bachelor's degree in Computer Science, Engineering, or a related field.
• 3+ years of experience in product configuration, ERP systems, or technical product management.
• Proficient in Python, SQL, and Excel (including advanced functions/macros).
• Experience in HVAC or data center industry preferred.
• Understanding of mechanical and electrical system components.
• Strong analytical and systems-thinking skills.
• Experience with data modeling or rule-based configuration systems is a plus.

We are open to outside of Minnesota candidates for this position - US Based. Potential relocation package available.
WHAT YOU WILL EXPERIENCE IN THIS POSITION:
• Leadership and Management: Lead and manage the CAD Design team, offering support and technical expertise. You'll be responsible for interviewing, hiring, and onboarding new talent.
• Resource Coordination: Efficiently allocate resources across projects, coordinate with various functions to set priorities, and identify future staffing needs.
• Professional Development: Conduct regular one-on-one meetings to support professional growth, provide performance appraisals, and develop training materials.
YOU HAVE:
• Educational and Professional Expertise: A Bachelor’s degree in Mechanical Engineering or a related field, alongside 7+ years of experience leading CAD Designer teams.
• Technical Proficiency: Demonstrated expertise in project management, SolidWorks 3D modeling, and familiarity with tools like DFMEA/DFM.
• Communication and Collaboration: Strong communication skills to collaborate effectively across functions.

• Engage directly with a broad range of suppliers in multifaceted product categories supporting nVent’s mission to develop safer systems to protect critical customer equipment.
• Lead onsite evaluations of supplier operations from design control to raw materials through production and all supporting activities to delivery.
• Support global sourcing initiatives teaming with sourcing category leaders on supplier assessments, supplier transitions and supplier/product qualification.
• Evaluate and improve supplier capability to support new nVent products and programs.
• Lead supplier performance improvement initiatives, conduct supplier performance evaluations, coordinate supplier scorecards and participate in supplier business reviews.
• Monitor supplier performance to identify high impact areas of opportunity to drive supply base performance.
• Partner with sourcing category leaders to find opportunities for efficiency, improved quality, better lead times, higher delivery performance, and cost reduction.
• Conduct risk analysis and implement actions to mitigate, avoid, etc. and develop recovery plans with critical suppliers to ensure supply chain resilience and rapid recovery when issues occur.
• Assist with the maintenance of our ISO 9001:2015 certified Quality Management System as it relates to Supply Management.
YOU HAVE:
• Bachelor’s degree in Industrial Technology, Engineering, Supply Management, Business, or related field required.
• Ideally 5+ years of proven experience in Supply Management, Supplier Development, Sourcing, Supplier Quality, Quality Engineering or related experience.
• Ability to travel up to 30-40% domestically.

This role will serve as the technical visionary shaping the next generation of power distribution architectures for hyperscale and AI-driven datacenters. This role will not only ensure technical excellence but also drive innovation in how power is delivered, managed, and scaled. From exploring HVDC adoption to developing connectorized rack power safety solutions, you will pioneer architectures that redefine what’s possible in datacenter energy efficiency and resiliency.
By anticipating the evolving needs of compute platforms and designing innovative solutions, you will play a pivotal role in strengthening nVent’s leadership in intelligent power for AI datacenters—empowering customers to achieve peak performance and sustainability.
WHAT YOU WILL EXPERIENCE IN THIS POSITION:
• Define and architect innovative rack-level power distribution topologies (AC, DC, HVDC).
• Investigate new technologies in solid-state protection, intelligent power switching, and metering, and AI-enabled load balancing.
• Evaluate and integrate breakthrough approaches such as sidecar power shelves, in-rack busbars, and modular busway systems.
• Translate innovative ideas into manufacturable, standards-compliant solutions.
• Guide engineering teams in rapid prototyping and proof-of-concepts to validate new concepts.
• Anticipate industry shifts (AI-scale, 1MW+ racks, OCP-HPR standards) and propose next-gen architectures.
• Partner with market-leading technology vendors (compute, rack, power components) where appropriate to accelerate development and ensure interoperability.
• Represent nVent as a leader with a point of view in customer innovation forums and industry standards bodies.
YOU HAVE:
• Bachelor’s degree in Electrical Engineering or related field required.
• Ideally 10+ years in datacenter, power electronics, or telecom power systems.
• Experience in architecting innovative electrical systems and bringing new concepts to market.
• Strong collaborator with cross-functional teams.

Supplier Quality Management
• Be the primary liaison between the assigned supply base, program team, production and nVent customers with a key responsibility to resolve supplier quality issues.
• Implement daily supplier management actions to improve supplier quality performance and flag, if necessary.
• Establish supplier engagement & cadence.
• Visit supplier locations. Travel up to 30%.
• Participate in supplier assessments and internal/external audits as needed.
• Coach, mentor and conduct trainings for suppliers as needed.
Problem Solving & Corrective Actions
• Lead RCCA activities for supplier quality problems.
• Issue supplier-corrective actions.
• Evaluate whether corrective actions are thorough and effective
• Evaluate what impact this may have on our overall relationship with the supplier.
• Communicate urgency/severity of concern with suppliers/customer.
• Validate the effectiveness of the actions taken by suppliers.
Process & Product Quality
• Establish strong relationships with the customer, the program team, engineering, supplier partnerships, sourcing and quality department to understand requirements and production risks and develop plans with each supplier to ensure the highest quality without disruption.
• Support APQP and PPAP activities for new product introductions and engineering changes.
• Support suppliers’ process capability studies, FMEA reviews, and control plan validations.
Continuous Improvement
• Leading supplier quality focused improvement activities to drive zero defect mentality.
• Support a collaborative environment with suppliers, operations, quality, program and sourcing teams, and other key functions to cultivate a zero-defect environment.
• Drive supplier development initiatives, including Lean and Six Sigma projects.
• Find opportunities for cost reduction and quality improvement across the supply base.
Compliance & Documentation
• Maintain supplier quality documentation, audit reports, and performance records.
• Ensure compliance with environmental, health, and safety standards where applicable.
YOU HAVE:
• Bachelor’s degree in Metallurgical Engineering, Industrial Engineering, Mechanical Engineering, Manufacturing/Automation Engineering, Chemical Engineering, Materials Engineering.
• Solid understanding of APQP, PPAP, SPC, Process Capability Studies, FMEA, Control Plans, and GD&T.
• Proficient in root cause analysis investigation, (such as RCCA, 8D, and 3L5Y).
• 4+ years in Supplier Quality related functions.
• Quality Certifications (CQE, CMQ, CQA) and ISO 9001:2015 Internal Auditor trained a plus.
• Effective communication at all levels of the organization.
• Ability to coach, mentor and conduct trainings for suppliers.
• Good process management, planning and change-management skills.
• Some knowledge and experience in reliability.

As an HPC Validation and Performance Engineer at NMC², you will take ownership of the validation and optimization of our HPC CPU and GPU calc farms. This critical role will involve developing a validation and performance baselining framework, which ensures system readiness for AI/ML and HPC workloads across multiple architectures. Your role will be essential in providing continuous performance benchmarking, real-time observability, and long-term strategic readiness. You will drive the implementation of advanced tooling and frameworks, maintaining an infrastructure that is crucial to our cutting-edge research efforts. You will be accountable for providing data driven performance metrics to support architectural design choices as we continue to globally scale our datacenter footprint. We are looking for someone with deep technical expertise in compute, storage or networking optimizations and performance engineering who can develop solutions that scale with our growing infrastructure. This role demands a forward-thinking engineer who can anticipate industry trends and adopt emerging architectures and strategies to keep NMC² at the forefront of innovation.
Responsibilities:
Architecting and implementing a validation framework to certify the readiness and utilization of GPU nodes across a large, distributed HPC environment.
Defining methodologies to continually assess performance and optimising infrastructure across AI/ML workloads
Developing and executing comprehensive performance testing using industry and customer specific benchmarks, ensuring optimal performance across HPC compute, storage and networking
Contribute to research reports that will describe the discoveries of the benchmarking, evaluating the complete HW performance and efficiency
Leading efforts to debug, identify and then resolve bottlenecks in system performance
Building robust, scalable tools for automated validation and testing, utilizing Python, Go, Kubernetes and CI/CD pipelines to streamline continuous validation and benchmarking processes
Implementing monitoring solutions using Prometheus, Grafana and other modern monitoring technologies to track performance metrics and real-time health of the cluster
Defining and implementing best practice for continuous performance validation, ensuring that the infrastructure remains reliable and efficient as new technologies emerge
Staying informed on industry trends and advancements to ensure long-term strategic alignment
Working cross-functionally with engineering, infrastructure and research teams to align validation efforts with the broader business objectives, ensuring that the platform meets evolving research demands

The HPC Storage Engineering team manages our high performance scalable storage in excess of 200 petabytes. We are responsible for the design, deployment, and operation of this storage platform, which supports the research workloads of our Researchers, and is therefore a core component of infrastructure.
Working alongside our global team, you will optimize and troubleshoot workflows alongside our Researchers and Software Engineers, Our involvement extends to optimization and troubleshooting of workflows, working with Researchers and Software Engineers daily, as well as our Networks team and external vendors, including DELL, VAST, HPE and, RUBRIK.
Responsibilities:
Design, implement, and manage multi-PB storage systems to support the compute environment.
Implement automated processes and procedures consistent with a strategy of Infrastructure-as-code for a consistent, repeatable storage product.
Leverage experience and expertise across all things storage related to act as an expert for a wide array of projects
Work with customers to streamline workflows for optimal storage usage.

In this role, you will leverage your expertise in networking and software engineering to automate the provisioning, configuration, and observability of complex network environments. You will collaborate closely with network engineers, platform engineers, and security teams to ensure that our networking stack is reliable, secure, and efficient.
This is a hands-on engineering role where you will design and implement automation solutions that support large-scale compute and research workflows, driving innovation in how modern networks are managed.
Key responsibilities
• Design, build, and maintain automation frameworks that simplify network provisioning and lifecycle management
• Develop software in Python and leverage tools such as Ansible, Terraform, and Jinja2 to implement Infrastructure-as-Code practices
• Integrate network automation into CI/CD pipelines, enabling testable, repeatable, and reliable deployments
• Build APIs and tooling to expose networking capabilities as self-service for engineering teams
• Implement observability workflows for network performance, availability, and telemetry using modern monitoring stacks
• Collaborate with Network Engineering, Security, and Platform teams to deliver resilient and scalable automation solutions
• Participate in on-call rotations and incident response, contributing to the reliability of production systems
• Contribute to architectural design discussions, ensuring automation is built for long-term maintainability and scalability

We are seeking a highly skilled Kubernetes Engineering Manager with a focus on HPC to join our Platform Engineering function in Dallas. Kubernetes underpin all facets of our Research platforms and HPC estate here at NMC². As the HPC Kubernetes Engineering Manager you will take ownership of the strategic roadmap, design and delivery of our Kubernetes platform. In addition, you will focus on continuous optimizations and performance enhancements of our kubernetes platform as Research demands augment. We are looking for a highly experienced technical manager who can lead the significant scaling up our existing compute platforms and who excels working on the bleeding edge of technology; pushing the boundaries of HPC compute performance and providing an innovative approach to solving complex technical challenges that arise. The HPC Kubernetes Engineering Manager will collaborate closely with the Kubernetes Platform Management team to ensure a smooth transition of new engineering capabilities, with a strong focus on operational excellence in all aspects of design and implementation.
Responsibilities:
Strong leadership and strategic vision in the design, deployment and scaling of a high-performance kubernetes platform
Pro-active stakeholder engagement, ensuring the Kubernetes platform supports broader business outcomes and research demands
Confident communication and collaboration, you will help drive cross functional engineering initiatives across the Technology and Research organizations
Vendor Management experience, working closely with our key vendors providing continuous feedback to leverage and influence roadmaps and ensuring efficient and timely deployment, support and maintenance of critical platforms
People leadership, managing and developing engineers and a high performing team across the UK and US
A deep understanding of emerging trends and technologies in the Kubernetes ecosystems, working closely with Architecture and Innovation Teams to appraise and adopt
Ensuring platforms are reliable, highly available and secure, managed with a DevOps mindset and Infrastructure-as-Code toolset
Budget control, capacity forecasting and management

The Cloud Platform Architect will define and guide the platform’s hybrid and fit for purpose multi-cloud strategy, ensuring secure, efficient, and scalable access to cloud resources. This role is responsible for designing multi-tenancy models (landing zones, access brokering) and hybrid cloud networking. This role will partner with the Cloud Engineering team to define the service roadmap, and enabling automation and self-service provisioning through Infrastructure as Code (IaC). The architect will streamline cloud provider usage, integrate cloud identity with Active Directory, and ensure cloud security controls are consistently applied. They will also lead capability mapping and gap analysis across organizational units and customer-facing needs.
Responsibilities:
Provide architecture/design guidance for secure, efficient, and scalable cloud access across Fit for FP cloud providers like AWS (primary), Azure, and hybrid environments.
Help enable and curate the Cloud Catalog for the platform consumers; define the developer experience/consumption of the same
Architect multi-tenancy models using landing zones, access brokering, and guardrails to securely isolate multiple customers.
Collaborate with the Cloud Engineering team to define service enablement roadmap (which cloud services to onboard, when, and how).
Rationalize and streamline cloud provider usage, reducing duplication and aligning with enterprise service catalog.
Define and enforce cloud security architecture: IAM, encryption, network segmentation, compliance-by-design.
Architect identity integration between enterprise Active Directory (AD) and cloud identity providers (AWS IAM, Azure AD, SSO).
Drive IaC automation and self-service provisioning using IAC frameworks and Service Catalogs.
Perform capability mapping and gap analysis across organizational units and customer-facing needs to inform roadmap and investments.
Provide cloud cost visibility, governance guardrails, budget actuals vs projections, right-sizing guidance and blueprints for the same
Partner with Security, Platform, and Solution Architects to ensure consistency across cloud/on-prem platforms.
Maintain Architecture Decision Records (ADRs) for cloud platform choices and patterns.

We are looking for an experienced and strategic UX Architect to join our Product team. This role is ideal for someone who wants to remain hands-on as an individual contributor, while also mentoring and less experienced team members.
As a UX Architect, you will be responsible for shaping the user experience across complex technical products and services. You will focus on information architecture, interaction design, and usability, ensuring our solutions are intuitive, accessible, and aligned with user needs. You will also play an advisory role, helping to elevate design practices across the team through mentorship and knowledge sharing.
Responsibilities
• Defining and documenting the overall information architecture and interaction models for our products and services.
• Developing user journeys, wireframes, and prototypes that communicate design concepts clearly.
• Collaborating closely with product managers, engineers, and designers to ensure cohesive and user-centered product design.
• Conducting and supporting user research and usability testing, translating insights into actionable design improvements.
• Establishing and advocating for UX best practices, accessibility standards, and design guidelines.
• Mentoring other team members, providing feedback, guidance, and support in their professional development.
• Contributing to the evolution of design systems and shared design resources.

We are seeking an experienced Solutions Architect to drive the design and delivery of scalable, resilient solutions within new data center halls. This role will bridge customer requirements, technology platforms, and infrastructure reference architectures to ensure the white space design aligns with business demand, HPC workloads, and emerging technology trends.
The ideal candidate will act as the glue between product, infrastructure, and operations teams, translating technical and business needs into integrated solutions that scale.
Responsibilities
• Own solution-level design and alignment across compute, storage, networking, and facilities systems within new data center builds.
• Partner with product teams, HPC customers, and operations to translate workload requirements into scalable data center white space solutions.
• Define reference architectures that incorporate compute/GPU density, storage ratios, and network integration strategies.
• Collaborate with infrastructure architects, digital architects, and network architects to deliver holistic designs.
• Create solution roadmaps that segment delivery into implementable phases while meeting near-term demand.
• Support proof-of-concept efforts and performance baseline testing to validate solution viability.
• Ensure designs meet compliance, resiliency, and sustainability requirements.
• Act as the customer-facing technical partner for complex solution discussions.

We are seeking a highly skilled Senior Kubernetes Engineer to join our HPC and Infrastructure function in Dallas. In this role, you will design, implement, and optimize GPU-accelerated container platforms at scale, enabling high-performance workloads (AI/ML, HPC, LLM training) across hybrid or on-prem environments. You will have deep expertise with both NVIDIA and Kubernetes ecosystems, including GPU scheduling, device plugins and custom operators.
Responsibilities:
Architecting and operating Kubernetes clusters optimised for GPU workloads, leveraging NVIDIA GPU Operator, Network Operator and DCGM
Developing, deploying and maintaining custom Kubernetes operators and controllers to automate infrastructure services
Integrating NVIDIA device plugins, Multi-Instance GPU (MIG) and GPU sharing features into the scheduling layer
Optimizing GPU utilization and job placement through scheduler extensions, such as kube-scheduler plugins, Slurm and Volcano
Collaborating with HPC, ML and DevOps teams to ensure multi-tenant, high-throughput cluster performance
Driving observability and telemetry integrations using Prometheus, Grafana, DCGM Exporter and OpenTelemetry
Implementing secure multi-user and multi-namespace GPU isolation, with RBAC and policy enforcement, such as OPA or Gatekeeper
Maintaining CI/CD pipelines for Kubernetes infrastructure using GitOps, ArgoCD and FluxCD
Contributing to infrastructure-as-code, using Terraform, Helm, and Kustomize
Participating in performance tuning, incident response and production readiness reviews

The Calc Farm Platform Services team develops and manages a large high-performance compute (HPC) platform to enable the business to conduct complex research at scale. We are seeking a highly motivated person to join our team to help us continue to push the envelope running batch workloads on Kubernetes.
The ideal candidate will have an active interest in Kubernetes and batch computing, a broad range of experience with software engineering and development, as well as experience managing large-scale infrastructure and complex tooling environments.
The main focus will be on Armada - an exciting open source CNCF project built and maintained by the team - which we use to solve multi-cluster Kubernetes batch job scheduling at scale.
You’ll join an experienced team, working at the cutting-edge of ML workloads and at scale.
Key responsibilities of the role include:
• Designing and developing high-quality software solutions using procedural programming languages, with a focus on Golang
• Building and maintaining highly scalable, highly available and globally distributed systems to support large-scale research workloads
• Managing and optimizing data interactions across relational and non-relational databases, particularly PostgreSQL
• Developing and operating containerized applications within Kubernetes, ensuring effective orchestration and workload scheduling
• Supporting, tuning and troubleshooting Linux-based systems as part of our core compute platform
• Applying core networking knowledge to help debug, optimize and enhance platform connectivity and performance
• Independently diagnosing and resolving complex technical issues across infrastructure and software layers
• Applying solid software architecture principles, computer science fundamentals and data structure knowledge to guide design decisions and code quality
• Driving continuous improvement by contributing to CI/CD pipelines and engineering best practices
• Staying up to date with emerging technologies and approaches, and applying new knowledge across disciplines

As the Manager of HPC Solutions Architecture, you will oversee a team of domain-specialist architects spanning compute, storage, security, Kubernetes, networking, and systems integration. This team engages directly with customers to design and deliver high-performance computing (HPC) solutions that are secure, scalable, resilient, and optimized for workload-specific needs.
You will guide your team through the entire customer lifecycle — from early engagement and requirements discovery, through solution design, proof-of-concepts, and deployment, to ongoing optimization and adoption. Your role will ensure that customers achieve measurable value at every stage while maintaining the flexibility to scale as their workloads evolve.
In addition to managing delivery, you will serve as a trusted advisor and strategic partner, building strong customer relationships and ensuring architectures align with both technical priorities and business objectives. You will also collaborate closely with product and engineering teams, turning field insights into reference architectures, reusable design patterns, and platform enhancements that strengthen the overall ecosystem.
This role offers the opportunity to shape the strategic direction of HPC solutions, influence product innovation, and establish best practices that drive customer success, platform scalability, and architectural excellence.
Responsibilities:
• Lead, mentor, and develop a high-performing team of Solutions Architects across compute, storage, Kubernetes, networking, security, and systems integration
• Act as the strategic link between customers, your team, and engineering, ensuring alignment between customer outcomes and platform evolution.
• Build and maintain trusted advisor relationships with customers, enabling them to maximize the value of HPC architectures.
• Oversee the creation of reference architectures and design blueprints to ensure repeatable, scalable solutions.
• Guide proof-of-concept initiatives, validating solution performance and accelerating customer confidence in adoption.
• Conduct technical design reviews and workload assessments, identifying opportunities to improve efficiency, resilience, and cost-effectiveness.
• Recommend strategic design choices across compute, storage, networking, data pipelines, and security to align with customer-specific workloads.
• Act as a trusted partner to product and engineering, providing field insights that shape platform roadmaps and features.
• Encourage prototyping of new architectural approaches for emerging HPC and AI/ML workloads, translating innovation into production-ready solutions.
• Ensure the team maintains high-quality documentation, reusable patterns, and best practices for consistent delivery.
• Stay current with emerging technologies (GPUs, accelerators, interconnects, distributed storage, orchestration frameworks) and guide clients in adoption.
• Represent the organization at client workshops, technical deep-dives, and industry events, occasionally requiring travel.
• Champion a culture of customer success, technical excellence, and continuous improvement across the Solutions Architecture team.

While this role does not have direct reports, it requires strong cross-functional leadership to align product management, engineering, operations, and executive stakeholders. The Program Manager will establish and drive program governance, optimize resource allocation, and ensure execution excellence at a program level, balancing priorities across multiple product development initiatives.
KEY RESPONSIBILITIES:
Program Strategy & Execution:
• Lead multiple concurrent NPI programs, ensuring alignment with business goals and delivering on time, within budget, and to quality standards.
• Drive program governance, execution frameworks, and risk mitigation to optimize efficiency, decision-making, and product launch success.
Cross-Functional Leadership & Stakeholder Engagement:
• Act as the primary accountability leader, influencing cross-functional teams and executives to drive alignment and resolve execution challenges.
• Provide executive-level updates, risk assessments, and trade-off recommendations to ensure visibility and stakeholder engagement.
Program Standardization & Best Practices:
• Develop and implement standardized methodologies, governance models, and execution frameworks to enhance consistency and scalability.
• Ensure adherence to stage-gate processes, quality assurance, and risk management strategies to mitigate delays and execution risks.
Risk Management & Resource Optimization:
• Identify and proactively mitigate risks, ensuring contingency plans are in place to maintain schedule and budget targets.
• Optimize resource allocation across programs, balancing personnel, budget, and capacity constraints to improve delivery efficiency.
Performance Monitoring & Business Impact:
• Track Key Performance Indicators for schedule adherence, budget performance, cost efficiency, and commercialization readiness, holding teams accountable for results.
• Use data-driven insights and dashboards to align program execution with business and market strategies, driving continuous improvement.
YOU HAVE:
• Bachelor’s degree in Engineering, or a related technical field (PMP, PgMP, or equivalent certification preferred).
• Ideally 12+ years of experience in program/project management, with a strong focus on NPI or complex product development programs.
• Proven track record to lead multiple concurrent projects and drive program-level execution across cross-functional teams.
• Strong expertise in stage-gate processes, program governance, and risk management.
• Experience implementing and scaling PMO methodologies, tools, and standard processes.
• Exceptional stakeholder management, executive communication, and decision-making abilities.
• Experience leading program budgets, aligning program execution with business objectives, and optimizing resource allocation.
• Willingness to travel up to 20% for collaboration and program execution.


We are seeking an experienced Storage Engineer who enjoys being challenged, appreciates an open and collaborative organizational structure, and thrives in a fast-paced environment. In this role, you will manage our existing storage infrastructure and be responsible for the design, automation, and management of future storage solutions to support our large-scale research environment. This is an exciting opportunity to join a small team focused on HPC storage and help set the direction for storage solutions at HRT. You’ll have plenty of freedom to research, propose, and test new hardware and software to improve the performance and usability of our research environment.
Responsibilities
- Design and manage a variety of storage solutions
- Research and experiment with new storage solutions
- Troubleshoot complex storage, OS, and networking issues
- Build tools to improve the performance and monitoring of storage clusters

As a Computer Specialist I, II, III, or Senior, you will play a crucial role in optimizing and managing our high-performance computing infrastructure and supporting network to support complex scientific, engineering, and research applications at the Mississippi State University (MSU) High Performance Computing Collaboratory (HPC2). Must be a U.S. Citizen or Permanent Resident.
Area of Specialization:
Linux Operating Systems
Essential Duties and Responsibilities:
1. Install, configure, and manage Linux operating systems on physical and virtual servers and desktops and HPC clusters.
2. Plan and implement system upgrades, migrations, life cycle management, and change management processes
3. Create, manage, and support virtual machines (VMs) and containers based on business needs.
4. Monitor and optimize system performance, including resource allocation, load balancing, and recommend hardware/software upgrades.
5. Perform monitoring and troubleshooting of system hardware, software, and operating systems to maintain systems and preserve data integrity.
6. Provide user and system support, education, and training including coordination with computing staff for integrating services such as file systems, printing, and other system resources.
7. Implement and maintain security best practices across the computing environments.
8. Develops and maintains the software infrastructure for the HPC environment, including user interfaces, utility scripts, and software stacks.
9. Develop and maintain automated processes for system management.
10. Maintain comprehensive documentation of system configurations, processes, and procedures.
11. Stay current with industry trends and technologies related to systems administration, virtualization, and HPC technologies.
12. Manage, train, and provide operational oversight to junior-level system administrators; collaborate with cross-functional IT teams to ensure seamless integration of systems and services.
13. Perform duties as assigned in a responsible manner.

As a Computer Specialist I, II, III, or Senior, you will play a crucial role in optimizing and managing our high-performance computing infrastructure and supporting network to support complex scientific, engineering, and research applications at the Mississippi State University (MSU) High Performance Computing Collaboratory (HPC2).
Area of Specialization:
Windows Operating Systems
Essential Duties and Responsibilities:
1. Install, configure, and manage Windows operating systems on physical and virtual servers and desktops.
2. Upgrades, migrations, life cycle management, and change management processes
3. Create, manage, and support virtual machines (VMs) and virtual desktop infrastructure based on business needs.
4. Monitor and optimize system performance, including resource allocation, load balancing, and recommend hardware/software upgrades.
5. Perform basic troubleshooting of hardware, software, and operating systems, escalating complex issues to appropriate technical staff as needed.
6. Provide user and system support, including coordination with computing staff for integrating services such as file systems, printing, and other system resources.
7. Implement and maintain security best practices across the computing environments.
8. Package software and schedule automated, remote deployments.
9. Maintain comprehensive documentation of system configurations, processes, and procedures.
10. Stay current with industry trends and technologies related to systems administration and virtualization.
11. Manage, train, and provide operational oversight to junior-level system administrators; collaborate with cross-functional IT teams to ensure seamless integration of systems and services.
12. Perform other duties as assigned.
Supervisory Responsibility: This position will occasionally lead small teams and assist in training, developing and evaluation of lower-level computer specialists. This position provides hands-on maintenance of advance research computing infrastructure to support researchers.

Job Family Definition:
Designs, develops and applies programs, methodologies and systems based on advanced analytic models (e.g. advanced statistics, operations research, computer science, process) to transform structured and unstructured data
into meaningful and actionable information insights that drive decision making. Uses visualization techniques to translate analytic insights into understandable business stories (eg. descriptive, inferential and predictive insights). Embeds analytics into client’s business processes and applications. Combines business acumen and scientific methods to solve business problems.
Management Level Definition:
Contributes to assignments of limited scope by applying technical concepts and theoretical knowledge acquired through specialized training, education, or previous experience. Acts as team member by providing information, analysis and recommendations in support of team efforts. Exercises independent judgment within defined parameters.
Responsibilities:
• Participates in the analysis and validation of data sets/solutions/user experience.
• Aids in the development, enhancement and maintenance of a client's metadata based on analytic objectives. May load data into the infrastructure and contributes to the creation of the hypothesis matrix. Prepares a portion of the data for the Exploratory Data Analysis (EDA) / hypotheses.
• Contributes to building models for the overall solution, validates results and performance. Contributes to the selection of the model that supports the overall solution.
• Supports the research, identification and delivery of data science solutions to problems.
• Supports visualization of the model's insights, user experience and configuration tools for the analytics model.

Job Family Definition:
The Cloud Developer builds from the ground up to meet the needs of mission-critical applications, and is always looking for innovative approaches to deliver end-to-end technical solutions to solve customer problems. Brings technical thinking to break down complex data and to engineer new ideas and methods for solving, prototyping, designing, and implementing cloud-based solutions. Collaborates with project managers and development partners to ensure effective and efficient delivery, deployment, operation, monitoring, and support of Cloud engagements. The Cloud Developer provides business value expertise to drive the development of innovative service offerings that enrich HPE's Cloud Services portfolio across multiple systems, platforms, and applications.
Responsibilities:
• Collaborate with cross-functional teams to design, develop, and implement cloud solutions tailored to meet business needs.
• Assist in the deployment and configuration of cloud infrastructure, platforms, and services. Contribute to the optimization and automation of the cloud deployment processes to improve efficiency and scalability.
• Perform testing and troubleshooting of cloud systems to ensure reliability, performance, and security.
• Stay up to date with industry trends and best practices in cloud computing to provide insights and recommendations for continuous improvement.
• Support ongoing cloud operations and provide technical assistance as needed.

Job Family Definition:
The Cloud Developer builds from the ground up to meet the needs of mission-critical applications, and is always looking for innovative approaches to deliver end-to-end technical solutions to solve customer problems. Brings technical thinking to break down complex data and to engineer new ideas and methods for solving, prototyping, designing, and implementing cloud-based solutions. Collaborates with project managers and development partners to ensure effective and efficient delivery, deployment, operation, monitoring, and support of Cloud engagements. The Cloud Developer provides business value expertise to drive the development of innovative service offerings that enrich HPE's Cloud Services portfolio across multiple systems, platforms, and applications.
Responsibilities:
• Collaborate with cross-functional teams to design, develop, and implement cloud solutions tailored to meet business needs.
• Assist in the deployment and configuration of cloud infrastructure, platforms, and services.
• Contribute to the optimization and automation of the cloud deployment processes to improve efficiency and scalability.
• Perform testing and troubleshooting of cloud systems to ensure reliability, performance, and security.
• Stay up to date with industry trends and best practices in cloud computing to provide insights and recommendations for continuous improvement.
• Support ongoing cloud operations and provide technical assistance as needed.

Job Family Definition:
Designs, develops, troubleshoots and debugs software programs for software enhancements and new products. Develops software including operating systems, compilers, routers, networks, utilities, databases and Internet-related tools. Determines hardware compatibility and/or influences hardware design.
Management Level Definition:
Contributes to assignments of limited scope by applying technical concepts and theoretical knowledge acquired through specialized training, education, or previous experience. Acts as team member by providing information, analysis and recommendations in support of team efforts. Exercises independent judgment within defined parameters.
Responsibilities:
• Codes and programs enhancements, updates, and changes for portions and subsystems of systems software, including operating systems, compliers, networking, utilities, databases, and Internet-related tools
• Executes established test plans and protocols for assigned portions of code; identifies, logs, and debugs assigned issues.
• Develops understanding of and relationship with internal and outsourced development partners on software systems design and development.
• Participates as a member of a project team of other software systems engineers and internal and outsourced development partners to develop reliable, cost effective and high quality solutions for low to moderately- complex products.

Job Family Definition:
Designs, develops, troubleshoots and debugs software programs for software enhancements and new products. Develops software including operating systems, compilers, routers, networks, utilities, databases and Internet-related tools. Determines hardware compatibility and/or influences hardware design.
Management Level Definition:
Contributes to assignments of limited scope by applying technical concepts and theoretical knowledge acquired through specialized training, education, or previous experience. Acts as team member by providing information, analysis and recommendations in support of team efforts. Exercises independent judgment within defined parameters.
Responsibilities:
• Codes and programs enhancements, updates, and changes for portions and subsystems of systems software, including operating systems, compliers, networking, utilities, databases, and Internet-related tools
• Executes established test plans and protocols for assigned portions of code; identifies, logs, and debugs assigned issues.
• Develops understanding of and relationship with internal and outsourced development partners on software systems design and development.
• Participates as a member of a project team of other software systems engineers and internal and outsourced development partners to develop reliable, cost effective and high quality solutions for low to moderately- complex products.

NVIDIA's CUDA-X libraries are essential, visible and growing both inside and outside of NVIDIA. We need a self-starting product manager to continue growing these math libraries. Do you have the rare blend of both technical, positioning and communication skills? Are you passionate about groundbreaking technology? If so, we would love to learn more about you!
What You’ll Be Doing:
Leading: Take a leadership role in defining strategic go-to-market plans that include product launches, campaigns, and events for the CUDA-X libraries across key industries. This includes effective messaging, positioning, and market research.
Building: Bring ideas to life through content creation of sales/partner enablement assets such as presentation decks, blogs, whitepapers, webinars, demos, and more.
Presenting: Engage with various audiences (internal and external), delivering ideas clearly with confidence using creative approaches, translating technology capabilities to messages that resonate
Collaborating: Team up with product managers, product marketing teams, campaign marketing, developer relations, BD, sales, and PR teams to ensure alignment and execution of marketing plans.
Learning: Bring back industry news, direct developer/customer conversations, and share critical insights with internal teams
What We Need To See:
Undergraduate degree or equivalent experience in computer science, mathematics, computer engineering, or relevant technical field
8+ years of combined experience in a product management or technical role at a technology company working with libraries, SDKs, and/or toolkits
Familiarity with linear algebra, numerical methods, and their application in AI, HPC, or scientific computing
Strong technical foundation in GPU computing, CUDA, or parallel programming models
Excellent communication and presentation skills, with the ability to work across engineering, marketing, and business teams
Proven ability to translate technical capabilities into developer and customer value
Ability to prioritize multiple projects and work independently with minimal direction
Ways To Stand Out From The Crowd:
Direct experience with CUDA-X math libraries
Prior experience with basic linear algebra, direct linear solvers and eigen solvers
Strong knowledge and understanding of the HPC and AI markets and it's key players
Prior success in building developer ecosystems or working with ISVs in HPC/AI
Experience driving go-to-market strategy and customer engagements for technical products
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 258,750 USD.
You will also be eligible for equity and benefits.

As NVIDIA Product Managers, our goal is to enable developers to be successful on the NVIDIA Platform, and push the boundaries of what is possible with their AI deployments! For Inference, we are the champions inside NVIDIA for AI developers looking to accelerate their deployments on GPUs. We work directly with developers inside and outside of the company to identify key improvements, create roadmaps, and stay alert on the inference landscape. We also work with NVIDIA leaders to define clear product strategy, and marketing team teams to build go-to-market plans. The Product Management organization at NVIDIA is a small, strong, and impactful group. We focus on enabling deep learning across all GPU use cases and providing extraordinary solutions for developers. We are seeking a rare blend of product skills, technical depth, and drive to make NVIDIA great for developers. Does that sounds familiar? If so, we would love to hear from you!
What you'll be doing:
Architect developer-focused products that simplify high-performance inference and training deployment across diverse GPU architectures.
Define the multi-year strategy for kernel and communication libraries by analyzing performance bottlenecks in emerging AI workloads.
Collaborate with CUDA kernel engineers to design intuitive, high-level abstractions for memory and distributed execution.
Partner with open-source communities like Triton and FlashInfer to shape and drive ecosystem-wide roadmaps.
What we need to see:
5+ years of technical PM experience shipping developer products for GPU acceleration, with expertise in HPC optimization stacks.
Expert-level understanding of CUDA execution models and multi-GPU protocols, with a proven track record to translate hardware capabilities into software roadmaps.
BS or MS or equivalent experience in Computer Engineering or demonstrated expertise in parallel computing architectures.
Strong technical interpersonal skills with experience communicating complex optimizations to developers and researchers.
Ways to stand out from the crowd:
PhD or equivalent experience in Computer Engineering or a related technical field.
Contributed to performance-critical open-source projects like Triton, FlashAttention, or TVM with measurable adoption impact
Crafted GitHub-first developer tools with >1k stars or similar community engagement metrics
Published research on GPU kernel optimization, collective communication algorithms, or ML model serving architectures
Experience building cost-per-inference models incorporating hardware utilization, energy efficiency, and cluster scaling factors
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 144,000 USD - 218,500 USD for Level 3, and 168,000 USD - 258,750 USD for Level 4.
You will also be eligible for equity and benefits.

This role operates at the hardware–software boundary of the most advanced computer systems in the world. As demand for compute grows while power and space plateau, the future of data centers depends on innovation across all layers of silicon and platform design. To elevate the quality, security, economics, and capabilities of accelerated computing platforms, we rely on close partnership with internal teams, customers, and vendors across the design cycle. This role reports into the Product Management team for NVIDIA Accelerated CPU Products (like Grace and Vera), helping orchestrate teamwork between these 3 components:
What you'll be doing:
Core Software and Firmware Enablement
We're looking for someone to drive alignment between teams responsible for architecting NVIDIA CPUs, as well as CUDA, GPU drivers, Linux Kernel and distros, BIOS, BMC and IBVs partners, drive enablement plans, customer and partner adoption of key features like Confidential Compute, MPAM, advanced power management, and virtualization, and ensuring components, documentation, and recipes are in place for our entire accelerated compute product portfolio to deliver an excellent out-of-box experience across multiple product lines and their customers.
Telemetry, Quality, and Diagnosability
Partnering with our HGX and DGX product teams, CPMs, engineers, and customers to understand active issues, communicate stages of resolution to our executive staff, and drive tactical and strategic (roadmap-based) closure of gaps. Handle expectations and internal reporting; act as a buffer between high-priority customer and deployments, their account teams, the product organization and engineering.
Guiding component and system-level quality targets in partnership with platform and quality PMs. Help our teams understand and plan for comprehensive in-system telemetry and debugging in future products.
Drive the distribution and packaging of workarounds, patches and mitigations or experimental firmware as needed.
3rd Party Hardware Ecosystem
Driving partnership and alignment with key 3rd parties responsible for memory modules, PCIe add-in cards, and more. Ensuring the correct hardware, business, documentation, and software capabilities are present to integrate with NVIDIA platforms.
This is a high-impact role on a fast-moving and central NVIDIA product team, and requires exceptional:
Communication skills: ability to express sophisticated ideas succinctly and at audience appropriate altitude.
Motivation: self-starting with the ability to drive frequent progress on multiple cross-team efforts.
Ability to Grapple with the Unknown: role definitions and product needs are very fluid, and paths to success will lack clear definition at times— especially in flat cultures like NVIDIA. For the right person, this is an opportunity to experience incredible breadth, and to “choose your own adventure.”
Technical and Architectural Capability: this role sits at the intersection of hardware and system software. EE / CS familiarity, preferably with a computer architecture background and working knowledge of the Linux kernel and boot process is required.
Curiosity and Intellectual Honesty: solving problems requires looking around corners, asking “why,” and the clarity and flexibility to occasionally reflect on and accept a change of paths.
Analytical Thinking: able to build objective measurements of schedules, economic trade-offs, and competitive insights.
Collaboration: we are all here to work on incredible tech that sparks joy, and and to take pride in getting stuff done. There is more than enough impact and scope for everyone to be successful doing it — hero PMs help the product, their partners and their colleagues succeed together.
What we need to see:
Bachelor's or Master’s degree in Computer Science or Computer Engineering (or equivalent experience).
10+ years of experience as a technical product or program manager in a multifaceted, fast-paced, high-tech environment.
Proven understanding of the datacenter hardware and server ecosystem. Familiarity with Arm CPUs is a plus.
Excellent project management and cross-functional leadership skills.
Strong communication and teamwork skills.
Ability to lead multiple projects and priorities in a fast-paced environment.
Want to sit at the heart of the world’s top AI platforms, wear many hats, solve strategic and technical problems, and build the future with a close-knit team of outstanding PMs? We’d love to hear from you.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 258,750 USD for Level 4, and 208,000 USD - 327,750 USD for Level 5.

Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world!
What you'll be doing:
NVIDIA's Accelerated Computing team sits is a driving force behind the explosion of Machine Learning, Artificial Intelligence and High-Performance Computing. We are looking for a highly capable individual with a consistent track record in technology and the skills for GPU product definition for Data Center. We are a small, dynamic, and motivated team that defines the next generation of products for these high growth markets.
Develop a deep understanding of datacenter workloads and applications, especially around Large Language Models, for training and inference
Find opportunities where we uniquely can address customer needs, and translate these into compelling GPU value proposition and product proposals
Work cross functionally with our partners, engineering, operations & field teams to develop comprehensive product go to market plans and product positioning
Work with our sales organization to develop effective sales collateral and tools
What we need to see:
7+ years of total experience in technology with previous product management, AI related engineering, design or development experience highly valued
BS or MS or equivalent experience in engineering, computer science, or another technical field. MBA a plus.
Demonstrated ability to fully contribute in one or more of the areas above within 3 months
Knowledge of, and capable of explaining GPU, software and computing architectures
Strong desire to learn, motivated to tackle complex problems and the ability to make sophisticated trade-offs
Ability to work closely with cloud partners
Ways to stand out from the crowd:
Strong background in developing or deploying large scale GPU based AI applications, like Large Language Model, for training and inference
Direct experience in building or leading cloud computing infrastructure and technologies
Exposure to bringing accelerated computing systems to market
Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 258,750 USD for Level 4, and 208,000 USD - 327,750 USD for Level 5.
You will also be eligible for equity and benefits.

What you'll be doing:
NVIDIA's Accelerated Computing team is a driving force behind the explosion of Machine Learning, Artificial Intelligence and High-Performance Computing. We are looking for a highly capable individual with a consistent track record in technology and the skills for GPU product definition for Data Center. We are a small, dynamic, and motivated team that defines the next generation of products for these high growth markets.
Guide the architecture of the next-generation of GPUs through an intuitive and comprehensive grasp of how GPU architecture affects performance for datacenter applications, especially Large Language Models (LLMs)
Drive the discovery of opportunities for innovation in GPU, system, and data-center architecture by analyzing the latest data center workload trends, Deep Learning (DL) research, analyst reports, competitive landscape, and token economics
Find opportunities where we uniquely can address customer needs, and translate these into compelling GPU value proposition and product proposals
Distill sophisticated analyses into clear recommendations for both technical and non-technical audiences
What we need to see:
5+ years of total experience in technology with previous product management, AI related engineering, design or development experience highly valued
BS or MS or equivalent experience in engineering, computer science, or another technical field. MBA a plus.
Deep understanding of fundamentals of GPU architecture, Machine Learning, Deep Learning, and LLM architecture with ability to articulate relationship between application performance and GPU and data center architecture
Ability to develop intuitive models on the economics of data center workloads including data center total cost of operation and token revenues
Demonstrated ability to fully contribute to above areas within 3 months
Strong desire to learn, motivated to tackle complex problems and the ability to make sophisticated trade-offs
Ways to stand out from the crowd:
2+ years direct experience in developing or deploying large scale GPU based AI applications, like LLMs, for training and inference
Ability to quickly develop intuitive, first-principles based models of Generative AI workload performance using GPU and system architecture (FLOPS, bandwidths, etc.)
Comfort and drive to constantly stay updated with the latest in deep learning research (academic papers) and industry news
Track record of managing multiple parallel efforts, collaborating with diverse teams, including performance engineers, hardware architects, and product managers
Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 144,000 USD - 218,500 USD for Level 3, and 168,000 USD - 258,750 USD for Level 4.
You will also be eligible for equity and benefits.

What You Will Be Doing:
Lead Technical Engagement: Engage with senior technical leaders and research teams at AI model builders. Optimize their workflows by leveraging NVIDIA's complete stack for their end-to-end generative AI workflows. Serve as a primary technical point of contact.
Drive Integration: Accelerate the technical integration of NVIDIA's core generative AI technologies. This includes NVIDIA GPU architectures, DGX systems, high-performance networking (InfiniBand), CUDA-X libraries, NeMo frameworks, and inference libraries like TensorRT. Integrate these into the training and inference pipelines of large model builders.
Strengthen Partnerships: Support and strengthen technical implementation plans with partner AI engineering and researchers. Define clear technical objectives, performance breakthroughs, and timelines. Align these with their long-term model development goals and NVIDIA's AI strategy.
Influence Product Roadmaps: Represent the software needs of large model builders to internal NVIDIA product and engineering teams. Contribute to product roadmap decisions by synthesizing findings from large-scale model training and inference environments. Identify cross-industry patterns and advocate for improvements to NVIDIA's core technologies.
Maintain Strategic Relationships: Conduct regular cadence meetings. Document insights, track progress, and provide consistent internal reporting on the adoption and impact of NVIDIA technologies.
Showcase Best Practices: Share standard methodologies for crafting and optimizing highly scalable generative AI model development pipelines across all stages. Focus on the context of large model development.
Stay Updated: Keep current with the latest NVIDIA hardware, libraries, and system updates. Proactively share relevant insights and optimizations with partner model development teams.
What We Need To See:
B.S. degree or equivalent experience.
7+ years of experience in technical product or engineering roles. Focus areas include AI/ML, high-performance computing, or distributed systems. Emphasis on core technology integration and partner collaborations is key.
Extensive experience working with or developing platforms that facilitate large-scale AI/ML training and inference workloads. This includes distributed systems, data infrastructure, and groundbreaking GPU cluster technologies.
Hands-on knowledge of large model architectures (e.g., Transformers, Diffusion Models). Familiarity with core deep learning frameworks (e.g., PyTorch, JAX), and NVIDIA AI acceleration libraries (e.g., CUDA, cuDNN, NCCL, TensorRT, NeMo). Understand techniques for model customization, distributed training, and inference orchestration.
Strong understanding of compute infrastructure environments. This includes GPU cluster management, high-speed networking, parallel file systems, and deployment across on-premise and cloud infrastructures. Possess specific understanding of how large model builders operate at scale.
Proven ability to communicate and influence senior leadership across engineering and research leaders at partner organizations. Link NVIDIA technology capabilities to crucial AI model development and business value.
Successfully navigated fast-paced environments, taking decisive action to achieve results. Especially valuable in AI research collaborations.
Skilled at connecting with engineers, researchers, executives, and multi-functional teams.
Ways to Stand Out From The Crowd:
Hands-on experience with large language models (LLMs), diffusion models, distributed training frameworks, and advanced optimization techniques. Ability to prototype quickly and integrate into model development pipelines.
Influence complex product and research decisions by nurturing positive relationships and understanding model builder needs.
Eager drive, strategic curiosity. Anticipate market trends in AI, shape NVIDIA's roadmap, and champion innovation. Understand the large model builder landscape.
Act as a technical advocate for NVIDIA GPU systems and software stack within assigned large model builder partners. Showcase its technical capabilities and strong value proposition.
Understanding of large-scale system performance optimization, container orchestration (e.g., Kubernetes), and Cloud Native technologies for AI workloads.
Join NVIDIA at a crucial time as we pioneer Generative AI growth. We are in the infancy stage of building and scaling our Generative AI business for large model development. This role offers a unique opportunity to join this rapid expansion. NVIDIA's hardware, systems, and software libraries are at the heart of this growth. They empower large model builders to revolutionize their operations with powerful AI capabilities. This is your chance to be a key member of a team that will shape the future of AI model development, working with the world's leading AI research labs and the most innovative technologies. Your contributions will directly impact the trajectory of our Generative AI success, making this an unparalleled opportunity for professional growth and significant impact.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD.
You will also be eligible for equity and benefits.

This role will focus on AI inference at scale, ensuring that customers and partners understand how to best leverage the potential of NVIDIA’s accelerated computing solutions. You will collaborate closely with engineering, product teams, field teams, and ecosystem partners to develop compelling technical narratives, competitive positioning, and technical content that drives adoption.
What You’ll Be Doing:
Develop Technical Positioning & Messaging – Translate NVIDIA’s AI inference and accelerated computing technologies into clear, impactful messaging that resonates with hyperscale data centers and enterprise AI customers.
Build Foundational Content – Develop whitepapers, technical blogs, solution briefs, presentations, explainer videos, and demos that highlight NVIDIA’s AI inference capabilities.
Engage with Engineering & Product Teams – Work closely with internal teams to deeply understand product features, roadmaps, and competitive differentiators.
Conduct Competitive Analysis – Analyze competitors' hardware and software solutions, using data and customer feedback to inform NVIDIA’s positioning.
Support Sales & Partner Enablement – Develop training materials, sales enablement tools, and technical content to empower internal teams, partners, and customers.
Lead Go-To-Market Execution – Partner with campaign marketing, field operations, and partner marketing to ensure seamless product launches and market adoption.
Improve NVIDIA’s AI Thought Leadership – Contribute to industry-firsts, customer success stories, analyst briefings, and high-visibility speaking engagements.
What We Need to See:
A BS Degree in Computer Science or Engineering related field with a Masters in Business Administration or equivalent experience in a Product Marketing role
7+ years of experience in product marketing, technical marketing, or customer-facing engineering roles. Must be passionate about AI/ML workloads at scale.
Technical Expertise – Deep understanding of modern data center architectures, accelerated computing, distributed inference, deep learning frameworks (PyTorch, TensorFlow, JAX), and inference-specific frameworks & optimizations (Triton Inference Server, TensorRT-LLM, vLLM, SGLang).
Market Awareness – Experience conducting technical competitive analysis and synthesizing key insights.
Collaboration & Influence – Proven ability to work cross-functionally across engineering, product management, sales, and marketing teams.
Strong Communication & Storytelling – Ability to translate sophisticated technical concepts into clear, compelling narratives for both technical and business audiences.
Ways to Stand Out from the crowd:
Experience working with hyperscale cloud providers or large-scale AI deployments.
Hands-on experience with AI inferencing workflows using NVIDIA or open-source serving frameworks running on accelerated computing in the data center.
Hands-on Technical Competence – Background in software development, cloud AI infrastructure, and technical writing is a plus.
Demonstrated ability to engage with executive leadership and external partners.
Published technical content or speaking experience at industry events.
NVIDIA is widely considered to be one of high technology's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. Our goal is to craft an environment where you can do your life's best work. If you're creative, self-motivated, and autonomous, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 160,000 USD - 253,000 USD for Level 4, and 200,000 USD - 322,000 USD for Level 5.
You will also be eligible for equity and benefits.

What You'll be Doing:
Collaborating with internal and external deep learning engineers and researchers to build product-based training material and how-to technical content
Being the champion for AI among the NVIDIA developer community by interacting and answering questions about the product on Github and other forums
Improving product documentations to be clear and self-explanatory
Facilitating channel customer usability feedback from the external community and partnering with internal teams to improve NVIDIA AI Platforms to be the easiest to use
Providing code guidelines to DL developers by implementing samples and proof of concept applications
Benchmarking and generating data for better positioning of NVIDIA's SW product
What We Need to See:
Bachelor’s degree in Computer Science, Computer Engineering, or similar field or equivalent experience
5+ years of meaningful work experience in software development, technical evangelism, technical marketing, developer marketing, or similar at a technology company
3+ year of experience with deep learning or machine learning
Strong knowledge of Python or C/C++, programming techniques, and software development
Strength presenting to technical audiences and generating content for developers
Prior success in juggling multiple projects at a time
Ways to Stand Out from the Crowd:
Advanced knowledge of LLMs, modern AI software architecture and cloud APIs
Existing public facing technical content, forum contributions or open source projects
Familiarity with PyTorch, JAX, vLLM or other training & inference frameworks
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 128,000 USD - 201,250 USD for Level 3, and 160,000 USD - 253,000 USD for Level 4.
You will also be eligible for equity and benefits.

As a Developer Relations Manager for NVIDIA Metropolis, you will play a pivotal role in accelerating the adoption of AI-powered systems that sense, reason, and act. You’ll empower developers and cultivate partnerships across a landscape of intelligent environments. Your efforts will directly support NVIDIA’s vision for smarter, safer, and more efficient cities, manufacturing, and warehouses—leveraging video analytics and automation. Whether enabling robotics or any AI-infused infrastructure, you’ll help bring intelligence to the physical systems that are redefining our world.
What You’ll be doing:
Build and nurture relationships with ISVs (50+) and partners in industries where AI is redefining operations
Engage and support developers, partners, and industry leaders to foster innovation and adoption of AI-powered solutions
Provide technical guidance to help developers integrate NVIDIA’s latest AI technologies for real-time applications
Collaborate with engineering, product, and marketing teams to drive developer engagement and ecosystem growth
Accelerate early adoption of new products and support launches and go-to-market strategies
What We need to see:
10 years experience in same technical areas
Bachelor’s degree or in computer science, engineering, or a related field (or equivalent experience)
Proven success in developer relations, technical evangelism, or a similar technology role
Strong technical background in AI - training, simulation, inferencing, intelligent video analytics, VLM models
Excellent communication and relationship-building skills across organization
Demonstrated leadership in driving partners and achieving bold goals
Ways To Stand Out From The Crowd:
Hands-on experience with AI for physical systems, such as robotics, autonomous vehicles, or intelligent devices, or with NVIDIA platforms (Metropolis, Cosmos, Omniverse, Isaac, CUDA-X)
Experience in industrial automation, intelligent spaces, or AI-enabled physical environments
With highly competitive salaries and a comprehensive benefits package, NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us, and, due to outstanding growth, our special engineering teams are growing fast. If you're a creative and autonomous professional with a genuine passion for technology, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.

Do you have the rare blend of both technical and marketing skills? We need hard-working and creative people who want to work on state-of-the-art technology and are passionate about supporting developers. If so, we would love to learn more about you.
What you’ll be doing:
Build product positions - Collaborate with business leaders across NVIDIA to understand and communicate the value of our products to developers. You will gather evidence, develop compelling product claims, and establish positioning points that highlight our strengths and address our competitors' weaknesses.
Introduce products - Develop and complete well-crafted marketing plans, ensuring consistent messaging across all materials. Collaborate with a diverse cross-functional team, including product management, technical marketing, engineering, campaign managers, and PR, to effectively implement these plans.
Foster awareness - Segment and target audiences, identify asset gaps, and collaborate with technical teams to build developer-centric marketing content. This includes generating deep technical blogs, webinars, tutorials, and more to showcase the outstanding features and capabilities
Public engagement - Represent NVIDIA at trade shows, conferences, and customer meetings. Evangelize and nurture the use of our software development kits to grow the NVIDIA developer community.
What we need to see:
MS/PhD in Computer Science or Engineering or equivalent experience
10+ years of meaningful work experience in a technical marketing role related to deep learning software.
Technical expertise - Familiarity with popular large language models like DeepSeek, GPT-OSS, Gemma and Phi and an understanding of optimization techniques for accelerating training and inference workloads.
Frameworks ecosystem knowledge - Experience with compilers such as OAI Triton, XLA, MLIR, and frameworks like PyTorch, JAX, vLLM, sglang.
Programming skills - Proficiency in modern programming languages like Python
Communication skills - Outstanding written and verbal communication and interpersonal skills, with a proven ability to articulate value propositions to both technical and non-technical audiences.
Project management - Demonstrated ability to prioritize projects, commit to getting things done, and work independently.
Entrepreneurial approach - A willingness to work on new products and technologies with an entrepreneurial spirit.
Writing samples - Please include samples of public-facing technical content you’ve built.
Ways to stand out from the crowd:
Product marketing experience - Experience in marketing accelerated computing software products for AI frameworks
NVIDIA ecosystem knowledge - Familiarity with NVIDIA GPUs and the CUDA parallel programming model
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 5, and 224,000 USD - 356,500 USD for Level 6.
You will also be eligible for equity and benefits.

What you will be doing:
Scoping, designing, and implementing high quality and performance numerical dense linear algebra software on GPUs.
Owning the execution of projects involving multiple engineers and sometimes teams.
Providing technical leadership and feedback to library engineers working with you on projects and sometimes mentor interns.
Working closely with product management and other internal and external customers to understand feature and performance requirements and contribute to the technical roadmaps of libraries.
Finding opportunities to improve library performance and reduce code maintenance overhead through re-architecting.
To be successful in your responsibilities which are by nature sophisticated, you will need to find and explain complex solutions, exercise leadership, and coordinate with multiple teams to work towards your goals.
What we need to see:
PhD, Master’s, or Bachelor's degree in Computer Science, Applied Math, or related science or engineering field of study (or equivalent experience).
8+ years of experience in designing, developing, testing, maintenance, and performance optimization of HPC software using C++.
Strong fundamentals in kernel generation and composable library design for linear algebra.
Leadership skills in driving software development projects.
Strong collaboration, communication, and documentation habits.
Kernel generation. JIT focus/experience desired
Ways to stand out from the crowd:
Experience with parallel programming, ideally using CUDA, MPI, OpenMP, OpenACC, pthreads.
Good understanding of Machine Learning and Deep Learning technologies as well as knowledge of GPU (preferred) or CPU hardware architecture.
Experience with low level programming using assembly for performance optimization and operator fusion is a huge plus.
Experience with agile software development practices using project management tools such as JIRA.
A scripting language, preferably Python.
With a competitive salary package and benefits, NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. Are you a creative and autonomous GenAI Engineer, who loves challenges? Do you have a genuine passion for advancing the state of AI & machine learning across a variety of industries? If so, we want to hear from you.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.

The Deep Learning Frameworks Team @ NVIDIA is responsible for building nvFuser, an advanced compiler that sits at the intersection of compiler technology and high-performance computing. You'll work closely with the PyTorch Core team and collaborate with Lightning-AI/Thunder, which integrates nvFuser to accelerate PyTorch workloads. We collaborate with hardware architects, framework maintainers, and optimization experts to create compiler infrastructure that advances GPU performance, developing manual optimization techniques into systematic, automated compiler optimizations.
What you'll be doing
As an nvFuser engineer, you'll work on exciting challenges in compiler technology and performance optimization! You'll design algorithms that generate highly optimized code from deep learning programs and build GPU-aware CPU runtime systems that coordinate kernel execution for maximum performance. Working directly with NVIDIA's hardware engineers, you'll master the latest GPU architectures while collaborating with optimization specialists to develop innovative techniques for emerging AI workloads. From debugging performance bottlenecks in thousand-GPU distributed systems to influencing next-generation hardware design, we push the boundaries of what's possible in AI compilation.
What we need to see
MS or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
4+ years advanced C++ programming with large codebase development, template meta-programming, and performance-critical code.
Strong parallel programming experience with multi-threading, OpenMP, CUDA, MPI, NCCL, NVSHMEM, or other parallel computing technologies.
Shown experience with low-level performance optimization and systematic bottleneck identification beyond basic profiling.
Performance analysis skills: experience analyzing high-level programs to identify performance bottlenecks and develop optimization strategies.
Collaborative problem-solving approach with adaptability in ambiguous situations, first-principles based thinking, and a sense of ownership.
Excellent verbal and written communication skills.
Ways to stand out from the crowd
Experience with HPC/Scientific Computing: CUDA optimization, GPU programming, numerical libraries (cuBLAS, NCCL), or distributed computing.
Compiler engineering background: LLVM, GCC, domain-specific language design, program analysis, or IR transformations and optimization passes.
Deep technical foundation in CPU/GPU architectures, numeric libraries, modular software design, or runtime systems.
Experience with large software projects, performance profiling, and demonstrated track record of rapid learning.
Expertise with distributed parallelism techniques, tensor operations, auto-tuning, or performance modeling.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.

The hardware and software accelerated computing ecosystem is constantly evolving, including shifts towards hybrid backends, deep integration with high-level languages and ecosystems (such as Python, Numpy, JAX, MLIR…), and optimization at runtime for maximum flexibility and performance. Our Dx APIs allow developers to embed highly-optimized mathematical operations in their applications for these and other scenarios. You will be part of a team designing, developing, and optimizing math libraries for the future. If you are passionate about designing modern HPC libraries and want to build software that will stand the test-of-time as it accelerates countless applications, we might have the dream job you have been waiting for!
What you'll be doing:
Design modern, flexible, and easy to use APIs for math libraries and lead design reviews with all collaborators.
Work closely with internal (e.g., engineering, Product Management) and external partners such as researchers to understand their use cases and requirements.
Become a domain expert by continuously surveying current trends in software systems.
What we need to see:
PhD or MSc degree in Computer Science, Applied Math, or a related science or engineering field is preferred (or equivalent experience).
3+ years of experience designing and developing software for high-performance computing and/or AI applications.
Advanced C++ skills, including modern design paradigms (e.g., template meta-programming, RAII).
Parallel programming experience with CUDA or OpenCL.
Strong collaboration, communication, and documentation habits.
Ways to stand out from the crowd:
Experience using graph compilers and/or Just In Time compilation workflows (e.g. XLA, LLVM, MLIR, Numba, NVRTC).
Programming skills with Python, and modern automation setups for both building software (e.g. cmake) as well as testing it (e.g. CI/CD, sanitizers).
Experience with CCCL, OpenMP, OpenACC, multi-threading, MPI, PGAS.
Strong background in numerical methods (e.g., FFT, numerical linear algebra).
Experience with scientific and deep learning libraries and frameworks such as PyTorch, JAX, MKL, MAGMA, PETSc, Kokkos, etc.
With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
You will also be eligible for equity and benefits.

The hardware and software accelerated computing ecosystem is constantly evolving, including shifts towards hybrid backends, deep integration with high-level languages and ecosystems (such as Python, Numpy, JAX, MLIR…), and optimization at runtime for maximum flexibility and performance. Our libraries follow CUDA Everywhere approach to let developers use highly-optimized mathematical operations on all hardware available in NVIDIA ecosystem. You will be part of a team designing, developing, and optimizing math libraries for the future. If you are passionate about designing modern HPC libraries and want to build software that will stand the test-of-time as it accelerates countless applications, we might have the dream job you have been waiting for!
What you'll be doing:
Design modern, flexible, and easy to use APIs and kernels for math libraries and lead design reviews with all collaborators.
Work closely with internal (e.g., Engineering, Product Management) and external partners such as researchers to understand their use cases and requirements.
Work with internal and external customers to deliver timely math libraries releases.
Become a domain expert by continuously surveying current trends in software systems.
What we need to see:
PhD or MSc degree in Computer Science, Applied Math, or a related science or engineering field is preferred (or equivalent experience).
12+ years of experience designing and developing software for high-performance computing and/or AI applications.
Advanced C++ skills, including modern design paradigms (e.g., template meta-programming, RAII).
Parallel programming experience with CUDA, OpenCL or vector programming on CPU (AVX, NEON or similar).
Strong collaboration, communication, and documentation habits.
Experience with ARM, RISC-V and/or x86_64 CPU architectures.
Ways to stand out from the crowd:
Strong background in numerical methods (e.g., FFT, numerical linear algebra).
Programming skills with Python, and modern automation setups for both building software (e.g. cmake) as well as testing it (e.g. CI/CD, sanitizers).
Experience with cross-compilation, setting up CPU/GPU/accelerator (cross-)compilation toolchains and bringing existing codes to new architectures.
Background with CCCL, OpenMP, OpenACC, multi-threading, MPI, PGAS.
Experience with scientific and deep learning libraries and frameworks such as PyTorch, JAX, MKL, MAGMA, PETSc, Kokkos, etc.
With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD for Level 5, and 272,000 USD - 425,500 USD for Level 6.
You will also be eligible for equity and benefits.

In this role, you will work together with other developers on developing solutions that involve generalizations to sparse tensor computations, domain specific language (DSL) specifications of sparse storage formats, and on-demand code generation. Ideal candidates will not only have experience developing accelerated computing software, but also be motivated to advance the state-of-the-art in a variety of accelerated computing domains and DL frameworks like PyTorch. If this sounds exciting, we would love to meet you!
What you will be doing:
Design and develop a C++-based system to simplify and accelerate computing for unstructured sparsity in DL and HPC on NVIDIA GPUs
Enable the system in languages and frameworks that are more commonly used in DL, such as Python and PyTorch
Evaluate and improve the performance of the system on real-life applications
Realize opportunities to improve library quality, performance and maintainability by writing effective and well-tested code for production use
Work closely with product management and other internal and external partners to understand feature and performance requirements and contribute to technical roadmaps
What we need to see:
BS, MS or PhD degree in Computer Science, Applied Math, or related field (or equivalent experience)
6+ years of overall experience in developing, debugging and optimizing high-performance software using C++ and parallel programming; ideally for sparse linear algebra applications and using CUDA, MPI, OpenMP, or equivalent technologies
Experience with domain-specific language design and compiler optimizations, in particular sparse compilers (MLIR or TACO)
Excellent C++, Python, and CUDA programming skills
Strong collaboration, communication, and documentation habits and ideally experience with working in a globally distributed organization
Ways to stand out from the crowd:
Strong understanding of sparse computations, in particular sparsity in AI and HPC
Good understanding of LLMs, Deep Learning methods and frameworks
Experience with low-level GPU performance optimization
Understanding of numerical linear algebra methods like direct and iterative solvers
Experience with adopting and advancing, software development practices such as CI/CD systems and project management tools such as JIRA.
NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing for science and engineering. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and build our teams with the smartest people in the world! Join us at the forefront of technological advancement. NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and talented people in the world working for us. If you're creative, autonomous and love a challenge, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.

What you will be doing:
Partner with system architects, product definition engineers, software/firmware engineers, HW/SW applications engineers and operations, in a multifaceted, dynamic, high-energy work environment to bring industry-defining products to market.
Recommending options and advocating for the needs of profiling tools in hardware development discussions. Providing explanation of and use cases for new hardware feature exposure in software development tools.
Design & build the premiere multi-discipline GPU+CPU+networking profiling tool in the industry. You will have the opportunity to work with researchers and real world developers who are doing important work to improve computers and computing systems.
Build software tools that enable developers across a spectrum of markets to optimize their workflows; enable complex computer systems doing ongoing work in High Performance Computing(HPC), Machine Learning, Deep Learning, Artificial Intelligence, Autonomous Machines, pro-visualization, and even entertainment. Enable work in tiny embedded and automotive systems. Our tools span the gamut.
Your day to day work will still contain coding, primarily in C/C++ with some Python. You'll work with customers and engineers across teams to explore problems, find solutions, write functional requirements docs and designs, drive execution, and deliver multi-functional software solutions.
What we need to see:
BS, MS, or PhD in EE, CE, CS, or Systems Engineering (or equivalent experience).
12+ years of experience in a related hardware/software position.
Excellent problem solving, collaborative, and interpersonal skills. Experience working with international teams preferred.
Experience creating monitoring, profiling, or optimization software tools for developers working on large scale systems.
Ways to stand out from the crowd:
Experience with multiple architectures (x86, ARM, Power) or multiple operating systems (Windows, Linux, macOS).
Proven track record of crafting engineering designs, driving them to consensus within teams, and bringing them to fruition.
Understanding of the intricacies of complex computer hardware and how that affects performance. Experience working collaboratively with chip design engineers a major plus.
With competitive salaries and a generous benefits package, NVIDIA is widely considered to be one of the technology world’s most desirable employers. Our diverse team of talented, capable, and professional people is our greatest asset! If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD for Level 5, and 272,000 USD - 425,500 USD for Level 6.
You will also be eligible for equity and benefits.

We are seeking a Machine Learning Compiler Engineer with deep expertise in compiler technologies to join our team. The ideal candidate will bring broad experience across the machine learning landscape—including reinforcement learning, genetic/evolutionary algorithms, predictive modeling, complex systems, and the analysis/manipulation of high-dimensional data—while also having strong foundations in compiler design and domain-specific languages.
What you'll be doing:
Become part of the team committed to progressing novel and inventive solutions in compilers and development tools, focusing on applied machine learning and artificial intelligence.
Join committed team members in leading tech sectors. Innovate in machine learning, system design, and more. Influence global products.
What we need to see:
BS/MS/PhD in Computer Science or related field (or equivalent experience) with focus on machine learning and compiler development tools
8+ years of software engineering and ML experience (tools development preferred)
Strong knowledge of compilers, code generation, and GPU architecture
Proficiency in Python, C/C++, Julia, Lisp/Scheme
Solid mathematical and scientific foundation relevant to ML and compiler technologies
Ways to Stand Out From the Crowd
10+ years of hands-on experience working with Python, C/C++, Julia, Lisp/Scheme
Familiarity with reinforcement learning, genetic/evolutionary algorithms, predictive modeling, and complex systems
Expertise in developing and deploying AI/ML solutions to production environments and embedded systems
Hands-on experience building compilers or compiler components using the LLVM framework, including optimization passes, code generation, or frontend integration
With highly competitive salaries and a comprehensive benefits package, NVIDIA is widely considered to be one of the technology industry's most desirable employers. We have some of the most brilliant and hardworking people in the world working with us and our product lines are growing fast in some of the hottest state of the art fields such as Deep Learning, Artificial Intelligence, Autonomous Vehicles, Virtual Reality, etc. Our diverse team of talented, capable, and professional people are our greatest asset! If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.

What you'll be doing:
Provide leadership and strategic mentorship on the management of large-scale HPC systems including the deployment of compute, networking, and storage.
Develop and improve our ecosystem around GPU-accelerated computing including developing scalable automation solutions.
Build and nurture customer and cross-team relationships to consistently support the clusters and address changing user needs.
Support our researchers to run their workloads including performance analysis and optimizations.
Conduct root cause analysis and suggest corrective action. Proactively find and fix issues before they occur.
Build innovative tooling to accelerate researchers' velocity, troubleshooting, and software performance at scale.
What we need to see:
Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience.
Minimum of 6 years of experience crafting and operating large scale compute infrastructure.
Experience with AI/HPC job schedulers and orchestrators, such as Slurm, K8s or LSF. Applied experience with AI/HPC workflows that use MPI and NCCL.
Proficient in using Linux including Centos/RHEL and/or Ubuntu Linux distributions. A solid understanding of container technologies like Enroot, Docker and Podman.
Proficiency in one scripting language (Python, Bash) and at least one compiled language (Golang, Rust, C, C++...).
Experience analyzing and tuning performance for a variety of AI/HPC workloads. Excellent problem-solving to analyze complex systems, identify bottlenecks, and implement scalable solutions.
Excellent communication and teamwork skills, with the ability to work effectively with diverse teams and individuals.
Passion for continual learning and staying ahead of new technologies and effective approaches in the HPC and AI/ML infrastructure fields.
Ways to stand out from the crowd:
Experience with NVIDIA GPUs, CUDA Programming, NCCL and MLPerf benchmarking.
Experience with Machine Learning and Deep Learning concepts, algorithms and models.
Familiarity with High-Speed Networking pertaining to HPC including InfiniBand, RDMA, RoCE and Amazon EFA.
Understanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workload. Experience working with deep learning frameworks including PyTorch, MegatronLM and TensorFlow.
Familiarity with metrics collection and visualization at scale with Prometheus, OpenSearch and Grafana.
NVIDIA offers competitive salaries and benefits. Our experienced and talented employees contribute to our outstanding engineering team's rapid growth. If you're a tech enthusiast, apply now!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.

CUDA defines a unified programming model across a range of system configurations and hardware capabilities. To accomplish this, the CUDA driver interacts with GPU hardware, kernel mode drivers, switches and the operating system. As a member of our team, you will use your design abilities, coding expertise, and creativity to deliver the best Compute platform in the world. You will craft elegant solutions to exciting problems and craft the future direction of CUDA as you collaborate with your peers across NVIDIA.
What you'll be doing:
You will evangelize, architect, and implement new CUDA features
You'll oversee and drive development efforts across multiple teams
Collaborate with members of hardware architecture teams
Help define forward-looking improvements to the CUDA APIs and programming model
Design and maintain performance and precision modeling
Write effective, maintainable, and well-tested code
Develop code for multiple operating systems
What we need to see:
Bachelor of Science or Master of Science degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
15+ years of relevant systems software development experience
Strong C programming skills
Experience designing, debugging, and maintaining complex software stacks
Experience with operating system interfaces for threads, process control, and virtual memory
Experience with HW/SW co-design, perf. modeling using emulation/simulation, creating SW programming model exposures for HW features
Strong interpersonal, verbal, and written communications skills with a capability to achieve objectives under tight deadlines
Strong collaborative and interpersonal skills, specifically a proven ability to effectively guide and influence within a dynamic matrix environment
Ways to stand out from the crowd:
You have an understanding of system level architecture, such as interconnects, memory hierarchy, interrupts, and memory-mapped IO
Designing and implementing drivers programming rich HW acceleration engines and software verification testplans.
Knowledge of CPU, GPU architectures, memory coherence and consistency models
Some familiarity with kernel mode development
Some familiarity with C++
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 425,500 USD.
You will also be eligible for equity and benefits.

CUDA defines a unified programming model across a range of system configurations and hardware capabilities. To accomplish this, the CUDA driver interacts with GPU hardware, kernel mode drivers, and the operating system. Unified Memory kernel driver provides kernel memory management that enables these advanced features.
What you'll be doing:
As a member of our team, you will apply your design, coding expertise, and creativity, and to collaborate with peers across NVIDIA to deliver upon the best compute platform in the world. You will craft sophisticated solutions to exciting problems shaping the future direction of CUDA!
Architect, and implement new features for new chips and new kernel features
Coordinate with other teams to accomplish your work daily
Help define forward-looking improvements to the CUDA APIs and programming model
Write effective, maintainable, and well-tested kernel and userspace code
Develop code for multiple Linux operating systems
What we need to see:
BS or MS degree in Computer Science, Electrical Engineering or related field (or equivalent experience)
Strong C programming skills
Minimum of 10+ years of related development experience
Experience working with large codebases
Background with operating system interfaces for threads, process control, and virtual memory
Experience writing and debugging multithreaded programs
Good written communication
Ways to stand out from the crowd:
Understanding of system level architecture, such as interconnects, memory hierarchy, interrupts, and memory-mapped IO
Knowledge of memory coherence and consistency models
Experience with kernel mode development
Experience with Windows, Linux, or macOS driver development
Some familiarity w C++
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.

What you'll be doing:
Define and drive the overall test engineering strategy for our robotics products, establishing an engineering-focused approach to quality encompassing design for scalable test frameworks, tools, and methodologies and lead product testing and engineering efforts across a full-stack robotics solution
Deliver test automation development for “On-Robot Hardware & Software” and embedded software running on our NVIDIA Jetson AGX Thor platform encompassing sensor integration validation, real-time performance, and low-level control systems.
Architect and implement rigorous test plans to validate “AI Foundation models”, like Isaac GR00T, focused on performance, safety, and reliability in real-world scenarios. This involves testing model outputs, identifying edge cases, and ensuring robustness.
Build and maintain the test infrastructure within our “Simulation & Digital Twins” environments, Isaac Sim and Isaac Lab, developing automated tests verifying physics engine accuracy and the transferability of learned policies from simulation to the real robot.
Oversee test automation development for the full “Robotics Software Stack”, including Isaac ROS, and its integration with other robotic components.
Architect, build, and maintain the test infrastructure for a humanoid reference platform showcasing the power of NVIDIA's technology and develop testing methodologies bridging the "sim-to-real" gap, creating a continuous feedback loop between virtual and physical testing.
You'll lead the design of complex test scenarios in Isaac Sim/Lab that use generative AI to stress-test robot policies, ensuring that models trained in simulation (using Isaac GR00T) perform reliably on real physical hardware.
Lead design of complex test scenarios in Isaac Sim/Lab that use generative AI to stress-test robot policies, ensuring that models trained in simulation (using Isaac GR00T) perform reliably on real physical hardware.
What we need to see:
B.S. or equivalent experience in Mechanical/Electrical/Computer Engineering or related field (or equivalent experience)
12+ overall years aligned Hardware Engineering experience w 3+ years of demonstrable experience in QA involving Robotics or Hardware Engineering
5+ years of leading a team
Proficiency with AI Tools such as Cursor, Github Copilot, Perplexity, and ChatGPT
Experience in with ML and training /testing Robotics Models
Fleet management, telemetry and debug, along with managing demo’s
Engage closely with a range of cross functional teams including project management, hardware and software developers, and routinely share statistical data reports with all customers, including Top-5.
Ability to translate organizational goals to QA deliverables
Strong problem solving and analytical skills with excellent communication skills
Ways to stand out from the crowd:
Incorporating Vision-Language Models (VLMs) -Designing, developing, and deploying robots that can understand and respond to both visual and linguistic inputs, enabling them to perform complex tasks through natural language commands and visual scene interpretation
Teleoperation – Manipulation and setting up data collection labs
Fluent in scripting in Python
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 216,000 USD - 345,000 USD.
You will also be eligible for equity and benefits.

What you’ll be doing:
Responsible for the development and execution of NVIDIA HGX/DGX/MGX platform test plan on servers, OS, FW and CUDA SW stack from design doc.
Installing and testing various systems OS, server firmware and SW stack.
Drive support for root cause analysis on reliability and validation test failures to identify root cause(s) and achieve mitigation.
Build, develop/debug server and OS level automation front-end and back-end framework and tests
Review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging as needed.
Work in an agile software development team with very high production quality standards.
Manage bug lifecycle and collaborate with inter-groups to drive for solutions.
What we need to see:
Bachelor’s Degree (or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field
5+ years proven experience; or master’s degree.
Proven years of OS and server level automation, CI/CD process and DevOps experience using Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript
Strong server and Linux(Ubuntu, RedHat, CentOS, SuSE, Fedora and etc…) troubleshooting and debugging experience in a bare-metal and KVM/VMWare/Hyper-V environment.
Good knowledge and hands-on experience in model testing, AI tools/frameworks (TensorFlow, Pytorch, Cursor and etc…), NLP and LLM benchmarking
Experience in using AI development tools for test plans creation, test cases development and test cases automation
Strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, Redfish - huge plus
Proven years of experience in GitHub/Gitlab/Gerrit, PXE, SLURM, Stack/Kubernetes/Docker) – huge plus
Ways to stand out from the crowd:
AI related tools, LLM and NLP.
Experience working with NVIDIA GPU hardware is a strong plus.
Good to have solid understanding of virtualization in Linux (KVM, Docker orchestrated with Kubernetes)
Background in parallel programming ideally CUDA/OpenCL is a plus
With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 136,000 USD - 212,750 USD for Level 3, and 168,000 USD - 264,500 USD for Level 4.
You will also be eligible for equity and benefits.

The Hypervisor and RTOS Team within NVIDIA DRIVE Software plays a critical role in NVIDIA's expansion into the world of artificial intelligence and autonomous vehicles. Our job is to facilitate the sharing and separation of system resources while achieving real-time, safety, and security requirements. We develop Hypervisor and RTOS with a strong focus on automotive quality, safety and security needed for the real-time, highly available system level components of world-class Autonomous Vehicles. We are making extensive use of formal methods to automate our workflow and increase the quality of our SW. We are hiring now for the position of Senior System Software Engineer for Hypervisor and RTOS
What you’ll be doing:
Design and implement core RTOS features
Design and implement core virtualization features using hardware-assisted virtualization capabilities NVIDIA Tegra SOCs
Implement industry standard virtualization interfaces
Develop software that meets automotive safety and security standards, and apply formal methods (i.e. TLA+) to improve software integrity.
What we need to see:
BS, MS in CS/CE/EE or a related engineering field or equivalent experience
8+ years of experience
Proficiency in C, C++
Experience in development of core rtos and virtualization software features
Strong understanding of operating systems and computer architecture
Experience on ARM 64-bit architecture
Clear, concise communication skills
Ways to stand out from the crowd:
Knowledge of Automotive quality standards, ASPICE, ISO 26262, ISO 21434
Hands-on experience with formal verification methods and tools, such as TLA+
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and talented people in the world working for us. If you're creative and autonomous, we want to hear from you.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.

You should demonstrate ability to excel in an environment with innovative and fast paced development on the worlds most powerful integrated software and hardware computing platform.
What you’ll be doing:
Core job duties will identify vulnerabilities in our embedded firmware and critical system software, building proof of concepts, and collaborating with development teams to remediate them.
Candidates will invest in improving current tools and offensive practices for bug discovery and evaluation while supporting remediation efforts. We expect team members to exercise modern tools for modeling new attack vectors on unreleased and emerging technology platforms.
The most impactful candidates can simulate real attacker behaviors, break systems by exploiting design assumption and effectively communicate their findings for action. Focus will be to increase resilience of the end products against all forms of attack through close collaboration with extended SW and HW offensive security teams.
Products targets span HPC data centers, consumer electronics, autonomous platforms, AI/cloud solutions, and a variety of embedded/IOT platforms providing a rich and complex target space to exercise your skills.
What we need to see:
We'd like to see proven experience and offensive security research (CVE’s, publications, patents, tools, bounties) with demonstrated responsible disclosure practices.
Strong skills in reverse engineering and automation (IDA, Ghidra), fuzzing (AFL, WinAFL, Syzcaller) and exploitation (ROP, memory corruption) are important to success; as well as understanding of modern embedded cryptography and common security issues.
Experience with ARM/X86/RISCV assembly (include shellcode development) and low-level C programming paired with understanding and experience with micro-architectural attacks (side channels, fault injection, etc) is critical.
Demonstrated skill for secure code reviews of complex source projects, and exposure to code quality practices (SDL, threat modeling) that support development goals.
Candidates should be comfortable working collaboratively and remotely with others to accomplish complex team goals, enabling delivery of outstanding security for our products.
BS/BA degree or equivalent experience
12+ years in a security related field
Ways to stand out from the crowd:
Navigating complex platform concerns and ability to analyze composed systems to identify high risk components and established testing targets and objectives.
Practical skills using Hex-Rays IDA Pro and plugin/loaders development (or similar experience with Ghidra) is valuable
Leveraging innovative strategies and AI advancements to accelerate discovery and resolution of security risks.
Experience with enclave models such as NVIDIA CC, ARM TEE, Intel SGX/TDX, AMD SEV-SNP and other isolation technologies.
Development and integration of AI tooling and skills to accelerate and improve activities and or experience with offensive actions targeting AI models (LLM or other) components within those platforms.
NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI - the next era of computing. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to solve, that only we can pursue, and that matter to the world. This is our life’s work, to amplify creativity and intelligence. Make the choice to join us today!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD for Level 5, and 272,000 USD - 425,500 USD for Level 6.
You will also be eligible for equity and benefits.

🚀 What You’ll Do
As the face of HPE Networking in the region, you’ll:
Lead and inspire a high-performing sales team to exceed ambitious revenue goals.
Own the strategy—from go-to-market planning to execution—for all HPE Networking solutions.
Build deep relationships with key Cloud Providers, understanding their business drivers and aligning them with cutting-edge HPE solutions.
Drive complex deals and lead negotiations that deliver real business value.
Collaborate cross-functionally with product leaders, architects, and technical experts to bring innovative solutions to life.

Hewlett Packard Enterprise is seeking a highly experienced and strategic WW VM Essentials Sales Director to lead our global sales organization. This high impact leadership role will be instrumental in driving the success of HPE’s VM Essentials strategy, overseeing a team of more than 50 sales and presales professionals, and reporting directly to the WW Hybrid Cloud Ops Software Sales Leader.
Primary Responsibilities
Shape and execute the global GTM strategy to achieve revenue growth, market share expansion, and customer adoption worldwide.
Drive forecast accuracy, pipeline health, and profitable growth through disciplined business inspection.
Lead and manage a diverse global team of sales professionals establishing clear operational rhythms and accountability aligned with HPE strategic objectives.
Build and sustain a high performing sales organization through effective recruitment enablement and leadership development.
Foster strategic relationships with channel partners, distributors, and ecosystem stakeholders to expand reach and velocity through repeatable partner led motions.
Monitor and analyze market trends, customer requirements, and competitive dynamics to identify new opportunities and inform strategic decisions.
Travel globally as required to engage with customers partners and regional sales teams.

Key Responsibilities
Global Sales‑Play Execution
Architect and operationalize sales plays globally—anchored to category value proposition and tailored for geos, verticals, customer personas, and GTM touchpoints.
Ensure regional field teams understand pipeline criteria, qualification metrics, performance scorecards, and execution checkpoints for each play.
Sales & Execution Alignment
Liaise with GTM leads, SEs, overlays, and known sellers to secure sales-ready alignment—pipeline buy-ins, stage-gating, and play compliance.
Drive field adoption via quarterly onboarding webinars, play clinics, and seller sandboxes (with sales engineering, demos, and close-won showcase walk-throughs).
Channel/MSP/GSI Engagement
Partner with Channel, MSP, and GSI leadership to co-develop regional-localized execution plans: partner messaging, demand campaigns, enablement sessions, deal registration, co-sell incentives, and published scorecards.
Track partner enrolment rates, partner pipeline metrics, and deal progression to finalize pipeline-to-order conversion levels.
Category & Marketing Synchronization
Work hand-in-hand with Category Management, Product Marketing, and Corporate Marketing to maintain updated playbooks, reference assets, support materials, and narrative consistency across all geos.
Ensure marketing launches, field campaigns, event blitzes, and local digital activation are fully integrated into field play cycles.
Execution Measurement & Governance
Design and maintain monthly global dashboards: play KPIs, pipeline progression, attach rates, average deal size, and vertical mix.
Lead quarterly business performance reviews (QBRs) with Geo leads and BU sponsors to evaluate progress vs. plans, diagnose GTM risks, and steer course corrections.
Cross‑Regio Execution Management
Identify regional execution barriers—lack of sales readiness, demand-gen lags, partner underperformance—and run targeted interventions (e.g., local incentives, tied marketing dollars, training intensives, or SDR pull-in).
Own risk logging and remediation tracking at global level.
Team Leadership and Enablement Coaching
Build and manage a team of ~6–8 execution specialists: play execution architects, regional ops managers, partner-liaison leads, and data/reporting analysts.
Develop a culture of performance accountability, peer coaching, and standardized play deliverables.

Company Overview
At Hewlett Packard Enterprise, everything we do is guided by our purpose – to advance the way people live and work, through engineering experiences that unlock your potential. Through this we seek to make a meaningful contribution to our customers, partners, employees and the communities we serve. It is part of our legacy, and it is our future. We believe technology’s greatest promise lies in its potential for positive change.
We have exciting aspirations and a clear strategy for the future. Our plans are to offer our full portfolio as-a-service and we are now boldly accelerating our organization as an edge-to-cloud platform as a service company. Our Hybrid Cloud and Intelligent Edge platforms, underpinned by our differentiated software solutions and combined with our innovative As-a-Service Consumption models provide our customers with control, security, real-time intelligence and flexibility. By providing the right mix of technology, people, and economics, we optimize our customers IT investments to power their Digital Transformations and help them create the differentiated experiences that unlock their full potential.
For more information on us, please visit www.hpe.com
Role
The North America Zerto Sales Leader is responsible for the strategic direction, execution, and performance of Zerto’s NA sales organization within Hewlett Packard Enterprise (HPE). This role leads a cross-functional leadership team and ensures alignment with HPE’s broader Storage & Data Services strategy to drive growth, customer success, and market expansion. The role reports to the Americas Storage Sales Leader.
Responsibilities
Strategic Leadership
Define and implement the global sales strategy for Zerto, ensuring alignment with HPE’s hybrid cloud and data protection objectives.
Drive revenue growth, customer acquisition, and market penetration across all of North America.
Lead organizational transformation to integrate Zerto into regional sales motions and accelerate adoption.
Team Oversight
Manage and empower a high-performing leadership team, each responsible for a critical function:
Cyber Resilience Vault Sales – Lead go-to-market strategy and execution for cyber resilience solutions, integrating Zerto into HPE’s broader cyber recovery offerings.
Sales Engineering – Oversee global technical pre-sales support, ensuring solution alignment with customer needs and enabling deal success.
Alliances – Manage strategic partnerships to expand Zerto’s reach and co-sell opportunities.
Renewals – Drive customer retention and contract continuity through proactive lifecycle engagement.
Executive Support – Ensure operational efficiency and executive coordination through dedicated support.
Cross-Functional Collaboration
Partner with Product Management to ensure customer feedback and insights are incorporated into the product roadmap and long-term vision.
Collaborate with Marketing to embed Zerto into major HPE campaigns and drive global demand and pipeline generation.
Build strong relationships with Geo Storage Leaders to foster collaboration, solution awareness, and regional alignment.
Work closely with Finance to provide visibility into business challenges and ensure accurate forecasting and budget planning each quarter.
Navigate the complexity of the Zerto Managed Service Provider (MSP) business by coordinating with stakeholders across multiple functions and business units.
Executive Presence & Governance
Represent Zerto by leading messaging at key events such as HPE Discover, Tech Jam, and Sales Kick-offs, as well as during strategic customer and partner engagements.
Participate in weekly forecast calls to ensure governance, pipeline health, and quarterly performance accountability.
Identify and champion new solution opportunities that drive higher margins, larger deal sizes, and increased relevance for HPE in the data protection space.

In this high-impact, individual contributor role, you will guide the architectural strategy and technical product management for a subset of Juniper’s QFX switching portfolio, with a focus on Trident/Tomahawk scale-up systems and cutting-edge Neo-Cloud solutions. This is a senior technical leadership role—not a people-management position—where your deep expertise will directly shape Juniper’s competitiveness in the fast-evolving AI and hyperscale data center markets.
This role will be based full time in Sunnyvale, CA.
Primary Responsibilities
Lead system architecture and roadmap alignment for QFX products, driving technical differentiation and innovation.
Engage with customers and ecosystem partners at the design and architecture level to ensure Juniper products meet next-gen performance, scalability, and efficiency requirements.
Make critical decisions on technology choices, component selection, and architectural trade-offs for complex switch and router systems.
Collaborate closely with engineering on system design, hardware architecture, power/thermal modeling, and fabric topology.
Analyze market trends and emerging technologies to future-proof Juniper’s data center portfolio.
Communicate technical strategy and product value to customers and stakeholders.

About our cybersecurity team
Are you ready to make an impact at one of the world’s leading tech companies? At HPE, our Cybersecurity team is shaping the future of secure innovation. We’re looking for an experienced Director, Cybersecurity Transformation to join our Cybersecurity team. If you’re passionate about cybersecurity and ready for your next challenge, we’d love to hear from you.
About the Role
We are seeking a skilled and motivated leader to serve as Director, Cybersecurity Transformation. This role, reporting directly to the Vice President of the Office of the CISO, will be a key partner to senior leadership in strengthening HPE’s cybersecurity posture through continuous improvement, transformation, and the successful delivery of complex programs.
This leader will be responsible for managing a portfolio of large-scale cybersecurity transformation programs and inspiring cross-functional teams to achieve ambitious goals. The ideal candidate will excel at partnering across functions and business units, influencing senior executives, and leading teams to deliver measurable business value.
The ideal leader will bring extensive expertise in program management, leadership in technology-driven initiatives, and a modern approach that combines deep cybersecurity knowledge with agile methodologies and disciplined execution.
Success in this role requires meticulous organization, crisp communication and a methodical approach to measurable risk reduction. Impact is achieved through phased delivery—breaking down complex work into manageable increments that produce early wins and build momentum for lasting adoption.
Transformation Leadership
Drive Transformational Change: Lead the planning, execution, and successful delivery of complex, large-scale transformation programs
Champion Change: Foster a culture of continuous improvement, innovation, and agility by driving the adoption of new processes, tools, and mindsets across the organization.
Ensure Value Realization: Establish and oversee frameworks for tracking and measuring the impact of transformation initiatives, ensuring a clear return on investment.
Program Management Office (PMO) Leadership
Portfolio Oversight: Provide executive oversight for the full portfolio of cybersecurity programs, ensuring alignment with strategic objectives and driving data-driven prioritization and resource allocation.
Financial Management: Project and forecast budgets via rigorous financial planning and oversight.
Transparency: Implement a robust governance framework to ensure accountability and transparency, with regular portfolio reporting to executive leadership.
Team and Stakeholder Leadership
Lead a High-Performing Team: Recruit, mentor, and lead a world-class cybersecurity transformation and program delivery team. You will lead internal and external resources, including technical PMs.
Cultivate Global Relationships: Build and maintain strong global relationships with senior and executive management, as well as stakeholders across the organization, from the C-suite to frontline teams.
Drive Consensus and Resolution: Influence, negotiate, and build consensus to drive success, stepping in to mediate critical escalations and driving alignment.
Skills and Competencies
Inspiring Leadership: Ability to motivate teams to achieve ambitious goals, solve complex problems and to navigate organizational challenges.
Strategic and Business Acumen: Strong understanding of business principles, financial management, and the ability to align program execution with long-term business objectives and critical business success factors.
Exceptional Communication: Outstanding interpersonal and communication skills; able to articulate a clear vision and influence at all levels.
Cybersecurity Knowledge: Deep understanding of the cybersecurity landscape, key threats, technologies, and regulatory frameworks.
Change Leadership: Demonstrated ability to drive adoption of new ways of working and embed lasting cultural change.

Division: NE-NERSC
The National Energy Research Scientific Computing Center (NERSC) is inviting applications for the position of Storage Systems Group (SSG) Lead. NERSC’s mission is to accelerate scientific discovery through high performance computing and data analysis for the Department of Energy’s (DOE) Office of Science programs. NERSC is searching for a knowledgeable and inspired group leader for the Storage Systems Group who will be responsible for developing NERSC’s storage strategy based on NERSC’s systems roadmap, science workflows and user needs. They will provide vision and guidance to design, operate and simplify the storage environment for NERSC’s 11,000+ users.
The SSG is responsible for NERSC’s storage portfolio, including large scale high capacity parallel file systems and archival storage systems with an eye towards balancing performance, stability, and usability for NERSC’s users who operate in a wide variety of DOE mission areas and scientific domains. The SSG Lead provides technical leadership to a group of highly skilled storage engineers who collaborate with other teams at NERSC to deliver innovative solutions to complex problems and a technical vision for the future of NERSC storage platforms.
The NERSC storage environment that SSG is responsible for today is composed of multiple tiers:
• The NERSC hierarchical storage management system (presently High Performance Storage System (HPSS)) stores more than 450 PB of data for the scientific community and puts NERSC in the top 10 largest HPSS deployments globally.
• NERSC provides a large-scale parallel community file system (presently Storage Scale) with more than 150 PB of online storage to the user community on a RDMA over Converged Ethernet (RoCE) fabric.
• Home and common storage mounted via Storage Scale on several thousand nodes across NERSC.
In addition to the current environment, SSG will be responsible for the scratch and new quality of service storage systems (https://www.nersc.gov/news-and-events/news/doudna-storage-solutions) in NERSC’s latest GPU based supercomputer, named Doudna (https://www.nersc.gov/what-we-do/computing-for-science/doudna-system), to be operationalized in 2027. Doudna will deliver a tenfold increase in computing power to NERSC users along with new capabilities. The new Doudna environment will support larger and higher resolution data sets coming from new sensors, detectors, sequencers and telescopes from the scientific community and these data sets will need to be managed, shared and stored.
The Storage Systems Group lead is responsible for understanding existing and new emerging requirements, and deploying storage solutions in collaboration with other NERSC teams to support NERSC’s broad user base of today and tomorrow. In doing so, the SSG Lead will drive the development and implementation of a holistic storage strategy to support changing scientific workflows and new technologies as part of Doudna and future NERSC system roadmaps. To accomplish this, the SSG Lead will be responsible for investigating new storage technologies and engaging with the vendor community on future roadmaps. The SSG Lead will work with the Data Center Department Head to provide guidance and priorities for the group based on NERSC’s strategic plan and its goals.
You will:
• Develop NERSC’s storage strategy based on NERSC’s systems roadmap, science workflows and user needs.
• Lead a team that procures, installs, manages, supports and monitors NERSC’s large scale storage systems, including providing 24x7 support.
• Ensure NERSC’s storage systems meet the needs of NERSC’s 11,000 users by providing high performing, available, and usable systems.
• Work independently and as part of the Storage Systems Group to diagnose and fix storage problems, help analyze storage system issues, and develop and implement workarounds and/or patches for software bugs.
• Provide effective line management to a group of approximately 10 Computer Systems Engineers by hiring excellent staff and working closely with SSG staff members. Ensure staff are meeting goals, provide both positive and constructive feedback to staff and ensure all staff have career growth opportunities.
• Provide technical leadership for implementation and deployment efforts for storage system improvements that enhance task automation, reliability, stability, usability, performance, and security.
• Continuously evaluate new storage technologies and make recommendations on future storage strategy and directions for the center, including both parallel and hierarchical storage, that would create new capabilities and enhance storage and HPC system performance and usability.
• Work closely with other teams at NERSC to enable large-scale simulation, data analysis and AI applications to run on NERSC supercomputing and storage systems.
• Provide budgetary input and oversight for NERSC’s storage systems.
• Lead or collaborate efforts with other Department of Energy (DOE) Labs on future storage technologies, multi-lab storage efforts and other related topics.
• Present at conferences and talks to promote NERSC to other national labs and HPC sites.
• Create and develop a vision and strategy for the group and be a key part of NERSC’s management team.
We are looking for:
• Bachelor’s degree in Computer Science, Engineering, Applied Mathematics, Computational Science (or related fields) and current applicable systems support and engineering experience, plus a minimum of 3 years of experience in a managerial role of complex computer systems, storage or networking unit.
• Experience with storage technologies in a Linux environment, such as InfiniBand, RoCE, SAN/NAS, NFS, pNFS, hierarchical storage management systems (such as HPSS), Lustre, Storage Scale, VAST, and object stores.
• Prior experience with HPC applications, workflows and computational and storage systems.
• Experience in managing and supporting a 24/7 IT environment.
• Ability to mentor staff to increase their knowledge and skills.
• Deep and broad knowledge of storage technologies such as parallel filesystems (i.e. Storage Scale), hierarchical storage management (i.e. HPSS), distributed storage systems (i.e. VAST), and storage networking (i.e. InfiniBand or RoCE).
• Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active intellectual environment.
• Ability to gather requirements from the scientific user community and turn requirements into system characteristics.
• Strong technical and collaboration skills needed to create and deploy innovative ways of allowing our diverse user base to effectively utilize the unique resources that NERSC provides.
• Understand balancing technical solutions with user needs and show initiative, tact and good judgment in developing solutions to problems.
• Excellent written and verbal communication skills.
Desired skills/knowledge:
• A Master’s or PhD degree in related fields.
• Knowledge of object storage and non-volatile storage technologies.
• Experience administering and deploying storage systems of tens of petabytes (or greater) scale in a HPC environment.
We’re here for the same mission, to bring science solutions to the world. Join our team and YOU will play a supporting role in our goal to address global challenges! Have a high level of impact and work for an organization associated with 17 Nobel Prizes!
Why join Berkeley Lab?
We invest in our employees by offering a total rewards package you can count on:
• Exceptional health and retirement benefits (https://benefits.lbl.gov/), including pension or 401K-style plans
• Opportunities to grow in your career - check out our Tuition Assistance Program (https://hr.lbl.gov/service/development/berkeley-lab-tuition-assistance-program/) [Only if Applicable to the Appointment]
• A culture where you’ll belong - we are invested in our teams!
• In addition to accruing vacation and sick time, we also have a Winter Holiday Shutdown (https://hr.lbl.gov/resource/berkeley-lab-holiday-calendar/) every year.
• Parental bonding leave (for both mothers and fathers)
• Pet insurance
Additional information:
• Application date: Priority consideration will be given to candidates who apply by December 15, 2025. Applications will be accepted until the job posting is removed.
• Appointment type: This is a (full-time/part-time) career appointment, exempt (monthly paid) from overtime pay.
• Salary range: The expected salary for this position is $203,496 - $248,736, which fits into the full salary of $180,876 - $305,268 depending upon the candidate’s skills, knowledge, and abilities. This includes education, certifications, and years of experience.
• Background check: This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
• Work modality: This position requires substantial on-site presence, but is eligible for a flexible work mode, and hybrid schedules may be considered. Hybrid work is a combination of performing work on-site at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA and some telework. Individuals working a hybrid schedule must reside within 150 miles of Berkeley Lab. Work schedules are dependent on business needs. A REAL ID or other acceptable form of identification is required to access Berkeley Lab sites (for more information click here: https://securityandemergencyservices.lbl.gov/).
• Export Control: This position will involve access to hardware, commodities, and technical information subject to export control regulations including, but not limited to, the Export Administration Regulations ("EAR") and/or International Traffic in Arms Regulations ("ITAR"). Accordingly, any hiring decision may depend in part on Berkeley Lab’s ability to obtain or rely on federal government authorizations as required, if you are not a U.S. citizen, lawful permanent resident of the U.S. (“green card holder”), asylee, refugee, or other qualifying protected individual as defined by 8 U.S.C. 1324b(a)(3).
Want to learn more about working at Berkeley Lab? Please visit: careers.lbl.gov
How To Apply
Apply directly online and follow the on-line instructions to complete the application process.
Equal Employment Opportunity Employer: The foundation of Berkeley Lab is our Stewardship Values: Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.
Berkeley Lab is a University of California employer. It is the policy of the University of California to undertake affirmative action and anti-discrimination efforts, consistent with its obligations as a Federal and State contractor.
Misconduct Disclosure Requirement: As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.

Responsibilities:
You will be responsible for support and maintenance of downstream product quality firmware of Arm Neoverse CSS based platform solutions. You will be working alongside distributed firmware development team in contributing to the successful deployment of Arm Neoverse CPU and System IP based Infrastructure platforms. Your responsibilities will include defect management, feature back porting and defect fixing. You will work closely with the release team for coordination of downstream release updates for customers post-launch releases.
Are you are looking for a unique opportunity to be part of a team transforming computing infrastructure landscape? We would like to hear from you!

We’re hiring SWE at various levels, where production experience with device drivers, graphics/compute APIs, debugging tools, and user-level applications is highly desired.

The HWE should have hands-on experience taking leading-edge digital silicon IP from idea to bring-up, working with DV and PD teams.

Perform detailed workload characterization to identify performance bottlenecks and propose architectural solutions. Collaborate, coordinate, and drive consensus across architects, and IP teams. Conduct workload compaction to facilitate effective modeling.
Create profiling and visualization frameworks to analyze with right level of abstraction. Contribute to automation for streamlining production processes
Stay up-to-date on latest advancements in application development, workload characterization, and performance/power/thermal analysis

Arm is establishing teams to develop new and best-in-class silicon platforms, addressing markets such as premium mobile, compute, IoT, AI/ML server, and automotive. Arm’s ambition is to demonstrate efficient performance by architecting, implementing, and fabricating pioneering silicon using the latest SoC process nodes and packaging technologies.
This is an exciting and unique initiative, where we are driving how the next generation of leading compute devices are built across the industry. Join Arm to be part of the solution.

This role requires close collaboration with hardware architects and leadership to define proof-of-concept features, provide benchmarking insights, and guide system requirements with a focus on performance. The ideal candidate has an advanced academic background in physics, mathematics, computer engineering, computer science, or a related field, with demonstrated experience in algorithms, modeling of physical phenomena, and programming for scientific computing.

You will join a wonderful team of Software Engineers who share a passion for wanting to stamp their mark on the future of computing. Our team plays a meaningful role in making Arm platform successful for AI data centers, cloud network infrastructure and building networking appliances. If you are passionate about innovative technologies and improving software quality, then we would like to hear from you.

As a member of the team, you’ll lead product efforts across multiple layers of the AI infrastructure stack — from the underlying compute platforms to the software development stack to the AI workloads those platforms serve. We’re looking for deeply technical, independent product leaders who thrive in ambiguity and can move seamlessly between product, roadmap, & strategy definition, technical analyses, and customer engagement.
Responsibilities:
Own product definition and execution in partnership with engineering — including customer requirements, product roadmap, and long-term strategy
Drive technology, industry, and competitive analysis & research to inform product definitions and product strategy
Craft and deliver messaging internally and externally up to executives about your products, roadmap, and long-term strategy
Shape and influence a new PM team and function within a growing strategic business area


Key responsibilities
Team leadership and management: Leading, mentoring, and developing a team of software engineers, fostering a collaborative and innovative environment.
Technical direction and strategy: Defining the technical vision and roadmap for projects involving SHMEM and MPI, ensuring alignment with the organization's goals.
Project planning and execution: Overseeing the entire software development lifecycle, including planning, resource allocation, implementation, testing, and delivery.
Technical expertise: Providing guidance on architectural decisions, code quality, and integrating new technologies, specifically focusing on SHMEM and MPI in user space applications.
Collaboration and communication: Working closely with product managers, designers, quality assurance professionals, and other stakeholders to understand requirements and ensure smooth project execution.
SHMEM and MPI expertise: Deep understanding of SHMEM and MPI programming models, including one-sided communication, message passing, collective operations, and synchronization.
Performance optimization: Expertise in optimizing performance for parallel applications using SHMEM and MPI, including techniques for minimizing latency and maximizing bandwidth.

Designs, plans, develops and manages a product or portfolio of products throughout the solution portfolio lifecycle: from new product definition or enhancements to existing products; planning, design, forecasting, and production; to end of life.
Management Level Definition:
Contributions impact technical components of HPE products, solutions, or services regularly and sustainable. Applies advanced subject matter knowledge to solve complex business issues and is regarded as a subject matter expert. Provides expertise and partnership to functional and technical project teams and may participate in cross-functional initiatives. Exercises significant independent judgment to determine best method for achieving objectives. May provide team leadership and mentoring to others.
Product Manager, Specialized Compute (AI, Edge, & Telco)
We are looking for a highly versatile and strategic Product Manager to join our team. The ideal candidate will be a true "utility player," capable of investigating new opportunities, building compelling business cases, and scoping out technical requirements for server designs across three distinct and rapidly evolving markets: Artificial Intelligence (AI) servers, Edge servers, and servers for the Telco marketplace.
You'll be the driving force behind our product strategy, constantly exploring what's next and translating market needs into tangible product plans.
Key Responsibilities
Investigate New Opportunities: Conduct in-depth market research and competitive analysis to identify emerging trends, customer pain points, and new product opportunities across AI, Edge, and Telco markets.
Build the Business Case: Develop and present data-driven business cases for new product initiatives, including market sizing, financial projections, and strategic alignment with company goals.
Scope Out Requirements: Work closely with engineering, sales, and customers to define and prioritize detailed product requirements and user stories, ensuring they are well-documented and understood by all stakeholders.
Cross-Market Strategy: Navigate the unique technical and business challenges of each market, adapting your approach to meet the diverse needs of AI, Edge, and Telco customers.
Stakeholder Collaboration: Act as the primary liaison between technical teams and business units, championing the product vision and ensuring alignment across the organization.

Join a tight-knit team where each individual’s contributions directly influence the success of the company and product. You'll have the opportunity to build a new kind of computer from the ground up and to solve groundbreaking challenges along the way. Work with people who love to build and who thrive in technically diverse environments where great ideas are prioritized.
Responsibilities
Architect high-speed (200Gbaud and beyond) electro-optical interfaces for Lightmatter’s next-generation co-packaged optics (CPO) solutions.
Design next-generation high-speed DACs, linear drivers and TIAs in the latest FinFET process nodes.
Actively collaborate across disciplines—with electronics, photonics, and mechanical engineering teams—to specify the requirements and solutions for circuit blocks, SoCs, debug, and validation.
Collaborate with the product team to develop our future technological and product roadmap in the context of industry trends.
Work closely with test and validation engineers to validate hardware against simulation prediction to ensure high performance and high yield.
Publish and present novel ideas and participate in premier technical conferences.

This highly visible role requires a unique combination of strategic acumen, technical fluency, and executive relationship management. You’ll work closely with cross-functional teams across engineering, product, operations, and marketing to align Lightmatter’s photonic computing technologies with the world’s most advanced infrastructure providers.
Responsibilities
Strategic Account Leadership
Own and drive the full lifecycle of customer engagement—from opportunity identification through deployment and revenue realization.
Define and lead account strategies that deliver growth, market influence, and long-term partnership value.
Build and maintain trusted executive relationships across procurement, engineering, architecture, and C-suite levels.
Act as the “voice of the customer” within Lightmatter, ensuring alignment of product priorities and strategic initiatives.
Lead cross-functional internal account teams and foster collaboration across the sales organization.
Technology & Product Alignment
Drive technical sales engagements that uncover customer challenges and align Lightmatter’s photonic computing platforms to address them.
Collaborate with Product and Engineering to influence roadmap priorities and secure customer-driven design wins.
Demonstrate a deep understanding of AI datacenter infrastructure, workload trends, and TCO analysis.
Use competitive insights to position Lightmatter’s products as differentiated enablers of performance, efficiency, and scalability.
Go-to-Market & Cross-Functional Collaboration
Partner with Marketing, Product, and Operations to develop joint go-to-market plans and customer engagement programs.
Work with ecosystem partners—including hyperscalers, ISVs, and system integrators—to expand adoption of Lightmatter technologies.
Contribute to regional and global partner strategies, ensuring unified customer experience and execution excellence.
Represent Lightmatter at major industry events, conferences, and customer innovation forums.
Revenue Growth & Operational Execution
Deliver quarterly and annual revenue targets with precision and discipline.
Negotiate and close complex, multi-year agreements in collaboration with Legal, Finance, and Executive leadership.
Provide data-driven insight into pipeline health, deployment timing, and revenue ramp.
Ensure seamless execution of account strategies—balancing long-term partnership development with near-term delivery milestones.

In this role, you will work closely with Tier 1 Semiconductor companies and Cloud Service Provider organizations that are defining the next generation of scale out, HPC, and machine learning systems. You will be working cross functionally within Lightmatter as you define an optimal technical solution. Additionally, you will be collaborating with customers who are domain experts in HPC and ML system architectures to refine these designs based on their application requirements. You should be comfortable supporting deep technical discussions focused on interconnect topologies and how to scale HPC/ML architectures in 1024+ node systems. Excellent interpersonal skills and the ability to work independently are essential to being successful in the role.
Responsibilities
Partner with Lightmatter sales leaders to develop strong relationships with key technical stakeholders, and help them optimize their system architecture by taking advantage of Lightmatter’s interconnect technology
Be a domain expert in high performance computing (HPC) and machine learning system architectures with a focus on high-performance interconnects
Leverage background in HPC/scientific computing algorithms to identify performance bottlenecks related to interconnect and optimize topologies to improve performance
Apply extensive knowledge in high-performance interconnect topologies and function as an expert and principal advisor
Define new and refine existing ideas and design concepts that address scaling HPC and ML system architectures using Lightmatter’s highly disruptive interconnect technology Passage
Explains advanced concepts in HPC/ML system architecture and understands how those concepts relate to other disciplines.

Responsibilities:
Strategic Sales Leadership with the ability develop and execute comprehensive sales strategies to exceed annual sales goals and expand market share within key accounts.
Customer Relationship Management, build and maintain strong, trusted advisor relationships with key influencers and decision-makers across all levels, from C-suite executives to individual engineers.
Solution Selling by leveraging a deep understanding of customer pain points and business objectives to effectively communicate the value of Lightmatter's solutions.
Act as the primary liaison between the customer and internal teams, including engineering, marketing, finance, and legal, to ensure seamless delivery and exceptional customer satisfaction and exceptional cross team collaboration.
Identify, manage, and close new business opportunities. Create accurate sales forecasts and report on pipeline activities to senior management.
Must deliver product feedback to internal business units, ensuring the voice of the customer is represented in go-to-market strategies.
Lead negotiations and successfully close complex deals with large, corporate customers.

Join a cutting-edge team at the forefront of silicon photonics and high-speed transceiver design! As a member of the analog team, you’ll collaborate with our architects and engineers to develop innovative high-speed analog transceiver solutions for next-generation optical and wireline communication systems.
* We are currently hiring for multiple levels for this role. Your level and compensation will be determined by your experience, education, and location.
Responsibilities
Support micro-architecture development with chip architects by conducting feasibility studies
Collaborate with members of our design engineering teams (system, digital, analog, photonics) to define electrical requirements
Design analog/mixed-signal blocks with a focus on transceivers and broadband circuits interfacing with silicon photonics elements
Document design simulations and verifications towards formal design reviews
Drive block-level floorplan, mask design views, and their reviews
Run post-layout and mixed-signal top-level simulations to validate integration
Define production and bench-level test plans
Validate performance of the circuits in the lab

Responsibilities:
Serve as the technical voice of the product externally, driving positioning, messaging, and content that clearly communicates the value of our silicon photonics solutions.
Collaborate closely with product management, engineering, and sales teams to implement product strategy and ensure alignment with customer needs and market trends.
Analyze the evolving AI data center landscape to inform product direction and ensure competitiveness in bandwidth, power efficiency, and system scalability.
Translate complex technology into compelling customer-facing content, including whitepapers, technical presentations, solution briefs, application notes and more.
Lead and support product launches, ensuring clear articulation of product features and system-level benefits to both technical and business audiences.
Represent the company at industry conferences, tradeshows, and technical forums as a technology expert and thought leader and advocate for silicon photonics.

Key Responsibilities:
Define and implement test strategies for PIC characterization, validation, reliability, and manufacturing qualification.
Oversee the design, deployment, and scaling of automated test setups for optical/electrical validation, including probe stations, active alignment systems, and temperature-controlled environments.
Own and drive program-level milestones for test readiness, encompassing test plan definition, execution tracking, schedule risk mitigation, and cross-functional alignment.
Engage with laser and integrated photonics design teams to establish design for test criteria and ensure efficient testability of device characterization DOEs and product designs.
Collaborate cross-functionally with test, design, packaging, firmware, reliability, and systems teams to align test coverage with product and system-level requirements.
Drive smooth transition of test flows from R&D to pilot and high-volume manufacturing, including SPC implementation, yield monitoring, and process optimization.
Coordinate with external foundries, supply chain, and operations to align test capabilities with wafer/die delivery, production timelines, and sourcing constraints.
Establish test plans with external component suppliers and review test reports.
Manage relationships with capital equipment vendors, oversee tool procurement and calibration, and supervise outsourced test and reliability services as needed.
Manage and mentor a team of test engineers supporting laser and photonics test activities at wafer, die, and package levels.

Responsibilities:
Design, layout, and test innovative photonic devices
Contribute and own large-scale photonic layouts for high-volume products
Actively collaborate across disciplines, with analog, digital, and physical design teams to tapeout products and characterization devices.
Actively collaborate with the product, sales, and architectural teams to define requirements for device characterization, debugging, and validation.
Provide thought leadership in the area of expertise and be perceived as a subject matter expert
Works on broader organizational projects that require a thorough understanding of photonics
Accountable for results, which may impact the entire function
Contribute to Process Design Kits (PDK) development

Join a cutting-edge team at the forefront of silicon photonics and analog circuit design! As a member of the analog team, you’ll collaborate with our architects and engineers to develop innovative analog and mixed-signal (AMS) solutions for next-generation optical and wireline communication systems.
* We are currently hiring for multiple levels for this role. Your level and compensation will be determined by your experience, education, and location.
Responsibilities
Design analog/mixed-signal (AMS) circuits used in control loops, such as data converters, amplifiers, drivers, comparators, bandgap, and more
Support micro-architecture development with chip architects by conducting feasibility studies
Collaborate with members of our design engineering teams (system, digital, analog, photonics/laser) to define electrical requirements
Document design simulations and verifications towards formal design reviews
Drive block-level floorplan, mask design views, and their reviews
Run post-layout and mixed-signal top-level simulations to validate integration
Define production and bench-level test plans
Validate the performance of the circuits in the lab

As a Principal Product Line Manager, Optics/Photonics, you will be responsible for leading our product definition, rollout, and product introduction to drive design wins with the industry's top semiconductor and cloud vendors for our photonics-enabled solutions. Reporting directly to the Director, Product Line Management, you will work with engineering, sales, operations, finance, and the executive team to accelerate the adoption of Lightmatter products in segment-defining categories.
Responsibilities
Complete the detailed requirements definition (MRD/PRD), product line family/SKU strategy, and commercial market positioning for the company’s silicon photonics-based hardware offerings with a focus on Passage, Lightmatter’s programmable photonic chiplet interconnect. This also includes evaluation/development systems and other demonstration vehicles.
Drive industry partner relationships and the availability of partner hardware solution offerings including lasers, fibers, connectors, and other related optical components.
Manage the roll-out of the products to lead customers and product lifecycle management through the development process all the way to and through the production phases.
Along with sales, pursue and win key customers by producing collaterals and articulating key advantages and value propositions of Lightmatter products through detailed customer proposals.
Develop pricing guidelines, margin analysis, and forecasts establishing competitiveness and profitability of Lightmatter products in support of corporate financial goals and revenue planning.
Represent Lightmatter at key industry forums and speaking events as a corporate leader and subject matter expert.
Develop high-quality, high-value outbound marketing and product technical collaterals in conjunction with product marketing and technical documentation teams.
Brief top industry press, analysts, and investors relative to Lightmatter product strategies, competitive advantages, and other industry trends.

Join a tight-knit team where each individual’s contributions directly influence the success of the company and product! You'll have the opportunity to build a new kind of computer from the ground up and to solve groundbreaking challenges along the way. Work with people who love to build and who thrive in technically diverse environments where great ideas are prioritized.
Responsibilities:
Contribute to the bring-up and characterization of high-speed optical components, including modulators and photodetectors, ensuring they meet performance expectations.
Design and execute validation and debug plans aligned with product specs, including link-level electro-optic testing using industry-standard equipment like BERTs, DCAs, VNAs, and AWGs.
Collaborate with multidisciplinary teams to identify and resolve issues across optics, electronics, and packaging, leveraging both lab data and simulation tools such as HFSS, ADS, and Lumerical.
Analyze high-speed signal behavior using eye diagrams, bathtub curves, and equalization techniques (CTLE, DFE, FFE) to improve link quality and system robustness.
Configure and optimize DSP-based SerDes to interface with custom optical modules and ensure seamless system integration.
Automate test procedures and analyze results using Python or MATLAB to streamline validation workflows and support engineering decisions.

Responsibilities
Lead foundry process technology development, manufacturing strategies and specifications efforts for our highly integrated photonics-based AI platform and our photonics chiplet communication fabric
Establish silicon fabrication process coverage metrics and establish yield expectations and monitoring mechanisms
Provide manufacturing process solutions for integrated digital/analog/photonic/laser devices
Work closely with Foundry suppliers to ensure manufacturing output specifications are being met
Collaborate with the Design-For-Manufacturing (DFM) team early in the design process
Develop silicon technology and capability roadmap to support Lightmatter product requirement.
Develop initial product fabrication qualification and execution plan
Establish metrics and methodology to characterize and ensure reliable product operation at customer sites over the expected product lifetime
Collaborate with design and /layout team to coordinate product tapeout to foundry.

Responsibilities:
Market & Business Development: Identify, cultivate, and expand new business opportunities in Taiwan and the broader APAC region. Build market intelligence to shape Lightmatter’s go-to-market approach and strengthen our competitive position.
Strategic Account Leadership: Develop and execute account strategies for TW based ASIC design houses, ensuring alignment with both customer priorities and Lightmatter’s business objectives.
Partnership & Ecosystem Building: Forge strong relationships with local partners, suppliers, and ecosystem stakeholders to accelerate the adoption of Lightmatter’s technologies.
Customer Relationship Management: Establish and maintain trusted advisor relationships with C-suite executives, procurement teams, and technical leaders, ensuring long-term engagement and loyalty.
Cross-Functional Liaison: Act as the primary point of contact between customers and internal teams (engineering, product, marketing, finance, legal), ensuring seamless collaboration and customer satisfaction.
Solution Positioning: Leverage deep knowledge of datacenter and AI/ML infrastructure to translate customer challenges into Lightmatter solutions, driving value for both sides.
Voice of Customer: Deliver insights and feedback to internal product and business units to guide strategy, roadmap, and product definition.

This is primarily a creative role where you need to think outside the box to push the boundaries of systems research with the power of silicon photonics at your fingertips. This role will work closely with our product, solutions architecture, and engineering teams to generate impactful performance analysis, create high-performance rack-scale designs, and publish cutting-edge ML systems research.
Responsibilities
Stay up to date on the latest advancements in Machine Learning research.
Build tools to model the performance of Passage-enabled systems when running both training and inference workloads for the latest AI models.
Collaborate with our product and solutions architecture teams to design compelling reference platforms for our customers, powered by Lightmatter technology.
Function as the Machine Learning expert in the room.
Publish academic papers at top industry conferences and journals. Help support the publication of whitepapers, blog posts, and other marketing collateral.

Responsibilities
Lead the optical and mechanical design of laser modules and detachable connectors.
Communicate design concepts, technical findings, and project status clearly.
Contribute to optical design of lenses and FAU arrays, and possess a strong understanding of isolator technologies.
Design packaging solutions for DFB lasers in Chip-on-Submount (CoS) configurations and for heterogeneously integrated III-V on Silicon wafers.
Ensure PIC designs are optimized for assembly processes such as wire bonding and flip-chip bonding.
Select critical components, including optical assemblies, electrical connectors and optical connectors.
Collaborate with suppliers and OSATs (Outsourced Semiconductor Assembly and Test) from prototype manufacturing through validation and into volume production.
Collaborate with cross-functional teams to ensure seamless integration and successful product delivery.

This onsite role, based in Arm’s Austin office, focuses on aligning specific technical requirements with product plans. You’ll collaborate with product, engineering, and architecture teams to support the delivery of compute products that meet real-world needs.
Key Responsibilities
🧩 Feature Planning & Coordination
Work with engineering, architecture, and commercial teams to define and track feature requirements.
Organize and manage incoming requests, ensuring documentation is accurate and up to date.
Support alignment of feature requests with business priorities and product timelines.
Requirements Documentation
Contribute to the creation and maintenance of product requirements for infrastructure-focused products.
Bring together insights from different markets—such as cloud, networking, AI, and HPC—into clear and structured documentation.
Project Coordination
Support the daily coordination of feature development with engineering and program teams.
Track progress and help communicate risks, delays, or changes to product leads.
Customer and Market Insights
Help organize feedback from partners, customer-facing teams, and the broader ecosystem.
Assist in preparing product updates and supporting materials for internal and external use.

You will engage with Arm's partners through web portal and virtual meetings to tackle sophisticated performance debug and optimization problems revolving around Arm IP and Arm-based systems. You will help to debug hardware performance issues during the SoC bring-up stage. You will support Arm's performance debug tools to help partners understand how hardware features impact the workloads of interest, and how to tune hardware and software to achieve the highest level of performance. As your experience grow, you will develop and present training courses on Arm's SoC, performance tooling, and processes to partner development teams. You will also collaborate closely with other product engineering groups, acting as the voice of the customer to motivate change and improve existing products through raising defects, reviewing documentation, crafting software examples and knowledge articles to facilitate proactive learning.

The engineer will assist in the definition, design, and verification of our package designs. The candidate needs to be able to adapt quickly to loosely defined problems and adapt to changes is important.
Layouts using best-known practices for DFM, DFA, Signal and Power Delivery Networks
Work with minimal supervision and approach challenges with enthusiasm and persistence
Bring forward ideas to improve overall team efficiency
Communicate and coordinate with external vendors as necessary

This role sits at the intersection of engineering, CAD, and program coordination, supporting teams that develop and maintain EDA tool environments for synthesis, verification, and design automation.
You’ll work closely with CAD engineers, design teams, and tool vendors to ensure that projects stay organized, documentation is clear, and technical updates are delivered smoothly.

Key Responsibilities:
Design, develop, and test software for our custom hardware platforms using C, C++, and Python.
Collaborate with hardware engineers to define software requirements and ensure seamless hardware-software integration.
Develop and maintain low-level drivers and firmware for various hardware components.
Develop and maintain the command-line interface (CLI) for our hardware platforms.
Participate in the entire software development lifecycle, from concept and design to testing and deployment.
Contribute to the improvement of our DevOps and CI/CD pipelines.
Troubleshoot and resolve software and hardware-related issues.
Write and maintain clear and comprehensive technical documentation.
Participate in code reviews to ensure code quality and adherence to best practices.
Minimum Qualifications:
Bachelor's degree in Computer Science, Electrical Engineering, or a related field.
Approximately 5 years of professional software development experience.
Proficiency in C, C++, and Python.
Experience working in a Linux development environment.
Experience developing software for custom hardware platforms.
Understanding of hardware-software interaction, including low-level interfaces (e.g., SPI, I2C, UART).
Familiarity with DevOps principles and CI/CD tools (e.g., Jenkins, Git).
Strong problem-solving and debugging skills.
Excellent communication and teamwork skills.
Preferred Qualifications:
Experience with embedded systems and real-time operating systems (RTOS).
Experience with OpenBMC (Yocto) and/or Network Operating Systems (NOS).
Experience with CVE (Common Vulnerabilities and Exposures) analysis and resolution.
Experience with scripting languages for automation.
Knowledge of agile development methodologies.
Location: This is a remote position for employees residing within the United States
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Joining our Research and Development team, you will collaborate with experts responsible for the compute, storage, operating systems, and automation tools that enable our trading and research to run 24/7 across the globe. We design, grow, and operate infrastructure at a large scale, including triple-digit petabyte-scale storage and massive CPU and GPU clusters in globally distributed data centers. As such, this is a high-impact role with broad scope, from HPC/AI cluster design and performance tuning, to troubleshooting and automation for thousands of nodes.
Responsibilities
- Design, build, and optimize large-scale distributed GPU compute clusters
- Identify and resolve GPU workloads’ performance bottlenecks across compute, storage, and networking layers
- Collaborate with research and development teams to profile, benchmark, and fine-tune GPU-based workloads
- Automate system deployment, monitoring, and troubleshooting across thousands of nodes
- Collaborate with research, and engineering teams to support evolving workloads
- Own critical infrastructure projects — from concept to implementation and support
- Test and deploy new hardware and software, and partner with vendors to resolve complex issues

We are seeking an ASIC Verification Engineer – Automation to own and evolve the tools, flows, automation, and AI capabilities across the entire DV lifecycle—from UVM testbench bring-up and coverage analytics to large-scale regressions, CI/CD, intelligent triage, and release pipelines. You will collaborate with software, hardware, RTL design, emulation, and post-silicon teams to deliver robust, reproducible, and data-driven DV at scale. Ideal candidates combine hands-on ASIC DV experience with a strong focus on automation, infrastructure, and AI-assisted workflows.
Key Responsibilities:
- Architect, implement, and maintain DV automation, regression infrastructure, and CI/CD pipelines.
- Build scalable, reliable pipelines for multi-simulator execution (VCS, Xcelium, Questa).
- Manage coverage collection/merge, results triage, flake detection, and auto-bisection.
- Apply AI/ML for DV acceleration, log/wave summarization, anomaly detection, failure clustering, and predictive test selection.
- Develop and maintain high-quality datasets, heuristics, and models for bug triage and assignment.
- Standardize tooling, environment setups, runbooks, and automated workflows.
- Integrate lint/CDC/formal flows, observability dashboards (Grafana/Prometheus/ELK), and performance metrics.
- Collaborate with DV and design engineers to ensure coverage closure, debuggability, and reproducibility.
- Mentor peers and document best practices for automation and AI-assisted DV.
Minimum Qualifications:
B.S. in EE, CE, CS, or related field.
5+ years (Mid-Level) / 8–10+ years (Senior) in ASIC DV and DV/EDA automation.
Proficiency in SystemVerilog, UVM, and RTL debugging; experience with at least one major simulator (VCS preferred) and coverage tools.
Strong scripting skills (Python, Shell, Tcl, Make/CMake) and Linux proficiency (containers included).
Hands-on Git/GitHub experience, including CI/CD, protected branches, and workflow automation.
Practical experience applying ML/AI to DV flows, triage, or operations; experience with Python ML/data stack.
Familiarity with job schedulers (SLURM/LSF/PBS/SGE) and license-aware scheduling.
Preferred Qualifications:
M.S. in EE, CE, CS, or related.
Domain expertise in high-speed networking: 50G/100G/400G Ethernet MAC/PCS, UDP/TCP/IP, RDMA/RoCE, IPsec.
Deep AI/ML experience for DV: LLMs for summarization, vector stores (FAISS/Milvus), prompt design, MLOps tools, and model serving at regression scale.
Experience with emulation/prototyping (Palladium, ZeBu, Protium, FPGA) integrated into DV flows.
Observability and metrics-driven operations experience; GitHub Enterprise administration; cloud or hybrid HPC for EDA/model workloads.
Location: This is a remote position for employees residing within the United States
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

The Performance Engineering Team plays a central role in enabling and optimizing performance across Arm’s compute systems. The team’s charter is to model, measure, and optimize performance at scale, ensuring Arm-based solutions achieve world-class efficiency and throughput for diverse workloads—from AI training and inference to scientific and data-intensive computing.

Key Responsibilities:
- Architect and Design: Lead the design of robust, scalable solutions for integrating Cornelis Networks' platform and fabric management software with Kubernetes.
- Develop Kubernetes Operators: Build and maintain custom Kubernetes Operators and Controllers in Go to manage the lifecycle of our software and hardware components within a cluster.
- Cloud-Native Integration: Develop solutions that allow for the seamless orchestration of our high-performance fabric services and platform management tools alongside other containerized workloads.
- Cluster Management: Work on extending Kubernetes for managing specialized hardware, scheduling, and networking requirements unique to HPC and AI workloads.
- Collaborate: Partner with the core platform, fabric, and hardware teams to ensure a cohesive and performant end-to-end solution.
- Upstream Contribution: Engage with the open-source community and contribute to relevant projects within the cloud-native ecosystem.
- Documentation and Best Practices: Author high-quality technical documentation and champion best practices for software development in a cloud-native environment.
- Leverage AI-powered tools to accelerate software development workflows, including intelligent code generation, refactoring, and performance optimization.
- Apply AI-driven techniques for automated code review, testing, and quality assurance to improve reliability and reduce development cycles.
Minimum Qualifications:
Bachelor's or Master’s degree in Computer Science, Computer Engineering, or a related technical field.
5+ years of professional software development experience.
Proven experience in designing and developing solutions for Kubernetes, including building custom operators/controllers using tools like the Operator SDK or Kubebuilder.
Strong proficiency in Go. Experience with C++ or Python is also valuable.
Deep understanding of Kubernetes architecture, including the control plane, networking (CNI), and storage (CSI) interfaces.
Hands-on experience with container technologies such as Docker or containerd.
Demonstrable experience in integrating existing software platforms or services with Kubernetes.
Preferred Qualifications:
Experience with high-performance computing (HPC) or high-performance networking.
Familiarity with performance-sensitive environments and low-latency application requirements.
Experience with monitoring and observability stacks like Prometheus, Grafana, and Fluentd.
Knowledge of CI/CD principles and experience building deployment pipelines.
Contributions to open-source projects in the Kubernetes or cloud-native ecosystem.
Location: This is a remote position for employees residing within the United States
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Responsibilities:
- Own end-to-end performance for distributed AI workloads (training + multi-node inference) across multi-node clusters and diverse fabrics (Omni-Path, Ethernet, InfiniBand).
- Benchmark, characterize, and tune open-source & industry workloads (e.g., Llama, Mixtral, diffusion, BERT/T5, MLPerf) on current and future compute, storage, and network hardware, including vLLM/TensorRT-LLM/Triton serving paths.
- Design and optimize distributed serving topologies (sharded/replicated, tensor/pipe parallel, MoE expert placement), continuous/adaptive batching, KV-cache sharding/offload (CPU/NVMe) & prefix caching, and token streaming with tight p99/p999 SLOs.
- Optimize inferencing: Validate RDMA/GPUDirect RDMA, congestion control, and collective/point-to-point tradeoffs during inference.
- Design experiment plans to isolate scaling bottlenecks (collectives, kernel hot spots, I/O, memory, topology) and deliver clear, actionable deltas with latency-SLO dashboards and queuing analysis.
- Build crisp proof points that compare Cornelis Omni-Path to competing interconnects; translate data into narratives for sales/marketing and lighthouse customers, including cost-per-token and tokens/sec-per-watt for serving.
- Instrument and visualize performance (Nsight Systems, ROCm/Omnitrace, VTune, perf, eBPF, RCCL/NCCL tracing, app timers) plus serving telemetry (Prometheus/Grafana, OpenTelemetry traces, concurrency/queue depth).
- Evangelize best practices through briefs, READMEs, and conference-level presentations on distributed inference patterns and anti-patterns.
Minimum Qualifications:
B.S. in CS/EE/CE/Math or related
5–7+ years running AI/ML at cluster scale.
Proven ability to set up, run, and analyze AI benchmarks; deep intuition for message passing, collectives, scaling efficiency, and bottleneck hunting for both training and low-latency serving.
Hands-on with distributed training beyond single-GPU (DP/TP/PP, ZeRO, FSDP, sharded optimizers) and distributed inference architectures (replicated vs sharded, tensor/KV parallel, MoE).
Practical experience across AI stacks & comms: PyTorch, DeepSpeed, Megatron-LM, PyTorch Lightning; RCCL/NCCL, MPI/Horovod; Triton Inference Server, vLLM, TensorRT-LLM, Ray Serve, KServe.
Comfortable with compilers (GCC/LLVM/Intel/OneAPI) and MPI stacks; Python + shell power user.
Familiarity with network architectures (Omni-Path/OPA, InfiniBand, Ethernet/RDMA/ROCE) and Linux systems at the performance-tuning level, including NIC offloads, CQ moderation, pacing, ECN/RED.
Excellent written and verbal communication—turn measurements into
persuasion with SLO-driven narratives for inference.
Preferred Qualifications:
M.S. in CS/EE/CE/Math or related
Scheduler expertise (SLURM, PBS) and multi-tenant cluster ops.
Hands-on profiling & tracing of GPU/comm paths (Nsight Systems, Nsight Compute, ROCm tools/rocprof/roctracer/omnitrace, VTune, perf, PCP, eBPF).
Experience with NeMo, DeepSpeed, Megatron-LM, FSDP, and collective ops analysis (AllReduce/AllGather/ReduceScatter/Broadcast).
Background in HPC performance engineering or storage (BeeGFS, Lustre, NVMeoF) for data & checkpoint pipelines.
Location: This is a remote position for employees residing within the United States.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Responsibilities:
This role requires close collaboration with RTL, architecture, software/firmware, emulation, and post-silicon validation teams. You will work across cross-functional teams, engaging and driving. Additionally, you will contribute to the development and execution of a DV strategy that aligns with the company’s objectives for first pass silicon success. You will also be responsible for hiring, training, and supporting a team of ASIC engineers to ensure timely and cost-effective product development. You will also be responsible for the people management aspects to support the organizational goals.
Minimum Qualifications:
15 + years’ experience in ASIC/SoC design verification.
5 + years’ experience leading a team of direct reports.
Expertise in design verification tools such as Synopsys VCS and Verdi.
Proven success in first-pass ASIC development.
Experience managing multiple projects and adjusting priorities with stakeholders.
Proficiency in developing UVM constrained random test benches.
Strong understanding of interpreting functional specifications and creating comprehensive test plans.
B.S. or M.S. in Computer Engineering, Electrical Engineering, or related technical field, or equivalent practical experience.
Preferred Qualifications:
Experience in building UVM environments from scratch.
Previous experience in a startup environment, demonstrating adaptability and a hands-on approach.
Expertise in working with networking System on Chips (SOCs).
Strong leadership in a dynamic, fast-paced development environment.
Location: This is a remote position for employees residing within the United States.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Technical Responsibilities:
- Define overall SOC level verification strategy, technical planning, direction
- Enable and drive the development of UVM environments to verify RTL at block, unit, and SoC levels
- Develop and execute functional tests according to verification test plans
Instrument TB for functional and code coverage and drive to closure based on the coverage metrics
- Collaborate with cross-functional teams like design, software, emulation and silicon validation teams towards ensuring the highest design quality
Team Responsibilities:
- Day-to-day guidance and leadership of team members
- Driving results via mentoring, coaching, and counseling
- Education of team in the use of AI tools to enhance productivity and efficiencies
- Generation and enforcement of coding and verification guidelines
Minimum Qualifications:
10 + years of experience with the following:
Hands-on experience with writing code using UVM/System Verilog
Verification for complex SoCs that include multiple clock and reset domains, using VCS or equivalent simulation tools
Debugging fails to the line of RTL, closing out bug fixes, using Verdi or equivalent debug tools
Experience in ground up testbench development
Experience with revision control systems like Git or SVN etc.
B.S. Degree in Computer Engineering, Computer Science, or Electrical Engineering
Preferred Qualifications:
M.S. Degree in Computer Engineering, Computer Science, or Electrical Engineering
15 + years of relevant experience in networking hardware verification, proven expertise in verifying 50G, 100G, 400G Ethernet MAC/PCS protocols, TCP/IP, RDMA/RoCE, IPSec. and their application in high-speed data processing/networking
One or more scripting languages (TCL, Python, Perl, Shell-scripting)
Track record of first-pass success in ASIC and Systems
Location: This is a remote position for employees residing within the United States.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Responsibilities:
- Formulate and execute direct sales strategies to achieve annual sales targets.
- Identify and engage key FSI end customers and generate sales opportunities.
- Maintain, manage, and build the sales pipeline in a CRM tool and report progress weekly.
- Work closely with channel sales, partner sales and OEM teams to advance sales opportunities.
- Deliver presentations of company solutions to target end customers.
- Foster close collaboration with internal product, marketing, and sales teams to nurture the pipeline.
- Harness the technical partner ecosystem to extend field reach and enhance customer engagement.
- Offer customer and market feedback to internal product teams for continuous improvement.
Required Qualifications:
Bachelor’s degree or higher in a STEM-related field
Minimum of 7 years of success selling High Performance Computing (HPC) or Artificial Intelligence (AI) solutions, with proven track record of meeting or exceeding sales quotas
Proven success in a startup “hunter” sales environment. Must demonstrate history in sales environment where >50% of leads are self-generated.
Proven track record of prospecting, building new relationships, and closing multi-million-dollar partnerships with customers across the financial services industry.
In-depth understanding of the technical sales landscape for high-performance network interconnect technologies and how those technologies apply to financial risk-modeling, monte carlo simulation, fraud detection and other critical workloads in FSI
Established relationships with key ecosystem partners and customers within the financial services industry vertical.
Proficiency in Microsoft Office Suite, Salesforce CRM, and lead-generation tools.
Exemplary adherence to the highest ethical standards and integrity.
Desired Qualifications:
Self-motivated and proactive with desire and initiative to drive growth and achieve personal and company goals.
Comfortable operating within a dynamic high-growth environment.
Experience in solution selling, especially in a technically complex environment
Strong leadership acumen to coordinate cross-functional teams effectively.
Demonstrated ability to tackle intricate challenges through innovative thinking and collaboration.
Exceptional organizational, written & verbal communication and negotiation skills
Effective interpersonal skills with a knack for building lasting relationships.
Aptitude for crafting and executing creative, differentiated strategies for sales growth.
Location: This is a remote role within the United States but does require 50% travel to customer sites, events, trade shows and conferences. Preference is for candidates to reside in the East Coast.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Responsibilities:
- Design and implement advanced Ethernet protocols for next-generation Ethernet switch ASICs, focusing on RTL development.
- Develop microarchitecture specifications for Ethernet protocol blocks.
- Implement Ethernet protocols such as Priority Flow Control, TCP, UDP, RoCEv2, VLAN, ECMP, DCQCN, ECN, and Security in Transmit and Receive pipelines using Verilog/System Verilog.
- Collaborate with verification engineers to create block- and system-level test plans to ensure comprehensive design coverage.
- Define timing constraints for RTL blocks and work with Physical Design engineers to optimize timing closure.
- Support post-silicon validation, collaborating with hardware, firmware, and software teams to debug and resolve ASIC issues.
- Contribute to performance optimization and power-aware design strategies for Ethernet subsystems.
Minimum Qualifications:
B.S. or M.S. degree in Computer Engineering, Electrical Engineering, or related field.
10+ years of industry experience in digital design with proficiency in Verilog and System Verilog.
Experience in RTL design for Ethernet protocols relevant to adapters and switches.
Familiarity with timing closure and modern physical design methodologies.
Proven ability in system-level debug and root cause analysis of technical issues.
Strong verbal and written communication skills.
Preferred Qualifications:
Deep knowledge of Ethernet architecture and networking protocols (L2/L3/L4 layers).
Prior experience with Ethernet MAC integration and development of L2/L3/L4 protocols for ASICs, including system debug.
Expertise in multiple clock domain designs and asynchronous interfaces.
10+ years of experience with scripting languages such as TCL, Python, or Perl.
Familiarity with EDA tools like Design Compiler, Spyglass, or PrimeTime.
Location: This is a remote position for employees residing within the United States.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Responsibilities:
- Design and develop high-performance kernel drivers and user-space libraries for our networking hardware.
- Build and optimize networking protocols at L2 (Ethernet), L3 (IP), and L4 (TCP/UDP) layers, tailored for AI/ML workloads.
- Leverage DPDK (Data Plane Development Kit) to create exceptionally fast packet processing pipelines that bypass the kernel for maximum throughput and minimal latency.
- Conduct deep-dive performance analysis and software optimization across the entire stack, identifying and eliminating bottlenecks.
- Collaborate with the hardware team to influence ASIC design and ensure software/hardware co-design principles are met.
- Develop robust testing, validation, and debugging tools for our networking stack.
- Contribute to a culture of technical excellence, continuous improvement, and collaborative problem-solving.
Minimum Qualifications:
Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field.
Proven experience in low-level systems programming with C/C++.
Strong understanding of Linux kernel driver development and internal architecture.
Hands-on experience with DPDK or similar user-space networking frameworks (e.g., VPP, XDP).
Deep knowledge of networking fundamentals and L2, L3, and L4 protocols.
Demonstrated experience in software optimization, profiling, and performance tuning.
A self-motivated and proactive mindset with a strong sense of ownership and the ability to work effectively in a dynamic, fast-paced startup culture.
Excellent teamwork and communication skills.
Preferred Qualifications:
Experience working with Ethernet/Switch ASICs or network processor silicon (e.g., Broadcom, Marvell, NVIDIA, Intel).
Familiarity with RoCE (RDMA over Converged Ethernet) or other RDMA protocols.
Experience in developing software for high-performance network interface cards (NICs) or SmartNICs.
Understanding of the unique networking requirements of distributed AI/ML training workloads (e.g., NCCL, MPI).
Location: This is a remote position for employees residing within the United States
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Responsibilities
- Design, implement, and manage a Linux-based HPC environment with 200+ compute nodes to support development of complex Application-Specific Integrated Circuits (ASICs), including AI-driven logic design, simulation and design verification, synthesis, gate-level verification, and design emulation
- Oversee the administration of batch compute systems including SLURM or LSF for optimal workload management
- Manage and optimize NFS systems and clustered storage infrastructure to support engineering workflows
- Oversee observability systems (monitoring, logging, alerting) to ensure infrastructure performance and reliability and to drive continuous improvement in automation and root-cause analysis
- Drive adoption of "Infrastructure as Code" and automated workflows to reduce manual intervention
- Implement and enforce best practices for system availability, performance tuning, capacity planning, and lifecycle management
- Ensure high availability and performance of critical infrastructure services including VNC, NFS, license servers and GitHub
- Collaborate with engineering teams to understand compute requirements and optimize infrastructure accordingly
- Lead capacity planning and infrastructure expansion initiatives
- Manage resources responsible for on-premises hardware installation, maintenance, and monitoring.
- Drive adoption of AI within the infrastructure team and workflows
Minimum Qualifications
Bachelor's degree in Computer Science, Engineering, or related field (Master's preferred)
Minimum 10 years of experience in Linux systems administration with focus on HPC environments, including experience with ASIC (e.g., Synopsys, Cadence) and software development workflows, strong exposure to EDA tools, and software development environments.
Deep expertise with HPC workload managers (SLURM or LSF)
Strong knowledge of NFS and distributed storage systems
Experience implementing and managing monitoring solutions for large-scale computing environments
Proficiency with infrastructure automation tools and scripting languages (Python, Bash, etc.)
Strong troubleshooting and problem-solving skills and leadership abilities
Hands-on technical expertise to be able to drive issue root
cause analysis and remediations
Preferred Qualifications
Experience with Ansible or similar tools for deploying applications or orchestration of workflows
Experience with CI/CD pipelines and DevOps practices
Familiarity with containerization technologies (Docker, Singularity)
Experience with performance tuning and optimization of HPC workloads
Experience with installation and maintenance of locally hosted LLMs for AI training/inference
Experience with cloud-based infrastructure
Location: This is a remote position for employees residing within the United States. Candidates residing locally to the Wayne, PA metro area is preferred.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Responsibilities
- Develop optimized C code for embedded processors
- Collaborate with architects and hardware engineers when designing software architecture
- Develop and implement standard processes for unit testing; collaborate on CI implementations
- Review design documents and pull requests from other team members
- Create and maintain design documents in collaboration with the engineering team
- Review and provide detailed feedback on firmware and software architecture specifications and design documentation
- Work with the Software Engineering Manager to identify tasks and build schedules
Minimum Qualifications:
Early career to 5 years of experience (we level based on capability)
Bachelor’s degree in Computer Science or a related discipline, or equivalent training and experience
Proficiency in C or C++ (able to read/write bitfields, manage buffers safely, and reason about endianness)
Familiarity with development tools and toolchains, including GCC and/or Clang, Make, CMake, Git, and bug tracking software (e.g., Jira, GitHub)
Familiarity with debugging tools used in embedded environments (e.g., logic analyzers, JTAG debuggers, and innovative logging methods for analysis and debugging)
Comfort working in Linux user space and with basic system tools; familiarity with sockets or message-based I/O
Understanding of binary protocols, including framing, headers, IDs, checksums/CRC, and simple state machines
Exposure to at least one low-speed or board-level interface (I²C/SMBus, SPI, UART) through coursework, labs, or projects
Ability to read technical standards/specifications and translate them into working code and tests
Growth mindset: eager to learn PLDM and MCTP deeply and deliver production-quality code
Preferred Qualifications:
Familiarity with DMTF standards such as MCTP (DSP0236/37/38/39) or PLDM (DSP0240/41/45/48; 0267/0257). Prior production experience is not required—interest and aptitude are sufficient.
Experience with Python for test harnesses, Wireshark dissectors, logic analyzer traces, or OpenBMC tooling (libmctp, libpldm, pldmtool)
Basic understanding of embedded development (RTOS) or Linux kernel subsystems is a plus
Experience with:
ARM or other RISC processors in an embedded environment
RISC-V processors
PCIe/VDM
Location: This is a remote position for employees residing within the United States
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Responsibilities:
- Develop microarchitecture specifications for packet processor and high-speed pipelined data path designs for host ethernet adaptors emphasizing low-latency performance.
- Implement RTL designs using Verilog/System Verilog for high-speed data paths and packet processing logic.
- Collaborate with verification engineers to create block- and system-level test plans to ensure comprehensive design coverage.
- Define timing constraints for RTL blocks and work with Physical Design engineers to optimize timing closure.
- Support post-silicon validation, collaborating with hardware, firmware, and software teams to debug and resolve ASIC issues.
- Contribute to performance optimization and power-aware design strategies for Host Fabric Interface subsystems.
Minimum Qualifications:
B.S. or M.S. degree in Computer Engineering, Electrical Engineering, or related field.
7+ years of post-college experience in digital design with proficiency in Verilog and System Verilog.
Experience in RTL design for high-speed data paths or packet processing in ASICs.
Deep understanding of Host Ethernet adaptor architectures.
Familiarity with timing closure and modern physical design methodologies.
Proven ability in system-level debug and root cause analysis of technical issues.
Strong verbal and written communication skills.
Preferred Qualifications:
Knowledge of Ethernet architecture and networking protocols.
Prior experience with RTL development for Ethernet host adapters and system debug.
Expertise in multiple clock domain designs and asynchronous interfaces.
7+ years of experience with scripting languages such as TCL, Python, or Perl.
Familiarity with EDA tools like Design Compiler, Spyglass, or PrimeTime.
Location: This is a remote position for employees residing within the United States.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Responsibilities:
- Own end-to-end integration of PCIe IP into complex ASIC designs.
- Collaborate with IP vendors, architecture, verification, physical design, and software teams to deliver robust PCIe subsystems.
- Drive performance optimization efforts across the PCIe stack, from PHY tuning to DMA/transaction layer efficiency.
- Contribute to system architecture and microarchitecture discussions with a focus on IO and interconnect scalability.
- Lead silicon bring-up and validation of PCIe links in the lab; work closely with board and firmware teams.
- Debug functional and performance issues at RTL, gate-level, and silicon.
- Ensure compliance with PCIe specifications and participate in interoperability testing where needed.
- Provide mentorship to junior engineers and help define PCIe subsystem development best practices.
- Good understanding of high-bandwidth, low-latency connectivity for high-performance compute platforms
Minimum Qualifications:
BS/MS in Electrical Engineering, Computer Engineering, or related field.
10+ years of industry experience in ASIC/SoC design with a focus on PCIe controller integration.
Proven experience in silicon bring-up and debug of high-speed interfaces.
Solid understanding of PCIe protocol stack (PHY, MAC, TLP, DLL), configuration space, and link training.
Hands-on experience with PCIe verification environments, performance tuning, and power-aware design.
Familiarity with PCIe compliance testing, simulation tools (e.g., VCS, Questa), and lab equipment (e.g., protocol analyzers, oscilloscopes).
Strong scripting (Python, Perl, TCL) and debugging skills.
Strong verbal and written communication skills.
Preferred Qualifications:
Experience with PCIe Gen5/Gen6 and advanced retimer or switch solutions.
Exposure to CXL, CCIX, or other cache-coherent interconnects.
Background in data center or AI/ML accelerator architectures.
Experience with emulation and prototyping platforms (e.g., ZeBu, Palladium, HAPS) for PCIe subsystem validation.
Location: This is a remote position for employees residing within the United States.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

As a member of the Technical Support organization, you will work collaboratively with cross-functional teams to identify partner and end user issues and deliver solutions with world class customer service. Key to this role is the ability to build cross-functional relationships, effectively communicate with highly technical customers, apply critical thinking and problem-solving skills, and take initiative to drive customer satisfaction.
The ideal candidate enjoys having deep technical knowledge and applying their knowledge to solve challenging customer issues while working in a fast-paced environment. This role has a significant impact and contributes to the operational and financial success of the company.
Key Responsibilities:
- Providing professional high quality customer support for Cornelis Networks products.
- Taking technical ownership of user reported issues, driving diagnosis and resolution in a professional and timely manner.
- Managing incoming customer support issues (via emails, portal postings, or phone calls), escalating, when necessary, based on documented procedures, periodically supporting off-hours “on-call” support activities.
- Partnering cross-functionally across all levels of the corporation, working with other technical staff to identify and resolve issues.
- Supporting OEMs and/or system integrators with new cluster installation and acceptance, providing remote or on-site support as needed.
- Proactively ensuring customer readiness for updates; understanding changes and their impacts on customers; providing input for materials and documentation supporting these upgrades or early use of new capabilities.
- Actively participating and advocating for customers during the planning and sustaining periods of our product lifecycle.
Minimum Qualifications:
5 + Years’ experience programming with C, C++, Perl, Python, and/or Fortran, as well as knowledge of Linux scripting.
5 + Years’ experience system troubleshooting, network configuration and troubleshooting.
Strong analytical skills
Related high performance networking experience (administration, support, etc.).
Experience with the installation, configuration, and administration of enterprise Linux servers in clustered environments.
Strong interpersonal, verbal and written communication skills.
A demonstrated ability to work across geographies, time zones, companies, and organizations
B.S. Degree in Engineering, Computer Science, or a related discipline or 5+ years of equivalent experience in hardware or software development, validation, training, or technical support.
Preferred Qualifications:
Experience developing or troubleshooting Linux device drivers.
Knowledge of parallel programming, especially MPI and/or SHMEM, along with experience in troubleshooting in HPC/AI cluster environments.
Two years of experience with more than one Operating System.
InfiniBand experience.
Cluster storage experience.
Ability to analyze, breakdown, understand complex problems and communicate solutions effectively is required.
Ability to manage customer expectations and balance customer requests with Cornelis Networks' business needs.
Experience in providing user support in a network or HPC environment.
Location: This is a remote position for employees residing within the United States. Candidates residing locally to the Wayne, PA metro area is preferred. The ability to travel to Cornelis' headquarters for periodic meetings and training—as well as to partner and customer locations—is required. Travel is typically up to 25%, with peaks of up to 50% during product launches or key customer installation and acceptance periods.
We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

The ideal candidate combines strong business development instincts with strategic relationship management skills to prospect new opportunities and secure sponsorship from influential stakeholders, with the long-term goal of expanding those engagements into multi-year technology and commercial partnerships. This role works closely with Cornelis’ product, sales, and marketing teams to align technology roadmaps with our differentiated capabilities—ensuring Cornelis solutions are embedded in the next generation of AI & HPC deployments.
Key Responsibilities:
- Hunt for new opportunities across AI & HPC customers by proactively identifying, engaging, and developing relationships with key influencers within each organization.
- Open doors at target accounts and establish Cornelis as a trusted partner by leveraging persistence, creativity, and strong customer engagement.
- Align technology roadmaps between Cornelis and cloud-native customers to influence infrastructure decisions in AI , HPC and enterprise deployments.
- Represent Cornelis Networks at events, trade shows, executive forums, and industry conferences.
- Serve as the regional customer advocate internally, providing structured feedback to shape Cornelis’ product and market strategies.
- Operate with grit and resilience, navigating complex, matrixed partner organizations to build alignment from field sellers to executive decision-makers.
Required Qualifications:
10 years of experience in a related field.
Minimum of 7 years of experience in business development, sales hunting, or technology partnerships—preferably in cloud, HPC, or enterprise infrastructure.
Proven track record of prospecting, building new relationships, and closing multi-million-dollar partnerships with AI & HPC customers and reseller partners.
Strong knowledge of HPC, AI, and data center infrastructure, including Ethernet, InfiniBand, and interconnect technologies.
Executive presence, with experience engaging engineering, VP-, and C-level decision-makers.
Proficiency with Salesforce CRM, Microsoft Office Suite, and sales enablement platforms.
Demonstrated resilience and persistence in breaking into new accounts with disruptive technologies.
Candidate must be fluent in spoken and written English.
All applicants must have citizenship of a Gulf country, rights to work and live in GCC or have valid VISA permitting them to do so.
Desired Qualifications:
Experience with AI & HPC customers and respective reseller partners.
Demonstrated ability to scale new opportunities into long-term partnerships with measurable revenue growth.
Familiarity with AI & HPC solution positioning, pricing models, and partner-led go-to-market strategies.
Entrepreneurial mindset with the ability to thrive in a high-growth, competitive market environment.
Collaborative, relationship-oriented style with excellent communication and influencing skills.
Location: This is a remote role within Saudi Arabia but does require 50% travel to customer sites, events, trade shows and conferences.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

The ideal candidate combines strong business development instincts with strategic relationship management skills to prospect new opportunities and secure sponsorship from influential stakeholders, with the long-term goal of expanding those engagements into multi-year technology and commercial partnerships. This role works closely with Cornelis’ product, sales, and marketing teams to align cloud technology roadmaps with our differentiated capabilities ensuring Cornelis solutions are embedded in the next generation of AI cloud deployments.
Key Responsibilities:
- Hunt for new opportunities across cloud service providers and neo-cloud players by proactively identifying, engaging, and developing relationships with key influencers within each organization.
- Open doors at target accounts and establish Cornelis as a trusted partner by leveraging persistence, creativity, and strong customer engagement.
- Align technology roadmaps between Cornelis and cloud-native customers to influence infrastructure decisions in AI and enterprise cloud deployments.
- Represent Cornelis Networks at cloud-hosted events, trade shows, executive forums, and industry conferences.
- Serve as the cloud customer advocate internally, providing structured feedback to shape Cornelis’ product and market strategies.
- Operate with grit and resilience, navigating complex, matrixed partner organizations to build alignment from field sellers to executive decision-makers.
Minimum Qualifications:
Minimum of 7-10 years of experience in business development, sales hunting, or technology partnerships—preferably in cloud, HPC, or enterprise infrastructure.
Proven track record of prospecting, building new relationships, and closing multi-million-dollar partnerships with CSPs, hyperscalers, or OEMs.
Strong knowledge of HPC, AI, and data center infrastructure, including Ethernet, InfiniBand, and interconnect technologies.
Executive presence, with experience engaging engineering, VP-, and C-level decision-makers.
Proficiency with Salesforce CRM, Microsoft Office Suite, and sales enablement platforms.
Demonstrated resilience and persistence in breaking into new accounts with disruptive technologies.
Candidate must be fluent in spoken and written English.
All applicants must have citizenship of an EU country, rights to work and live in EU or have valid VISA permitting them to do so.
Preferred Qualifications:
Experience with cloud service providers, hyperscalers, neo-cloud, or HPC cloud providers.
Demonstrated ability to scale new logos into long-term partnerships with measurable revenue growth.
Familiarity with cloud product positioning, pricing models, and partner-led go-to-market strategies.
Entrepreneurial mindset with the ability to thrive in a high-growth, competitive market environment.
Collaborative, relationship-oriented style with excellent communication and influencing skills.
Location: This is a remote role within EU but does require 50% travel to customer sites, events, trade shows and conferences. Candidates would ideally be based in Belgium.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

As a member of the Technical Support organization, you will work collaboratively with cross-functional teams to identify partner and end user issues and deliver solutions with world class customer service. Key to this role is the ability to build cross-functional relationships, effectively communicate with highly technical customers, apply critical thinking and problem-solving skills, and take initiative to drive customer satisfaction.
The ideal candidate enjoys having deep technical knowledge and applying their knowledge to solve challenging customer issues while working in a fast-paced environment. This role has a significant impact and contributes to the operational and financial success of the company.
Key Responsibilities:
- Providing professional high quality customer support for Cornelis Networks products.
- Taking technical ownership of user reported issues, driving diagnosis and resolution in a professional and timely manner.
- Managing incoming customer support issues (via emails, portal postings, or phone calls), escalating, when necessary, based on documented procedures, periodically supporting off-hours “on-call” support activities.
- Partnering cross-functionally across all levels of the corporation, working with other technical staff to identify and resolve issues.
- Supporting OEMs and/or system integrators with new cluster installation and acceptance, providing remote or on-site support as needed.
- Proactively ensuring customer readiness for updates; understanding changes and their impacts on customers; providing input for materials and documentation supporting these upgrades or early use of new capabilities.
- Actively participating and advocating for customers during the planning and sustaining periods of our product lifecycle.
Minimum Qualifications:
5 + Years’ experience programming with C, C++, Perl, Python, and/or Fortran, as well as knowledge of Linux scripting.
5 + Years’ experience system troubleshooting, network configuration and troubleshooting.
Strong analytical skills
Related high performance networking experience (administration, support, etc.).
Experience with the installation, configuration, and administration of enterprise Linux servers in clustered environments.
Strong interpersonal, verbal and written communication skills.
A demonstrated ability to work across geographies, time zones, companies, and organizations.
B.S. Degree in Engineering, Computer Science, or a related discipline or 5+ years of equivalent experience in hardware or software development, validation, training, or technical support.
Preferred Qualifications:
Experience developing or troubleshooting Linux device drivers.
Knowledge of parallel programming, especially MPI and/or SHMEM, along with experience in troubleshooting in HPC/AI cluster environments.
Two years of experience with more than one Operating System.
InfiniBand experience.
Cluster storage experience.
Ability to analyze, breakdown, understand complex problems and communicate solutions effectively is required.
Ability to manage customer expectations and balance customer requests with Cornelis Networks' business needs.
Experience in providing user support in a network or HPC environment.
Location: This is a remote position for employees residing within Germany, the United Kingdom, or Belgium. The ability to travel to Cornelis' headquarters for periodic meetings and training—as well as to partner and customer locations—is required. Travel is typically up to 25%, with peaks of up to 50% during product launches or key customer installation and acceptance periods.
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Key Responsibilities
- Implementation of the advanced ethernet protocols for the next generation ethernet ASICs.
- Write micro architecture specifications.
- Pre-silicon: Define timing constraints for the RTL block and work with Physical Design engineers for timing optimization.
- Pre-silicon: Work with the verification engineers to create block/system level test plans and ensure that the design is adequately covered.
- Post-silicon: Work with hardware/firmware/software engineers to resolve ASIC issues.
Minimum Qualifications
B.S. or M.S. degree in Computer or Electrical Engineering.
10+ years of post-college experience in digital design with proficiency in Verilog, System Verilog.
Relevant experience in timing closure and familiarity with modern physical design methodologies.
Demonstrated strength with system level debug and determining root cause of technical issues.
Strong verbal and written communications skills.
Preferred Qualifications
Experience with Ethernet architecture and Networking protocols.
Familiarity and experience in design of some ethernet protocols applicable to ethernet adapters and switches.
Experience with multiple clock designs and asynchronous interfaces
Job Location: This role may work remote from Costa Rica via an EOR (Employer of Record).
At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives.
Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

SOLVING THE HARDEST PROBLEMS IN NATIONAL SECURITY
Protecting U.S. residents and visitors requires sophisticated technological solutions to counter increasingly complex, AI-enabled threats. As adversaries deploy autonomous systems, coordinate multi-domain operations, and leverage advanced AI capabilities, our capabilities must evolve to match this complexity. The PNNL national security mission employs our expert engineering talent—self-driven innovators who thrive in ambiguity—to architect and deliver next-generation AI systems, multi-intelligence fusion platforms, and large-scale analytics capabilities that form the backbone of our nation's security infrastructure.
THE CHALLENGE IS REAL. THE PROBLEMS ARE HARD. THE IMPACT IS IMMEDIATE
You'll tackle problems that don't exist elsewhere from detecting and preventing smuggling of drugs and contraband to correlating petabytes of multi-modal intelligence data to solve National Security Problems. Every system you architect, every algorithm you optimize, and every deployment you lead directly strengthens our nation's security posture. This work demands relentless drive—the kind of engineer who sees obstacles as puzzles to solve and treats mission success as personal responsibility.
PNNL's advanced AI engineering initiatives—spanning agentic AI architectures, petabyte-scale data orchestration, and real-time multi-intelligence processing—are defining the future of national security technology. With your expertise in scalable system design, AI/ML engineering, and complex data architectures, you'll contribute to revolutionary capabilities that process multi-modal intelligence data at unprecedented scale and speed. We operate with startup agility and urgency, rapidly iterating and deploying solutions where failure isn't an option.
APPLYING YOUR ENGINEERING EXPERTISE TO CHALLENGES IMPACTING NATIONAL SECURITY
As a Lead Software Engineer on PNNL's Software Engineering and Architectures (SEA) team, you will architect and lead the development of mission-critical AI systems, design systems at scale in support of mission objectives, and drive technical strategy for some of the nation's most sophisticated intelligence and security applications. You'll take ownership from concept to deployment, mentor and develop staff, collaborate with research scientists and domain experts, and translate cutting-edge research into production-ready systems deployed in highly secure environments—all while maintaining the entrepreneurial mindset that turns breakthrough research into operational reality.
HANDS-ON IMPACT ACROSS DIVERSE, HIGH-STAKES CHALLENGES
System Architecture – Design and build distributed platforms in state-of-the-art secure facilities, architect high-availability microservices, conduct performance testing on enterprise-scale systems
Field Deployment – Deploy and optimize software systems in operational environments, troubleshoot mission-critical applications in real-world conditions, collaborate directly with system operators and infrastructure teams
Performance Engineering – Build ultra-high-performance APIs that handle millions of requests per second, architect systems that operate under extreme performance constraints, solve scalability challenges at national scale
Real-Time Systems – Build platforms that process streaming data with sub-millisecond latency, architect fault-tolerant systems that achieve 99.99% uptime, optimize performance for mission-critical operations
What You'll Architect & Build:
Software Platforms – Design scalable microservices architectures that coordinate across multiple domains, build robust API gateways for classified environments, architect service mesh implementations that handle the nation's most sensitive traffic
High-Performance Processing Systems – Lead distributed system design processing data from hundreds of sources simultaneously, architect real-time streaming platforms that handle terabytes per hour, design event-driven architectures that scale horizontally across data centers
Mission-Critical Infrastructure – Design container orchestration platforms that span multiple security domains, architect CI/CD pipelines processing millions of deployments, build monitoring and observability systems across secure enclaves
Technical Leadership with Impact – Mentor engineering teams building systems that push the boundaries of software architecture, drive technical decisions that influence platform strategy, lead cross-functional initiatives spanning multiple development teams
Technical Expertise Expected:
Programming Excellence – Expert-level Python, C#/.Net, with proficiency in JavaScript/TypeScript, [Java, C/C++, Rust]
Cloud & Infrastructure – AWS/Azure architecture, Kubernetes orchestration, Infrastructure as Code, [GCP, Multi-cloud, Edge computing]
Data & Storage – S3, Elasticsearch/OpenSearch, PostgreSQL, MongoDB, Redshift, Delta Lake, Vector stores
AI/ML Engineering – Agentic AI systems, LLMs, model deployment & orchestration, MLOps platforms, distributed training, model serving at scale, vector databases, graph analytics
Intelligence Systems – Data fusion, GEOINT processing, signals analysis, threat correlation
Data Engineering – Spark/Databricks, Kafka/streaming, data lake/mesh architectures, real-time analytics, distributed computing
Specialized Systems – Geospatial processing frameworks, time-series databases, distributed computing, secure enclaves
Leadership Tools – Agile/SAFe methodologies, GitLab Enterprise, advanced CI/CD, security-first DevOps
Impact & Growth:
Lead technical strategy for AI-powered national security systems while mentoring the next generation of engineers. Your architectural decisions will influence platforms processing intelligence data at national scale, directly contributing to our nation's security posture.
National Interest Project Examples:
Detect and prevent smuggling of drugs and contraband at ports of entry [Link]
Develop large data pipelines to thwart funding for terrorists, nuclear proliferators, drug cartels, and rogue leaders [Link]
Applying big data solutions to national security problems [Link]
Applying image classification for nuclear forensics analysis [Link]
Develop capabilities for scalable geospatial analytics [Link]
Use remotely sensed imagery to identify and monitor the progression of wildfires [Link]
Analyze the resiliency of the electric power grid to prevent large-scale outages [Link]
Optimize building efficiency using IOT and ICS data with automated demand-response markets [Link]
Model climate change and impacts to civilization [Link]
Hunt for the existence of dark matter to understand the nature of the universe [Link]
This position is based in Richland, WA and requires an onsite presence Monday through Thursday, with Friday as required by business needs.

Protecting U.S. residents and visitors requires sophisticated technological solutions to counter increasingly complex, AI-enabled threats. As adversaries deploy autonomous systems, coordinate multi-domain operations, and leverage advanced AI capabilities, our capabilities must evolve to match this complexity. The PNNL national security mission employs our expert engineering talent—self-driven innovators who thrive in ambiguity—to architect and deliver next-generation AI systems, multi-intelligence fusion platforms, and large-scale analytics capabilities that form the backbone of our nation's security infrastructure.
THE CHALLENGE IS REAL. THE PROBLEMS ARE HARD. THE IMPACT IS IMMEDIATE
You'll tackle problems that don't exist elsewhere, from detecting and preventing smuggling of drugs and contraband to correlating petabytes of multi-modal intelligence data to solve National Security Problems. Every system you architect, every algorithm you optimize, and every deployment you lead directly strengthens our nation's security posture. This work demands relentless drive—the kind of engineer who sees obstacles as puzzles to solve and treats mission success as personal responsibility.
PNNL's advanced AI engineering initiatives—spanning agentic AI architectures, petabyte-scale data orchestration, and real-time multi-intelligence processing—are defining the future of national security technology. With your expertise in scalable system design, AI/ML engineering, and complex data architectures, you'll contribute to revolutionary capabilities that process multi-modal intelligence data at unprecedented scale and speed. We operate with startup agility and urgency, rapidly iterating and deploying solutions where failure isn't an option.
APPLYING YOUR ENGINEERING EXPERTISE TO CHALLENGES IMPACTING NATIONAL SECURITY
As a Lead Software Engineer on PNNL's Software Engineering and Architectures (SEA) team, you will architect and lead the development of mission-critical AI systems, design systems at scale in support of mission objectives, and drive technical strategy for some of the nation's most sophisticated intelligence and security applications. You'll take ownership from concept to deployment, mentor and develop staff, collaborate with research scientists and domain experts, and translate cutting-edge research into production-ready systems deployed in highly secure environments—all while maintaining the entrepreneurial mindset that turns breakthrough research into operational reality.
HANDS-ON IMPACT ACROSS DIVERSE, HIGH-STAKES CHALLENGES:
System Architecture – Design and build distributed platforms in state-of-the-art secure facilities, architect high-availability microservices, conduct performance testing on enterprise-scale systems
Field Deployment – Deploy and optimize software systems in operational environments, troubleshoot mission-critical applications in real-world conditions, collaborate directly with system operators and infrastructure teams
Performance Engineering – Build ultra-high-performance APIs that handle millions of requests per second, architect systems that operate under extreme performance constraints, solve scalability challenges at national scale
Real-Time Systems – Build platforms that process streaming data with sub-millisecond latency, architect fault-tolerant systems that achieve 99.99% uptime, optimize performance for mission-critical operations
What You'll Architect & Build:
Software Platforms – Design scalable microservices architectures that coordinate across multiple domains, build robust API gateways for classified environments, architect service mesh implementations that handle the nation's most sensitive traffic
High-Performance Processing Systems – Lead distributed system design processing data from hundreds of sources simultaneously, architect real-time streaming platforms that handle terabytes per hour, design event-driven architectures that scale horizontally across data centers
Mission-Critical Infrastructure – Design container orchestration platforms that span multiple security domains, architect CI/CD pipelines processing millions of deployments, build monitoring and observability systems across secure enclaves
Technical Leadership with Impact – Mentor engineering teams building systems that push the boundaries of software architecture, drive technical decisions that influence platform strategy, lead cross-functional initiatives spanning multiple development teams
Technical Expertise Expected:
Programming Excellence – Expert-level Python, C#/.Net, with proficiency in JavaScript/TypeScript, [Java, C/C++, Rust]
Cloud & Infrastructure – AWS/Azure architecture, Kubernetes orchestration, Infrastructure as Code, [GCP, Multi-cloud, Edge computing]
Data & Storage – S3, Elasticsearch/OpenSearch, PostgreSQL, MongoDB, Redshift, Delta Lake, Vector stores
AI/ML Engineering – Agentic AI systems, LLMs, model deployment & orchestration, MLOps platforms, distributed training, model serving at scale, vector databases, graph analytics
Intelligence Systems – Data fusion, GEOINT processing, signals analysis, threat correlation
Data Engineering – Spark/Databricks, Kafka/streaming, data lake/mesh architectures, real-time analytics, distributed computing
Specialized Systems – Geospatial processing frameworks, time-series databases, distributed computing, secure enclaves
Leadership Tools – Agile/SAFe methodologies, GitLab Enterprise, advanced CI/CD, security-first DevOps
Impact & Growth:
Lead technical strategy for AI-powered national security systems while mentoring the next generation of engineers. Your architectural decisions will influence platforms processing intelligence data at national scale, directly contributing to our nation's security posture.
National Interest Project Examples:
Detect and prevent smuggling of drugs and contraband at ports of entry [Link]
Develop large data pipelines to thwart funding for terrorists, nuclear proliferators, drug cartels, and rogue leaders [Link]
Applying big data solutions to national security problems [Link]
Applying image classification for nuclear forensics analysis [Link]
Develop capabilities for scalable geospatial analytics [Link]
Use remotely sensed imagery to identify and monitor the progression of wildfires [Link]
Analyze the resiliency of the electric power grid to prevent large-scale outages [Link]
Optimize building efficiency using IOT and ICS data with automated demand-response markets [Link]
Model climate change and impacts to civilization [Link]
Hunt for the existence of dark matter to understand the nature of the universe [Link]
This position is based in Seattle, WA, and requires an on-site presence Monday through Thursday, with Friday as required by business needs.

Protecting U.S. residents and visitors requires sophisticated technological solutions to counter increasingly complex, AI-enabled threats. As adversaries deploy autonomous systems, coordinate multi-domain operations, and leverage advanced AI capabilities, our capabilities must evolve to match this complexity. The PNNL national security mission employs our expert engineering talent—self-driven innovators who thrive in ambiguity—to architect and deliver next-generation AI systems, multi-intelligence fusion platforms, and large-scale analytics capabilities that form the backbone of our nation's security infrastructure.
THE CHALLENGE IS REAL. THE PROBLEMS ARE HARD. THE IMPACT IS IMMEDIATE
You'll tackle problems that don't exist elsewhere from detecting and preventing smuggling of drugs and contraband to correlating petabytes of multi-modal intelligence data to solve National Security Problems. Every system you architect, every algorithm you optimize, and every deployment you lead directly strengthens our nation's security posture. This work demands relentless drive—the kind of engineer who sees obstacles as puzzles to solve and treats mission success as personal responsibility.
PNNL is leading the charge in advanced AI engineering, driving innovation in areas such as real-time multi-intelligence processing, petabyte-scale data orchestration, and agentic AI system architectures to support national security initiatives. We are looking for a mid-career Software Engineer to play a key role in advancing these groundbreaking technologies. In this position, you will leverage your experience in scalable system design, AI/ML engineering, and data architecture to develop revolutionary capabilities for processing multi-modal intelligence data with unprecedented speed and scale. You’ll work in a fast-paced, mission-driven environment that combines startup-style agility with the rigor required for high-impact national security applications.
APPLYING YOUR ENGINEERING EXPERTISE TO CHALLENGES IMPACTING NATIONAL SECURITY
As a Software Engineer on PNNL's Software Engineering and Architectures (SEA) team, you will architect and lead the development of mission-critical AI systems, design systems at scale in support of mission objectives, and drive technical strategy for some of the nation's most sophisticated intelligence and security applications. You'll take ownership from concept to deployment, mentor and develop staff, collaborate with research scientists and domain experts, and translate cutting-edge research into production-ready systems deployed in highly secure environments—all while maintaining the entrepreneurial mindset that turns breakthrough research into operational reality.
HANDS-ON IMPACT ACROSS DIVERSE, HIGH-STAKES CHALLENGES
System Architecture – Design and build distributed platforms in state-of-the-art secure facilities, architect high-availability microservices, conduct performance testing on enterprise-scale systems
Field Deployment – Deploy and optimize software systems in operational environments, troubleshoot mission-critical applications in real-world conditions, collaborate directly with system operators and infrastructure teams
Performance Engineering – Build ultra-high-performance APIs that handle millions of requests per second, architect systems that operate under extreme performance constraints, solve scalability challenges at national scale
Real-Time Systems – Build platforms that process streaming data with sub-millisecond latency, architect fault-tolerant systems that achieve 99.99% uptime, optimize performance for mission-critical operations
What You'll Architect & Build:
Software Platforms – Develop scalable microservices architectures that coordinate across multiple domains, build robust API gateways for classified environments, architect service mesh implementations that handle the nation's most sensitive traffic
High-Performance Processing Systems – Lead distributed system design processing data from hundreds of sources simultaneously, architect real-time streaming platforms that handle terabytes per hour, design event-driven architectures that scale horizontally across data centers
Mission-Critical Infrastructure – Develop container orchestration platforms that span multiple security domains, architect CI/CD pipelines processing millions of deployments, build monitoring and observability systems across secure enclaves
Technical Leadership with Impact – Mentor engineering teams building systems that push the boundaries of software architecture, drive technical decisions that influence platform strategy, lead cross-functional initiatives spanning multiple development teams
Technical Expertise Expected:
Programming Excellence – Advanced Python, C#/.Net, with proficiency in JavaScript/TypeScript, [Java, C/C++, Rust]
Cloud & Infrastructure – AWS/Azure architecture, Kubernetes orchestration, Infrastructure as Code, [GCP, Multi-cloud, Edge computing]
Data & Storage – S3, Elasticsearch/OpenSearch, PostgreSQL, MongoDB, Redshift, Delta Lake, Vector stores
AI/ML Engineering – Agentic AI systems, LLMs, model deployment & orchestration, MLOps platforms, distributed training, model serving at scale, vector databases, graph analytics
Intelligence Systems – Data fusion, GEOINT processing, signals analysis, threat correlation
Data Engineering – Spark/Databricks, Kafka/streaming, data lake/mesh architectures, real-time analytics, distributed computing
Specialized Systems – Geospatial processing frameworks, time-series databases, distributed computing, secure enclaves
Leadership Tools – Agile/SAFe methodologies, GitLab Enterprise, advanced CI/CD, security-first DevOps
National Interest Project Examples:
Detect and prevent smuggling of drugs and contraband at ports of entry [Link]
Develop large data pipelines to thwart funding for terrorists, nuclear proliferators, drug cartels, and rogue leaders [Link]
Applying big data solutions to national security problems [Link]
Applying image classification for nuclear forensics analysis [Link]
Develop capabilities for scalable geospatial analytics [Link]
Use remotely sensed imagery to identify and monitor the progression of wildfires [Link]
Analyze the resiliency of the electric power grid to prevent large-scale outages [Link]
Optimize building efficiency using IOT and ICS data with automated demand-response markets [Link]
Model climate change and impacts to civilization [Link]
Hunt for the existence of dark matter to understand the nature of the universe [Link]
This position is based in Richland, WA and requires an onsite presence Monday through Thursday, with Friday as required by business needs.

Our Science & Technology directorates include National Security, Earth and Biological Sciences, Physical and Computational Sciences, and Energy and Environment. In addition, we have an Environmental Molecular Sciences Laboratory, a Department of Energy, Office of Science user facility housed on the PNNL campus.
The National Security Directorate (NSD) drives science-based, mission-focused solutions to take on complex, real-world threats to our nation and the world.
The AI and Data Analytics Division, part of the National Security Directorate, combines profound domain expertise and creative integration of advanced hardware and software to deliver computational solutions that address complex data and analytic challenges. Working in multidisciplinary teams, we connect foundational research to engineering to operations, providing the tools to innovate quickly and field results faster. Our strengths are integrated across the data analytics lifecycle, from data acquisition and management to analysis and decision support.
Responsibilities
SOLVING THE HARDEST PROBLEMS IN NATIONAL SECURITY
Protecting U.S. residents and visitors requires sophisticated technological solutions to counter increasingly complex, AI-enabled threats. As adversaries deploy autonomous systems, coordinate multi-domain operations, and leverage advanced AI capabilities, our capabilities must evolve to match this complexity. The PNNL national security mission employs our expert engineering talent—self-driven innovators who thrive in ambiguity—to architect and deliver next-generation AI systems, multi-intelligence fusion platforms, and large-scale analytics capabilities that form the backbone of our nation's security infrastructure.
THE CHALLENGE IS REAL. THE PROBLEMS ARE HARD. THE IMPACT IS IMMEDIATE
You'll tackle problems that don't exist elsewhere from detecting and preventing smuggling of drugs and contraband to correlating petabytes of multi-modal intelligence data to solve National Security Problems. Every system you architect, every algorithm you optimize, and every deployment you lead directly strengthens our nation's security posture. This work demands relentless drive—the kind of engineer who sees obstacles as puzzles to solve and treats mission success as personal responsibility.
PNNL is leading the charge in advanced AI engineering, driving innovation in areas such as real-time multi-intelligence processing, petabyte-scale data orchestration, and agentic AI system architectures to support national security initiatives. We are looking for a mid-career Software Engineer to play a key role in advancing these groundbreaking technologies. In this position, you will leverage your experience in scalable system design, AI/ML engineering, and data architecture to develop revolutionary capabilities for processing multi-modal intelligence data with unprecedented speed and scale. You’ll work in a fast-paced, mission-driven environment that combines startup-style agility with the rigor required for high-impact national security applications.
APPLYING YOUR ENGINEERING EXPERTISE TO CHALLENGES IMPACTING NATIONAL SECURITY
As a Software Engineer on PNNL's Software Engineering and Architectures (SEA) team, you will architect and lead the development of mission-critical AI systems, design systems at scale in support of mission objectives, and drive technical strategy for some of the nation's most sophisticated intelligence and security applications. You'll take ownership from concept to deployment, mentor and develop staff, collaborate with research scientists and domain experts, and translate cutting-edge research into production-ready systems deployed in highly secure environments—all while maintaining the entrepreneurial mindset that turns breakthrough research into operational reality.
HANDS-ON IMPACT ACROSS DIVERSE, HIGH-STAKES CHALLENGES
System Architecture – Design and build distributed platforms in state-of-the-art secure facilities, architect high-availability microservices, conduct performance testing on enterprise-scale systems
Field Deployment – Deploy and optimize software systems in operational environments, troubleshoot mission-critical applications in real-world conditions, collaborate directly with system operators and infrastructure teams
Performance Engineering – Build ultra-high-performance APIs that handle millions of requests per second, architect systems that operate under extreme performance constraints, solve scalability challenges at national scale
Real-Time Systems – Build platforms that process streaming data with sub-millisecond latency, architect fault-tolerant systems that achieve 99.99% uptime, optimize performance for mission-critical operations
What You'll Architect & Build:
Software Platforms – Develop scalable microservices architectures that coordinate across multiple domains, build robust API gateways for classified environments, architect service mesh implementations that handle the nation's most sensitive traffic
High-Performance Processing Systems – Lead distributed system design processing data from hundreds of sources simultaneously, architect real-time streaming platforms that handle terabytes per hour, design event-driven architectures that scale horizontally across data centers
Mission-Critical Infrastructure – Develop container orchestration platforms that span multiple security domains, architect CI/CD pipelines processing millions of deployments, build monitoring and observability systems across secure enclaves
Technical Leadership with Impact – Mentor engineering teams building systems that push the boundaries of software architecture, drive technical decisions that influence platform strategy, lead cross-functional initiatives spanning multiple development teams
Technical Expertise Expected:
Programming Excellence – Advanced Python, C#/.Net, with proficiency in JavaScript/TypeScript, [Java, C/C++, Rust]
Cloud & Infrastructure – AWS/Azure architecture, Kubernetes orchestration, Infrastructure as Code, [GCP, Multi-cloud, Edge computing]
Data & Storage – S3, Elasticsearch/OpenSearch, PostgreSQL, MongoDB, Redshift, Delta Lake, Vector stores
AI/ML Engineering – Agentic AI systems, LLMs, model deployment & orchestration, MLOps platforms, distributed training, model serving at scale, vector databases, graph analytics
Intelligence Systems – Data fusion, GEOINT processing, signals analysis, threat correlation
Data Engineering – Spark/Databricks, Kafka/streaming, data lake/mesh architectures, real-time analytics, distributed computing
Specialized Systems – Geospatial processing frameworks, time-series databases, distributed computing, secure enclaves
Leadership Tools – Agile/SAFe methodologies, GitLab Enterprise, advanced CI/CD, security-first DevOps
National Interest Project Examples:
Detect and prevent smuggling of drugs and contraband at ports of entry [Link]
Develop large data pipelines to thwart funding for terrorists, nuclear proliferators, drug cartels, and rogue leaders [Link]
Applying big data solutions to national security problems [Link]
Applying image classification for nuclear forensics analysis [Link]
Develop capabilities for scalable geospatial analytics [Link]
Use remotely sensed imagery to identify and monitor the progression of wildfires [Link]
Analyze the resiliency of the electric power grid to prevent large-scale outages [Link]
Optimize building efficiency using IOT and ICS data with automated demand-response markets [Link]
Model climate change and impacts to civilization [Link]
Hunt for the existence of dark matter to understand the nature of the universe [Link]
This position is based in Seattle, WA, and requires an on-site presence Monday through Thursday, with Friday as required by business needs.

Our Science & Technology directorates include National Security, Earth and Biological Sciences, Physical and Computational Sciences, and Energy and Environment. In addition, we have an Environmental Molecular Sciences Laboratory, a Department of Energy, Office of Science user facility housed on the PNNL campus.
The National Security Directorate (NSD) drives science-based, mission-focused solutions to take on complex, real-world threats to our nation and the world.
The AI and Data Analytics Division, part of the National Security Directorate, combines profound domain expertise and creative integration of advanced hardware and software to deliver computational solutions that address complex data and analytic challenges. Working in multidisciplinary teams, we connect foundational research to engineering to operations, providing the tools to innovate quickly and field results faster. Our strengths are integrated across the data analytics lifecycle, from data acquisition and management to analysis and decision support.
Responsibilities
SOLVING THE HARDEST PROBLEMS IN NATIONAL SECURITY
Protecting U.S. residents and visitors requires sophisticated technological solutions to counter increasingly complex, AI-enabled threats. As adversaries deploy autonomous systems, coordinate multi-domain operations, and leverage advanced AI capabilities, our capabilities must evolve to match this complexity. The PNNL national security mission employs our expert engineering talent—self-driven innovators who thrive in ambiguity—to architect and deliver next-generation AI systems, multi-intelligence fusion platforms, and large-scale analytics capabilities that form the backbone of our nation's security infrastructure.
THE CHALLENGE IS REAL. THE PROBLEMS ARE HARD. THE IMPACT IS IMMEDIATE
You'll tackle problems that don't exist elsewhere, from detecting and preventing smuggling of drugs and contraband to correlating petabytes of multi-modal intelligence data to solve National Security Problems. Every system, every algorithm, and every deployment you support directly strengthens our nation's security posture. This work demands relentless drive—the kind of engineer who sees obstacles as puzzles to solve and treats mission success as personal responsibility.
PNNL is at the forefront of advanced AI engineering, driving innovation in areas like real-time multi-intelligence processing, petabyte-scale data management, and next-generation AI architectures for national security applications. We are seeking a motivated Software Engineer to join our dynamic team and contribute to cutting-edge projects that shape the future of critical technology. In this role, you'll work alongside experienced engineers to design, build, and optimize scalable systems for processing multi-modal intelligence data at a remarkable scale and speed. This is a unique opportunity to grow your skills in AI/ML engineering, data orchestration, and system architecture while delivering impactful solutions in a fast-paced environment where agility and reliability are key.
APPLYING YOUR ENGINEERING EXPERTISE TO CHALLENGES IMPACTING NATIONAL SECURITY
As a Software Engineer on PNNL's Software Engineering and Architectures (SEA) team, you'll contribute to mission-critical AI systems, help build enterprise-scale data platforms, and gain hands-on experience with some of the nation's most sophisticated intelligence and security applications. You'll receive mentorship from senior engineers, collaborate with research scientists, and see your code deployed in production systems that serve national security missions.
What You'll Build & Learn:
AI Systems Development – Contribute to agentic AI implementations, work with LLM integrations, learn MLOps best practices on real-world applications
Intelligence Platform Development – Build components for multi-INT fusion systems, develop GEOINT processing tools, contribute to threat analysis platforms
Data Engineering – Develop data pipelines, build analytics workflows, contribute to cloud-native data architectures
Professional Growth – Receive mentorship from senior engineers, participate in code reviews, learn enterprise software development practices
Technical Skills (We'll Help You Grow):
Programming Foundation – Solid Python skills, experience with one additional language (C#/.Net, JavaScript, Java, or similar)
Platform Experience – Basic cloud experience (AWS, Azure, or GCP), familiarity with Linux environments
Development Practices – Understanding of version control (Git), exposure to CI/CD, basic Agile/Scrum experience
Data & Databases – Experience with SQL, basic understanding of data processing concepts
Eagerness to Learn – Interest in AI/ML technologies, geospatial systems, large-scale data processing
What We Offer:
Accelerated Learning – Work with cutting-edge AI and data technologies while receiving guidance from industry experts
Meaningful Impact – See your contributions deployed in systems that protect national security
Career Growth – Clear advancement path with mentorship and technical skill development opportunities
Cutting-Edge Technology – Hands-on experience with the latest AI, cloud, and data engineering technologies
National Interest Project Examples:
Detect and prevent smuggling of drugs and contraband at ports of entry [Link]
Develop large data pipelines to thwart funding for terrorists, nuclear proliferators, drug cartels, and rogue leaders [Link]
Applying big data solutions to national security problems [Link]
Applying image classification for nuclear forensics analysis [Link]
Develop capabilities for scalable geospatial analytics [Link]
Use remotely sensed imagery to identify and monitor the progression of wildfires [Link]
Analyze the resiliency of the electric power grid to prevent large-scale outages [Link]
Optimize building efficiency using IOT and ICS data with automated demand-response markets [Link]
Model climate change and impacts to civilization [Link]
Hunt for the existence of dark matter to understand the nature of the universe [Link]
This position is based in Richland, WA and requires an onsite presence Monday though Thursday, with Friday as required by business needs.

Protecting U.S. residents and visitors requires sophisticated technological solutions to counter increasingly complex, AI-enabled threats. As adversaries deploy autonomous systems, coordinate multi-domain operations, and leverage advanced AI capabilities, our capabilities must evolve to match this complexity. The PNNL national security mission employs our expert engineering talent—self-driven innovators who thrive in ambiguity—to architect and deliver next-generation AI systems, multi-intelligence fusion platforms, and large-scale analytics capabilities that form the backbone of our nation's security infrastructure.
THE CHALLENGE IS REAL. THE PROBLEMS ARE HARD. THE IMPACT IS IMMEDIATE
You'll tackle problems that don't exist elsewhere from detecting and preventing smuggling of drugs and contraband to correlating petabytes of multi-modal intelligence data to solve National Security Problems. Every system, every algorithm, and every deployment you support directly strengthens our nation's security posture. This work demands relentless drive—the kind of engineer who sees obstacles as puzzles to solve and treats mission success as personal responsibility.
PNNL is at the forefront of advanced AI engineering, driving innovation in areas like real-time multi-intelligence processing, petabyte-scale data management, and next-generation AI architectures for national security applications. We are seeking a motivated Software Engineer to join our dynamic team and contribute to cutting-edge projects that shape the future of critical technology. In this role, you'll work alongside experienced engineers to design, build, and optimize scalable systems for processing multi-modal intelligence data at a remarkable scale and speed. This is a unique opportunity to grow your skills in AI/ML engineering, data orchestration, and system architecture while delivering impactful solutions in a fast-paced environment where agility and reliability are key.
APPLYING YOUR ENGINEERING EXPERTISE TO CHALLENGES IMPACTING NATIONAL SECURITY
As a Software Engineer on PNNL's Software Engineering and Architectures (SEA) team, you'll contribute to mission-critical AI systems, help build enterprise-scale data platforms, and gain hands-on experience with some of the nation's most sophisticated intelligence and security applications. You'll receive mentorship from senior engineers, collaborate with research scientists, and see your code deployed in production systems that serve national security missions.
What You'll Build & Learn:
AI Systems Development – Contribute to agentic AI implementations, work with LLM integrations, learn MLOps best practices on real-world applications
Intelligence Platform Development – Build components for multi-INT fusion systems, develop GEOINT processing tools, contribute to threat analysis platforms
Data Engineering – Develop data pipelines, build analytics workflows, contribute to cloud-native data architectures
Professional Growth – Receive mentorship from senior engineers, participate in code reviews, learn enterprise software development practices
Technical Skills (We'll Help You Grow):
Programming Foundation – Solid Python skills, experience with one additional language (C#/.Net, JavaScript, Java, or similar)
Platform Experience – Basic cloud experience (AWS, Azure, or GCP), familiarity with Linux environments
Development Practices – Understanding of version control (Git), exposure to CI/CD, basic Agile/Scrum experience
Data & Databases – Experience with SQL, basic understanding of data processing concepts
Eagerness to Learn – Interest in AI/ML technologies, geospatial systems, large-scale data processing
What We Offer:
Accelerated Learning – Work with cutting-edge AI and data technologies while receiving guidance from industry experts
Meaningful Impact – See your contributions deployed in systems that protect national security
Career Growth – Clear advancement path with mentorship and technical skill development opportunities
Cutting-Edge Technology – Hands-on experience with the latest AI, cloud, and data engineering technologies
National Interest Project Examples:
Detect and prevent smuggling of drugs and contraband at ports of entry [Link]
Develop large data pipelines to thwart funding for terrorists, nuclear proliferators, drug cartels, and rogue leaders [Link]
Applying big data solutions to national security problems [Link]
Applying image classification for nuclear forensics analysis [Link]
Develop capabilities for scalable geospatial analytics [Link]
Use remotely sensed imagery to identify and monitor the progression of wildfires [Link]
Analyze the resiliency of the electric power grid to prevent large-scale outages [Link]
Optimize building efficiency using IOT and ICS data with automated demand-response markets [Link]
Model climate change and impacts to civilization [Link]
Hunt for the existence of dark matter to understand the nature of the universe [Link]
This position is based in Seattle, WA, and requires an on-site presence Monday through Thursday, with Friday as required by business needs.

Key Responsibilities in this role:
Identify mission challenges and methodically formulate engineering solutions
Demonstrate software engineering excellence and deliver high-quality results at scale
Develop software using a high-level programming language such as Python
Apply strong design principles and innovative problem-solving skills to complex technical challenges
Stay current on advancements in data management and database technologies
Take initiative in setting and pursuing personal goals and professional growth
Mentor and support the development of junior scientists and engineers
Communicate effectively, both verbally and in writing, and collaborate successfully in a team environment
How We Work:
Mission-Critical Development:
Design, develop, test, and deploy software applications that directly contribute to the nation’s mission objectives.
Collaborate with Government and Industry partners to understand mission needs, requirements, and translate them into robust and scalable software solutions.
Innovative Problem Solving:
Tackle complex technical challenges with creativity and innovation.
Propose and implement solutions that go beyond conventional approaches, ensuring the software we deliver is both efficient and impactful.
End-to-End Ownership:
Take ownership of the entire software development lifecycle, from initial concept to deployment.
Ensure the reliability, security, and performance of mission-critical software systems.
Collaborative Teamwork:
Work in an agile environment with multidisciplinary teams, including software engineers, cloud engineers, machine learning engineers, data scientists/domain experts, UX/UI, front-end developers, scrums masters, product owners, and most importantly, USERS.
Participate in code reviews and knowledge-sharing sessions to enhance the overall team skill set.
Continuous Learning:
Stay up-to-date with industry trends and emerging technologies to bring innovative ideas to the table.
Actively participate in training and development opportunities to enhance your skills and contribute to the team's growth.
Have opportunities to attend conferences and engage in industry events.
Technologies We Use:
Programming & Scripting – Python, C#/.Net, Bash, Powershell, JavaScript, typescript, [Java, C/C++]
Platforms – AWS, Azure, On-Prem, [IoT/ICS, GCP]
Development/Lifecycle – Agile/Scrum/Kanban/others, Gitlab/Atlassian, CI/CD & DevSecOps
Paradigms – Rapid prototypes with ad-hoc CSP managed services, well-architected solutions using Infrastructure as Code, compliant deployments in highly-regulated environments
Data & Storage – S3, Athena, PostgreSQL, Elasticsearch/OpenSearch, Dynamo, Redshift, MongoDB, DataBricks, Apache Spark
Data Complexities:
Volume – large, terabytes and petabytes of data
Variety – Images, audio, text, IoT, RF, GPS, edge sensors
Velocity – Sub-second and lower frequency
National Interest Project Examples:
Detect and prevent smuggling of drugs and contraband at ports of entry [Link]
Develop large data pipelines to thwart funding for terrorists, nuclear proliferators, drug cartels, and rogue leaders [Link]
Applying big data solutions to national security problems [Link]
Develop capabilities for scalable geospatial analytics [Link]
Use remotely sensed imagery to identify and monitor the progression of wildfires [Link]
Analyze the resiliency of the electric power grid to prevent large-scale outages [Link]
Optimize building efficiency using IOT and ICS data with automated demand-response markets [Link]
Model climate change and impacts to civilization [Link]
Hunt for the existence of dark matter to understand the nature of the universe [Link]
This position is based in either Richland, WA or Seattle, WA and requires an onsite presence.

The ideal candidate will have demonstrated expertise in developing, deploying, and scaling agentic AI systems—systems characterized by autonomy, adaptability, reasoning and collaboration—and successfully applying them to complex domains like national security, scientific discovery, energy sustainability, or climate analysis.
This position is onsite and requires onsite work in either Richland, WA or Seattle, WA.

Key Responsibilities:
Conduct cutting-edge research to develop new AI techniques and algorithms for RF signal processing, including but not limited to automatic modulation recognition, signal detection and classification, and adaptive communication systems.
Apply machine learning and deep learning methodologies to improve the analysis, interpretation, and management of RF signals in noisy and contested environments.
Collaborate with cross-disciplinary teams to integrate AI technologies into existing and next-generation RF communication systems, ensuring they meet stringent security and reliability standards.
Design, implement, and iterate on novel deep learning methods to directly address National Security problems.
Ability to recreate state-of-the-art architectures, training techniques, and digital signal processing blocks from journal publications with minimal examples and/or author code access.
Stay abreast of advancements in AI, machine learning, and RF technologies to identify opportunities for innovation and improvement in national security applications.
This position is based in either Richland, WA and requires an onsite presence.

Responsibilities Include:
Leading technical direction of high-impact research projects that apply advanced AI methods to solve complex national security challenges across diverse data types and operational contexts.
Leading development of compelling, competitive research proposals that align with evolving priorities.
Developing and adapting AI/machine learning (ML) techniques for varied data modalities, such as text, image, tabular, sensor readings, to support real-world applications in national security.
Building and leading interdisciplinary teams, fostering collaboration across technical domains.
Reproducing, evaluating, and extending state-of-the-art AI methods ensuring scientific rigor and mission relevance.
Maintaining awareness of emerging trends in AI and national security to shape future research directions and identify new opportunities for impact.
This position is based in either Richland, WA and requires an onsite presence.

As a nationally recognized leader in data science, the Chief Data Scientist will set strategic vision, direct complex research programs, and operationalize data-driven solutions to address critical national security challenges. This individual will guide technical direction, mentor interdisciplinary teams, and collaborate closely with software engineers to transition cutting-edge research into field-ready capabilities.
In addition to leading research efforts, the Chief Data Scientist will serve as a key interface with sponsors and stakeholders, leveraging domain expertise to align PNNL’s research agenda with evolving mission requirements. The chosen candidate will have the unique opportunity to drive real-world impact at mission tempo while supporting PNNL's mission to tackle grand challenges through science, technology, and innovation.
The successful candidate will work with talented multi-disciplinary teams on a variety of the nation’s most complex science and engineering problems, delivering advanced AI solutions into critical sponsor missions. At PNNL, we foster a collaborative, adventurous environment focused on lifelong learning and creative problem-solving, while striving for leadership in interdisciplinary, data-driven innovation.
Key Responsibilities:
Strategic Leadership: Define and execute a strategic vision for large, multi-year GEOINT data science programs, advancing geospatial analytics and operational AI capabilities.
Team Leadership: Lead and mentor diverse teams of data scientists and software engineers by guiding analytic methodologies, research approaches, and project implementation. This includes developing advance machine learning (ML) models, designing end-to-end workflows, and performing model testing and validation.
Operational Integration: Collaborate with software engineering teams to integrate analytic capabilities into scalable, field-ready systems.
Program Oversight: Serve as principal investigator on highly visible programs, ensuring technical excellence, meeting research milestones, and fostering sponsor relationships.
Development of New Opportunities: Identify and lead proposal efforts to shape future investments in GEOINT analytics and AI research.
Technical Representation: Represent PNNL leadership in GEOINT research via publishing, presenting in high-impact venues, and participating in national technical exchanges.
Collaboration: Establish and foster partnerships across PNNL, other DOE national laboratories, academia, and industry to accelerate innovation and deliver mission-aligned solutions.
Sponsor Engagement: Engage directly with government sponsors, ensuring research outcomes meet operational objectives and requirements.
Mentorship: Mentor and develop future technical leaders, helping to build a world-class GEOINT data science workforce.
Mission-Specific Responsibilities:
Sponsor Engagement: Serve as the primary interface with sponsors and develop a deep understanding of their mission priorities.
Operational Awareness: Gain insight into sponsors’ operational environments and technical requirements to enable effective deployment of new capabilities.
Strategy Development: Support the ideation and execution of strategic initiatives by identifying key stakeholders and outlining actionable plans.
Role of PNNL: Effectively communicate and leverage PNNL’s role in the larger mission landscape, addressing technical solutions, relevant use cases, and critical challenges.
This position is based in either Richland, WA or Seattle, WA and requires an onsite presence.

Key Responsibilities:
Design and execute machine learning (ML) experiments that follow the current best practices within the community, including large scale experiments on HPC (e.g. experiments that take a week)
Evaluate AI systems across multiple axes, including standard performance metrics but also generalization to out-of-distribution settings
Write high-quality research code
Assess empirical results to guide future research and experimental steps
Conduct work in secure environments and execute work from an operational security perspective
This position is based in either Richland, WA or Seattle, WA and requires an onsite presence.

Key Responsibilities:
Refactor, modularize, and optimize research code for maintainability and scalability
Collaborate with researchers to understand algorithmic intent and with engineers to ensure integration into broader systems
Develop tools and APIs to enable integrations into mission-relevant environments and workflows
Write clear, well-documented code and participate in code reviews
Conduct work in secure environments and execute work from an operational security perspective
This position is based in either Richland, WA or Seattle, WA and requires an onsite presence.

Key Responsibilities:
Lead refactors, modularization, and optimization of research code for maintainability and scalability
Drive collaboration with researchers to understand algorithmic intent and with engineers to ensure integration into broader systems
Mentor junior staff on best practices to integrate research into broader systems
Design and develop tools and APIs to enable integrations into mission-relevant environments and workflows
Write clear, well-documented code and participate in code reviews
Conduct work in secure environments and execute work from an operational security perspective
This position is based in either Richland, WA or Seattle, WA and requires an onsite presence.

Key Responsibilities:
Lead the design and execution of machine learning (ML) experiments that follow the current best practices within the community, including large scale experiments on HPC (e.g. experiments that take a week)
Evaluate AI systems across multiple axes, including standard performance metrics but also generalization to out-of-distribution settings
Lead the development of high-quality research code along with instituting best practices to enable research integration
Assess empirical results to determine and decide future research and experimental steps
Conduct work in secure environments and execute work from an operational security perspective
This position is based in either Richland, WA or Seattle, WA and requires an onsite presence.

Ideal candidates will have a strong background in one or more of the following areas:
Quantum error correction, detection, and mitigation.
Deployment of quantum algorithms and protocols on near-term hardware.
Machine learning for quantum computing and simulation of quantum systems.
Emerging quantum technologies such as neutral atom systems and erasure qubits.
Benchmarking, verification, and validation toolkit development for quantum computing.
Quantum resource estimation for NISQ and fault-tolerant quantum computing (FTQC).
Quantum algorithm development, optimization, and circuit transpilation for NISQ and FTQC
Responsibilities will include:
Conduct innovative research to advance the state of the art in one or more target areas, working in close collaboration with staff scientists.
Mentor and collaborate with interns at PNNL, providing both scientific guidance and academic support.
Publish research findings in leading computer architecture and physics venues such as ISCA, ASPLOS, MICRO, HPCA, PRX Quantum, Quantum, and npj Quantum Information.
Present research progress and results during weekly team meetings and other technical forums.
Collaborate with interns, postdocs, and staff scientists to help expand and strengthen PNNL’s quantum ecosystem and research capabilities.
The assignment duration is for two years

We are making a bold leap into the future of artificial intelligence with a $45 million
investment in an NVIDIA DGX SuperPOD. This investment underscores our commitment
to all Texas A&M System members’ faculty and staff providing cutting-edge research
and super computing needs. As a Senior High Performance Computing Engineer (HPC),
you will provide technical expertise and consultation for the design and deployment of
HPC systems. Get in on the ground floor with a team that is shaping the next
generation of innovation.
This position is security sensitive requiring U.S. Citizenship.
Opportunities to Contribute
• Manage large-scale HPC cluster operations, including OS upgrades, firmware
patching, and performance tuning.
• Oversee networking, security, and infrastructure for HPC systems.
• Lead the development of specialized HPC computing clouds and scalable storage
systems.
• Collaborate with stakeholders to develop service-based solutions.
• Serve as a strategic technical resource across departments.
• Lead enterprise-wide HPC projects using established project management
protocols.
• Mentor junior system administrators and enforce performance standards.
What you need to know
Salary: $125-136K
Location: In-person role in College Station, Texas
Schedule: This role may require working outside of standard office hours, including
evenings, weekends, and holidays, to support the demands of technology services and
ensure the seamless operation of essential systems.
Citizenship: Must be a United States citizen, permanent resident, or a person granted
asylum or refugee status in accordance with 15 CFR, Part 762; 22 CFR §§122.5, 123.22
and 123.26; and 31 CFR § 501.601

Responsibilities:
Contribute and research a variety of topics with close collaboration with staff scientists
Present research progress and work in weekly team meetings, communicating progress and results
Engage with interns, post-docs and staff scientists in helping develop the quantum ecosystem capabilities
The internship is three months long, subject to extension based on performance and project needs. Interns can engage in cutting-edge research and contribute to impactful projects alongside our world-class team. This position will collaborate with a team in New York City.

Responsibilities:
Contribute and research a variety of topics with close collaboration with staff scientists
Present research progress and work in weekly team meetings, communicating progress and results
Engage with interns, post-docs and staff scientists in helping develop the quantum ecosystem capabilities
The internship is three months long, subject to extension based on performance and project needs. Interns can engage in cutting-edge research and contribute to impactful projects alongside our world-class team. This position will collaborate with a team in New York City.

Division: NE-NERSC
Lawrence Berkeley National Lab’s (LBNL, https://www.lbl.gov/) NERSC Division has an opening for a Dev Ops Engineer to join the team.
In this exciting role, you will serve as a DevOps-oriented System Administrator/Software Engineer (Computer Systems Engineer 3/4) at the National Energy Research Scientific Computing Center (NERSC) to help architect, deploy, configure, and operate large scale, leading-edge high-performance computing (HPC) systems. You will work collaboratively to develop and operate large-scale compute and storage platforms to support NERSC's mission of accelerating scientific discovery through high-performance computing and data analysis. Working with teams at NERSC, other national laboratories, HPC vendors and open-source communities you will develop innovative solutions that enable science as well as improve the state of HPC practice on an international stage. Your focus will be to improve and operate NERSC's largest HPC resources, Perlmutter and Doudna, and to work with the rest of the HPC community to develop and maintain world class system software.
The selected candidate(s) will be hired at the Computer Systems Engineer 3 or 4 (CSE3 or CSE4) depending on their level of skills and experience.
What You Will Do if hired at a Level 3:
• Participate in team-oriented agile development and management process for HPC systems using languages like Go, Rust and Python
• Develop and maintain APIs to securely expose system functionality to end users
• Automate common tasks and processes to continuously improve HPC systems management
• Analyze system issues and develop solutions to improve end-user experience
• Be part of a team that installs, tests, maintains and manages HPC systems
• Assist with technology evaluation of systems and system architecture
• Work with vendors to prioritize, develop and enhance their technologies in order to better meet the needs of our users
• Be part of team providing on-call rotation for 24x7 HPC system support
• Work on and resolve complex issues where analysis of situations or data requires an in-depth evaluation of variable factors.
• Exercise judgment in selecting methods, techniques and evaluation criteria for obtaining results.
• Determine methods and procedures on new assignments and may coordinate activities of other personnel.
• Network with key contacts outside own area of expertise.
In Additional Responsibilities if hired at a Level 4:
• Provide leadership and technical guidance to group members, and members of other groups at NERSC.
• Recommend and lead implementation and deployment efforts for system improvements that enhance reliability, stability, usability, performance and security.
• Identify and evaluate emerging HPC technologies and explore new features that would create new capabilities and enhance system performance and usability.
• Participate in working/user/advocacy groups and represent NERSC and its interests to the broader HPC community.
• Work at a higher level of independence while carrying out work assignment.
• Work on and solve significant and issues where analysis of situations or data requires an in-depth evaluation of variable factors.
What is Required to be hired at a Level 3:
• Typically requires a minimum of 8 years of related experience with a Bachelor’s degree; or 6 years and a Master’s degree; or equivalent experience.
• Minimum of 2 years of experience with systems programming in Linux environment or management of large-scale Linux-based systems in a high-performance computing, cloud computing, or hyper-scale environment.
• Experience with C, bourne shell, and Python3 programming languages.
Additional Requirements to be hired at a Level 4
• Typically requires a minimum of 12 years of related experience with a Bachelor’s degree; or 8 years and a Master’s degree; or equivalent experience.
• Demonstrated excellent systems programming skills and strong knowledge of Linux internals.
• Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active and respectful intellectual environment.
• Excellent oral and written communication skills.
• Ability to resolve complex issues in creative and effective ways and derive technical solutions in a collaborative environment to meet end user requirements or needs.
• Ability to network and collaborate with key contacts outside own area of expertise.
• Ability to work on and resolve significant and unique issues where analysis of situations or data requires an evaluation of intangibles.
• Ability to exercise independent judgment in methods, techniques and evaluation criteria for obtaining results.
Desired Qualifications:
• Development of kubernetes microservices using technologies like helm or loftsman for deployment.
• Operations of kubernetes, etcd.
• Infrastructure as code solutions like argo, terraform, ansible, puppet, salt.
• Rust or Go programming language.
• Gitlab or Github Continuous Integration and Project Management.
• Agile process, scrum.
• Linux kernel interfaces, cgroups, ebpf.
• Installation, configuration, monitoring, and tuning of workload management systems such as Slurm, PBSPro, or GridEngine.
• Monitoring solutions such grafana, prometheus, ldms.
• HPC systems administration.
• HPC applications analysis, MPI.
• Specialized networking (Infiniband, Slingshot or other high-speed networks).
• Lustre, SpectrumScale (GPFS) or other parallel file systems.
Notes:
• This is a full-time career appointment, exempt (monthly paid) from overtime pay.
• This position will involve access to hardware, commodities, and technical information subject to export control regulations including, but not limited to, the Export Administration Regulations ("EAR") and/or International Traffic in Arms Regulations ("ITAR"). Accordingly, any hiring decision may depend in part on Berkeley Lab’s ability to obtain or rely on federal government authorizations as required, if you are not a U.S. citizen, lawful permanent resident of the U.S. (“green card holder”), asylee, refugee, or other qualifying protected individual as defined by 8 U.S.C. 1324b(a)(3).
• This position will be hired at a level commensurate with the business needs and the skills, knowledge, and abilities of the successful candidate.
• Level 3: The full salary range of this position is between $129,948.00 - $219,276.00 per year and is expected to pay between a targeted range of $146,184.00 - $178,668.00 per year depending upon candidates' full skills, knowledge, and abilities, including education, certifications, and years of experience.
• Level 4: The full salary range of this position is between $147,984.00 - $249,732.00 per year and is expected to pay between a targeted range of $166,476.00 - $203,484.00 per year depending upon candidates' full skills, knowledge, and abilities, including education, certifications, and years of experience.
• This position is subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
• This position requires substantial on-site presence, but is eligible for a flexible work mode, and hybrid schedules may be considered. Hybrid work is a combination of performing work on-site at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA and some telework. Individuals working a hybrid schedule must reside within 150 miles of Berkeley Lab. Work schedules are dependent on business needs. In rare cases, full-time telework or remote work modes may be considered.
Want to learn more about working at Berkeley Lab? Please visit: careers.lbl.gov
How To Apply
Apply directly online at http://50.73.55.13/counter.php?id=311989 and follow the on-line instructions to complete the application process.
Equal Employment Opportunity Employer: The foundation of Berkeley Lab is our Stewardship Values: Team Science, Service, Trust, Innovation, and Respect; and we strive to build community with these shared values and commitments. Berkeley Lab is an Equal Opportunity Employer. We heartily welcome applications from all who could contribute to the Lab's mission of leading scientific discovery, excellence, and professionalism. In support of our rich global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories under State and Federal law.
Berkeley Lab is a University of California employer. It is the policy of the University of California to undertake affirmative action and anti-discrimination efforts, consistent with its obligations as a Federal and State contractor.
Misconduct Disclosure Requirement: As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.

Responsibilities:
Performing reliability/characterization tests
Setting up the tests and writing the test automation script
Perform test data analysis and presenting it to key stakeholders

We are hiring an Intern for our Photonics Design team at Lightmatter! We are a photonic computer company redefining what computers and human beings are capable of by building the engines that will power discoveries and drive progress in a sustainable way. With modern human progress relying heavily on computers, the world has hit a dead end with traditional transistors, and the prospect of constantly building data centers is an environmental nightmare. Lightmatter has created a solution in photonic computing: using photons instead of electrons to take advantage of their higher bandwidth. We build chips for artificial intelligence computing. Our architecture leverages the unique properties of light to enable fast and efficient inference and training engines.
Responsibilities
Design high-performance passive and active silicon photonics devices with commercial softwares such as Lumerical, Tidy3D, Cadence
Develop automated inverse design of silicon photonic components.
Assess fabrication variability and operational performance of Si photonic components
Develop and implement testing and analysis methodology for new and existing silicon photonics devices, including design-for-test, design-for-manufacturing, and design-of-experiments.
Design and execute model creation for silicon photonics chips and subsystems based on measurement data and close the loop with design and simulations.

Responsibilities
Participate in electro-optic testing of photonics chips/devices, leveraging hands-on laboratory experience and working with various optical instruments.
Conduct tests and characterization of silicon photonics components, including micro-rings, optical phase shifters, and Mach-Zehnder interferometers.
Operate a range of optical equipment such as lasers, electro-optic modulators, semiconductor optical amplifiers, PIN receivers, optical spectrum analyzers, optical vector analyzers, and optical filters.
Utilize semiconductor test equipment, including vector network analyzers, spectrum analyzers, source-meter units, arbitrary waveform generators, and high-speed oscilloscopes.
Design and execute testing protocols for silicon photonics chips and subsystems.
Develop and implement automated testing and system interfacing through Python scripting.

Responsibilities:
Work on wafer probers to help test data collections from photonic devices
Develop software to automate photonic test system and develop test methods for photonic device testing
Data analysis and develop data analysis algorithms
Work with partner teams to assist with jobs related to reliability test and PIC design validation test

Key responsibilities:
Developing and maintaining SKILL scripts
Integrating Cadence with other tools to automate simulation setup and result analysis
Document the function and process to use the scripts.
Interacting with Package team members such package design and layout engineers

Lightmatter is seeking a PhD Intern to join our Machine Learning team. This internship offers a unique opportunity to contribute to the invention of novel hardware systems utilizing Lightmatter's cutting-edge technology and to model their performance on both training and inference workloads.
This is a highly creative and impactful role where you'll be encouraged to think outside the box, pushing the boundaries of ML systems research by leveraging the power of silicon photonics. You'll work closely with our product, solutions architecture, and engineering teams, contributing to:
Performance analysis of Lightmatter-enabled systems.
The design of high-performance, rack-scale platforms.
The generation of cutting-edge ML systems research.
Responsibilities
Research and stay current on the latest advancements in Machine Learning systems and models, particularly Large Language Models (LLMs).
Develop and utilize tools to model the performance of Lightmatter-enabled systems running various training and inference workloads.
Collaborate on the design of compelling reference platforms systems powered by Lightmatter technology.
Work towards publishing academic papers at top industry conferences or journals, and assist in creating whitepapers, blog posts, and other technical collateral.

We are hiring an Intern for our Physical Design team at Lightmatter! We are a photonic computer company redefining what computers and human beings are capable of by building the engines that will power discoveries and drive progress in a sustainable way. With modern human progress relying heavily on computers, the world has hit a dead end with traditional transistors, and the prospect of constantly building data centers is an environmental nightmare. Lightmatter has created a solution in photonic computing: using photons instead of electrons to take advantage of their higher bandwidth. We build chips for artificial intelligence computing. Our architecture leverages the unique properties of light to enable fast and efficient inference and training engines.
Responsibilities
Deliver block level physical designs for active project
Produce physical design flow enhancements
Explore new physical design capabilities
Run experiments on block level designs and gather performance data
Learn the Physical Design Flow at Lightmatter
Participate in Physical Design activities directly related to our next generation products
In conjunction with the Physical Design team - choose an aspect of the PD flow to improve or explore. This could be a new flow capability or feature, or a new method of processing or visualizing existing data

As a full-time Systems Engineer, you’ll work on projects that impact our live trading and research and allow them to run worldwide 24/7. A typical day could include scaling and keeping our HPC environment running smoothly or designing and building services to improve HRT’s research environment, but responsibilities are broad! Your day could also consist of hardware and software installation and configuration, remote administration, monitoring, new hardware and software testing, automation and tool development, and performance tuning.
Responsibilities
- Plan/execute technical projects to make our Linux trading systems (hardware, network, OS, etc)
- faster, adaptable, and more maintainable
- Manage and optimize large scale distributed compute/HPC clusters
- Automate/troubleshoot a broad range of technical infrastructure
- Troubleshoot hardware and software issues
- Operating system installation and upgrading
- Performance testing and tuning
- Scripting to automate repetitive tasks
- Working with vendors to resolve issues

We are looking for a Network Engineer with deep experience in designing, deploying, and scaling networks for large-scale HPC environments. You will be building the next generation of infrastructure required by a premier algorithmic trading firm operating at the forefront of technology and finance.
This position requires deep technical skills, along with the product leadership and vision needed for the future of cloud-scale networking. You’ll have broad responsibilities and freedom to analyze and solve problems, and your solutions will have an immediate impact on our operations worldwide. We’re looking for someone who loves technology and is excited about tackling big picture problems. You will work on a wide variety of projects to enhance the performance and capacity of the network and contribute to both the management and the technical nitty-gritty of our computing architecture. Our people and our compute capabilities are our two most important differentiators, and that is where we are investing the most.
Responsibilities
- Network design, product selection, routing, configuration and troubleshooting
- Closely work with our HPC Systems team, as well as other teams to scale our research environment
- Meet with external parties for project related items as well as future endeavors
- Overall capacity planning for the data center networks
- Develop scripts and processes to increase efficiency
- Collaborate with others in the network team to solve cross discipline problems
- Provide detailed documentation
- Mentor others as a senior resource

FPGAs and ASICs are critical pieces of our technology stack. We are looking for talented hardware developers to architect and design complex systems on a highly collaborative global team. In this role, you'll identify efficient ways to perform on-the-fly transformations of market data and implement models with complex data structures in RTL. Deep knowledge of SystemVerilog, FPGA internals and/or ASIC primitives, computer architecture, and vendor tool suites are essential to succeeding in this role. Expertise in networking protocols, CPU design, and/or machine learning accelerators is a big plus. No financial experience is necessary.
Responsibilities
- Collaborate with a cross-functional team to develop and deploy custom FPGA and/or ASIC solutions for a wide range of trading applications
- Investigate new technologies and tools
- Contribute to a nimble hardware development tech stack

These high performance designs require even higher performance verification. We are looking for experienced Design Verification (DV) engineers who are skilled at writing testbenches and building verification environments to exercise complex HDL. Our ideal candidate is not only an ace tester, but a practicing toolsmith. You know the EDA landscape and want to be part of a team actively working to rethink, redesign, and surpass the status quo. For example, members of our team are active maintainers of popular open source projects such as Slang, Verilator, and Cocotb.
FPGA and ASIC verification is part of an innovative, growing team at HRT which is integral to the success of our trading. You can expect to always be challenged by the ever-changing financial markets as you work to ensure correctness and robustness of our critical hardware in an extremely fast-paced, real-time environment. No financial experience is necessary.
Responsibilities
- Creating testbenches and tests for our hardware platform, leveraging a hybrid open-
source/proprietary, highly flexible environment
- Writing detailed verification plans
- Quickly root-cause RTL bugs
- Collaborating directly with designers for rapid bringup of new projects and debugging of existing designs
-Managing test suites and continuous integration infrastructure
- Developing and improving open-source and internal tools

Job responsibilities
- Executes creative software solutions, design, development, and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems
- Develops secure high-quality production code, and reviews and debugs code written by others
- Identifies opportunities to eliminate or automate remediation of recurring issues to improve overall operational stability of software applications and systems
- Leads evaluation sessions with external vendors, startups, and internal teams to drive outcomes-oriented probing of architectural designs, technical credentials, and applicability for use within existing systems and information architecture
- Leads communities of practice across Software Engineering to drive awareness and use of new and leading-edge technologies
- Adds to team culture of diversity, opportunity, inclusion, and respect

Job responsibilities
- Creates complex and scalable coding frameworks using appropriate software design frameworks
- Develops secure and high-quality production code, and reviews and debugs code written by others
- Advises cross-functional teams on technological matters within your domain of expertise
- Serves as the function’s go-to subject matter expert
- Contributes to the development of technical methods in specialized fields in line with the latest product development methodologies
- Creates durable, reusable software frameworks that are leveraged across teams and functions
- Influences leaders and senior stakeholders across business, product, and technology teams
- Champions the firm’s culture of diversity, opportunity, inclusion, and respect
- Using telemetry, create measurable frameworks for deciding amongst hardware and software options
- Publish and support re-usable patterns to optimize training and inference of ML models on various architectures
- Support developer community in learning lessons from high-performance computing (HPC) domain

What you could be doing as a Graduate Electrical Engineer
• Help design, review, and bring up complex server and rack-level boards, including high-speed signaling and power delivery.
• Support component selection, SI/PI analysis, and system-level integration to ensure quality and reliability.
• Collaborate with mechanical, thermal, and manufacturing teams to achieve optimized, scalable, and testable designs.
• Perform lab validation and debug using oscilloscopes, logic analyzers, and other electrical test tools.
• Document and share design updates, validation results, and process improvements.

The 2026 summer internships are 11 weeks in duration. Applicants must be available to work 40 hours per week from May 18, 2026 through July 31, 2026 for participation in the program. NCAR has 3 unpaid holidays for interns during the summer internship (Memorial Day, Juneteenth, Independence Day).
Program requirements, beyond working on projects, include attending appropriate technical seminars, attending skills-enhancing workshops, preparing a poster, and giving an oral presentation of results at the end of the summer.
For Summer 2026, we are planning for an in-person internship program where interns come on-site to our Boulder, Colorado campuses and meetings may be held virtually over Zoom or Google Meet.

NCAR’s Computational and Information Systems Laboratory (CISL) is a leader in supercomputing and data services necessary for the advancement of atmospheric and geospace science. CISL’s mission is to remain a leader at the forefront of ensuring that research universities, NSF NCAR, and the larger atmospheric, oceanographic, and related research communities have access to the computational resources they need for their research. To fulfill the need for a stronger workforce at the intersection of High Performance Computing (HPC) and geoscience problems, CISL engages in education and outreach activities to inspire and attract a skilled future workforce.
The project qualifications describe the ideal skill set we look for in candidates. We encourage you to apply even if you do not possess all of the listed qualifications.

This individual is responsible for overseeing the daily operations of the IT Help Desk and Multimedia Services team, ensuring exceptional user support, efficient service delivery, and the effective use
of multimedia technologies.
As an Assistant Director of EIT, this role also contributes significantly to the broader strategic planning, project management, and operational efficiency of the entire Enterprise IT department, acting as a key advisor to the CIO and serving on the IT Senior Leadership Team.
-----------------------------
Help Desk Management (50%)
● Service Delivery & Operations:
Lead, manage, and mentor a team of Help Desk technicians, fostering a culture of customer service excellence, continuous improvement, and problem-solving.
Oversee the day-to-day operations of the IT Help Desk, ensuring timely and effective resolution of technical issues for all staff.
Develop, implement, and refine Help Desk policies, procedures, and best practices to optimize efficiency and user satisfaction.
Manage the ticketing system (e.g., ServiceNow, Jira Service Management) to ensure proper prioritization, escalation, tracking, and reporting of all incidents and service requests.
Provide ticket triage support to all of UCAR, UCP, and NSF NCAR, while directly providing ticket resolution support to a subset of those groups.
Partner closely with NSF NCAR Research IT to effectively hand off tickets for last-mile ticket resolution within NSF NCAR.
● Performance & Quality Assurance:
Establish and monitor key performance indicators (KPIs) and service level agreements (SLAs) for Help Desk operations, driving continuous improvement in response times, resolution rates, and customer satisfaction.
Conduct regular quality assurance reviews of Help Desk interactions and resolutions, providing constructive feedback and training to staff.
Implement user feedback mechanisms to continuously assess and improve service quality.
● Knowledge Management:
Develop and maintain a comprehensive knowledge base for common IT issues and solutions, empowering users with self-service options and improving technician efficiency.
Ensure all documentation for IT systems and processes relevant to end-users is current and accessible.
Multimedia Services Management (25%)
At this time, the Enterprise Multimedia Team has a Lead with subject matter expertise in multimedia services, audiovisual technologies, and event, conference, and meeting support. This Lead reports to the Help Desk and Multimedia Manager, and as a result the HDMM Manager will require less direct expertise in multimedia delivery. The HDMM Manager is nonetheless accountable for multimedia service delivery including:
● Operations & Support:
Oversee the delivery of multimedia services, including audio-visual support for meetings, conferences, and events (both in-person and virtual).
Manage the setup, operation, and maintenance of conference room technology, presentation systems, video conferencing solutions, and digital signage.
Provide expert guidance and support for live streaming, recording, and editing of multimedia content.
Work closely with administrators and IT staff in UCAR, UCP, and NSF NCAR, to provide counseling, advice, and design support for rooms outside the purview of Enterprise Multimedia Services.
● Technology Management:
Evaluate, recommend, and implement new multimedia technologies and solutions to enhance collaboration and communication capabilities.
Ensure the reliability, security, and performance of all multimedia systems and equipment.
Manage inventory and lifecycle of multimedia hardware and software.
● Training & Consultation:
Train users on multimedia equipment and best practices for effective presentations and virtual meetings.
Consult with departments and project teams on multimedia requirements for various initiatives.
Assistant Director of Enterprise IT (25%)
● Strategic Planning & Development:
Collaborate with the CIO and the IT Senior Leadership Team on strategic planning, goal setting, and the development of long-term technology roadmaps.
Identify opportunities for technology innovation and process improvement across the EIT department.
Assist in the development and management of the EIT budget, ensuring efficient allocation of resources.
● Project Leadership & Management:
Lead or co-lead key cross-functional IT projects from inception to completion, ensuring adherence to timelines, budgets, and scope.
Represent EIT in various organizational committees and working groups.
Coordinate efforts between different EIT teams (e.g., Network, Systems, Applications) to ensure integrated and effective service delivery.
● Operational Oversight & Improvement:
Support the CIO in ensuring the overall operational efficiency, reliability, and security of enterprise IT systems and services.
Contribute to disaster recovery and business continuity planning for IT services.
● Vendor Management & Procurement:
Assist in managing relationships with IT vendors and service providers, evaluating contracts, and overseeing service delivery.
Participate in the procurement process for new IT hardware, software, and services.
Other Desired/Required Skills/Experience
Understands how to effectively communicate the importance of customer service to the team, and how to ensure that customers receive a good experience.
Understands the relationship between organizational activities and the IT capabilities that can help support those activities. Communicates that relationship effectively to both organizational leadership as well as IT leadership.
Independently evaluates software, hardware, and cloud solutions for complex problems. Recommends and/or supports the recommendation of full solutions to the Enterprise IT leadership team and organizational leadership teams including systems to be purchased, strategies to follow, personnel required, and plans for reaching goals. Presents proposed solutions to appropriate governance bodies. Builds support where necessary. Manages projects through to completion.
Determines best application of resources to achieve results based on organizational and Enterprise IT strategies and priorities.
Evaluates strategies and system deployments.
Solves complex problems common to heterogeneous computing environments including mobile devices, customers, cloud services, and virtual environments.
Recommends best courses of action to maintain productive, secure, computing environments that enable the organizational mission to take place effectively. Assumes responsibility for the integrity of design decisions, taking into account economic, political, cultural, and technical considerations.

Eligible Majors
• Electrical Engineering
• Computer Engineering
• Computer Science
• Data Science / Data Analytics
• Other related Science or Engineering Discipline
________________________________________
Key Responsibilities
Interns may be assigned one or more of the following responsibilities based on project needs and individual strengths:
• Validation & Debugging: Support validation of engineering workflows and assist in debugging software or hardware systems.
• AI Tools & Workflows: Contribute to the evaluation and implementation of AI-assisted workflow improvements.
• Process Improvements: Identify and recommend enhancements to engineering processes and team efficiency.
• Automation for CAD/Design: Develop scripts or tools to automate CAD and design-related tasks.
• Technical Documentation: Create and maintain clear, concise documentation for tools, processes, and project deliverables.
• Data Analysis: Analyze engineering and operational data to support decision-making and performance optimization.

Eligible Majors
• Electrical Engineering
• Computer Engineering
• Computer Science
• Data Science / Data Analytics
• Mechanical Engineering
• Other related Science or Engineering Discipline
Solidigm is seeking passionate and driven graduate-level interns to join our Engineering teams. This internship offers a unique opportunity to work on impactful projects across multiple domains including firmware development, technology pathfinding, hardware validation, strategic planning, reliability engineering, and AI-driven innovation.
Key Focus Areas:
Interns will be matched to projects based on their skills and interests across the following tracks:
🔧 Firmware Engineering
• Contribute to firmware development and validation for next-gen SSD products.
• Work on code coverage improvements, debugging, and tool enhancements.
• Collaborate in agile teams to groom and execute backlog tasks.
🧪 Technology Development & Validation
• Support system-level workload studies and performance benchmarking.
• Develop and execute test plans for media policy and backend test flows.
• Assist in thermal mechanical validation and lab automation.
🛠️ Hardware Engineering
• Participate in high-speed I/O characterization and NAND testing.
• Build and enhance test programs using advanced equipment (e.g., T5835).
• Support immersion cooling capability development and lab setup.
📊 Strategy & Data Analytics
• Build AI dashboards and visualizations using PowerBI and Snowflake.
• Analyze customer/product data and competitive insights using ML/AI techniques.
• Collaborate cross-functionally with Global Ops, IT, and Strategy teams.
🤖 AI & Automation
• Develop AI-powered tools to streamline workflows and improve code quality.
• Explore vector indexing and inference benchmarking using GPU/NVME platforms.
• Create intelligent dashboards for predictive metrics and decision-making.

This is a great opportunity to get in at a very early stage.

This is a Software Engineering role, and core responsibilities include coding and applied engineering work. You will be expected to write high-quality, efficient, and maintainable code, and contribute to the development of innovative software solutions. In this role, you will have the opportunity to work on cutting-edge projects, collaborate with cross-functional teams, and contribute to the development of innovative solutions that push the boundaries of scalable computing. We offer a dynamic and innovative work environment that encourages creativity and experimentation, as well as opportunities for professional growth and development. If you are passionate about driving innovation and advancing the way people connect, we encourage you to apply for this exciting opportunity.

Lead multi-disciplinary teams to develop solutions for large scale training systems. Assess trade-offs of various solutions and make pragmatic decisions
Ensure timely milestone delivery with teamwork and close collaboration
Responsible for the overall performance of the communication system, including performance benchmarking, monitoring and troubleshooting production issues
Defining technical vision and driving a multi-year roadmap to make progress towards the related objectives
Work with cross functional teams and provide guidance on the AI network architecture including topologies, transport, congestion control techniques

Manage engineers who are responsible for design, model, develop, test, deploy and operate AI/HPC Networks at scale
Provide continual feedback that is actionable, coach in career development and conduct performance reviews
Help define and drive a technical roadmap to meet organizational objectives
Be a hands-on manager with technical experience in networking, systems, hardware and software
Operate in a rapidly evolving environment, adapt quickly to new information, prioritize as needed
Proactively identify resource needs, participate in recruiting efforts, and hire to grow the team

At the high level, the team aims to enable Meta-wide ML products and innovations to leverage our large-scale GPU training and inference fleet through an observable, reliable and high-performance distributed AI/GPU communication stack. Currently, one of the team’s focus is on building customized features, SW benchmarks, performance tuners and SW stacks around NCCL and PyTorch to improve the full-stack distributed ML reliability and performance (e.g. Large-Scale GenAI/LLM training) from the trainer down to the inter-GPU and network communication layer. And we are seeking for engineers to work on the space of GenAI/LLM scaling reliability and performance.
Software Engineer, SystemML - Scaling / Performance Responsibilities
Enabling reliable and highly scalable distributed ML training on Meta's large-scale GPU training infra with a focus on GenAI/LLM scaling

At Groq, we are building a custom cloud from the ground up - one data center at a time. Our Compute Storage team owns the systems that turn racks of bare metal into production-ready Kubernetes clusters powering the next generation of AI workloads.
We are looking for a Sr. Staff Linux Systems Engineer to help us scale this effort. This role focuses on creating a reliable, performant and secure foundation for the Groq Cloud. You will work with your infrastructure peers to enable and optimize compute nodes and storage clusters that form Groq Cloud. We're looking for someone passionate about infrastructure who enjoys debugging close to the metal. If you're eager to grow your skills in deploying, scaling, and optimizing bare metal to support complex distributed HPC in the expanding inference market – we would love to talk.
Responsibilities & opportunities in this role:
Kernel and OS level enablement and optimization for compute nodes (GPU, LPU) and storage clusters.
Work with infrastructure peers to define optimal health standards for all production servers, including certified OS, Kernel, BIOS/FW versions.
Strengthen security posture through improving system level CVE response
Debug and resolve systems level performance and reliability issues in the fleet.
Work with vendors to debug and resolve BIOS/FW issues.
Support design and deployment of large GPU clusters.
Lead cross-functional collaboration with data center operations, networking, and platform teams to ensure infrastructure is fully integrated and production-ready.
Follow best practices and standards for infrastructure-as-code and configuration management using Git, Flux, Terraform, and related tools.
Set technical direction and maintain high-quality system documentation, operational runbooks, and internal tooling that improve the resilience, repeatability, and observability of the infrastructure stack.
Ideal candidates have/are:
Experience with Linux OS management in large virtualized environments.
Deep Kernel knowledge with experience working with the upstream community to resolve bugs.
Experience deploying large GPU clusters with network fabric.
Familiarity with infrastructure-as-code and Git-based workflows (e.g., Terraform, Flux, Kustomize).
Ability to write and maintain basic tooling in Go, Python, or Bash.
Understanding of networking fundamentals (IPAM, VLANs, DHCP, DNS).
Working knowledge of storage concepts (block vs object, NFS, RAID, etc.).
Strong sense of ownership and a willingness to dive into hardware, firmware, or low-level provisioning issues.
Nice to Have:
Exposure to Talos Linux.
Experience with maintaining a production Kubernetes environment.
Hardware SKU definition and lifecycle management.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

Ensure the reliability, scalability, and performance of Groq’s observability tools and services for provisioning and managing the full lifecycle of Groq hardware, software, and networking systems at massive scale.
Responsibilities & opportunities in this role:
Build and maintain comprehensive observability systems at massive scale. Obsess about running high quality production systems with excellent uptime that engineers can trust.
Constantly iterate on, maintain, update, automate, and dogfood your own systems. Put in place great monitoring of your own systems that can be used as best practices by the rest of the organization.
Instrument Kubernetes clusters, applications, and datacenter infrastructure components such as switches, PDUs, environmental sensors, cameras, chillers, etc.
Familiarity with and strong opinions on signals:
Effective canonical logging and cost control
Tracing expertise including context propagation, tail sampling strategies, attribute enrichment, querying
Metrics derived from a variety of systems such as hosts, kube-state-metrics, kubelet, IPMI, SNMP
Be a teacher: advise teams on instrumenting their applications in a variety of languages (Rust, C++, TypeScript, GoLang), implementing sensible SLO and alerting strategies, as well as on-call best practices.
Be a student: Groq is uniquely vertically integrated. You will be challenged with tasks in unfamiliar domains, and constantly expand your knowledge of technologies ranging from networking to FPGA design.
Ideal Candidates have/are:
4+ years of experience in observability as a core responsibility of previous roles
Deep understanding of cloud-native technologies and infrastructure as a service (IaaS) such as Terraform and Flux
Have instrumented large Kubernetes clusters and built operators
Expertise in standing up and running monitoring, observability, and alerting systems — OpenTelemetry Tracing and Collector, Grafana/Prometheus, PagerDuty, AlertManager, IPMI, SNMP, etc.
Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation
Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams
Attributes of a Groqster:
Humility – Egos are checked at the door
Collaborative & Team Savvy – We make up the smartest person in the room, together
Growth & Giver Mindset – Learn it all versus know it all, we share knowledge generously
Curious & Innovative – Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness – No-limit thinking, fueling informed risk taking

At Groq, we are building a custom cloud from the ground up - one data center at a time. Our Compute Storage team owns the systems that turn racks of bare metal into production-ready Kubernetes clusters powering the next generation of AI workloads.
We are looking for a Staff Infrastructure Engineer to help us scale this effort. This is a hands-on role focused on fully automating deployment and lifecycle management of the Groq Cloud server fleet. You will work closely with DC, network and platform teams to define and develop tools and automation that enable seamless deployment and management of Groq compute nodes and storage clusters. We're looking for someone passionate about infrastructure who enjoys debugging close to the metal. If you're eager to grow your skills in deploying, scaling, and optimizing bare metal to support complex distributed HPC in the expanding inference market – we would love to talk.
Responsibilities & Opportunities in this Role:
Develop robust, scalable automation solutions (Go, Python, Bash) to streamline and standardize deployment workflows across global data center environments.
Be part of large cross-functional collaboration with data center operations, networking, and platform teams, ensuring infrastructure is fully integrated and production-ready.
Develop automation to ensure all production machines and clusters consistently meet optimal health standards in a timely manner.
Define best practices and standards for infrastructure-as-code and configuration management using Git, Flux, Terraform, and related tools.
Set technical direction and maintain high-quality system documentation, operational runbooks, and internal tooling that improve the resilience, repeatability, and observability of the infrastructure stack.
Ideal candidates have/are:
Experience with deploying and supporting Linux / Kubernetes systems at scale.
Familiarity with infrastructure-as-code and Git-based workflows (e.g., Terraform, Flux, Kustomize).
Ability to write and maintain basic tooling in common modern languages such as Go and Python.
Understanding of networking fundamentals (IPAM, VLANs, DHCP, DNS).
Working knowledge of storage concepts (block vs object, NFS, RAID, etc.).
Strong sense of ownership and a willingness to work through ambiguity.
Nice to Have:
Experience provisioning physical machines in a data center environment.
Exposure to Talos Linux, Kubernetes bootstrapping, or Kubernetes platform engineering.
Previous collaboration with facilities, hardware, or network teams in an operational role.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

Your mission is to deliver practical, high-impact solutions that enable Groq engineers to move fast and focus on innovation. You will design and implement systems that simplify complexity, remove bottlenecks, and accelerate progress across the organization. By emphasizing speed, reliability, and automation, you’ll help teams get things done quickly while maintaining the highest standards of quality and efficiency.
Responsibilities & opportunities in this role:
Accelerate build and release systems to reduce wait times, increase reliability, and integrate seamlessly with tools like GitHub, Buildkite, and Nix.
Design and optimize CI/CD pipelines that deliver rapid, reliable feedback and enable frequent, high-quality releases across languages such as Haskell, C++, Rust, Go, and Python.
Automate quality and security practices including linting, static analysis, dependency management, and secure delivery of software.
Develop and support hardware-integrated test environments for Groq silicon, ensuring production-like reliability, scalability, and usability.
Drive automation and self-service across the engineering lifecycle with modern infrastructure and tooling.
Champion best practices and culture around speed, reliability, and continuous improvement in software delivery.
Ideal candidates have/are:
Strong background in software engineering with focus on infrastructure, build systems, and automation.
Proven experience designing and implementing CI/CD pipelines, build systems, and other engineering infrastructure.
Proficiency in Go and Python, with exposure to Haskell, C++, and Rust in large, fast-moving codebases.
Hands-on expertise with GitHub, Buildkite, Nix, plus modern source control and dependency management practices.
Experience collaborating across diverse teams (Compilers, Hardware, Cloud) to improve processes and accelerate delivery.
Passion for developer productivity and testing, enabling teams to deliver high-quality software quickly and reliably.
Experience with large monorepos is a strong asset, particularly in scaling tooling and workflows across diverse teams.
Attributes of a Groqster:
Humility – Egos are checked at the door
Collaborative & Team Savvy – We make up the smartest person in the room, together
Growth & Giver Mindset – Learn it all versus know it all, we share knowledge generously
Curious & Innovative – Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness – No-limit thinking, fueling informed risk taking

Design, build, and operate large-scale cloud systems to deliver the fastest inference engine in the world.
Responsibilities & opportunities in this role:
Infrastructure Development: Design and automate cloud infrastructure using Terraform to support a wide variety of needs.
Global Footprint: Implement global load balancing with traffic steering, failover, and data locality requirements.
System Optimization: Diagnose and resolve performance issues, eliminating latency to ensure efficiency across all systems.
Dynamic Adaptability: Navigate all layers of the software stack–from TCP packet inspection at the edge to the Linux kernel scheduler–adapting swiftly to changing priorities and needs.
Attention to Detail: Uphold a high standard for code quality and system performance, taking pride in delivering impeccable work.
Ideal candidates have/can:
Experience: a proven track-record developing and operating global infrastructure at scale on at least one major cloud provider.
Automation Mindset: boost team velocity by building the right tools (Golang, Python, Bash) and integrating AI into everyday workflows.
Ownership: be self-directed and accountable; deliver projects end-to-end.
Curiosity: dive deep into complex systems to perform RCA on functionality, reliability, and latency issues.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

Join the team that builds and operates Groq’s real-time, distributed inference system delivering large-scale inference for LLMs and next-gen AI applications at ultra-low latency. As a Low-Level Production Engineer, your mission is to ensure reliability, fault tolerance, and operational excellence in Groq’s LPU-powered infrastructure. You’ll work deep in the stack—bridging distributed runtime systems with the hardware—to keep Groq systems fast, stable, and production-ready at scale.
Responsibilities & opportunities in this role:
Production Reliability: Operate and harden Groq’s distributed runtime across thousands of LPUs, ensuring uptime and resilience under dynamic global workloads.
Low-Level Debugging: Diagnose and resolve hardware-software integration issues in live environments, from datacenter level events to single component failures.
Observability & Diagnostics: Build tools and infrastructure to improve real-time system monitoring, fault detection, and SLO tracking.
Automation & Scale: Automate deployment workflows, failover systems, and operational playbooks to reduce overhead and accelerate reliability improvements.
Performance & Optimization: Profile and tune production systems for throughput, latency, and determinism—every cycle counts.
Cross-Functional Collaboration: Partner with compiler, hardware, infra, and data center teams to deliver robust, fault-tolerant production systems.
Ideal candidates have/are:
Proven experience in production engineering across the stack and operating large-scale distributed systems.
Deep knowledge of computer architecture, operating systems, and hardware-software interfaces.
Skilled in low-level systems programming (C/C++ or Rust), with scripting fluency (Python, Bash, or Go).
Comfortable debugging complex issues close to the metal—kernels, firmware, or hardware-aware code paths.
Strong background in automation, CI/CD, and building reliable systems that scale.
Thrive across environments—from kernel internals to distributed runtimes to data center operations.
Communicate clearly, make pragmatic decisions, and take ownership of long-term outcomes.
Nice to have:
Experience operating high-performance, real-time systems at scale (ML inference, HPC, or similar).
Familiarity with GPUs, FPGAs, or ASICs in production environments.
Prior exposure to ML frameworks (e.g., PyTorch) or compiler tooling (e.g., MLIR).
Track record of delivering complex production systems in high-impact environments.
Attributes of a Groqster:
Humility – Egos are checked at the door
Collaborative & Team Savvy – We make up the smartest person in the room, together
Growth & Giver Mindset – Learn it all versus know it all, we share knowledge generously
Curious & Innovative – Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness – No-limit thinking, fueling informed risk taking

Own the end‑to‑end development of low‑level firmware that brings Groq’s AI‑accelerator hardware to life. Drive architectural decisions, mentor a growing team of firmware engineers, and champion best‑in‑class processes that accelerate time‑to‑market while raising the overall quality and reliability of our products.
Responsibilities & opportunities in this role:
Technical Leadership
Serve as the primary technical authority for firmware across the product stack (bootloader, drivers, RTOS, application‑level services). Provide vision, set standards, and make trade‑off decisions that balance performance, power, security, and maintainability.
Team Enablement
Lift up the team by conducting regular design reviews, pair‑programming sessions, and “firmware brown‑bag” tech talks. Mentor junior and mid‑level engineers; create growth paths that move engineers toward senior‑staff or principal levels.
Architecture & Design
Translate Product Requirement Documents (PRDs) into detailed firmware specifications, architecture diagrams, and interface contracts.
Define modular, reusable firmware frameworks that can be leveraged across multiple Groq products.
Leverage deep Embedded Linux and RTOS expertise, including crafting and maintaining Device Tree blobs to describe firmware‑hardware configuration for custom board integration.
Design and document the firmware‑hardware interface, ensuring seamless integration with the device tree and RTOS layers.
Performance & Reliability
Lead systematic profiling, optimization, and validation of latency‑critical paths (e.g., LPU DMA, interrupt latency, power‑state transitions). Implement robust error‑handling, watchdog, and safety mechanisms to guarantee > 99.99 % uptime in production.
Cross‑Functional Collaboration
Work hand‑in‑hand with hardware, silicon, system‑software, and AI‑software teams to co‑design interfaces (PCIe, DDR, high‑speed SerDes, I²C, SPI, etc.). Drive integration‑test strategies and resolve cross‑domain bugs quickly.
Continuous Improvement
Identify and implement process improvements (CI/CD pipelines for firmware, automated regression testing, static analysis, code‑review standards). Champion a culture of data‑driven decision making that yields measurable quality gains.
Bring‑up & Debug
Lead bring‑up activities for new LPU silicon, including bootloader development, early‑stage peripheral bring‑up, and post‑silicon validation. Perform hands‑on debugging in the lab using oscilloscopes, logic analyzers, JTAG/SWD, and in‑system trace tools.
Security & Compliance
Integrate secure boot, firmware encryption, attestation, and other security primitives. Support product certification (e.g., FCC, CE) and GTM readiness activities.
Ideal candidates have/are:
B.S. in Computer Engineering, Electrical Engineering, Computer Science, or a related field.
10+ years of professional firmware development experience on complex, high‑performance SoC/ASIC platforms (preferably AI/ML accelerators).
Deep knowledge of C/C++ (C‑11 or later), assembly, and low‑level hardware interaction (memory‑mapped I/O, interrupt handling, DMA, bootloader design).
Proven experience with real‑time operating systems (FreeRTOS, Zephyr, VxWorks, ThreadX) and/or bare‑metal firmware for latency‑critical workloads.
Proficiency with high‑speed interfaces (PCIe Gen3/4, DDR4/5, SerDes, Ethernet), and lower‑speed buses (SPI, I²C, UART, CAN).
Nice to have:
AI/ML Firmware – Prior work on firmware for AI/ML inference engines, tensor accelerators, or similar workloads.
Datacenter Exposure – Understanding of server‑grade power, cooling, and reliability requirements.

Perform detailed circuit design and analysis and component selection for cost-optimized solutions and ensure successful product development with aggressive product cycles.
Responsibilities & opportunities in this role:
Product‑Driven Design Leadership – Translate Product Requirement Documents (PRDs) into detailed board functional specifications, schematic capture, and PCB layout, ensuring every design decision aligns with the overall product strategy.
High‑Speed Interface Engineering – Architect and validate SerDes, PCIe, DDR, and other high‑bandwidth links, pushing the limits of signal integrity and timing closure.
Power Delivery & Sub‑Section Design – Design robust power rails (POLs, VRMs, etc.) that deliver clean, low‑noise supply to the GroqChip LPU while meeting stringent power budgets.
Low‑Speed Interface & Firmware Integration – Implement SPI, I²C, CAN‑bus, and other control buses, and interface them with CPLDs/FPGAs using Verilog to support board‑level control and diagnostics.
ASIC Bring‑Up & Validation – Lead the design of test boards for ASIC bring‑up and post‑silicon validation, debugging hardware and firmware issues in a lab environment.
Chassis & System Integration – Contribute to chassis design and multi‑system implementations, ensuring mechanical, thermal, and electrical compatibility across the product family.
Debug & Certification Support – Own end‑to‑end debugging of chip, board, and system‑level issues; support product certification and go‑to‑market activities.
Cross‑Functional Collaboration – Work hand‑in‑hand with software, firmware, and product teams to deliver a cohesive, high‑performance silicon solution.
Ideal candidates have/are:
Deep Hardware Design Expertise – 10+ years of proven experience in schematic capture, constraint management, and PCB layout for high‑speed, high‑density systems.
Signal Integrity & Power Delivery Mastery – Hands‑on experience with SerDes, PCIe, DDR, and power sub‑sections (POLs, VRMs, etc.).
Firmware & FPGA Proficiency – Comfortable writing Verilog for CPLDs/FPGAs and integrating firmware with hardware.
Lab‑Ready Debugging Skills – Proficient with oscilloscopes, logic analyzers, power analyzers, and other measurement tools to isolate and resolve complex hardware faults.
Team‑Oriented, Self‑Starter Mindset – Thrive in a collaborative environment, prioritize tasks under tight deadlines, and drive projects to completion with minimal oversight.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

The Staff Systems Engineer will play a critical role in shaping the future of Groq's hardware platforms, enabling state-of-the-art AI/ML workloads, and driving the development of the most advanced AI accelerator on the market. This role requires a unique blend of technical expertise, leadership skills, and collaboration abilities. The Staff Systems Engineer will guide design and architecture decisions, ensuring that they align with organizational objectives. The role will also involve collaborating with cross-functional teams to ensure seamless delivery of products, features, and enhancements.
Responsibilities & opportunities in this role:
Technical Leadership: Lead hardware‑systems design projects – guide design and architecture decisions that align with Groq’s strategic objectives.
Cross‑Functional Collaboration: Partner with Silicon Engineering, Data Center Operations, Cloud Infrastructure, Product Management, and Manufacturing Operations to ensure seamless delivery of products, features, and enhancements.
Project Management: Track defects/bugs throughout the project lifecycle and drive timely issue resolution.
Interconnect Solution Definition: Electro‑optical: backplanes, bulk‑power distribution, optical links, internal & external system interconnects. Thermo‑mechanical: macro‑system cooling solutions, integration, and interconnect. Coordinate design teams to achieve common, leveraged results.
Full‑Stack System Expertise: End‑to‑end troubleshooting and issue resolution. Deep understanding of design expectations and validation methods—from proof‑of‑concept to final product.
Team Enablement: Help teams isolate problems to the subsystem level and guide the entire group toward resolution when needed.
Technical Proficiency:
Electrical & power issues
Mechanical & thermal issues
Low‑level firmware & application‑layer software
Operating systems, performance testing, and tuning for optimization
Ideal candidates have/are:
8+ years of experience leading platform teams to delivery success
A Bachelor's or Master's degree in Electrical Engineering, Mechanical Engineering, or a related field
Proficient in electrical and power issues, mechanical and thermal issues, low-level firmware, application layer software, operating systems, performance testing, and tuning for optimizations.
Problem-Solving & Debugging – Strong problem-solving and debugging skills, with the ability to troubleshoot and resolve complex hardware and firmware issues.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

As Senior Staff Compiler Engineer, you will be responsible for defining and developing compiler optimizations for our state-of-the-art compiler, targeting Groq's revolutionary LPU, the Language Processing Unit.
In this role you will drive the future of Groq's LPU compiler technology. You will be in charge of architecting new passes, developing innovative scheduling techniques, and developing new front-end language dialects to support the rapidly evolving ML space. You will also be required to benchmark and monitor key performance metrics to ensure that the compiler is producing efficient mappings of neural network graphs to the Groq LPU.
Ideal candidates have experience with LLVM and MLIR, and knowledge with functional programming languages an asset. Also, knowledge with ML frameworks such as TensorFlow and PyTorch, and portable graph models such as ONNX desired.
Responsibilities & opportunities in this role:
Compiler Architecture & Optimization: Lead the design, development, and maintenance of Groq’s optimizing compiler, building new passes and techniques that push the performance envelope on the LPU.
IR Expansion & ML Enablement: Extend Groq’s intermediate representation dialects to capture emerging ML constructs, portable graph models (e.g., ONNX), and evolving deep learning frameworks.
Performance & Benchmarking: Benchmark compiler outputs, diagnose inefficiencies, and drive enhancements to maximize quality-of-results on LPU hardware.
Cross-Disciplinary Collaboration: Partner with hardware architects and software leads to co-design compiler and system improvements that deliver measurable acceleration gains.
Leadership & Mentorship: Mentor junior engineers, review contributions, and guide large-scale, multi-geo compiler projects to completion.
Innovation & Impact: Publish novel compilation techniques and contribute thought leadership to top-tier ML, compiler, and computer architecture conferences.
Ideal candidates have/are:
8+ years of experience in the area of computer science/engineering or related
5+ years of direct experience with C/C++ and LLVM or compiler frameworks
Knowledge of spatial architectures such as FPGA or CGRAs an asset
Knowledge of functional programming an asset
Experience with ML frameworks such as TensorFlow or PyTorch desired
Knowledge of ML IR representations such as ONNX and Deep Learning
Additionally nice to have:
Strong initiative and personal drive, able to self-motivate and drive projects to closure
Keen attention to detail and high levels of conscientiousness
Strong written and oral communication; ability to write clear and concise technical documentation
Team first attitude, no egos
Leadership skills and ability to motivate peers
Optimistic Outlook, Coaching and mentoring ability
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

At PennEngineering, we innovate and collaborate to make the world a better place. You can contribute to work that matters with a company where diversity, equity and belonging are shared values. We’re committed to fostering an environment for every employee that’s welcoming, respectful and inclusive, with great opportunity for professional growth. Find your future with us.
PennEngineering is seeking a motivated, excited, individual to be a part of Data and AI Rotational Program as an Associate Data Engineer located at our Global Headquarters in Danboro, PA.
As the Associate Data Engineer, you will work closely with our Information Services team and our Data & AI Special Ops Committee to rotate through (3) areas of our Data Group: Data Engineering, Data Visualization, and Data Science and Modeling. Your role will be to learn our approach to data in each specific group, with the objective of learning how to best support the design, development, and implementation of AI-driven solutions that optimize and enhance our business operations.
This rotational experience offers a unique opportunity to gain hands-on experience with data, apply AI techniques to real-world problems, and contribute to projects that improve efficiency, accuracy, and decision-making processes across different departments and ultimately driving innovative customer solutions. You will also gain experience working with our offshore development team. Upon graduating from the program, you will work as a Data Scientist, Data Engineer, or Visualization Engineer.
Please note this role will be primarily onsite in Danboro, PA.
Join us as we build the future in Manufacturing and Engineering!
Perks and Benefits:
• Medical & Employer Paid: Dental and Vision
• 401k and Employer Match
• Paid time off and holidays
• Tuition reimbursement
• Parental Leave
• Paid On the Job Training
• Performance incentive bonuses
• Community Volunteering
• Talent Referral Bonus Program
• Employee Centric Culture
• Company Provided Technology (laptop, phone, monitors for office and home environment)
• Onsite Gym
WHAT YOU WILL DO:
• Data Handling/Exploration: Collect, preprocess, and analyze large datasets to identify trends and patterns
• Model Development: Assist in developing machine learning models using relevant algorithms and techniques for regression, classification, time series, NLP, and clustering applications. In partnership with business teams perform customer behavior analysis, prospect valuation, and process optimization.
• Support AI Tool Development: Assist in the design, development, and implementation of AI-based solutions to enhance various company processes
• Collaboration: Work closely with cross functional teams to understand business problems/requirements and integrate AI solutions into existing workflows.
• Research: Stay updated with the latest advancements in AI and machine learning and suggest innovative techniques to improve tool performance.
• Documentation: Help maintain comprehensive documentation of AI/ML models, processes, and results for future reference and continuous improvement.
• Risk assessment: Contribute to organizational strategies for identifying and managing the risks of deploying AI and data applications for developers and end-users, especially pertaining to user experience and human factors

This role is ideal for a technologist who is fluent in the evolving AI stack, understands the interplay between model design and system-level hardware demands, and can translate technical and market signals into clear, actionable strategy for Arm. The successful candidate will work closely with internal stakeholders across engineering, product, and commercial teams—as well as engage with external ecosystem partners, researchers, and innovators.


Texas A&M University is hiring an Enterprise IT Architect for Artificial Intelligence. We are making a bold leap into the future with a $45 million investment in an NVIDIA DGX SuperPOD. This investment underscores our commitment to all Texas A&M Systems faculty, researchers, and graduate assistants. As an IT Enterprise Architect, you will provide technical expertise and consultation to faculty and researchers on how to leverage the Nvidia SuperPod to accomplish their research goals. You will be involved with design and program development, including planning, directing, and evaluating AI technology infrastructure and solutions. Just imagine the impact your contributions will have for your clients and beyond!
This position is security sensitive requiring U.S. Citizenship.
While technical skills are necessary, your consultative skills and curiosity for solving complex problems are what will set you apart. Ideally, you will have worked with faculty and/or researchers as part of your career. Looking for someone who is adaptable, works well with ambiguity and whose motivation is derived from helping others be successful!
What you need to know
Salary: $170,000 commensurate with education and experience
Location: In-person role in College Station, Texas
Schedule: This role may require working outside of standard office hours, including evenings, weekends, and holidays, to support the demands of technology services and ensure the seamless operation of essential systems.
Citizenship: Must be a United States citizen, permanent resident, or a person granted asylum or refugee status in accordance with 15 CFR, Part 762; 22 CFR §§122.5, 123.22 and 123.26; and 31 CFR § 501.601

We are seeking a highly skilled Machine Learning Engineer to join our advanced model development team. This role focuses on pre-training, continued training, and post-training of models, with a particular emphasis on draft model optimization for speculative decoding and quantization-aware training (QAT). The ideal candidate has deep experience with training methodologies, open-weight models, and performance-tuning for inference.
Responsibilities & opportunities in this role:
Lead pre-training and post-training efforts for draft models tailored to speculative decoding architectures.
Conduct continued training and post-training of open-weight models for non-draft (standard) inference scenarios.
Implement and optimize quantization-aware training pipelines to enable low-precision inference with minimal accuracy loss.
Collaborate with model architecture, inference, and systems teams to evaluate model readiness across training and deployment stages.
Develop tooling and evaluation metrics for training effectiveness, draft model fidelity, and speculative hit-rate optimization.
Contribute to experimental designs for novel training regimes and speculative decoding strategies.
Ideal candidates have/are:
5+ years of experience in machine learning, with a strong focus on model training.
Proven experience with transformer-based architectures (e.g., LLaMA, Mistral, Gemma).
Deep understanding of speculative decoding and draft model usage.
Hands-on experience with quantization-aware training, including PyTorch QAT workflows or similar frameworks.
Familiarity with open-weight foundation models and continued/pre-training techniques.
Proficient in Python and ML frameworks such as PyTorch, JAX, or TensorFlow.
Preferred Qualifications:
Experience optimizing models for fast inference and sampling in production environments.
Exposure to distributed training, low-level kernel optimizations, and inference-time system constraints.
Publications or contributions to open-source ML projects.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

Design and build the real-time data infrastructure that powers GroqCloud’s global revenue engine, processing hundreds of billions of events each day, sustaining millions of writes per second, and enabling a multi-billion-dollar business to operate in real time. Drive the intelligence layer that fuels global billing, analytics, and real-time business operations at worldwide scale.
Responsibilities & opportunities in this role:
Architect high-performance data pipelines to ingest, process, and transform millions of structured and semi-structured events daily.
Build distributed, fault-tolerant frameworks for streaming data from diverse sources.
Create data services and APIs that make usage and billing data easily accessible across the platform.
Develop lightweight tools and dashboards to monitor and visualize data ingestion, throughput, and system health.
Ideal candidates have/are:
Strong background in real-time data processing, distributed systems, and analytics infrastructure.
Hands-on experience with streaming technologies such as Kafka, Flink, Spark Streaming, or Redpanda and real-time analytics databases such as Clickhouse, Druid, or Pinot.
Deep understanding of serialization, buffering, and data flow optimization in high-throughput systems.
Bonus points:
Experience deploying and managing workloads on Kubernetes.
A passion for systems performance, profiling, and low-latency optimization.
Familiarity with gRPC and RESTful API design.
Attributes of a Groqster:
Humility – Egos are checked at the door
Collaborative & Team Savvy – We make up the smartest person in the room, together
Growth & Giver Mindset – Learn it all versus know it all, we share knowledge generously
Curious & Innovative – Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness – No-limit thinking, fueling informed risk taking

Performance modeling of Groq systems on state-of-the-art AI/ML workloads, identify bottlenecks early and guide future hardware development of the most advanced AI accelerator on the market.
Responsibilities & opportunities in this role:
Develop and maintain performance models for multiple generations of Groq hardware on the latest AI/ML workloads (LLMs, CNNs, LSTMs, etc.)
Analyze AI/ML algorithms to understand their compute, networking and memory requirements, and map them effectively onto the underlying hardware architecture
Lead a matrixed team to enable SW/HW co-optimization across chip, system and software teams
Identify performance bottlenecks and help drive next generation chip architecture through a solid understanding of Groq's software and hardware
Work with silicon and system integration engineers to evaluate the costs & benefits of new technologies on Groq systems
Provide what-if scenarios / continuous guidance directly to CEO & senior leadership
Develop the Design Space Exploration (DSE) tool for performance analysis and exploration of both chip and system across various workloads
Define custom hardware solutions for high profile customers
Ideal candidates have/are:
Computer science, mathematics, ECE or equivalent background and/or experience in this domain
Strong fundamentals in computer architecture, with deep knowledge and experience of working on domain specific AI architectures, is highly preferred
In-depth understanding of latest AI/ML algorithms and their hardware implications
Ability to analyze and simplify complex hardware designs into simple abstracted timing models
Past experience on modeling AI/ML workloads, and creating necessary tools for performance optimization. Experience with modeling LLM performance is beneficial, but not required
Proficient in programming languages such as C/C++ and Python
Experience with cycle-accurate simulators for benchmarking analysis
Experience with developing ASIC microarchitecture design is a plus
Experience with understanding and simulating RTL (systemVerilog) designs is a plus
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

Mission:
You will join the hardware team with the goal of supporting novel application areas and AI modes beyond current use cases. Responsibilities include researching the evolving landscape of AI applications and models, analyzing underlying model architectures, and building implementations on Groq. Further responsibilities include analyzing mappings to existing and future hardware, modeling performance, and working cross-functionally with the hardware design team on novel hardware features e.g. functional units, numeric modes, interconnect, system integration, etc to unlock novel application areas for Groq. There will be opportunities to participate in a wider range of R&D activities, either internally or externally with key Groq partners.
Responsibilities & opportunities in this role:
AI application and model research
Performance modeling
Cross-functional work with hardware and software teams
Next generation hardware architecture development
Support internal and outward-facing R&D
Ideal candidates have/are:
Strong foundation in computer science
Experience with AI models and applications
Knowledge of LLMs and other Gen AI applications
Strong foundation in computer architecture and computer arithmetic
Python and common ML frameworks such as PyTorch & TensorFlow
Experience with performance analysis / modelling
Problem solving mindset
Nice to Have:
Experience with scientific computing & HPC
Experience in optimizing applications on specialized accelerators (GPU, FPGA, or other custom accelerators).
Experience with compiler tools and MLIR.
Experience in delivering complex projects in a fast-moving environment.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

The Hardware Infrastructure team in Groq is responsible for architecting and supporting a world class ASIC development and verification environment that optimizes the productivity of our hardware engineering team.
Responsibilities & opportunities in this role:
Architect, develop, and deploy an innovative framework that enables automation of the Groq silicon design flow.
Collaborate with the silicon design team to migrate the Groq hardware code base into a form that facilitates dynamic scaling and feature management.
Engage with software teams to define and deliver configuration-specific collateral that streamlines co-development.
Investigate new and novel ways to accelerate verification and physical design of the generated designs, leveraging AI/ML approaches where possible.
Investigate opportunities to more closely align hardware, software and production flows.
Ideal candidates have/are:
Bachelor’s or Master’s degree in Computer or Electrical Engineering
A solid background in ASIC/FPGA hardware (5+ years in design, verification, or EDA/CAD experience).
Industry-proven software design experience (hardware modeling, applications development, compiler development).
Strong scripting skills (Python, PERL, etc.).
Previous experience with meta-programming approaches would be an asset.
Experience with physical design methodologies and flows would be an asset.
Strong team player with excellent verbal and written communication skills.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

While the initial emphasis is on server and networking Infrastructure, this role spans multiple business lines — ensuring that the subsystem technologies defined within CT are reusable and driven across Automotive, IoT/EdgeAI, and Client applications.
You will work closely with engineering, architecture, product, and partner teams to translate market needs into technical direction, maintain subsystem roadmaps, and validate them through engagement with internal and external partners.
Responsibilities:
Develop and maintain technical requirements and multi-year technology roadmaps for interconnect, memory, and system IP components!
Define long-term evolution plans for Arm’s next-generation interconnect, ensuring alignment with Infrastructure and cross-market needs.
Partner with Product Management and Business Lines to align subsystem technology development with solution-level roadmaps.
Collaborate with IP and implementation engineering teams to ensure technology definitions are practical and implementation-ready for partner use.
Engage with customers and ecosystem partners to validate roadmap assumptions and collect technical feedback.
Identify market and competitive trends that inform the direction of interconnect and system IP evolution.
Coordinate with program and technical leads to align dependencies, schedules, and breakthroughs across CE, CT, SE, and business lines.
Chip in to technology planning processes and improvements across the Technology Management function.
Required Skills and Experience :
Shown experience in interconnect, memory, or SoC subsystem design and integration.
Strong understanding of system IP (Coherent Mesh fabric, NoC, MMU, GIC, memory controllers, PCIe, Debug). Understanding of SoC boot flow, D2D, security.
Experience developing or supporting SoC-level design and integration flows.
Validated ability to define requirements and lead technical trade-offs across IPs.
Clear and concise communicator able to interface effectively with architects and engineering teams.
“Nice To Have” Skills and Experience :
Experience developing or maintaining roadmaps for interconnect or memory subsystem IP.
Knowledge of coherency protocols, LPDDR, and on-chip fabric standards.
Familiarity with performance analysis or verification methodologies for large subsystems.
Cross-domain experience applying server/networking tech to Automotive or EdgeAI systems.
In Return:
You will develop the roadmap for Arm’s core interconnect and control subsystems, ensuring they are strategically aligned and technically validated across markets. While the initial focus is infrastructure, you will work across line of business and customers to ensure these foundational technologies are robust, driven, and reusable across Arm’s diverse product portfolio. Your ownership of requirement specs and roadmap rigor will ensure subsystem coherence across product generations — enabling Arm to scale from IP to complete system solutions.
Our 10x mindset guides how we engineer, collaborate, and grow. Understand what it means and how to reflect 10x in your work:https://careers.arm.com/en/10x-mindset

This position will focus on formal verification of Groq’s next generation hardware. Through this work term, the successful candidate will work alongside experienced engineers to help verify complex digital designs using formal verification techniques and tools. You will be involved in tasks ranging from setting up formal verification environments, running automated checks, debugging formal proofs, and supporting the integration of formal verification tools into the overall verification flow.
Responsibilities & opportunities in this role:
Verify hardware features of Language Process Unit (LPU).
Cross-functional Collaboration: Partner with architecture/RTL teams to specify properties, resolve deep design issues, and influence micro-architecture decisions.
Formal verification execution:
Leverage and unleash the power of formal verification to rigorously verify critical design properties and ensure compliance with specifications, as well as minimize spec ambiguities.
Debug findings and collaborate with stakeholders in an efficient manner.
Support silicon bring-up and debug using formal methods where it applies.
Methodology Leadership: Develop and implement advanced formal verification environments and methodologies for complex ASIC designs, including automated flows for scalability and efficiency.
Mentorship: Train and coach junior engineers on formal techniques and best practices; Help on methodology/FAQ documentation.
Innovate. Contribute to developing future verification strategies for validating future accelerator chips and hardware architectures for ML workloads.
Ideal candidates have/are:
BS degree in electrical engineering, or related fields, or equivalent practical experience; advance degrees (MS or PhD) is a plus
8+ years in ASIC verification with 5+ years focused on formal verification methods
Mastery of SystemVerilog Assertions (SVA) and formal property verification
Proficient on at least one popular formal verification tools in the industry(JapserGold, VC Formal, etc.)
Strong analytical skills and attention to detail when debugging complex issues
Good scripting skills for flow automation(tcl, python, etc.)
Good written and oral communication skills
Must be authorized to work in the United States or Canada
Proven success in full-cycle formal sign-off for complex compute blocks
Expertise in formal apps: sequential equivalence checks, datapath, connectivity, etc.
Deep understanding of LPU or GPU architecture/design
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

We are seeking an experienced engineer to drive the next-generation infrastructure, methodology, and automation that will accelerate our ASIC design cycle. In this role, you will establish best-in-class flows, apply advanced AI/ML techniques to complex design and sign-off challenges, and develop robust tooling that reduces time-to-tapeout while enhancing quality of results (QoR).
Responsibilities & opportunities in this role:
Own ASIC methodology end-to-end: Define and evolve Architectural Modeling → RTL → Design Verification → gates → P&R → sign-off flows.
Lead AI-driven methodology across the full ASIC lifecycle: Define the strategy, roadmap, and guardrails for applying AI/ML from architectural investigation through post-silicon, with measurable positive impacts on schedule, cost, and QoR.
Automate for speed and reproducibility: Build robust pipelines for build/test/run, artifact tracking, and results triage; make one-click “from RTL to sign-off” flows standard.
Scale compute and data: Design farm orchestration for thousands of concurrent jobs, manage license usage, cache, and distributed storage for large design datasets.
Make results visible: Create dashboards and alerts for QoR trends, regressions, and SLA adherence; enable data-driven design decisions.
Partner across teams: Collaborate with RTL, DV, PD, DFT, verification, and CAD to land new methods, support tapeout, and drive continuous improvement.
Collaborate with software engineering to elevate hardware methodology: Partner with platform, SRE, and tooling teams to apply software best practices for robust, scalable, and developer-friendly ASIC design workflows.
Ideal candidates have/are:
7+ years building and maintaining ASIC flows in production (from RTL through sign-off) with proven tapeout experience.
Deep knowledge of at least two EDA domains: simulation, hardware emulation, logic synthesis, P&R, STA, formal/equiv, CDC/RDC, DFT/ATPG, physical verification, or power sign-off.
Strong coding skills in Python and one of Tcl/C++; rigorous software practices (version control, code review, testing).
Experience building CI/CD style automation for hardware (e.g., Jenkins/GitHub Actions, BuildKite, Bazel, containers).
Comfortable with Linux at scale and job schedulers (e.g., LSF/Slurm/Kubernetes).
Ability to translate engineer pain points into reliable tools with excellent UX and documentation.
Preferred qualifications
Hands-on experience with Synopsys (VCS/Verdi), Cadence (Genus/Innovus/Tempus/Jasper) EDA tools.
Practical ML experience (PyTorch/JAX/scikit-learn) and MLOps basics (feature/log pipelines, experiment tracking).
Data engineering for EDA: metrics schemas, results warehousing (PostgreSQL/BigQuery), dashboarding (Grafana), and anomaly detection.
Cloud HPC (AWS/GCP/Azure) for bursty EDA workloads; cost and license optimization.
Power intent and low-power methodology (UPF/CPF), multi-corner multi-mode closure experience.
Prior success introducing new methodology across multiple projects/teams.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

As a Staff DFT Engineer, you will be responsible to enable Groq’s next‑generation LPU by designing, integrating, and validating cutting‑edge DFT solutions.
Responsibilities & opportunities in this role:
Build & Own End‑to‑End DFT Flow: Design a production‑ready DFT flow from scratch, select tools from an industry‑leading EDA partner , create reusable templates, scripts, and best‑practice guidelines.
RTL DFT DRC & Optimization: Run DFT DRC on RTL blocks, propose fixes, and deliver RTL that meets timing, area, and power targets.
Timing Closure & Sign‑off: Write DFT‑specific timing constraints, work with physical design to close timing in test mode, and sign off test‑mode constraints.
Test Vector Development: Generate ATPG, IJTAG, and MBIST vectors; run RTL and gate‑level simulations to achieve high coverage and verify functional path exercise.
Post‑Silicon Bring‑up: Partner with silicon validation teams to run patterns on silicon, troubleshoot ATE issues, and streamline validation cycles.
Automation & Workflow Improvement: Identify bottlenecks, develop Tcl/Python automation, and integrate new tools to reduce manual effort and boost productivity.
Documentation & Knowledge Transfer: Maintain version‑controlled DFT documentation, create a “DFT playbook,” and lead knowledge‑sharing sessions.
Ideal candidates have/are:
Bachelor's or Master's degree in Electrical Engineering or Computer Engineering.
8+ years of industry experience with Design for Test (DFT) for high-performance ASICs.
Proven hands‑on experience with industry‑leading DFT tools, techniques and methodologies for large SOC/ASIC designs.
Expert in core DFT techniques: ATPG, scan compression, MBIST, IJTAG, IEEE -1500, Logic BIST, test‑pattern translation, and yield learning.
Experience with UDFM’s such as Cell Aware and Small Delay Defect.
Experience in ATPG Streaming SCAN Network (SSN) implementation.
Strong timing‑analysis, constraint sign‑off, and gate‑level simulation skills; adept at debugging DFT patterns on real silicon and ATE.
Strong knowledge of RTL to GDS methodologies and formal equivalence.
Excellent coding skills in Tcl and Python.
Strong interpersonal and organizational skills.
Ability and desire to work effectively as part of a team.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

Test and validate silicon and systems for the most advanced AI accelerator on the market.
Responsibilities & opportunities in this role:
New silicon bring up, create validation plan, & perform system validation on all aspects in a lead role
Validation of HSIO and LSIO IPs
Drive ATE to system correlation efforts during NPI stage
Perform scripting and test data processing to extract meaningful signals
Develop and integrate software test applications for effective product stress and SLT screening.
Collaborate with software teams to evaluate system performance and HW/SW interaction under various conditions.
Enable automation to extract key validation and characterization data collected across PVT
Root cause analysis and RMA processing
Mentor Junior Engineers when the project need arises
Experience in Post Silicon Electrical Validation of server processors (if server processor is too specific, you can remove server)
Ideal candidates have/are:
Proven driver and leader of a full system validation from end to end (silicon out to production start) with attention to detail and a passion for root causing issues
Silicon validation experience, preferably in the area of HSIO, LSIO, Logic and Memory. BE or ME Graduate
Experience in system marginality validation
Good understanding of lab equipment and measurement techniques for high-speed interfaces. High speed scopes, probes, spectrum analyzers, BERTs.
Strong understanding of Electrical Engineering fundamentals
Strong understanding of Firmware and able to debug and create new test cases
Knowledge of board and package design, signal integrity and power integrity a plus
Software proficiency for python test scripting, data handling and reporting
Laboratory experience, including hands-on use of equipment: oscilloscope, logic analyzer, etc.
Excellent problem-solving skills, good communication skills and ability to work cooperatively in a team environment
Debug issues with SOC IP and boards as needed.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

You will own, build, and manage the RMA and FA debug and root-cause analysis for existing and new Groq AI / ML products. You will conduct tests, FA debug, and root-cause analysis.
Responsibilities & opportunities in this role:
Conduct and lead debug and root-cause analysis of field RMAs. Collaborate with Systems Engineers, Hardware Engineers, Software Engineers and Operations Engineers as required.
Scale Root Cause Failure Analysis capabilities within your organization.
Create Failure Analysis result reports that align with standard 8D or similar processes
Develop and optimize RMA testing strategy to improve timeliness and effectiveness of characterization process
Analyze RMA, Failure Analysis, and Repair data. Identify trends and raise quality alerts when necessary. Drive resolution, containment, and mitigation plans for such quality alerts.
Oversee hardware quality performance, monitoring field quality data and associated metrics including RMA Rates, MTBF, and Reliability Ratio.
Manage operational performance of Failure Analysis at contract manufacturer(s), ensuring partner(s) achieve key performance indicators, including FA cycle times, fault duplication rates, and fault isolation rates.
Drive learning’s from RMA / FA back into Manufacturing, Engineering, and Support teams.
Oversee the set-up of new products into Failure Analysis operations.
Ideal candidates have/are:
BS/MS in Electrical Engineering, Physics or a related degree
7+ years of hands-on systems test and/or validation engineering experience
Proven hands-on management and leadership experience
Competence using lab equipment such as oscilloscopes, logic analyzers, power analyzers, etc.
Deeply cognizant of the differences between System test vs ATE test
Experience with enabling reliability tests such as HTOL and quality tests such as Burn In.
Ideal candidate will have working knowledge of Failure analysis techniques and tools such as FIB, SEM, TDR, VNA, and CSAM
Ideal candidate will also have working knowledge of Fault Isolation techniques such as OBIRCH, DLS/LADA, LVP and LVI
Proficiency with high speed interfaces (Serdes, PCIe, DDR)
Experience testing power sub-sections (e.g. POLs, VRMs, etc.)
Familiarity with lower speed interfaces like SPI, I2C, CAN bus, etc.
Proficiency in Python, Perl, C++, or other languages on UNIX/Linux
Experience in Failure Analysis for one (or more) of the following:
Microprocessors, complex SOC devices, AI Systems, Servers, Network Systems
Excellent knowledge of PCB card and system-level test and debug
Able to manage factory floor partners (CM’s) for RMA / FA activities
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

As a member of the Groq Backbone team, your mission is to scale GroqCloud’s global footprint—delivering a consistent, high-performance experience with maximum availability to users everywhere.
Responsibilities & opportunities in this role:
Backbone Architecture & Design
Global Network Collaboration: Partner with regional network teams to harmonize routing, peering, and policy across all sites. Serve as the primary liaison for external service providers (ISP, dark‑fiber carriers) and coordinate joint troubleshooting.
Transport Optimization & Performance: Monitor link utilization, packet loss, jitter, and latency using telemetry (NetFlow, sFlow, OpenTelemetry).
Network Automation & Configuration Management: Build and maintain GitOps‑driven pipelines (Buildkite, Terraform, Python, Kubernetes) for backbone configuration and lifecycle management.
Ideal candidates have/are:
Backbone Expertise – 7+ years designing, deploying, and operating inter‑DC transport networks (ISIS, MPLS, SR, TE, L3VPN, Anycast, dark fiber, wave and optical platforms).
Routing Proficiency – Deep knowledge of BGP, ISIS, SR, TE; experience with route reflectors, full internet routing tables on global networks, IXs and BGP communities.
Automation & Scripting – Strong background having automated several networks previously, especially with use of Terraform/CDKTF, Python, and vendor APIs for configuration management and automation.
Cloud native application development and delivery - Testing and packaging automation as cloud native RESTful Python applications (FastAPI)
Monitoring & Telemetry – Familiarity with NetFlow, sFlow, Prometheus ecosystem, Grafana, and OpenTelemetry for end‑to‑end visibility.
Security Knowledge – Understanding of IPsec, MACsec, ACLs, and DDoS mitigation in a global network context.
Soft Skills – Excellent analytical, communication, and collaboration abilities; comfortable leading cross‑geo teams and mentoring others.
Attributes of a Groqster:
Humility – Egos are checked at the door
Collaborative & Team Savvy – We make up the smartest person in the room, together
Growth & Giver Mindset – Learn it all versus know it all, we share knowledge generously
Curious & Innovative – Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness – No-limit thinking, fueling informed risk taking

As a Datacenter Network Engineer at Groq, you will design, build, and deploy Groq’s global datacenter network fabric to meet the highest standards of availability, efficiency, and automation. Your work will ensure a robust, scalable, and future-ready infrastructure that directly supports Groq’s mission to deliver fast, cost-efficient inference services.
Responsibilities & opportunities in this role:
Network Fabric Architecture and Design: Design end‑to‑end network topologies for new and existing data centers, covering IPv4/IPv6, BGP, OSPF, MPLS, and high‑availability fabric. Provide a scalable, fault‑tolerant foundation that keeps inference pipelines running 24/7.
Hardware Deployment & Optimization: Deploy, configure, and tune Cisco, Juniper, and Arista switches, routers, and firewalls at massive scale. Ensure optimal throughput, low jitter, and predictable performance for ML workloads.
Documentation and Bill‑of‑Materials (BOM) Management: Create and maintain accurate BOMs for all fabric components (switches, line cards, transceivers, cables, racks). Validate BOMs against design specifications and update them as hardware revisions or new models are introduced.
Cross functional collaboration: Partner with Data Center Engineering (power, cabling, floor layouts) teams to understand traffic profiles and future scaling needs.
Ideal candidates have/are:
Hardware Proficiency – Hands‑on experience with Cisco Nexus, Juniper QFX, and Arista 7500/7800 series switches, including line‑card and transceiver management.
GPU Clusters – RoCEv2 –
BOM & Procurement Skills – Experience creating and maintaining BOMs, coordinating with procurement, and managing inventory.
Automation & Scripting – Strong background in Ansible, Terraform, Python, and vendor APIs for configuration management.
Monitoring & Telemetry – Familiarity with Prometheus and OpenTelemetry for port‑level visibility.
Soft Skills – Excellent analytical, communication, and collaboration abilities; comfortable mentoring and documenting complex designs.
Bonus – Experience with ML traffic patterns, SD‑WAN, or contributions to open‑source fabric projects.
Attributes of a Groqster:
Humility – Egos are checked at the door
Collaborative & Team Savvy – We make up the smartest person in the room, together
Growth & Giver Mindset – Learn it all versus know it all, we share knowledge generously
Curious & Innovative – Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness – No-limit thinking, fueling informed risk taking

Installation, configuration, and maintenance of data center infrastructure, including servers, storage systems, and network devices.
As a Data Center Technician, you will serve as the Directly Responsible Individual (DRI) for daily operations within the data center. You will lead hands-on installation, maintenance, and troubleshooting of compute and network infrastructure critical to Groq’s high-performance AI workloads.
Responsibilities & opportunities in this role:
Hardware Operations:
Receive, unpack, and move servers and other equipment to the data center floor.
Install, cable, and maintain servers, network switches, and power distribution units (PDUs) in racks.
Perform hardware-level bring-up and testing using Linux command-line tools.
Ensure proper accountability for equipment and assets through inventory management.
Troubleshooting & Support:
Troubleshoot and resolve complex technical issues related to rack and node failures.
Run scripts to debug and repair rack cabling and other hardware problems.
Create, update, and resolve tickets in Groq's ticketing system to document all work.
Participate in an on-call rotation to provide 24/7 support for data center operations.
Infrastructure & Collaboration:
Execute final test sign-offs for newly built racks.
Collaborate with other engineering teams to design and implement data center upgrades and expansions.
Develop and maintain technical documentation, including diagrams and procedures, to ensure operational consistency.
Ensure compliance with data center standards, policies, and procedures.
Ideal candidates have/are:
2+ years of experience in data center operations or a related field
Strong knowledge of data center infrastructure, including servers, storage systems, and network devices
Experience with data center management software, such as DCIM or BMS
Strong problem-solving and analytical skills
Excellent communication and teamwork skills
Ability to work in a fast-paced environment and prioritize tasks effectively
Strong attention to detail and ability to maintain accurate records
Experience with scripting languages, such as Python or Bash
Familiarity with virtualization technologies, such as Kubernetes
Advanced fiber optic cabling skills
Intermediate Linux skills
Intrinsic curiosity and drive to stay up-to-date with the latest technologies and trends in data center infrastructure and operations
Familiarity with Macbooks, Slack and Google docs
Bachelor's degree in Computer Science, Information Technology, or related field, or equivalent experience
Ability to travel up to 50% of the time
Attributes of a Groqster:
Humility – Egos are checked at the door
Collaborative & Team Savvy – We make up the smartest person in the room, together
Growth & Giver Mindset – Learn it all versus know it all, we share knowledge generously
Curious & Innovative – Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness – No-limit thinking, fueling informed risk taking

You will be responsible for setting up, maintaining, and operating lab equipment used for testing and validating high-performance hardware systems. You will collaborate with design engineers, validation teams, and manufacturing to ensure our products meet the highest quality and reliability standards before release.
Responsibilities & opportunities in this role:
Hardware Testing & Validation
Execute test plans for electronic components, PCB’s and system-level-assemblies.
Perform functional, performance and environmental testing using oscilloscopes, logic analyzers, power supplies and other instrumentation.
Collect and analyze test data to identify root causes of failures or performance deviations.
Technical Support
Support HW design, bring-up and validation activities by preparing workstations, test setups and prototypes.
Assist engineers with hardware builds, soldering, rework, and board-level troubleshooting as needed.
Implement and improve processes for hardware prototype tracking and test documentation.
Lab Operations & Maintenance
Maintain and calibrate test equipment, ensuring the lab remains organized and compliant with safety standards.
Manage hardware inventory, including prototypes, test boards, and lab assets.
Support setup and teardown of test benches, racks, and measurement systems.
Generate detailed test reports and communicate results to cross-functional teams.
Cross-Functional Collaboration
Work closely with hardware design, validation, and manufacturing engineers to improve test coverage and debug hardware issues.
Provide feedback to design teams to enhance product reliability and manufacturability.
Ideal candidates have/are:
Bachelor’s degree in Electrical Engineering, Computer Engineering, or a related technical field.
5+ years of experience in a hardware test, validation, or engineering lab management role.
Strong hands-on experience with lab instrumentation (oscilloscopes, spectrum analyzers, multimeters, etc.).
Solid understanding of electronics fundamentals (digital, analog, and power systems).
Excellent problem-solving and analytical skills.
Preferred:
Familiarity with hardware validation workflows for ASICs, FPGAs, or embedded systems.
Experience working in a production or prototype hardware lab environment.
Knowledge of thermal, mechanical, or environmental testing.
Familiarity with safety and ESD standards in lab environments.
Experience with high-speed digital interfaces (PCIe, DDR, Ethernet, etc.).
Familiarity with data acquisition systems or measurement automation frameworks.
Strong organizational and documentation skills.
Excellent communication and teamwork abilities.
Attention to detail and a proactive attitude toward troubleshooting and process improvement.
Ability to work independently and manage multiple test activities concurrently.
Attributes of a Groqstar:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. From our Bay Area roots to our growing global presence, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. From our Bay Area roots to our growing global presence, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.
Attributes of a Groqster:
Humility - Egos are checked at the door
Collaborative & Team Savvy - We make up the smartest person in the room, together
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously
Curious & Innovative - Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness - no limit thinking, fueling informed risk taking

Winter 2026 (January - April) Internship - full-time
Hybrid (Palo Alto, CA)
Mission:
We’re a small, fast team behind OpenBench (open, reproducible LLM evals). We turn model behavior into measurable progress, then upstream it. You’ll work alongside people, not for people: low ceremony, quick feedback, lots of ownership. You won’t be siloed; you’ll jump across evals, post-training, infra, and (when useful) product/GTM.
Responsibilities & opportunities in this role:
Build and reimplement evals (accuracy, robustness, safety, latency) end-to-end.
Run tight SFT/DPO/RLHF-style loops; track deltas and ship models for customers.
Red-team models; turn quirks into metrics and provide feedback to the inference team
Own scoped projects: design → implement → document → upstream.
Write research papers on evals you build.
Pitch improvements across the company when you see them, then ship.
Ideal candidates have/are:
Founding Engineer (grinder)
You unblock yourself, learn fast, and ship relentlessly - scrappy first, then clean and reproducible.
Signals: productionized side projects, CI’d repos, tools other people actually use.
Researcher (loves data and pushing the frontier)
You reason clearly about eval design, failure modes, and data quality; you run ablations and write tight analyses.
Signals: careful experiments, thoughtful write-ups, PRs to open-source projects.
Must-haves
Agentic, kind, gritty.
Hands-on with evals, post-training, or applied AI (not just theory).
Comfort getting a bit hacky while keeping results reproducible.
Why Join Us
Purposeful Hiring: You’re not here by accident, and neither is anyone else. Every teammate is handpicked with intention because who we build with matters.
Builders Wanted: You’re not just riding the rocket ship, you’re building it. Your work directly shapes the trajectory of our company.
Mission-Driven Work: We’re here to make a real impact. Our mission fuels everything we do.
Tackling Hard Problems: If easy isn’t your thing, you’re in the right place. We solve some of the most complex and exciting challenges in our space.
Excellence Is The Standard: High performance isn’t just encouraged, it’s the baseline. And it’s contagious.
If this sounds like you, we’d love to hear from you!

Winter 2026 (January - April) Internship - full-time - hybrid
Mission:
Leverage industry experience to lay the groundwork for creating a silicon validation and characterization platform with a strong focus on quality to provide the best possible devices to the datacenter and help drive future design improvements.
Responsibilities & opportunities in this role:
Responsible for partnering with senior Silicon validation engineers to help develop an E2E system validation test plan for the SoC including characterization
Collaborate with the SW and HW teams to characterize tests and hardware
Focus on automation of tests for validation and characterization
Ideal candidates have/are:
Courses/lab work in core Electrical Engineering (DSP, VLSI, Circuit design, Microcontrollers,)
Strong fundamentals in Signal Integrity basics and High Speed I/O characteristics
Strong communication skills
Keen interest and experience in programming (Python/C++)
Basic understanding of Electrical Engineering concepts and strong fundamentals in Physics
Basic understanding of LINUX OS and use of commands in LINUX
Must be authorized to work in the United States or Canada
Why Join Us
Purposeful Hiring: You’re not here by accident, and neither is anyone else. Every teammate is handpicked with intention because who we build with matters.
Builders Wanted: You’re not just riding the rocket ship, you’re building it. Your work directly shapes the trajectory of our company.
Mission-Driven Work: We’re here to make a real impact. Our mission fuels everything we do.
Tackling Hard Problems: If easy isn’t your thing, you’re in the right place. We solve some of the most complex and exciting challenges in our space.
Excellence Is The Standard: High performance isn’t just encouraged, it’s the baseline. And it’s contagious.
If this sounds like you, we’d love to hear from you!

Position manages the administration, support, operational, and developmental functions for Research Imaging Division facilities’ specialized computing systems and resources. Coordinates with Senior Director of IT regarding overall Department and University policies and procedures, standards, processes, and projects.
j.kacich@wustl.eduPrimary Duties & Responsibilities:
Design, deploy, configure and operate high performance computing systems.
Design, deploy, configure and operate high performance storage systems.
Design, deploy, configure and operate physical servers and associated virtualization layer.
Oversee selection, employment, development, control and discharge of employees and the establishment of personnel policies and practices in conjunction with the department’s HR leadership.
Develop and manage user support programs, including active helpdesk for these specialized resources, training sessions and written documentation.
Interface with vendors to select, purchase and maintain hardware and software solutions.
Develop and maintain service level agreements and supporting system for monitoring and tracking metrics.
Interface with department, school and university information technology groups to enable integrated operations with network, computing and identify management.
Maintain hardware installations in university data centers.
Develop and maintain comprehensive cybersecurity plan consistent with and meeting or exceeding the minimum standards of the university including threat detection, intrusion monitoring and system/data backups.
Provide reports to advisory board, department leadership and other key stakeholders.
Perform other duties as assigned.
Education:
Bachelor’s degree or combination of education and/or experience may substitute for minimum education.
Certifications:
No specific certification is required for this position.
Work Experience:
Management/Supervisory (10 Years)
Skills:
Atlassian, Ceph (Software), Docker (Software), File Management System, File Storage, Git, GitLab, High-Performance Computing (HPC) Systems, Interpersonal Communication, Iptables, Linux Bash, Network File System (NFS), Oracle Grid Engine, Oral Communications, Portable Batch System (PBS), PostgreSQL, Proxy Servers, Puppet (Software), Python (Programming Language), Scripting Languages, Slurm Workload Manager, Virtual Infrastructure, VMware vSphere, Written Communication
WashU offers rewarding opportunities in various fields at all levels, with positions in engineering, nursing and health care, research, administration, technology, security and more. We seek people from diverse backgrounds to join us in a supportive environment that encourages boldness, inclusion and creativity.
