Director, AIML & Scientific Computing Optimization
Company: GlaxoSmithKline
Location: Cambridge
Posted on: November 6, 2024
Job Description:
Site Name: Cambridge MA, UK - LondonPosted Date: Oct 25 2024The
Onyx Research Data Tech organization is GSK's Research data
ecosystem which has the capability to bring together, analyze, and
power the exploration of data at scale. We partner with scientists
across GSK to define and understand their challenges and develop
tailored solutions that meet their needs. The goal is to ensure
scientists have the right data and insights when they need it to
give them a better starting point for and accelerate medical
discovery. Ultimately, this helps us get ahead of disease in more
predictive and powerful ways.Onyx is a full-stack shop consisting
of product and portfolio leadership, data engineering,
infrastructure and DevOps, data / metadata / knowledge platforms,
and AI/ML and analysis platforms, all geared toward:
- Building a next-generation, metadata- and automation-driven
data experience for GSK's scientists, engineers, and
decision-makers, increasing productivity and reducing time spent on
"data mechanics".
- Providing best-in-class AI/ML and data analysis environments to
accelerate our predictive capabilities and attract top-tier
talent.
- Aggressively engineering our data at scale, as one unified
asset, to unlock the value of our unique collection of data and
predictions in real-time.Our AIML& Scientific Computing
Optimization team is focused on optimizing first-in-class Compute
and AIML platforms that accelerate application development, scale
up computational experiments, and integrate all computation with
project metadata, logs, experiment configuration and performance
tracking over abstractions that encompass Cloud and
High-Performance Computing. This metadata-forward, CI/CD-driven
platform represents and enables the entire application and analysis
lifecycle including interactive development and explorations
(notebooks), large-scale batch processing, observability and
production application deployments. The optimization team's focus
is on maximizing scale and performance of all aspects of the
platforms.A Director of AIML & Scientific Computing Optimization is
a deeply technical leader. They consistently deliver major compute
and AIML platform features and solutions with cross-organizational
impact and value. They are recognized as expert in software
engineering, scientific computing, and/or AIML with deep
understanding of performance and optimization, within the Onyx
team, across R&D Digital & Tech, and even externally. They can
work closely with -- and have strong technical knowledge of -
underlying platform dependencies such as DevOps, Infrastructure and
Cloud and can enable collaborations and help drive the requirements
across other Onyx engineering teams that results in improved
performance and better user experience.This role is responsible for
building and leading a team of world-class software engineers
focused on optimizing a best-in-class Compute and AIML Platforms at
scale, quality, and cost. The Director of AIML & Scientific
Computing Optimization will support the Sr. Director of Computing,
Analysis, and AI/ML Platforms in building a strong culture of
accountability and ownership in their team, as well as instilling
best-in-class engineering practices (e.g. testing, code reviews,
DevOps-forward ways of working). They work in close partnership
with AI/ML Platform team, Compute Platform team, Data and
Infrastructure Engineering, Product Management, Portfolio
Management, and other engineering functions to ensure close
alignment with customers and with engineering teams both upstream
and downstream of their work.Key Responsibilities:
- Build, lead, develop, and retain world-class software
engineers
- Serve as a top expert for the optimization team, and contribute
technical expertise to teams in closely aligned technical areas
such as DevOps, Cloud and Infrastructure
- Lead design of major optimization software components of the
Compute and AIML Platforms, contribute to development of production
code and participate in both design reviews and PR reviews
- Accountable for delivery of scalable solutions to the Compute
and AIML Platforms that supports the entire application lifecycle
(interactive development and explorations/analysis, scalable batch
processing, application deployment) with particular focus on
performance at scale
- Partner with both AIML and Compute platform teams as well as
scientific users to help optimize and scale scientific workflows by
utilizing deep understanding of both software as well as underlying
infrastructure (networking, storage, GPU architectures, ---)
- Direct scrum team leads, and contribute technical expertise to
teams in closely aligned technical areas
- Able to design innovative strategy and way of working to create
a better environment for the end users, and able to construct a
coordinated, stepwise plan to bring others along with the change
curve
- Standard bearer for proper ways of working and engineering
discipline, including CI/CD best practices and proactively
spearhead improvement within their engineering area
- Serve as a technical thought leader and champion: e.g., speak
at industry events, promote GSK as an attractive place to build a
career and thrive as a Platform engineer, act as a key knowledge
holder for the Onyx organization.Why You?Basic Qualifications:
- Bachelor's, Master's or PhD degree in Computer Science,
Software Engineering, or related discipline.
- 8+ years of experience using specialized knowledge in cloud
computing, scalable parallel computing paradigms, software
engineering, CI/CD with Bachelor's.
- 6+ years of experience using specialized knowledge in cloud
computing, scalable parallel computing paradigms, software
engineering, CI/CD with Master's.
- 4+ years of experience using specialized knowledge in cloud
computing, scalable parallel computing paradigms, software
engineering, CI/CD with a PhD.
- At least 2 years of experience with recruiting, managing, and
developing engineers or other deeply technical
contributorsPreferred Qualifications:
- Deep experience using at least one interpreted and one compiled
common industry programming language: e.g., Python, C/C++, Scala,
Java, including toolchains for documentation, testing, and
operations / observability
- Hands-on experience with application performance tuning and
optimization, including in parallel and distributed computing
paradigms and communication libraries such as MPI, OpenMP, Gloo,
including deep understanding of the underlying systems (hardware,
networks, storage) and their impact on application performance
- Expert understanding of AIML training optimization, including
distributed multi-node training best practices and associated tools
and libraries as well as hands-on practical experience in
accelerating training jobs
- Understanding of ML model deployment strategies, including
agent systems as well as scalable LLM model inference systems
deployed in multi-GPU, multi-node environments.
- Deep expertise in modern software development tools / ways of
working (e.g. git/GitHub, DevOps tools, metrics / monitoring,
---)
- Cloud experience (e.g., AWS, Google Cloud, Azure), including
infrastructure-as-code and relevant tools and libraries such as
Terraform, Ansible, and Packer
- Experience with CI/CD implementations using git and a common
CI/CD stack (e.g., Azure DevOps, CloudBuild, Jenkins, CircleCI,
GitLab)
- Experience with Docker, Kubernetes, and the larger CNCF
ecosystem including experience with application deployment tools
such as Helm
- Experience with low level application builds tools (make,
CMake) and understanding of optimization at the build and compile
level
- Demonstrated excellence with agile software development
environments using tools like Jira and Confluence
- Deep familiarity with the tools, techniques, optimizations in
high-performance applications space, including engagement with the
open-source community (and potentially making contributions to such
tools)
- Experience with establishing software engineering ways of
working and best practices for a team (whether informally or as
formal SOPs etc)
- Experience recruiting top engineering talent
- Experience with agile planning and execution processes for
software delivery
#J-18808-Ljbffr
Keywords: GlaxoSmithKline, Springfield , Director, AIML & Scientific Computing Optimization, Executive , Cambridge, Massachusetts
Didn't find what you're looking for? Search again!
Loading more jobs...