The WRDM Machine Learning (ML) Research Hub is seeking machine learning data engineers with a background in software engineering, strong technical problem-solving skills, and experience in creating scalable data pipelines and infrastructure for training, validating, and deploying into production ML solutions for broad usage.
The successful candidate will work with ML research scientists across WRDM to enable their proprietary data and external datasets to be leveraged for ML modeling. This will be accomplished by designing and implementing end-to-end data workflows for large-scale data ingestion, processing, tagging, and publishing, with an eye towards improving ML model performance over time.
Qualifications
- Formal training in Computer Science, Statistics, Applied Mathematics, Chemistry, Physics, a life science discipline, related technical discipline, or relevant practical experience
- 2+ years’ programming experience in Python, Java, Scala, C++, or SQL
- 2+ years’ experience in software design, development, and algorithm-related solutions for production-grade systems using machine learning
- 2+ years’ experience in managing code composed of multi-developer teams, following industry best practices
- Deep knowledge of one or more scientific data types (e.g. biomedical images, biomedical text, large-scale, multidimensional ‘omics, large- or small- molecule therapeutics, clinical or Real World Data, etc.)
Preferred Qualifications
- MS/PhD + 2 years of relevant research experience
- Experience with high performance computing (HPC) environments (SLURM/LSF/SGE schedulers)
- Familiarity with cloud computing infrastructure including Amazon Web Services (AWS) and distributed computing libraries (e.g. Spark, Hive, Impala, Kafka, etc.)
- Experience with containerization and orchestration tools (e.g. Docker, Singularity, Airflow, Luigi, Kubernetes, etc)
- Experience with workflow languages (CWL, WDL, Nextflow, etc.) and data publishing/consumption from modern data warehousing systems /MPP databases (Redshift, Snowflake, BigQuery)
- Experience with CI/CD and automation tools (Terraform, CloudFormation, Jenkins, Ansible, etc.)
- Passion and curiosity for data and proven ability to take ideas from prototype to production
Technologies They Use:
Python, Java, C++, Slurm-based on-premise compute clusters, Google Cloud Platform, AWS, Docker, Singularity, Kubernetes, Python (Numpy, Pandas, Dask, PyTorch, TensorFlow, sci-kit learn, RDKit, Weights and Biases, etc.)
Pfizer requires all U.S. new hires to be fully vaccinated for COVID-19 prior to the first date of employment. As required by applicable law, Pfizer will consider requests for Reasonable Accommodations.
More Information
- Salary Offer 0 ~ $3000
- Experience Level Junior
- Total Years Experience 0-5
- Dropdown field Option 1