Open Data Science job portal

Lead Data Engineer, Life Science 878 views

Join Medidata to help shape the future of Clinical Studies with data-driven products and services. Their data platform is growing and constantly improving, and they are looking for a motivated and experienced lead data engineer.

Key Responsibilities

  • Be the technical lead engineer on a team of data engineers responsible for data aggregation, transformation, modeling and delivery for both client usage and internal data science teams
  • Full-stack design, development, and operation of core data capabilities like a data lake, data warehouse, data marts, and data pipelines
  • On the team’s roadmap and project planning process, partnering with stakeholders to develop business objectives and translate those into action
  • Full accountability for one or more data assets
  • Work with data architects to develop data flows and align to platform integration standards
  • Build data flow for data acquisition, aggregation, and modeling, using both batch and streaming paradigms
  • Consolidate/join datasets to create easily consumable, consistent, holistic information
  • Empower other data teams, data scientists and data analysts to be as self-sufficient as possible by building core capabilities as services and developing reusable library code
  • Ensure efficiency, quality, the resiliency of the core data platform


  • Undergraduate or graduate degree in a technical or scientific field, such as Computer Science, Engineering, Mathematics, or similar
  • 5+ years of professional experience as a data engineer, software engineer, data analyst, data scientist, or related role
  • Analytically minded and detail-oriented: you actually like working with data, looking for patterns and outliers, establishing data models, and finding the best answers to business & technology problems
  • Expertise in data engineering languages such as Java, Scala, Python, SQL
  • Data modeling and data governance experience; you’ve designed and implemented data marts, data warehouses or other large-scale data management systems
  • Experience building ETL and data pipelines, both with traditional ETL solutions like Pentaho, SSIS, Talend but also via code-oriented systems like Spark, Airflow or similar
  • Cloud-oriented with a strong understanding of SaaS models
  • Experience operating in a secure networking environment, leveraging separate production support and SRE teams is a plus
  • Excellent technical documentation and writing skills
  • You have a bias towards automation, an Agile/Lean mindset and embrace the DevOps culture
  • Familiarity with streaming/messaging technologies like Kafka, Kinesis, Spark Streaming,
  • Familiarity with visualizing data with Tableau, Business Objects, Quicksight, PowerBI, and similar tools
  • Great customer focus and strong technical troubleshooting skills
  • Proficiency in statistics and data science is a nice-to-have, and interest in learning these is even better
  • Experience with clinical trial data is not required, but interest to learn and understand it is a must
  • Hadoop/Spark and Graph/RDF/Ontologies experience a plus

Medidata is making a real difference in the lives of patients everywhere by accelerating critical drug and medical device development, enabling life-saving drugs and medical devices to get to market faster. Their products sit at the convergence of the Technology and Life Sciences industries, one of the most exciting areas for global innovation. Nine of the top 10 best-selling drugs in 2017 were developed on the Medidata platform.
Medidata’s solutions have powered over 14,000 clinical trials giving them the largest collection of clinical trial data in the world. With this asset, they pioneer innovative, advanced applications and intelligent data analytics bringing an unmatched level of quality and efficiency to clinical trials enabling treatments to reach waiting for patients sooner.

More Information

Share this job
Company Information
Connect with us
Contact Us

Here at the Open Data Science Conference we gather the attendees, presenters, and companies that are working on shaping the present and future of AI and data science. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in the USA, Europe, and Asia.

Contact Us