Open Data Science job portal

Data Engineer 1180 views

Job Description


Seeking an experienced Data Engineer who will be responsible for designing, developing and optimizing ETL / data pipelines to support a variety of machine learning, predictive analytics, systems and BI solutions in support of the organizations goals to digitize and optimize clinical trials. This individual will work within ECDis Information Management Office (IMO).

The role will require cross-functional interactions with Data Management Leads, Predictive Analytics Analysts, Artificial Intelligence Scientists and Information Technology teams across multiple projects to implement data solutions in ECDis data lake and data warehouse called gCORE. The hallmark of a great candidate is one who can translate the unique needs of a diverse set of stakeholders and requirements across both the data lake and data warehouse use cases and is eager to solve complex data challenges selecting the best fit solution. Must be self-motivated, passionate about data management and analytics and able to extrapolate customer needs with minimal direction.


  • Understand the current state data landscape, use cases and existing data lake and data warehouse setup
  • Work with Business Analysts, Data Analysts, Data Scientists and AI Engineers to identify infrastructure and data roadmap needs and propose the appropriate strategy in partnership with other IMO engineers
  • Assemble large, complex data sets in the format fit for each use case
  • Architect, develop and optimize ETL pipelines using Python, Spark, EMR, Docker and Airflow
  • Develop and optimize big data pipelines for data scientists (requires a basic understanding of data science concepts and ML)
  • Write generic Python/Pyspark modules for processing data from various data sources (XML, Parquet, CSV, Relational)
  • Hands on physical and logical database design and modeling in the context of data warehousing (currently using AWS Redshift)
  • Perform hands-on infrastructure design of ECDs AWS data lake and data warehouse environment (gCORE) including continuous exploration and recommendation of new technologies and best practices
  • Research and recommend new innovative methods and systems to manage data for business improvement
  • Participate in internal governance to drive the data quality business cycle and roadmap


  • Bachelors or Masters degree in computer science or software engineering
  • 5+ years of programming experience (including functional programming); must be advanced in Python
  • 3+ years experience designing, building and maintaining production data pipelines and/or data warehouses
  • Demonstrable experience working with different database types including columnar data stores, SQL and graph based and the ability to select the right tool for the right job
  • Experience building and optimizing big data pipelines using Spark
  • Experience with AWS cloud services: S3, EC2, EMR, RDS, Redshift, Lambda, EKS
  • Solid understanding of how to design robust data workflows including optimization and user experience
  • Strong analytical and problem-solving skills
  • Excellent oral and written communication skills
  • Able to work in teams and collaborate with others to clarifyrequirements
  • Strong co-ordination and project management skills to handle complex projects
  • Experience developing and working with XML, JSON, and external web services

Preferred Qualifications:

  • Clinical drug development domain knowledge
  • Experience working with clinical and biomedical data types (clinical patient data, omics, imaging, etc.)
  • Competencies in applied statistics to solve business needs
  • Knowledge of industry data standards used in drug development, particularly in Clinical development

Seniority Level

Mid-Senior level


  • Pharmaceuticals
  • Medical Device
  • Biotechnology

Employment Type


Job Functions

  • Science
  • Engineering

More Information

Share this job
Company Information
Connect with us
Contact Us

Here at the Open Data Science Conference we gather the attendees, presenters, and companies that are working on shaping the present and future of AI and data science. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in the USA, Europe, and Asia.

Contact Us