Open Data Science job portal

Data Science Intern 1628 views

Their mission is to be the catalyst for massive, measurable, data-informed healthcare improvement through:

  • Data: integrate data in a flexible, open & scalable platform to power healthcare’s digital transformation
  • Analytics: deliver analytic applications & services that generate insight on how to measurably improve
  • Expertise: provide clinical, financial & operational experts who enable & accelerate improvement
  • Engagement: attract, develop and retain world-class team members by being a best place to work

The Data Science Intern will be responsible for applying machine learning algorithms for Natural Language Processing of raw clinical text. NLP activities will include text de-identification and extraction of Molecular and Genomic results from laboratory tests across a diverse set of clients and diseases. The incumbent will support precision medicine product development and pre-sales activities under the Lead Data Scientist.

Duties & Responsibilities

  • Development and iteration of a de-identification process using publicly available machine learning training sets and regular expression search techniques.
  • Curation and labeling of molecular data from next generation sequencing, IHC, FISH, and other biomarker testing.
  • Development of a machine learning algorithm and pipeline for extracting molecular information and unstructured clinical data elements from raw clinical text.
  • Development of application features possibly including adverse health risk prediction, clinical trial matching, and similar patient cohort matching.

Required Background & Skills

  • 2+ years of Python and SQL Server scripting and pipeline development
  • Experience with Natural Language Processing (NLP) tools and corpora from the Natural Language Toolkit (NLTK)
  • Experience developing and implementing supervised and unsupervised learning with classification, regression, and clustering algorithms from the scikit-learn toolkit

Preferred Background & Skills

  • Familiarity with Genomics and various molecular data including Next Generation Sequencing
  • Exposure to Healthcare industry and clinical data
  • Familiarity with HIPAA Privacy Rules around Protected Health Information (PHI) and off-the-shelf de-identification methods such as NeuroNER and Philter is a plus


  • Bachelor of Science in Computer Science, Bioinformatics, Engineering, or Statistics. Biology or Biotechnology degree applicants considered with relevant experience.
  • Pursuing a Master of Science in Computer Science, Bioinformatics, Data Science, Genomics, or Health Informatics or a Master of Public Health

The above statements describe the general nature and level of work being performed in this job function. They are not intended to be an exhaustive list of all duties, and indeed additional responsibilities may be assigned by Health Catalyst.

At Health Catalyst, they appreciate the opportunity to benefit from the diverse backgrounds and experiences of others. Because of their deep commitment to respect every individual, Health Catalyst is an equal opportunity employer.

More Information

Share this job
Company Information
Connect with us
Contact Us

Here at the Open Data Science Conference we gather the attendees, presenters, and companies that are working on shaping the present and future of AI and data science. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in the USA, Europe, and Asia.

Contact Us