Lead Life Sciences Data Scientist

A report published in 2014 by the Tufts Center for the Study of Drug Development determined that the cost of developing a prescription drug and securing market approval is $2.6B, which, correcting for inflation, was a 145% increase over the 2003 estimate. A significant part of this cost entails finding and analyzing reams of data: technical data, drug trial data, documentation and more.

Our client is a 2500-person, global, not-for-profit technology and services provider and rights broker for some of the world ’92s most sought after information and content. Its big data search-analysis-rights management platform for the life sciences field not only helps to drive down costs for drug discovery by making the process more efficient, it also enables the discovery of new insights and information, providing immeasurable value for pharmaceutical, research and other life sciences companies working to cure life-threatening and life-altering diseases, both common and rare.

Overview of the Role

Reporting to the CTO/VP Engineering, work closely with industry-leading life science companies to solve interesting and complex real-world problems that truly make a difference in the world. Bringing to the process an understanding of complex data structures, identify and understand customers ’92 analytical and business challenges, and work collaboratively with the Engineering team to develop innovative and efficient solutions to meet those needs. Up to 25% travel to client sites.


Apply a broad array of capabilities spanning machine learning, statistics, mathematics, modeling, simulation, visualization, pathway/network analysis, NLP and graph analysis

Digest complex data structures, perform statistical data reduction, summarization, normalization and multivariate data analyses; and report findings

Develop intelligent software components for our applications and solutions that we build for our clients

Evangelize machine learning throughout the company, sharing your knowledge and enthusiasm with others

Qualifications Required

  • Proven computational biology or data science experience in the life sciences
  • Broad understanding of the biotech/pharmaceutical industry and, in particular, its scientific, translational and clinical platforms/domains
  • Good knowledge of and experience with machine learning algorithms, data processing and visualization
  • Experience with Big Data and, in particular, large multivariate data sets
  • Familiarity with Big Data frameworks such as Apache Hadoop or Apache Spark
  • Good knowledge of and experience with Linux, Java, Python or Scala

Qualifications Preferred

  • Knowledge of information systems, technology, and databases serving and analyzing biological, chemical, genomic and/or medical information
  • Experience with a deep-learning framework (e.g. TensorFlow, CNTK, Theano)
  • Experience with any one of the following: C/C++/C#, F, F90, HPF, R, Octave
  • Experience with D3 and derivative libraries
  • Excellent communication and presentation skills
  • Education & Work Experience
  • PhD or Master ’92s degree in a STEM discipline