Summer of Coding to Fight Cancer: Google SoC 2007

First established at Northwestern University in 1974, the Robert H. Lurie Comprehensive Cancer Center is committed to being a national leader in the battle to overcome cancer. The Bioinformatics Core Facility at the Robert H. Lurie Comprehensive Cancer Center consists of five faculty-level bioinformaticists with experienced and diverse computer programmers. The Core develops state of the art algorithms for pathway analysis, microarray analysis, clinical trials and clinical informatics. We offer Google SoC 2007 student interns with a broader choice of projects and a general perspective of information technology in science and healthcare; previous curriculum in biology is not necessary for the projects.

We have selected the following students for the summer of 2007 projects:

Google Gadget Gateway to PubMed
by Jared Flatow, mentored by Pan Du
Google Gateway to Genomics Literature
by Adrian Schoenig, mentored by Warren A Kibbe

SoC 2007 Project Highlight

visualizationG3P: Google Gadget Gateway to  PubMed

Using the Web 2.0 technology, the G3P gadget can be embedded in your dashboard or Blog. It also runs on the iPhone!

G3P can be used by researchers for a daily update of the literature, or by consumers who are monitoring the frontiers of clinical treatments.

This application is developed by Jared Flatow, an Electrical and Computer Engineering student at Rice University, as part of the Robert H. Lurie Comprehensive Cancer Center's Summer of Coding to Fight Cancer 2007 program.

Potential Projects


G3L: Google Gateway to Genomics Literature

It is a GUI/ Visualization project providing an alternative to screen scrolling. Similar to the Newsmap, but for scientific data. The project needs the following skills:
1) Algorithm development: treemap arrangements
2) Spider: HTML retrieval and parsing
3) Ajax or Flash: interactive user interface design

NLPText Data Mining: GenQuad

Analyzing medical literature using Natural Language Processing (NLP) and text data mining methods.
We will create a "GenQuad" for each gene in the human genome.  It is a defined project but student-initiated proposals are welcome. The project needs the following skills:
1)      Programming in Python and Java
2)      Familiar with XML
3)      Web-enabled database and query
4)      Familiar with N-gram Markov model and HMM algorithms for language parsing

PDF Document Management System:

The project needs the following skills:

1)      XML

2)      Java

3)      Adobe XPAAJ

4)      Webservice

proteomicsProteomics and Metabolomics Data Processing System:

We will expand our MassSpecWavelet package in the Bioconductor to accommodate metabolomics data (both Mass Spectrometry and NMR). The project needs the following skills:
1)      Programming in R/Bioconductor
2)      Statistical data analysis
3)      Knowledge in Matlab
4)      Familiar with digital signal processing techniques

Our projects have a broad impact in healthcare and are directly relevant to providing cancer patients better treatment, with the goal of reducing the pain and suffering due to cancer.

Submit your own project ideas!

We have listed some proposed projects above; we have also unlisted microarray-related projects.  If you are interested in working on a project with us that is not listed, please do not hesitate to send us a proposal. We love well-thought-out and interesting "blue sky" proposals. Send us your ideas - we may well accept it!

Good luck with your Google SoC 2007 project!


How can I apply for the Google SoC 2007 intern?
You must apply through the google website.

What is the Deadline?
March 26, 2007

Who's eligible to participate?

How do payments work?

For questions about the projects, you are welcome to contact Drs. Warren Kibbe () and Simon Lin () before your formal application.