My research interest are broadly in algorithm design and analysis, and I take inspiration from biological problems. Many times this not only leads to an interesting algorithmic result, but a useful biological tool (see Software).
I am currently a Lane Fellow in the Computational Biology Department at Carnegie Mellon University working with Carl Kingsford.
I was previously a PhD student in the Computer Science Department at the University of Arizona working with John Kececioglu and a student in the CS Department Department at the University of Central Florida working with Shaojie Zhang.
In the past my work has focused mainly on multiple sequence alignment problems. Most recently I worked on improving accuracy of protein multiple sequence alignments. Multiple sequence alignment is a fundamental step in bioinformatics, but the problem is NP-complete. Because of the importance of the result and complexity of the multiple sequence alignment problem many algorithms exist to find high quality alignments in practice. Each of these algorithms has a large number of tunable parameters that can greatly affect the quality of the computed alignment. Most users rely on the default parameter choices, which produce the best alignments on average, but produce poor alignments for some inputs. We developed a process called parameter advising which selects parameter choices that produces a high quality alignment for the input. To accomplish this candidate alignments are produced using each of the parameter choices in an advising set, the accuracy of these candidate alignments is then estimated using an advising estimator, the candidate alignment with the highest estimated accuracy is then selected for the user. To estimate the alignment accuracy we developed Facet (Feature-based accuracy estimator) which is a linear combination of efficiently-computable feature functions. We have found that learning an optimal advisor (selecting both the estimator coefficients and the set of parameter choices) is NP-complete. We expanded this result to show that finding the estimator coefficients or the estimator set independently is also NP-complete. In practice, we have methods to find close-to optimal advisors. We are working on ways to improve the accuracy of these parameter advisors.
I have also worked on improving the memory consumption of secondary structure conscious RNA multiple sequence alignment (see PMFastR) and high throughput phylogeny filtering (see SiClE).
I have been invited to give a talk at the Cold Spring Harbor Laboratory Biological Data Science meeting (#biodata18) November 7-10. My talk is preliminarily titled “Building an automated bioinformatician—More accurate, large-scale genomic discovery using parameter advising”.
Continue reading “Biological Data Science at Cold Spring Harbor Laboratory”
I will be giving a talk at the StringBio Workshop at UCF at the end of October. The workshop runs from the 25th to the 27th. I am scheduled to talk on the 26th in the afternoon. I am planning to talk about multiple sequence alignment accuracy estimation using Facet in the context of parameter advising. My slides are available here.
My colleague Guillaume Marçais is also speaking at the meeting, likely about methods related to minimizer ordering (some of which is joint with myself, Carl Kingsford, and others).
I am part of the organizing team for the Workshop on the Future of Algorithms in Biology, an NSF-funded conference being held here at CMU on September 28 and 29. The slate of talks includes 15 speakers from a wide range of disciplines as well as shorter lightning talks, posters (including my own) and a panel discussion. More information is available on the FAB 2018 website.
My work on Parameter Advising for transcript assembly has been accepted for an oral presentation at the European Student Council Symposium in Athens, Greece. In addition I will be attending ECCB in September to present a poster at the main meeting.
The Education and Internships Committee poster that I presented at ISMB in the Education COSI Track received the F1000 Outstanding Presentation Prize at ISMB. This is a great honor for the council and I hope it will improve the visibility of a vial program for our group. The poster focussed on our recent publication in PLOS Computational Biology which highlighted the process of the ISCB-SC Internships Program, I had mentioned it previously. The paper and poster are linked from internships.iscbsc.org. Thanks to the entire committee for making it such a great program.