My research interest are broadly in algorithm design and analysis, and I take inspiration from biological problems. Many times this not only leads to an interesting algorithmic result, but a useful biological tool (see Software).
I am currently a Lane Fellow in the Computational Biology Department at Carnegie Mellon University working with Carl Kingsford.
I was previously a PhD student in the Computer Science Department at the University of Arizona working with John Kececioglu and a student in the CS Department Department at the University of Central Florida working with Shaojie Zhang.
In the past my work has focused mainly on multiple sequence alignment problems. Most recently I worked on improving accuracy of protein multiple sequence alignments. Multiple sequence alignment is a fundamental step in bioinformatics, but the problem is NP-complete. Because of the importance of the result and complexity of the multiple sequence alignment problem many algorithms exist to find high quality alignments in practice. Each of these algorithms has a large number of tunable parameters that can greatly affect the quality of the computed alignment. Most users rely on the default parameter choices, which produce the best alignments on average, but produce poor alignments for some inputs. We developed a process called parameter advising which selects parameter choices that produces a high quality alignment for the input. To accomplish this candidate alignments are produced using each of the parameter choices in an advising set, the accuracy of these candidate alignments is then estimated using an advising estimator, the candidate alignment with the highest estimated accuracy is then selected for the user. To estimate the alignment accuracy we developed Facet (Feature-based accuracy estimator) which is a linear combination of efficiently-computable feature functions. We have found that learning an optimal advisor (selecting both the estimator coefficients and the set of parameter choices) is NP-complete. We expanded this result to show that finding the estimator coefficients or the estimator set independently is also NP-complete. In practice, we have methods to find close-to optimal advisors. We are working on ways to improve the accuracy of these parameter advisors.
I have also worked on improving the memory consumption of secondary structure conscious RNA multiple sequence alignment (see PMFastR) and high throughput phylogeny filtering (see SiClE).
Soon after joining the Kingsford group I began talking with Guillaume Marçais about his work on minimizer schemes. This year I contributed to the most recent publication in this line of work which describes the asymptotic bounds for the densities of these schemes. This work was accepted for presentation at ISMB 2018 in Chicago, IL. A preprint of the manuscript is on bioRxiv (see Publications).
The book based on my dissertation written along with my PhD advisor John Kececioglu is now available! The work is a part of Springer’s Computational Biology series. The book contains a chapter that had not been previously published and updated results not previously in my dissertation.
The ISCB-SC Education and Internships committee do great work including running the internships program. Each year the EIC matches students in developing countries with willing labs in Europe or Australia. With the introduction of the Anna Tramontano fellowship they will start to have the financial support needed to increase this program. The paper we published in PLOS Computational Biology describes the program but the committee is happy to answer any questions. Also if you have room in your laboratory for an intern, the program is always looking for willing PIs.
Student Council Symposium-Africa 2017
Every other year the ISCB-SC runs four Student Council Symposia, US (co-located with ISMB), Europe (co0located with ECCB), Latin America (co-located with ISCB-LA) and most recently Africa (co-located with ISCB-Africa). I had the pleasure of helping to organize this year’s SCS-Africa meeting. The symposium was a great success and details about the talks held can be found in the conference highlights that we published in F1000Research.
I have recently converted by CV into Latex, this is something I have been meaning to do for a while but had not found the time for, my new CV can be found here as compared to the old formatting. Because I put a lot of time into customizing it I thought it would be helpful for others to have the source for it as well. I have posted the source on Github for anyone who wants to make the change themselves. This is an adaptation of Steve Tjoa’s original CV template.
I have included a customized
bst file that reformats my name in the publications list(s), this can be adapted simply within the
cv.tex file without having to edit the
bst file yourself. The original template I used allows for multiple bibliographies so if you want to add sections that are not currently there it is quite simple. Using the
unsrt bibliography style allows me to order the publications how I want them, I currently have them in decreasing publication year.
During the IGERT problems course in 2012 our project was to analyze and annotate a newly discovered and sequenced snake fungal pathogen, Ophidiomyces ophiodiicola. At the time we just called it the “snake fungus”. Recently, thanks to the the hard work of Manna Ohkura and her PhD advisor Marc Orbach, we published our results in ASM’s Genome Announcements. The full list of co-authors and a link to the open access paper can be found on the Publications page.