My research focuses on improving accuracy of protein multiple sequence alignments. Multiple sequence alignment is a fundamental step in bioinformatics, but the problem is NP-complete. Because of the importance of the result and complexity of the multiple sequence alignment problem many algorithms exist to find high quality alignments in practice. Each of these algorithms has a large number of tunable parameters that can greatly affect the quality of the computed alignment. Most users rely on the default parameter choices, which produce the best alignments on average, but produce poor alignments for some inputs. We developed a process called parameter advising which selects parameter choices that produces a high quality alignment for the input. To accomplish this candidate alignments are produced using each of the parameter choices in an advising set, the accuracy of these candidate alignments is then estimated using an advising estimator, the candidate alignment with the highest estimated accuracy is then selected for the user. To estimate the alignment accuracy we developed Facet (Feature-based accuracy estimator) which is a linear combination of efficiently-computable feature functions. We have found that learning an optimal advisor (selecting both the estimator coefficients and the set of parameter choices) is NP-complete. We expanded this result to show that finding the estimator coefficients or the estimator set independently is also NP-complete. In practice, we have methods to find close-to optimal advisors. We are working on ways to improve the accuracy of these parameter advisors.
I am currently funded as a graduate research assistant under NSF grant number IIS-1217886, and was previously funded as a GRA under the NSF IGERT Grant in Comparative Genomics DGE-0654435.
The paper describing a software package I created with Jennifer H. Wisecaver (now at Vanderbilt University) while we were both PhD students at the University of Arizona has been accepted for publication in the journal PeerJ. Even as a preprint on arXiv it was cited multiple times (included by papers in Nature Communications and PNAS).
The application SiClE (for Sister Clade Extractor) was created to perform high-throughput phylogenetic analysis. Given a tree and a search term it first determines if the search term is monophyletic in the tree then identifies the two sister clades. It has been used successfully as an initial filtering step to investigate horizontal gene transfer at the high-throughput scale. The program is open source and freely available under a Creative Commons License at http://eebweb.arizona.edu/sicle/.
Read the paper at https://peerj.com/articles/2359/.
I will be giving a talk titled “Boosting alignment accuracy through adaptive local realignment” at ISMB 2016 in session TP021 on Sunday, 10 July at 2:00 PM in the Northern Hemisphere E1/E2 room. This will be a late breaking research talk so there is no associated publication yet, but information about using adaptive realignment can be found on the Facet website at http://facet.cs.arizona.edu/realignment.html. I will also be presenting a poster on the same work, during the Monday poster session you can find me at poster number N22. A preprint is also on bioRxiv DOI 10.1101/063131.
I have accepted a position to be a Lane Fellow in the Computational Biology Department in the School of Computer Science at Carnegie Mellon University working with Carl Kingsford. I will be starting in Pittsburgh in September 2016.
My paper titled “Predicting core columns of protein alignments improves parameter advising” was accepted to WABI 2016, and I will presenting it at the meeting in Aarhus, Denmark in August. The slides from my talk from WABI 2016 can be viewed online here.
I recently converted my website to WordPress. Even though I used to work in web development myself, I no longer have the time or interest in maintaining a custom built website. I am hoping that by having a content management system updates when new publications and events happen will be more easily managed.