Ohloh profile for Pavel Senin

HackyStat

From early 2007 I am working with Collaborative Software Development Laboratory hackers.

While working on Hackystat-7 I have implemented some telemetry reduction functions, added JDepend sensor, and added some functionality to plotting.

Currently we are working on Hackystat-8 and I am focusing on the research questions concerning software projects comparison and pattern discovery framework. The original idea which induced this work was blogged by Philip Johnson here. Started by following this idea with 3D-animated plots & Euclidean metric I've diverged to comparison of telemetry streams using Fourier-based decomposition of time series. Later, Dynamic Time Warping was implemented and embedded into the Hackystat Project Browser:

DTW application to Software Measurements DTW application to Software Measurements

Nevertheless these methods: Euclidean distance in multidimensional space, FFT based decomposition and DTW did not provide sensible performance for the outlined goals.

Early 2009 I have moved towards SAX decomposition of telemetry streams. Once finished JMotif library, I have developed a time-series indexing mechanism using MySQL for index storage. Current version of the Hackystat Trajectory automatically builds SAX motif index of given telemetry streams and allows visualization of found motifs.

Hackystat Trajectory Browser screenshot.

This depicts two DevTime telemetry streams corresponding to independent developers with highlighted similar pattern. This pattern consists of 7 days of measurements and equals “aaacaaa” string motif. Both developers found "idling" first three days, than working more than 6 hours during single day and then “idling” for next three days.

Currently I am working on the algorithm and code development for automated discovery of "Association Rules" in the software development telemetry streams using SAX-based indices.

PAA and SAX implementations, Java library

I have implemented Piecewise Aggregate Approximation (PAA) and Symbolic Aggregate Approximation (SAX) algorithms in Java and created a stand-alone library which I am using in the Hackystat-Trajectory package. By using this two methods I am approximating and converting Hackystat telemetry streams (time-series) into strings for further analyses of software development activities. The code and binary distribution are located at Google Project Hosting: http://code.google.com/p/jmotif/

The sample plot showing PAA approximation of 14 points time series to 9 points with successive approximation by 9 letter strings using SAX and 7 letters alphabet, see http://code.google.com/p/jmotif/ for the further explanation.

Fragment Recruitment Plot

Fragment recruitment pipleine

During my internship at LANL I've designed and developed the fragment recruitment pipeline allowing statistical analysis and visualization of recruitment. It was used for binning of the GOS metagenome and results were published in the PLoS article in April, 2009:

Assembling the marine metagenome, one cell at a time
Tanja Woyke, Gary Xie, Alex Copeland, Jose M. Gonzalez, Cliff Han, Hajnalka Kiss, Jimmy Saw, Pavel Senin, Chi Yang, Sourav Chatterji, Jan-Fang Cheng, Jonathan A. Eisen, Michael E. Sieracki and Ramunas Stepanauskas
PLoS ONE, April 23, 2009.

Currently I am developing a public web-service and GUI allowing users to perform recruitment "easy way"

The sample plot showing GOS sample recruits for Synechococcus sp. WH8102

Fragment Recruitment Plot

Genome assembly and annotation pipeline

While working at ASGPB I've designed and developed genome assembly and annotation pipeline which I used to assemble and annotate some genomes. This work led to the next publications:

Complete genome sequence of the extremely acidophilic methanotroph isolate V4, "Methylacidiphilum infernorum", a representative of the bacterial phylum Verrucomicrobia
Shaobin Hou, Kira S. Makarova, Jimmy H. W Saw, Pavel Senin, Benjamin V. Ly, Zhemin Zhou, Yan Ren, Jianmei Wang, Michael Y. Galperin, Marina V. Omelchenko, Yuri I. Wolf, Natalya Yutin, Eugene V. Koonin, Matthew B. Stott, Bruce W. Mountain, Michelle A. Crowe, Angela V. Smirnova, Peter F. Dunfield, Lu Feng, Lei Wang and Maqsudul Alam Biology Direct 2008, 3:26, 1 July 2008
The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus)
Ray Ming, Shaobin Hou, Yun Feng, Qingyi Yu, Alexandre Dionne-Laporte, Jimmy H. Saw, Pavel Senin, Wei Wang, Benjamin V. Ly, Kanako L. T. Lewis, Steven L. Salzberg, Lu Feng, Meghan R. Jones, Rachel L. Skelton, Jan E. Murray, Cuixia Chen, Wubin Qian, Junguo Shen, Peng Du, Moriah Eustice, Eric Tong, Haibao Tang, Eric Lyons, Robert E. Paull, Todd P. Michael, Kerr Wall, Danny W. Rice, Henrik Albert, Ming-Li Wang, Yun J. Zhu, Michael Schatz, Niranjan Nagarajan, Ricelle A. Acob, Peizhu Guan, Andrea Blas, Ching Man Wai, Christine M. Ackerman, Yan Ren, Chao Liu, Jianmei Wang, Jianping Wang, Jong-Kuk Na, Eugene V. Shakirov, Brian Haas, Jyothi Thimmapuram, David Nelson, Xiyin Wang, John E. Bowers, Andrea R. Gschwend, Arthur L. Delcher, Ratnesh Singh, Jon Y. Suzuki, Savarni Tripathi, Kabi Neupane, Hairong Wei, Beth Irikura, Maya Paidi, Ning Jiang, Wenli Zhang, Gernot Presting, Aaron Windsor, Rafael Navajas-Perez, Manuel J. Torres, F. Alex Feltus, Brad Porter, Yingjun Li, A. Max Burroughs, Ming-Cheng Luo, Lei Liu, David A. Christopher, Stephen M. Mount, Paul H. Moore, Tak Sugimura, Jiming Jiang, Mary A. Schuler, Vikki Friedman, Thomas Mitchell-Olds, Dorothy E. Shippen, Claude W. dePamphilis, Jeffrey D. Palmer, Michael Freeling, Andrew H. Paterson, Dennis Gonsalves, Lei Wang and Maqsudul Alam
Nature 452, 991-996 (24 April 2008)
Methane oxidation by an extremely acidophilic bacterium of the phylum Verrucomicrobia
Peter F. Dunfield, Anton Yuryev, Pavel Senin, Angela V. Smirnova, Matthew B. Stott, Shaobin Hou, Binh Ly, Jimmy H. Saw, Zhemin Zhou, Yan Ren, Jianmei Wang, Bruce W. Mountain, Michelle A. Crowe, Tina M. Weatherby, Paul L. E. Bodelier, Werner Liesack, Lu Feng, Lei Wang & Maqsudul Alam
Nature 450, 879-882 (6 December 2007)

Bioinformatics utilities

I've developed some bioinformatics utilities which are useful in the data processing and visualization. Part of the code is old, undocumented and not well-designed, but some people found them very helpful.

The scaffold (contig) gene structure rendering with tooltips.
Java-based. Requires Javascript support in browser.


Scaffold 1

HINT:(mouse over genes to see tooltips :))

iClouds

my capstone MS project.

The software to simulate the interstellar grain chemistry.

From the Bioastronomy-2007 abstract:

Chemical reactions on interstellar dust grains play a crucial role in interstellar chemistry by promoting the formation of organic products. While many of the reaction rates are poorly understood and molecular formation routes are difficult to isolate in the laboratory, computer simulations of these reactions allows us to better understand the nature and evolution of interstellar molecules in molecular clouds. The work presented here is part of the CASS 2006 initiative which aimed to involve graduate students from diverse regions of science into the field of Astrobiology. This particular collaboration resulted in the development of computational software that implements a MCMC stochastic model of the surface chemistry. While the current model only simulates grain-surface chemistry, we show the capabilities and vast potential of coupling the grain-surface with a full-scale interstellar gas chemistry model. We have improved upon the original IDL model by implementing a Java based product that provides numerous options for set-up and tuning of simulations along with real-time visualization of the model results.

The project source code is available at SVN repository located at the GoogleCode: http://code.google.com/p/iclouds/

The thesis text in PDF and/or PS format.

Sample plot based on the computational experiment:

The distribution of deuterium amongst the D-bearing species as a function of density, for D-to-H ratio of 0.01. Left panel shows the chemistry that results from the O3-OH cascade and right panel shows the grain surface composition that occurs when the cascade is switched off.

TreeXplorer

The TreeXplorer software is aiming to handle big in size phylogenetic trees with various zooming features such as conventional, hyperbolic and semantic zooming, along with application of motif (Teiresias and Gemoda) and nullomer search algorithm to research species relations. I am planning to support intersection and unioun selection for motif and nullomers. Some of the features in ToDo list are on the fly tree reshuffling, specie clusters manual composition, subset names and actual sequences extraction… http://bioutil-senin.googlecode.com/svn/trunk/treeXplorer/

GapFinder

The GapFinder software utility was designed to help with BES selection for re-sequencing effort while closing gaps within WGS genome assembly. The software has intuitive GUI that allows loading assembly information from the Arachne WGA output, running search algorithm and saving produced optimal BES selection list. The software could be checked out from Subversion repository located at http://bioutil-senin.googlecode.com/svn/trunk/gapfinder/

RJImage

The RJImage (Pattern Classification class) project was aimed to get experience with Reversible Jump Monte-Carlo Markov Chains and Simulated Annealing algorithms. As the result of effort the Java-based GUI utility for grayscale image segmentation was developed. The model design and software implementation based on the original work by Z. Kato "Bayesian color image segmentation using reversible jump Markov chain Monte Carlo". Probability, Networks and Algorithms, February, 28, 1999. CWI, Amsterdam and some of his previous work. The software could be checked out from Subversion repository located at http://rjimage.googlecode.com/svn/trunk/. The implemented segmentation method wraps samplers by simulated annealing.