| Most of my code is open and here is some statistics collected by Ohloh: |
![]() |
From early 2007, when I was writing up my capstone project, I joined hackers from Collaborative Software Development Laboratory and Hackystat project. For its seventh version I implemented a telemetry reduction functions, added JDepend sensor, and explored some analytics. Later, I helped with a Hackystat-8 release, and since then worked on the Hackystat extension for mining of software processes which is called Software Trajectory Analysis.
In short, it digests artifacts that were left behind software development process (CVS/SVN/Git change records, bug reports/issues, social interactions) and builds multidimensional time series out of it. These, in turn, mined for interesting facts by applying approximation, aggregation, and indexing.
Here is an overview of this process:
![]() |
While applied to the real-world data, this approach yielded some interesting results, which are described in the following papers:
Recognizing recurrent development behaviors corresponding to Android OS release life-cycle
Pavel Senin, SERP 2012, July 2012, Las-Vegas, NV, USA.
Software Trajectory Analysis: An empirically based method for automated software process discovery
Pavel Senin, ESEM/IDoESE 2010, September 2010, Bolzano-Bozen, Italy
![]() |
I have implemented Piecewise Aggregate Approximation (PAA) and Symbolic Aggregate Approximation (SAX) algorithms in Java and created a stand-alone library which I am using in the Hackystat-Trajectory package. The code and binary distribution are located at Google Project Hosting: http://code.google.com/p/jmotif/
The package allows to find recurrent patterns - motifs and surprise patterns - discords in temporal data. See below two examples of finding abnormal events in telemetry series.
|
||||
During my internship at LANL I've designed and developed the fragment recruitment pipeline allowing statistical analysis and visualization of recruitment. It was used for binning of the GOS metagenome and results were published in the PLoS article in April, 2009:
Assembling the marine metagenome, one cell at a time
Tanja Woyke, Gary Xie, Alex Copeland, Jose M. Gonzalez, Cliff Han, Hajnalka Kiss, Jimmy Saw, Pavel Senin, Chi Yang, Sourav Chatterji, Jan-Fang Cheng, Jonathan A. Eisen, Michael E. Sieracki and Ramunas Stepanauskas
PLoS ONE, April 23, 2009.
Modified algorithm used for reads recruitment and identification from human dental plaque microbiota
Community and gene composition of a human dental plaque microbiota obtained by metagenomic sequencing
G. Xie, P.S.G. Chain, C.-C. Lo, K.-L. Liu, J. Gans, J. Merritt, F. Qi
Molecular Oral Microbiology, Volume 25, Issue 6, 2010
While working at ASGPB I've designed and developed genome assembly and annotation pipeline which I used to assemble and annotate some genomes. This work led to the next publications:
![]() |
Complete genome sequence of the extremely acidophilic methanotroph isolate V4, "Methylacidiphilum infernorum", a representative of the bacterial phylum Verrucomicrobia Shaobin Hou, Kira S. Makarova, Jimmy H. W Saw, Pavel Senin, Benjamin V. Ly, Zhemin Zhou, Yan Ren, Jianmei Wang, Michael Y. Galperin, Marina V. Omelchenko, Yuri I. Wolf, Natalya Yutin, Eugene V. Koonin, Matthew B. Stott, Bruce W. Mountain, Michelle A. Crowe, Angela V. Smirnova, Peter F. Dunfield, Lu Feng, Lei Wang and Maqsudul Alam Biology Direct 2008, 3:26, 1 July 2008 |
![]() |
The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) More information about transgenic papaya and a ringspot virus (PRSV) can be found here |
![]() |
Methane oxidation by an extremely acidophilic bacterium of the phylum Verrucomicrobia Peter F. Dunfield, Anton Yuryev, Pavel Senin, Angela V. Smirnova, Matthew B. Stott, Shaobin Hou, Binh Ly, Jimmy H. Saw, Zhemin Zhou, Yan Ren, Jianmei Wang, Bruce W. Mountain, Michelle A. Crowe, Tina M. Weatherby, Paul L. E. Bodelier, Werner Liesack, Lu Feng, Lei Wang & Maqsudul Alam Nature 450, 879-882 (6 December 2007) |
I've developed some bioinformatics utilities which are useful in the data processing and visualization. Part of the code is old, undocumented and not well-designed, but some people found them very helpful.
Chemical reactions on interstellar dust grains play a crucial role in interstellar chemistry by promoting the formation of organic products. While many of the reaction rates are poorly understood and molecular formation routes are difficult to isolate in the laboratory, computer simulations of these reactions allows us to better understand the nature and evolution of interstellar molecules in molecular clouds. The work presented here is part of the CASS 2006 initiative which aimed to involve graduate students from diverse regions of science into the field of Astrobiology. This particular collaboration resulted in the development of computational software that implements a MCMC stochastic model of the surface chemistry. While the current model only simulates grain-surface chemistry, we show the capabilities and vast potential of coupling the grain-surface with a full-scale interstellar gas chemistry model. We have improved upon the original IDL model by implementing a Java based product that provides numerous options for set-up and tuning of simulations along with real-time visualization of the model results.
The project source code is available at SVN repository located at the GoogleCode: http://code.google.com/p/iclouds/
The thesis text in PDF and/or PS format.
The TreeXplorer software is aiming to handle big in size phylogenetic trees with various zooming features such as conventional, hyperbolic and semantic zooming, along with application of motif (Teiresias and Gemoda) and nullomer search algorithm to research species relations. I am planning to support intersection and unioun selection for motif and nullomers. Some of the features in ToDo list are on the fly tree reshuffling, specie clusters manual composition, subset names and actual sequences extraction… http://bioutil-senin.googlecode.com/svn/trunk/treeXplorer/
The GapFinder software utility was designed to help with BES selection for re-sequencing effort while closing gaps within WGS genome assembly. The software has intuitive GUI that allows loading assembly information from the Arachne WGA output, running search algorithm and saving produced optimal BES selection list. The software could be checked out from Subversion repository located at http://bioutil-senin.googlecode.com/svn/trunk/gapfinder/
The RJImage (Pattern Classification class) project was aimed to get experience with Reversible Jump Monte-Carlo Markov Chains and Simulated Annealing algorithms. As the result of effort the Java-based GUI utility for grayscale image segmentation was developed. The model design and software implementation based on the original work by Z. Kato "Bayesian color image segmentation using reversible jump Markov chain Monte Carlo". Probability, Networks and Algorithms, February, 28, 1999. CWI, Amsterdam and some of his previous work. The software could be checked out from Subversion repository located at http://rjimage.googlecode.com/svn/trunk/. The implemented segmentation method wraps samplers by simulated annealing.