There are many areas in computer science, like Architecture, Operating system and Networking, Computational theory and complexity, Compiler, Programming language, Data structure and algorithm, Database, Computer Vision, Graphics, Multimedia, Security, Artificial intelligence, Software engineering, Human computer interface, Parallel and distributed system, Biomedical informatics, relevant mathematics include Probability and Statistics, Stochastic process, Discrete math, Abstract math, Number theory, Operation research, Numerical methods, and more. See my bookshelf.

I'm most interested in compiler theory and implementation at this time.

I'm generally interested in algorithms, compiler theory, programming languages, artificial intelligence, machine learning, data mining and information retrieval, and operating system. I am also interested in bioinformatics and medical informatics since I had relevant background in biology (a BS in biology and a MS in biochemistry).

Here are some of the research and development that I did or are doing now.


Research

Compiler theory and implementation

General statement: My advisor is Dr. David Pager. My disseration title is Measuring and Extending LR(1) Parser Generation. The defense committee consists of these professors: David Pager, Yingfei Dong, David Chin, Dennis Streveler and Scott Robertson. My PhD research is focused on compiler theory, specifically, LR(1) parser generation algorithms and its extension to LR(k). These have to do with compiler generation and potentially natual language processing. LALR(1) parser generators such as Yacc and Bison are used widely in the industry for a long time. LL parser generators such as ANTLR picked up their popularity since 1990s. But LR(1) parser generation, with its best recognition power, did not receive enough embracement due to its expensive computation cost. There are, however, existing algorithms that can reduce the computation cost and make LR(1) parser generation practical. My work is to study these algorithms, work on their efficiency, extension and implementation.

  • (Fall 2006 ~ ) Hyacc. This work implemented a LR(1) parser generator. It's like a yacc/bison, only that yacc/bison are LALR(1) parser generators (Bison also uses GLR algorithm). LR(1) parser generator is more powerful than LALR(1) parser generators. But ever since Knuth's 1965 paper "On the translation of languages from left to right" on LR(k) parsing algorithm was published, the general conception was that the algorithm takes too much space and time. It is the case even today for some implementations. But my implementation will have some competitive edges, due to the algorithms and optimizations used. After the parser generator is done, there are many other relevant extension work to do. As a demonstration of the use of the parser generator, I will use it to implement a compiler. Hyacc version 0.95 was released into the open source community on 1/25/2008. See Sourceforge.net: Hyacc. A notice was sent to the comp.compiler news group on 2/3/2008. There are also other related issues to address.
  • As of Spring 2009, Hyacc has been extended into a LR(0)/LALR(1)/LR(1)/LR(k) parser generator. LR(k) partially works. Hyacc version 0.95 was released to sourceforge.net on April 8, 2009.
  • (2007) The Latex2gDPS compiler. This is the compiler created using Hyacc. It basically is a source-to-source translator, translates Latex source code to gDPS source code. (gDPS stands for general Dynamic Programming Specification, designed by Holger).

Bioinformatics and machine learning

General statement: I have been working with Dr. Guylaine Poisson on some bioinformatics projects, mostly to create bioinformatics tools with a web interface. I also worked on an experimental project with Dr. Susan Still on the reinforcement learning algorithm. Since machine learning is used frequently in bioinformatics, I put these two together.

  • (Spring 2007) FragAnchor Databse and Blast tool for the Integr8 Eukaryota genomes. This individual directed research project with Dr. Poisson downloaded the Integr8 genomes into a local MySQL database, then applied the NN/HMM (see below) application to obtain predicted GPI anchors and stored the information into database. Next a blast tool is downloaded from NCBI and installed. A Perl CGI web interface is then deployed to allow users to search and blast their proteins against the database. See web interface.
  • (Fall 2006) A Java Applet/Application that demonstrates reinforcement learning (Q-learning) algorithm. This is a directed research project with Dr. Still. The purpose is for the simulation of reinforcement learning. More ...
  • (Fall 2005) Automation and web interface for the GPI-anchor prediction NN/HMM system. The NN/HMM tandem system was developed by Dr. Poisson et. al. My role was to develop a Perl CGI that glues together the NN and HMM applications and provide a web interface for its automation. See web interface. This work was also incorporated into the annotation pipeline of the Barton Bioinformatics research group in February, 2008.
  • (Spring 2005) NADPH Oxidase membrane spanning region determination survey. This is a computational biology course project. More ...
  • (Summer 2001) Perl tools that help with genome sequencing experiement data processing. Summary (PDF).

Development

Some of the projects I worked on are included here.

An open source Data Structure & Algorithm library in C

Code generation - RAD tools

  • (Fall 2003 ~ Spring 2005) This is a project for self-interest. It also helped with my RA work by saving lots of time on web site development. ASP web application generator. Two versions, one in ASP (5000 lines of code), one in VB.NET (35,000 lines of code). More ...
  • (Summer ~ Fall 2005) RA work. Sharepoint-based site generator. More ...

Spell checking

  • (Summer ~ Fall 2007) This is part of my current RA work. The engine is developed as a COM server, in plain C. I was able to use some basic data structures (AVL tree, heap, hash table) and algorithms (levenshtein dynamic programming, soundex etc.) in this. So far I have three versions: one console version that can run under both windows and solaris, one DLL COM version and one EXE COM version. The COM server can be used in any COM-enabled environments, e.g., C/C++, VB, VC, .NET, Delphi, VBScript, ASP. The current spell checker includes clients in C, VBScript, C#, ASP and Ajax. The primary client is in C#. Some literature research on spell checking

Web crawling and information retrieval

  • (Fall 2007 ~ ) This is also needed by part of my RA job. The work was mainly in multhreaded C# with recursive descent parsing on html. A short memo was written on this topic.

Web application development

  • (2000 ~ ) These are mostly for my RA job. There were at least five production web applications in ASP/MS SQL/IIS/Windows, one in PHP/MySQL/Apache/Linux, and misellaneous small projects in perl CGI, ASP.NET etc. New concepts like web 2.0 (Tag, RSS), Ajax, web services were sometimes used.

My technology blog

  • Here is my technology blog started from Summer 2008. There are not many contents yet. Currently it includes material like Linux (Suse 10.3) setup note, C# screen capture, C# simulate mouse and keyboard events, C# convert PDF to image format etc. There are source code ready for use.

Misc. readings and literature review



Last updated: 5/28/2009