|
|
 |

There are many areas in computer science, like Architecture, Operating system
and Networking, Computational theory and complexity,
Compiler, Programming language, Data
structure and algorithm, Database, Computer Vision, Graphics,
Multimedia, Security, Artificial intelligence, Software engineering, Human
computer interface, Parallel and distributed system, Biomedical informatics,
relevant mathematics include Probability and Statistics, Stochastic process,
Discrete math, Abstract math, Number theory,
Operation research, Numerical methods, and more. See my bookshelf.
I'm most interested in compiler theory and implementation at this time.
I'm generally interested in algorithms, compiler theory,
programming languages, artificial intelligence, machine learning,
data mining and information retrieval,
and operating system. I am also interested in bioinformatics and medical
informatics since I had relevant background in biology (a BS in biology
and a MS in biochemistry).
Here are some of the research and
development that I did or are doing now.
Research
Compiler theory and implementation
General statement:
My advisor is Dr. David Pager.
My disseration title is Measuring and Extending LR(1) Parser Generation.
The defense committee consists of these professors: David Pager,
Yingfei Dong,
David Chin,
Dennis Streveler and
Scott Robertson.
My PhD research is focused on compiler theory, specifically,
LR(1) parser generation algorithms and its extension to LR(k). These
have to do with compiler generation and potentially natual language processing.
LALR(1) parser generators such as Yacc and Bison are used widely in the industry
for a long time. LL parser generators such as ANTLR picked up their popularity
since 1990s. But LR(1) parser generation, with its best recognition power,
did not receive enough embracement due to its expensive computation cost.
There are, however, existing algorithms that can reduce the computation
cost and make LR(1) parser generation practical. My work is to study
these algorithms, work on their efficiency, extension and implementation.
- (Fall 2006 ~ ) Hyacc. This work implemented a LR(1) parser generator. It's like a yacc/bison,
only that yacc/bison are LALR(1) parser generators (Bison also uses
GLR algorithm). LR(1) parser generator is more powerful than LALR(1) parser
generators. But ever since Knuth's 1965 paper "On the translation of
languages from left to right" on LR(k) parsing algorithm was published,
the general conception was that the algorithm takes too much space
and time. It is the case even today for some implementations.
But my implementation will have some competitive edges, due to the
algorithms and optimizations used.
After the parser generator is done, there are many other relevant
extension work to do.
As a demonstration of the use of the parser generator, I will use it
to implement a compiler.
Hyacc version 0.95 was released into the open source community on 1/25/2008.
See Sourceforge.net: Hyacc.
A notice was sent to the comp.compiler news group on 2/3/2008.
There are also other related issues to address.
- As of Spring 2009, Hyacc has been extended into a LR(0)/LALR(1)/LR(1)/LR(k) parser generator. LR(k) partially works. Hyacc version 0.95 was released to sourceforge.net on April 8, 2009.
- (2007) The Latex2gDPS compiler. This is the compiler created using Hyacc. It basically is a source-to-source translator, translates Latex source code to gDPS source code. (gDPS stands for general Dynamic Programming Specification, designed by Holger).
Bioinformatics and machine learning
General statement:
I have been working with Dr. Guylaine Poisson on some bioinformatics
projects, mostly to create bioinformatics tools with a web interface.
I also worked on an experimental project with Dr. Susan Still on the
reinforcement learning algorithm.
Since machine learning is used frequently in
bioinformatics, I put these two together.
- (Spring 2007) FragAnchor Databse and Blast tool for the Integr8
Eukaryota genomes. This individual directed research project with Dr. Poisson
downloaded the Integr8 genomes into
a local MySQL database, then applied the NN/HMM (see below) application to obtain
predicted GPI anchors and stored the information into database. Next a blast tool is
downloaded from NCBI and installed. A Perl CGI web interface is then
deployed to allow users to search and blast their proteins against the database.
See web interface.
- (Fall 2006) A Java Applet/Application that demonstrates reinforcement
learning (Q-learning) algorithm. This is a directed research project
with Dr. Still. The purpose is for the simulation of reinforcement learning.
More ...
- (Fall 2005) Automation and web interface for the GPI-anchor prediction NN/HMM
system. The NN/HMM tandem system was developed by Dr. Poisson et. al. My role
was to develop a Perl CGI that glues together the NN and HMM applications
and provide a web interface for its automation.
See web interface. This work was also incorporated into the annotation pipeline of the Barton Bioinformatics research group in February, 2008.
- (Spring 2005) NADPH Oxidase membrane spanning region determination
survey. This is a computational biology course project.
More ...
- (Summer 2001) Perl tools that help with genome sequencing experiement
data processing. Summary (PDF).
Development
Some of the projects I worked on are included here.
An open source Data Structure & Algorithm library in C
- (Summer 2007 ~ ) This started as to meet my needs when writing
programs for research, so that I can write once and use always.
Now I'm expecting to expand the coverage whenever having a chance.
This library uses BSD license, so the code can be
used in both open source and commercial software as long as the
contribution of the author (me) is acknowledged. This library is
provided "AS IS" and without any warranty.
So far these were based on at least seven algorithm books.
Please report bugs and comments to chenx at hawaii dot edu.
Click here for listing or here for Catalog.
- One fun project included with the stat module is a C program to automate the WordChallenge game in Facebook using brutal force. An un-optimized version: source, DOS executable. An optimized version for speed is here: source, DOS executable.
Code generation - RAD tools
- (Fall 2003 ~ Spring 2005) This is a project for self-interest. It also helped with my RA work
by saving lots of time on web site development. ASP web application generator.
Two versions, one in ASP (5000 lines of code), one in VB.NET (35,000 lines of code). More ...
- (Summer ~ Fall 2005) RA work. Sharepoint-based site generator. More ...
Spell checking
- (Summer ~ Fall 2007) This is part of my current RA work. The engine is developed
as a COM server, in plain C. I was able to use some basic data structures
(AVL tree, heap, hash table) and algorithms (levenshtein dynamic programming,
soundex etc.) in this.
So far I have three versions: one console version that can run under both
windows and solaris, one DLL COM version and one EXE COM version. The
COM server can be used in any COM-enabled environments,
e.g., C/C++, VB, VC, .NET, Delphi, VBScript, ASP. The current spell checker
includes clients in C, VBScript, C#, ASP and Ajax. The primary
client is in C#. Some literature research on spell checking
Web crawling and information retrieval
- (Fall 2007 ~ ) This is also needed by part of my RA job. The work was mainly in multhreaded C# with
recursive descent parsing on html.
A short memo was written on this topic.
Web application development
- (2000 ~ ) These are mostly for my RA job. There were at least five production
web applications in ASP/MS SQL/IIS/Windows, one in PHP/MySQL/Apache/Linux,
and misellaneous small projects in perl CGI, ASP.NET etc. New concepts like
web 2.0 (Tag, RSS), Ajax, web services were sometimes used.
My technology blog
- Here is my technology blog started from Summer 2008. There are not many contents yet. Currently it includes material like Linux (Suse 10.3) setup note, C# screen capture, C# simulate mouse and keyboard events, C# convert PDF to image format etc. There are source code ready for use.
Misc. readings and literature review
Last updated: 5/28/2009
|
Conferences
Compiler conferences
2009
- POPL 2009 (abstract: 7/8/08, paper: 7/15/08. Conference: 1/21-23/09)
2008
Others
List of conferences
- http://www.cs.ubc.ca/~zrakamar/conferences.html
- http://www.cs.txstate.edu/~aq10/links/confs.html
- http://www.ida.liu.se/~davbr/conf.html
|