1505 Seneca Run, Ambler, PA 19002

Home: (215) 628-2798; Lab: (215) 214-4261; Fax: (215) 728-3574


summary of Qualifications:

  • Bioinformatics And Computational Biology
  • Domain knowledge on molecular biology and biochemistry.
  • Bioinformatics algorithm development and application implementation.
  • Machine learning and data mining.
  • Array data and other large dataset analysis.
  • Sequence space searching, sequence alignment/analysis.
  • Protein structural bioinformatics.
  • Bioinformatics tools/applications deployment and maintenance.
  • Computational Chemistry
  • Computational Chemistry/Theoretical Chemistry Ph.D.
  • Protein structure prediction: fold recognition, homology modeling, and ab initio simulation.
  • Computational sampling of sequence and structural space.
  • Docking, QM/MM/DM simulation.
  • Energy function development.
  • Structure-based protein and sequence design.
  • Computational sampling of sequence and structural space.
  • Protein structural bioinformatics.
  • Scientific Programming
  • C/C++, C#, Java, Perl, Php, XML, R, Matlab.
  • High performance computing (HPC) with C++ and MPI.
  • Relational database and SQL programming.
  • Web server and web services.
  • Working and develop environment in Linux/Unix, Linux cluster, and Windows.
  • Strong communication skills and people skills.


    Ph. D.Theoretical ChemistryJilin University, ChinaJune 1994
    M. STheoreticl ChemistryJilin University, ChinaJune 1991
    B. Sc.PhysicsInner Mongolia University, ChinaJune 1988


    Working Experience:

    2006-present: Sr. Program Analyst II at Fox Chase Cancer Center

  • Protein homology modeling refinement (PSI funding with Drs. Roland Dunbrack at FCCC and David Baker at University of Washington (67% effort).

    Abstract: New statistical energy functions development, and their application in Rosetta protein modeling package. Evaluate and optimize Rosetta protocols based on new energy functions for homology modeling, ab initio prediction, and sequence and structure design. Integrate SDPro into MolIDE2 to provide more powerful homology modeling platform. SDPro is a in-house software package which integrates all my individual algorithms and applications involving sequence space searching, sequence alignment/analysis, structure analysis, homology modeling, algorithm benchmarking, and data mining.

    Technologies and tools: Rosetta, homology modeling, .NET/C#, Mono, C++, Fortran, MPICH2, Perl.

  • HPC programming and consulting (33% effort).

    Abstract: Provide HPC support for researches at Fox Chase Cancer Center, which include molecular simulation, docking, drug screening and design, microarray data analysis, and other structural biology and bioinformatics studies.

    Technologies and tools: Schrodinger/Glide, Amber, Charmm, Modeller, Rosetta, NMF, C++, MPICH2, Perl, Php.

  • 2005: Research Associate in Bioinformatics Group at Fox Chase Cancer Center

  • Developed a novel algorithm (LSNMF) for microarray data analysis.

    Abstract: Non-negative factorization (NMF) is a machine learning algorithm, I developed a novel variant of NMF by utilizing the uncertainty estimate information. LSNMF significantly improved the power of NMF in term of linking functionally related genes, and it is also much more stable than NMF during the simulation. I have implemented a C++  package for NMF and LSNMF with HPC version available.

    Technologies and tools:  C++, LAM/MPI, Perl, Matlab, biostatistics.


  • Involved in routine facility support for whole Fox Chase research community.

    Technologies and tools: C/C++, Perl, R, Matlab, NCBI toolkit and databases.  

  • 2004: Research Associate in Dr. Roland Dunbrack's lab at Fox Chase Cancer Center

  • Adapted and implemented several Bayesian statistical algorithms and applied to bioinformatics study and data mining.

    Abstract: Dirichlet process mixture models and the corresponding simulation techniques are well-developed methodology in statistics to address classification and pattern recognition problems. I have adapted Nealí»s various implementations to derive Dirichlet mixtures for amino acid counts in multiple alignment columns and genome sequence domains, and have used the derived Dirichlet mixture models to develop new sequence alignment and sequence segmentation algorithms.

    Technologies and tools: C/C++, Perl, Bayesian statistics, HMM, MCMC.

  • Implemented a comprehensive software package for sequence alignment and homology modeling (SDPro).

    Abstract: Integrated all related sequence alignment software pieces into a package for an in house platform which can be used for algorithm development on sequence alignment and homology modeling.

    Technologies and tools: C/C++, Perl, Php, Java, HTML/Javascript, CGI, LAM/MPI, MySQL 

  • Homology modeling in CASP6.

    Abstract: I was the key member in the assessor group to evaluate algorithms of fold recognition in CASP6. CASP is the most prestigious international contest for algorithm developers in protein structure prediction. I introduced a new scoring system to judge the algorithm performance, I also developed a pipeline for large scale data visualization.

    Technologies and tools: sequence and structure alignment, Perl, csh, Grace, InsightII, Rasmol, Modeller.  

  • 2000-2003: Post-doctoral fellow in Dr. Roland Dunbrack's lab at Fox Chase Cancer Center

  • Developed a novel profile-profile alignment algorithm for genome annotation and fold recognition.

    Abstract: Developed a new structure dependent profile-profile alignment algorithm, which integrated multiple scoring systems together to yield more accurate and more specific result. The new algorithm combines position specific profile-profile alignment score with location dependent secondary structure alignment score, makes use of as much as possible available sequence and structure information. It had been showed more powerful than other profile-profile alignment algorithms which were available at that time.

    Technologies and tools: C++, Perl, PsiPred, 3D-profile, HMM, MySQL.

  • Established a benchmark system for algorithm evaluation and parameters optimization.

    Abstract: Set up a comprehensive benchmark system based on SCOP database which contains total 1627 representative SCOP entries, 3442 true alignment pairs for alignment accuracy training and test, 2207890 decoys for homology detection test. I have tested 7 different scoring functions, 3 approaches of converting MSA to PSSM, 3 gap penalty schemes, 3 different ways of building MSA from sequence space searching.

    Technologies and tools: C++, Perl, NCBI toolkit, HMM, secondary structure prediction.

  • Built structure models of severe acute respiratory syndrome coronavirus (SARS-CoV) Mpro protease identified the key residues and interactions involved in the protease specificity.

    Technologies and tools: NCBI toolkit, InsightII, Modeller, Loopy.

  • Web server and biological database design and implementation.
  • PISCES (
  • CASA (
  • S2C (
  • Technologies and tools: Perl, CGI, Php, Java, MySQL, HTML/Javascript, XML.

  • CASP5 contest (ranked the top 8 group in the world for homology modeling).

    Abstract: I was the key player in this contest, my major role was building sequence alignments for all targets against known PDB structures.

    Technologies and tools: NCBI toolkit, Perl, C++, shell script, LAM/MPI.

  • 1999-2000: Post-doctoral fellow in Dr. Huan-Xiang Zhou's lab of Drexel University

  • Developed a new algorithm for fold recognition (COBLATH), implemented the corresponding software package.
  • Implemented an automatic pipeline system for large-scale genome annotation (32 whole genomes were completed).
  • Developed an automatic pipeline system for large-scale homology modeling (had applied to whole yeast genome).
  • CASP4 contest (ranked the top 6 group in the world for fold recognition).
  • 1996-1999: Associate professor in Beijing Institute of Biotechnology, China.

  • Designed two new mutants of tumor necrosis factor (TNF), which had been experimentally validated that they can specifically bind one TNF receptor but not the other one.
  • Built structure models for tissue-type plasminogen activator (t-PA), identified the key residues for the biological functions of t-PA.
  • Established several protocols for antibody humanization design.
  • Designed several experimentally validated peptide mutants of Mastoparan (MP), which is an antagonist of Lipopolysaccharide (LPS).
  • 1997-1999: Support scientist of Biosym/MSI simulation system (Accelrys) for NeoTrident Technology Limited,China.

  • Technical trouble-shooting and users training.
  • 1994-1996: Post-doctoral fellow in Beijing Institute of Biotechnology, China.

  • Established the first molecular design laboratory in Beijing Institute of Biotechnology.
  • Developed sets of protocols for structure simulation and molecular design.


    Reviewer of BMC bioinformatics and Journal of Biomedical Informatics.