The Project
ChemModLab
Power MV
People
Datasets
Workshops
Software
Other ECCRs
My Account
Administration
People


Atina Brooks, M.S. (Statistics) is a graduate student in the Department of Statistics at North Carolina State University. Prior to pursuing graduate studies in Statistics, Brooks was an engineer at IBM. As part of Global Services, she worked on a wide variety of projects, including an enterprise automotive quality and statistical reporting system. Her honors include an NSF-VIGRE fellowship for graduate study, a SAMSI Student Fellow Award for the program on Data-Mining and Machine Learning, A Dean's Fellowship in Engineering, and IBM's Golden Circle award.

Moody Chu, Ph.D. (Mathematics) is a world-renowned researcher in matrix analysis, structural dynamical systems and their applications. His pioneering research on continuous realization methods with broad range of applications has led him to serve in the editorial board of the prestigious SIMAX journal for 12 years. His recent monograph, "Inverse Eigenvalue Problems: Theory, Algorithms, and Applications" published by the Oxford University Press, is recognized as the first authoritative treatise of this subject. Dr. Chu has studied extensively the matrix inverse problems where the goal is to devise theoretical understanding and numerical techniques so as to validate, determine, or estimate the parameters of the system according to its observed or expected behavior. His expertise is relevant to the project of drug discovery in several aspects, including optimization, structured low rank approximation, nonnegative matrix decompositions, and data mining by LSI techniques. His current NSF-sponsored project studies the centroid decomposition which approximates the SVD but is easily obtainable and maintains simple interpretability. The technique is aptly connected to the drug discovery project.

Douglas M. Hawkins, Ph.D. (Statistics) has a wealth of experience in multivariate analysis and outlier detection, as indicated in his books Identification of Outliers, and the edited work Topics in Applied Multivariate Analysis. Both contain many insights that remain relevant today. Numerous publications in the areas of outlier identification and robust estimation have developed ideas applicable to drug discovery, including a recent PNAS paper that described an approach for dealing with large data arrays containing missing information and outliers in unknown locations. One thread has been better algorithms for dealing with the combinatorial explosion that arises in multiple-outlier settings. Several of these research areas have involved the writing of software, which Hawkins has made freely available on his web site. Hawkins has twice won the American Statistical Association’s ‘Statistics in Chemistry’ award. One award was for a paper on the use of recursive partitioning in drug discovery; the other was for an evaluation of methods of model validation. His 2004 Shewhart Medal and 2005 William Hunter Award recognize a corpus of work related to a broad array of quality-related problems.

Gary W. Howell, Ph.D. (Numerical Linear Algebra and Parallel Computation) is an expert on developing algorithms and speeding computations. He has developed and improved several matrix decompositions, including a reduction of a general square matrix to a similar small-band Hessenberg form, a Krylov Subspace computation based on reduction of sparse matrices to a similar Hessenberg form, and most recently Householder bidiagonalization. Householder bidiagonalization is the computationally expensive part of determining matrix singular values, so by speeding this component, Dr. Howell’s algorithms speed computation of the SVD by a factor of almost two without sacrificing stability. He has worked with parallel computation since the 1980s, having served on the MPI standards board, the BLAS standards board, helped in extending the LAPACK libraries, and has taught parallel computing and numerical analysis at the graduate level for the last fifteen years. His primary work for the last three years is in aiding researchers of interdisciplinary teams in implementing and improving parallel codes and optimizing computations.

Jacqueline M. Hughes-Oliver, Ph.D. (Statistics) has worked with high throughput screening datasets since 1999. Until 2002, Dr. Hughes-Oliver’s work in drug discovery was exclusively on mixture experiments, where multiple compounds are mixed together in solvent and a single test is applied to the combined sample. Very often no information or very little information is obtained for individual compounds in the mixture. This makes it very difficult to build effective QSAR models. Difficulty is compounded by the fact that within-mixture interactions cause additional complexity of any QSAR that may exist for individual compounds. In more recent years, Dr. Hughes-Oliver has also considered drug discovery problems not involving mixtures, including a design-of-experiments approach to improve the performance of a cell-based QSAR modeling approach and a specialized recursive partitioning or tree-based approach to building QSAR models that works by simultaneously considering multiple descriptors for creating a single split in the tree.

Morteza G. Khaledi, Ph.D. (Analytical/Medicinal Chemistry) is developing novel QSAR models for ADME-Tox studies. This involves prediction of relevant biological measures such as intestinal absorption, skin permeability, permeability through blood-brain barrier, and blood-tissue partition coefficients. Various structural and physico-chemical properties such as lipophilicity, hydrogen bonding accepting and donating properties, size, rigidity, and polar surface area are believed to play important roles in these biological processes. There, however, exist disagreement and sometimes conflicting reports about the significance and adherence to chemical principles of various molecular properties identified as important. One issue is that these studies are often based on QSAR models that have been derived using a limited number of drugs in the training set. Another issue that is often ignored in variable selection for QSAR models is the existence of collinearity between various molecular descriptors; this introduces redundant information into the QSAR and compromises robustness of such models. Yet another critical issue is the oft-made assumption of linear dependence of biological response on molecular descriptors. A focus of Dr. Khaledi’s research has been to define and develop a more accurate and biologically relevant scale for quantitation of lipophilicity of molecules in QSAR models.

Lexin Li, PhD. (Statistics) has research interests in sufficient dimension reduction, bioinformatics, statistical machine learning, and biostatistics. He obtained B.E. in Electrical Engineering, from Zhejiang University, P.R. China, in 1998, and Ph.D. in Statistics, from School of Statistics, University of Minnesota, in 2003. Dr. R. Dennis Cook and Dr. Christopher J. Nachtsheim were his thesis advisors on Sufficient Dimension Reduction for High-dimensional Data. He worked as a Postdoctoral Researcher at Dr. P.J. Hagerman Lab, School of Medicine, University of California, Davis. He is now an Assistant Professor in the Department of Statistics at North Carolina State University.

Liz Nelson, M.S. (Statistics) is a graduate student in the Department of Statistics at North Carolina State University. She graduated from Whitman College with a B.A. in Mathematics in 2004. While at NCSU, Nelson has taught for 3 semesters, including an introductory statistics course of approximately 65 students. Additionally, she has helped with the NIH-funded SIBS program, to promote biostatistics among undergraduate students. Nelson plans to begin dissertation work in the fall of 2007, within the realm of biostatistics.

Raymond T. Ng, Ph.D. (Computer Science) is internationally known in the data mining research community. One of his most cited studies is the paper describing development of a scalable kmedoids clustering algorithm designed for large databases. This algorithm has inspired many recent attempts in the same direction. Consequently, in the past few years, the conference version is cited in almost every data mining paper where clustering is a relevant topic. Furthermore, the software developed has become a standard benchmark. According to Google Scholar, the paper has been cited 483 times since March 1, 2005. Dr. Ng is also well-known for two other lines of work. One focuses on exploratory mining on associations with constraints (two papers cited over 300 times since March 1, 2005) and the other focuses on distance-based outliers (two papers cited over 200 times since March 1, 2005). Dr. Ng has recently won two best paper awards: the 2001 ACM SIGKDD best paper award (ACM SIGKDD is the most prestigious data mining conference in the world) and the 2004 ACM SIGMOD best paper award (ACM SIGMOD is one of the two most prestigious database conferences worldwide).

Kirtesh Patil, B.S. (Computer Science and Engineering) is a graduate student in the Department of Computer Science at North Carolina State University. Prior to pursuing graduate studies, Patil was a software engineer at Hewlett Packard. He has worked on storage server development projects.



William J. Welch, Ph.D. (Statistics) has worked on statistical methods for high-throughput screening data since 1998, largely in partnership with chemists and statisticians at GlaxoSmithKline. He has collaborated closely with Professor Hugh Chipman, who is now Canada Research Chair at Acadia University. Both were, until recently, at the University of Waterloo, and co-supervised a team of one PhD student and two Master’s students there. These students worked on methods for relevant subsets of descriptor variables, a method of pruning classification trees to facilitate identification of multiple activity mechanisms, and on methods for selecting important descriptors to avoid the “curse of dimensionality” of support vector machines. Another Ph.D. student moved with Welch to the University of British Columbia, and is working on methods for benchmarking statistical methods and data sets; see http://hajek.stat.ubc.ca/~will/. Two further students are working on clustering methods for drug discovery data. Part-time student Raymond Lam, an employee of GlaxoSmithKline, was jointly supervised by Dr. Young and Dr. Welch. His research on methods for selecting representative molecules from large chemical libraries led to the 2000 Statistics in Chemistry Award.

S. Stanley Young, Ph.D. (Statistics) has worked on statistical aspects of biology and chemistry problems since graduating from North Carolina State University, BS (1966), MES (1968) and Ph.D. in Statistics and Genetics (1974). While at Eli Lilly & Co, 1972-1987, he was the lead statistician in a large Toxicology division. In 1993 he published with Peter Westfall the highly cited book on multiple testing, Resampling-Based Multiple Testing. While at what is now GlaxoSmithKline, 1987-2002, he worked on statistical aspects of drug discovery. He led internal and external research teams developing novel statistical methods for drug discovery, two issued patents for drug discovery statistical algorithms and over twenty papers on the application of statistical methods to drug discovery. He is a Fellow of the American Statistical Association, 1990, and won five “best paper” awards. He is an adjunct professor of statistics at three major universities where he helps direct graduate student thesis topics. He is also Assistant Director for Bioinofomatics at the National Institute of Statistical Sciences and Lead Statistician at Metabolon.

Qianyi Zhang, M.S. (Statistics) is a graduate student in the Department of Statistics at North Carolina State University. She graduated from Columbia University with M.S. degree in 2003, she worked on financial data modeling. Since 2004 Zhang began her PhD study at NC State under the supervision of Dr. David A. Dickey and Dr. Sastry Pantula. Her research work focuses on the seasonal unit root test. Zhang is a SAS certificate programmer.

Pankaj Chopra, M.S. (Computer Science) is a PhD student in the dept. of Computer Science at North Carolina State University. His current research involves data mining in gene expression datasets. More specifically, it involves the identification of bio-markers that may be involved in various types of human cancers.



Ravish Karki Narayan, B.E in Computer Science and Engineering is currently pursuing his Masters in Computer Science at North Carolina State University. Prior to this, Ravish was a Software Engineer at IBM. As part of the IBM-KPMG Global Business Services team, he worked on Audit and Tax related financial software development. He was recepient of the IBM-KPMG Encore Award. His current interests include TCP/IP and wireless networking.

Hui Shen, M.S. (Statistics, Computational Mathematics) is a PhD student in the Department of Statistics at University of British Columbia (UBC). She graduated from National Univesity of Singapore with a M.S. in statistics in 2001. Prior to pursuing studies in Statistics, she got a M.S. in Computational Mathematics from Nankai University in Tianjin, P.R. China in 1994. Under the supervision of Dr. Willliam J. Welch, her research work focuses on cross validation strategies for model assessment and selection with drug discovery data.

Jessica J. Kraker, M.S.(Statistics)is a lecturer at the University of Wisconsin in Eau Claire. She is finishing her Ph.D. dissertation at the University of Minnesota with advisor Douglas M. Hawkins. Her research and dissertation focus on penalized regression models and applying those to chemometric data.

© 2006 - ECCR @ NCSU
All Rights Reserved
This work was funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant 1 P20 HG003900-01. Information on the Molecular Libraries Roadmap Initiative can be obtained from http://nihroadmap.nih.gov/molecularlibraries/