I have been working for some years in this field to develop a general integrated Bayes/likelihood theory of statistical inference. The theory is based on two main points: that the multinomial model and Dirichlet prior provide a general "always true" model and analysis for sample data, and that Bayesian model comparisons are based not on Bayes factors but on the full posterior distribution of the likelihood ratio between the models.
The first point implies that parametric models are a convenience, not a necessity. An example of a full Bayesian two-level variance component analysis with a regression model using only the multinomial model and Dirichlet prior is given in the Journal of Official Statistics 2008.
Parametric model analyses need to be validated against the general multinomial model. The second point allows the use of weak or noninformative prior distributions for this validation, and generally for model comparisons, which is not possible in the Bayes factor approach.
My original work which generalized Dempster's result from simple null hypotheses can be found in Statistics and Computing 1997. The numerical integrations used in this paper were replaced by simple posterior Monte Carlo simulations, and several examples were given, in an extended paper with Richard Boys and Tom Chadwick in Statistics and Computing 2005, based on Tom Chadwick's PhD thesis at Newcastle. An application with Charles Liu to multiple model comparisons in a psychological study can be found in the Journal of Mathematical Psychology 2008, and an application with Charles Liu and Tom Chadwick to two-level models, where there are several upper-level models for the random effect, can be found in the Annals of Applied Statistics 2009. A general book-length treatment has been published by Chapman and Hall/CRC 2010. Current applications include a new unit-root test.
This work has been supported by several grants from the Australian Research Council and the US National Center for Education Statistics.
My current interests are in Bayesian computing, especially large-scale posterior simulation for all kinds of parametric functions. A major application is to posterior distributions of likelihoods and likelihood ratios for Bayesian model comparison. Several examples of this use are given in the paper with Richard Boys and Tom Chadwick in Statistics and Computing 2005, and a general book-length treatment has been published by Chapman and Hall/CRC 2010.
Another interest is in speeding up MCMC computations using initial parameter values drawn from an approximate normal posterior obtained from a maximum likelihood analysis. This helps with issues like labelling in finite mixture models.
My earlier interests in this field were primarily in maximum likelihood model-fitting, especially in extensions of generalized linear models. These extensions frequently involved unobserved random effects; these models were fitted by Gaussian quadrature and nonparametric maximum likelihood in GLIM using EM algorithms.
A hybrid Gauss-Newton/EM algorithm for these problems was described and evaluated in the paper with Irit Aitkin in Statistics and Computing 1996. Applications were given to overdispersion, variance components and measurement error, and several applications to autoregressive-random effect longitudinal models with Marco Alfo' appeared in Statistics and Computing, in 1998, 2000 and 2006. A full book-length treatment of these models is given in Statistical Modelling in GLIM4 (2005), and an adaptation of this book to R is Statistical Modelling in R (2009) .
The 1989 book Statistical Modelling in GLIM was revised for Oxford University Press: the second edition Statistical Modelling in GLIM4 2005 contained a substantial expansion of likelihood theory and a summary of Bayes theory, and three new chapters on finite mixtures, overdispersion and variance component models. This book was adapted for R with Ross Darnell: Statistical Modelling in R 2009.
Random effect models provide a very general extension of exponential family and other statistical models. The early restrictions for maximum likelihood analysis to conjugate random effects allowing analytic likelihoods or to approximate quasi-likelihoods were removed by estimating the random effect distribution nonparametrically; the resulting estimate leads to a general finite mixture maximum likelihood problem. The range of extensions possible with random effects is surprisingly large.Finite Mixtures
An intractable difficulty with these finite mixture models has been determining the number of components needed in the mixture. The posterior likelihood approach to Bayesian model comparisons gives a new solution to this problem, which is described in the book published by Chapman and Hall/CRC 2010. Applications of this approach to simpler model comparisons problems can be found in Statistics and Computing 2005, Journal of Mathematical Psychology 2008, and Annals of Applied Statistics 2009.
Maximum likelihood in finite mixtures has been well-developed algorithmically through EM since the 1977 paper of Dempster, Laird and Rubin. Difficulties remain in identifying multiple local maxima, and in deciding how many components are needed in the mixture. The latter problem, which has so far resisted theoretical treatment, has generally been handled computationally by bootstrapping the distribution of the likelihood ratio test statistic (McLachlan Applied Statistics 1987). The bootstrap test was implemented in GLIM macros and used in the second GLIM4 edition of our book: Statistical Modelling in GLIM4 2005, and implemented in an R procedure in the R edition: Statistical Modelling in R 2009.
However p-values from the bootstrap test are non-conservative, and an alternative Bayesian model comparison approach gives a new solution to this problem. It is described in the book published in 2010 by Chapman and Hall/CRC Press.
Neural network models are widely used in complex chemical engineering process modelling and control, and in applications in computer science and psychology. Training the network - that is, fitting the model - requires careful handling in the conventional approach: direct minimization of the residual sum of squares criterion function results in serious over-fitting and poor prediction on cross-validation. A common approach is to greatly overfit the model with a penalty function on the number of nodes.
Work with Rob Foxall in Statistics and Computing 2003 reformulated the multi-layer perceptron model as an explicit latent variable or factor model with binary latent variables. The reformulated model can be fitted straightforwardly with an EM algorithm or an iteratively reweighted least squares algorithm. The latent variable model has a much better-behaved likelihood surface, which is more easily maximized than the sum of squares function.
The latent variable model is a special case of the mixture of experts model; the connection between the multi-layer perceptron and the mixture of experts model had not previously been recognised.
Item response models relate the probability of a correct answer on a test item to the candidate's latent ability through a logistic or probit regression. Maximum likelihood fitting of these models was first made practical using an EM algorithm, in Psychometrika (1981).
I have been working since 2003 with Irit Aitkin, in a series of research contracts with the US National Center for Education Statistics, on the development and implementation of fully model-based approaches to the analysis of the large-scale US National Assessment of Educational Progress (NAEP) surveys of primary and high-school students' achievement. We have developed multi-level model-based approaches for the clustered and stratified survey designs, examined alternative distributions for the latent ability, and formulated alternative latent class models for guessing, or "engagement" on the test items. For each of these developments, we have compared the results of the model-based approach with the current methods of analysis using both simulated and actual NAEP data.
We wrote up the results in a series of technical reports. The reports have been integrated into the new book by Murray and Irit Aitkin Statistical Modeling of the National Assessment of Educational Progress Springer 2011.
The Melbourne Psychology Department had a research group in Social Networks, the only such Department group in Australia. It was headed by Pip Pattison, a mathematician and psychologist, who became Deputy Vice-Chancellor at Melbourne, and later moved to Sydney University. The data analysis approach, as in the few groups in US and UK universities, was based on extensions of the exponential random graph model (ERGM), originally due to Holland and Linhart. Much of the interest was in identifying subgroups of well-connected members of a network, though the ERGM as it was used did not have a simple way of representing group structure: this was done by data analysis on the links connecting the members of the network.
Statistical inference was particularly difficult because of the conditioning on the sufficient statistics used in the conditional likelihood approach: maximizing the conditional likelihood was formidable computationally, and standard errors were even more formidable. A popular alternative was like that used in Approximate Bayesian Computation (ABC), of generating random draws from the prior distributions of the model parameters, then generating random samples from the drawn parameters, and comparing aspects of the real network with those of the simulated networks to determine what parameter values were plausible.
After some years of familiarization with this approach, and because of the Bayesian approach I was using, it was clear to me that a fully conditioned Bayesian approach only had to deal with the covariance structure of the links, so the introduction of latent class structure provided a simple representation of the covariance structure, one for which both maximum likelihood and Bayesian approaches were relatively straightforward.
The ARC supported Duy Vu to implement the approach, and we were able to compare ML and Bayesian approaches, and establish the superiority of the Bayesian approach.
Fri May 22 2015
Back to my home page
Department of Mathematics and Statistics home page
Last modified: Fri May 22 2015