### Teaching Interests

My interests in teaching cover a wide range.

#### General linear modelling

I began in 1972 to teach normal-theory regression and ANOVA through a general linear model approach using OMNITAB, the US National Bureau of Standards spread-sheet statistical package. I developed the new course while a Fulbright Senior Fellow at the Educational Testing Service in Princeton NJ. OMNITAB (a precursor of MINITAB) is an English language package - it is still available, at

http://www.itl.nist.gov/div898/software/omnitab.html

- and was designed for non-programmers. It had good graphical facilities (for the time) and a very efficient regression routine. This allowed weighting, and gave the hierarchical partition of the total sum of squares in the order of terms specified in the model fit statement. This was very valuable in illustrating the need for several permutations of order of model terms in non-orthogonal experimental designs and general regression models.

The importance of this issue has been obscured by the common but unfortunate practice in statistical packages of setting a default analysis of non-orthogonal designs based on an adjustment procedure in which the sum of squares for each term in the model is adjusted for all other terms, or for all other terms of the same order (interactions, main effects). The resulting "ANOVA" tables are at best misleading, and at worst lead to serious error in model specification. I wrote a major discussion paper on this issue: J. Roy. Statist. Soc. A (1978). This has been largely overlooked, and the same errors continue to be made, in textbooks as well as packages.

The OMNITAB inversion routine for the information matrix did not have aliasing protection against a singular model specification: the sums of squares partition was always correct but the parameter estimates and standard errors would blow up spectacularly. This proved very useful in illustrating the effects of over-parametrization, the use of dummy variables for factor levels and the designation of a reference category.

I taught this course from 1972 to 1976 to a mixed group of third-year psychology and statistics students; for the psychology students the course was a pre-requisite for the fourth honours year in psychology. I gave a proper mathematical model presentation of regression, and provided intensive tutorials in the simple linear algebra and coordinate geometry necessary to understand linear models, for those students who needed help.

The final take-home exam which counted 50%-60% required the students to analyse a three-way unbalanced cross-classification with three covariates and a normal response, the data (from a psychiatric study of emotional responses of husbands to suicide attempts by their wives) being unique to each student. This analysis would have stretched most statistics honours students at that time (and still would). I wrote a course book based on this approach (Linear Models with Applications in the Social Sciences) which much later morphed into the book Statistical Modelling in GLIM which appeared in 1989. This used the same regression model formulation I had used in the course, greatly extended to generalized linear models fitted in GLIM.

#### Introductory service courses for non-mathematics students

I have been deeply concerned about these courses since my appointment to Macquarie University in 1969. The University's common course at that time, which was taken by a large majority of first year students, was perhaps no worse than other such courses, but it was disliked by students, and they learned little from it. Many took away a heartfelt dislike of statistics (and sometimes statisticians) and a determination to avoid any more of it, or them, if possible. This made the task of Macquarie statisticians much harder in second- and higher-level courses.

I attribute much of the low esteem in which statistics and statisticians are held directly to such first courses. Why are these courses so bad? The Macquarie course was designed by a committee of users in other departments, and went from nothing to linear regression in 12 weeks, of 2 hours lectures (and no tutorials). No mathematics could be assumed, and no theory of anything was given, just methods. The main statistical principle, if there was one, was the Central Limit Theorem justifying the t-test, the centrepiece of the course.

I developed a first course based on simple nonparametric methods, and tried for some years to get it adopted as an alternative to the existing course, without success - the absence of the t-test was judged unacceptable and even dangerous.

What should we be aiming for in an introductory course for a wide range of students from different disciplines? The changes in high-school mathematics with the introduction of statistical concepts have made this a different question from the 1960s and 70s, but many of the issues are unchanged.

The first important point is that a first-year student does not have a programme so narrow that he or she cannot be exposed to real-world statistical issues that do not directly relate to his or her major field of study (which may in any case change during the student's University experience). On the contrary, it is very important to demonstrate that statistical issues come up in all fields of study and are met in daily life, regardless of the student's major field.

The second point is that the main statistical issue to be raised should not be how to formulate a statistical hypothesis and do the t-test - this should be the subject of a second or later-year course. The main issues should be those of design, experimental and survey, and how to be able to assess whether the design used allows believable conclusions to be drawn from the study. Central to this is the sample design: is the sample from which conclusions are being drawn representative of the population to which the inference is to be applied? Randomness, and how to achieve it, play a critical role in this.

I have had great success with a 20 lecture-hour course based on these principles, and a population data base from which students draw random samples, then use their samples to draw simple inferences about population parameters. To minimise the amount of probability theory needed, I restrict inference questions to proportions. The current form of this course is frequentist, but with minimal changes it could be Bayesian. The course uses the StatLab data base from Hodges, Krech and Crutchfield's 1974 book. The database was freely available on the Web for many years but now seems to have been withdrawn. I wrote a small course book for this course which I taught at the University of Newcastle UK, under the name Information in the Modern World. It can be downloaded from this address.

Subsequent courses will develop model-based (especially the normal model) analysis, and even for non-mathematical students it is essential to teach the necessary mathematical model representation, especially for the multiple regression model, as described above.

#### The first course in mathematical statistics

The same principles apply to mathematically capable students as apply to non-mathematicians - except that we can give the theory in full and at length. The question is - what theory?

We have taught for far too long statistical theory as influenced by Neyman, a theory almost divorced of models. Neyman's remarkable developments of the mathematical principles of finite population survey sampling, of the Neyman-Pearson lemma and of confidence intervals have had a very strong effect on courses in introductory mathematical statistics. But in many such courses, the job of the student seems to be to learn of the class of estimators for a given population parameter, and then choose among these according to their sampling variance, or mean square error if you allow biased estimators. From this point of view maximum likelihood is only one method of estimation, and while it may be asymptotically optimal (in some families of distributions) it need not provide the best estimator in any finite sample situation. Thus other families of estimators have to be considered, and moment estimators, which replicate the population definition in the sample, may be quite appropriate (for the negative binomial distribution for example).

This approach is pre- (or anti-) Fisherian, in the sense of ignoring the very strong issues of sufficiency and efficiency, under which moment estimators, for the negative binomial or any other distributional model, are inefficient unless they are functions of the sufficient statistics. Should the sufficient statistics be hard to handle, or not exist in any simple form, Fisher showed how to obtain asymptotically efficient estimators from inefficient ones by one cycle of the scoring algorithm.

Fortunately this pre-Fisherian approach is dying out. The likelihood-based Fisher-Neyman-Pearson (FNP) approach has such a strong base in statistical theory that it is the major and standard approach, though Fisher and Neyman had quite different views on the role of inference (see Alan Welsh's book for a very clear discussion).

Where is Bayesian theory in all this? In the usual teaching of the current FNP paradigm, a small corner is assigned to Bayes theorem and prior/posterior inference. This may have been defensible pre-1990 when Bayesian computation was in the early stages of its explosive development, but it has long since become indefensible.

The reasons are very clear. The first is that Bayes analysis of complex models through MCMC is now able to provide fully efficient analyses when ML is simply unable to handle the complexities. The EM algorithm for ML can take us some way into simple incomplete data models, but with multiple kinds of unobserved, latent and missing data, ML through multiple nested EM algorithms becomes unmanageably complicated, and to obtain the information matrix through the Louis approach or any other method is even more complicated. For such models, FNP theory has ground to a halt - the theory is exhausted computationally. Bayes analysis through MCMC requires very extensive simulations, but these are now extremely fast in WinBugs (for example) and completely practical on a standard desktop or powerful laptop. Equally importantly, the setting-up of models and priors is a routine process (at least for standard models) and the statistician does not have to be concerned about the computational correctness of the program (though parametrizations and convergence issues are of concern).

But a second reason is even more important theoretically. In such complex models, the likelihoods can be far from log-quadratic in the parameters, and so ML estimates and SEs from the information matrix do not provide an adequate representation of the information about individual parameters. Profile likelihoods are better, but suffer the obvious difficulty of over-precision. Model comparisons through the likelihood ratio test also start to break down for the same reason - the asymptotic distribution is not adequate, and to find a higher order asymptotic expansion in the sample size is impossibly complicated as a routine procedure for different models. Even more critically, there is no standard asymptotic procedure for comparing non-nested models: the limited results of Cox do not generalise to arbitrarily complex models. Bayesian methods are essential for such complex models.

What about the philosophical objections to Bayesian procedures - subjective priors, non-invariance of uniform priors, calibration of credible intervals, the difficulties with Bayes factors? Many Bayesians have addressed these issues. I have been working since 1996 on my own approach to the development of Bayesian methods to deal with these objections. A book-length treatment published by Chapman and Hall/CRC (2010) deals with all these issues, and gives an integrated Bayesian/likelihood approach to posterior inference which requires no more than weak or non-informative priors, for both posterior parameter inference and model comparisons.

The Bayesian approach can be, and in my view should be, the basis for the teaching of statistical theory. To delay it till a later University stage after a standard FNP course wastes precious teaching time and does not equip the student for the complex analyses he or she will need to cope with in modern applied research.