No. of course: 060009
The Chinese name of course: 现代回归分析 (双语课程)
Name of course: Modern Regression Analysis (bilingual course)
Curriculum classification: Main specialty curriculum
Prerequisite: Advanced calculus, Linear algebra, Probability and statistics
The coming courses: Elective courses
Credit hours: 3
Course hours: 54
Textbook: Weisberg, S. (2005). Applied Linear Regression, 3rd edition, New York: John Wiley & Sons.
Summary
One of the most important research topics is exploring relationships among variables in social and natural sciences. Linear regression analysis become one of the most widely used statistical tools for quantifying relationships among factors, although highly innovative modern tools in statistics, such as nonparametric regression, neural networks, support vector machines, tree-based methods and machine learning, are available. Regardless of the degree of sophistication of the model, the most commonly used statistical method for estimating the parameters of interest is the least squares. The criterion applied in least squares estimation is simple and has great intuitive appeal. The linear regression analysis is appealing because it provides a conceptually simple method for investigating functional relationships among variables. Probably, the most important reason to learn about linear regression and least squares estimation is that even with all the new alternatives most analyses of data continue to be based on this older paradigm; and mastery of the linear model is a prerequisite to work with advanced statistical tools. And why is this? The primary reason is that it works: least squares regression provides good, and useful, answers to many problems. The journals in any area where data are commonly used for prediction or estimation show that the dominant method used will be linear regression with least squares estimation.
Regression analysis answers questions about the dependence of a response variable on one or more independent variables. The standard approach in regression analysis is to take data, fit a model, and then evaluate the fit using statistics, including prediction of future values of a response, discovering which predictors are important, and estimating the impact of changing a predictor or a treatment on the value of the response. In this course, the emphasis is not on mathematical proofs of formal statistical tests and probability calculations, but on regression concepts, applications of various regression methods and techniques, and practice of analyzing data and real problems in social and natural sciences. This course provides an introduction of linear regression. Topics include estimate and inference of simple and multiple linear regression, for instance, least squares estimation, hypothesis testing and confidence interval. Moreover, the course also involves diagnostics for model validation and detection of influential and/or outlier observations, and the remedial approaches to various violations of model assumptions. These remedial measures are weighted least squares estimation, transformation of variables and ridge regression, etc. Variables selection is an important topic in model development; and the methods and criteria for determining which independent variables should be included in the models are discussed. Finally, logistic regression is briefly introduced. But it is noted that all linear regression methods discussed in this course are appropriate for cross-sectional data, not for time series data and panel data, which are specially discussed in the courses of Time Series Analysis and Econometrics.
We rely heavily on graphical representations of the data, and employ many variations of plots of regression residuals. We are not overly concerned with precise probability evaluations. Graphical methods for exploring residuals can suggest model deficiencies or point to troublesome observations. Upon further investigation into their origin, the troublesome observations often turn out to be more informative than the well-behaved observations. We notice often that more information is obtained from a quick examination of a plot of residuals than from a formal test of statistical significance of some limited null-hypothesis.
Objective
The basic purpose of the course is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. The emphasis is on regression concepts, rather than on mathematical proofs. Good students, even though they may not have strong mathematical backgrounds, quickly grasp the essential concepts and appreciate the enhanced understanding. The learning process is reinforced with continuous use of numerical examples throughout the text and with several case studies.
Our hope is that after studying through the course, the students will have a more thorough grounding in the theory of linear regression model; and be ready and able to analyze his/her data methodically, thoroughly and confidently. Our hope is that after studying through the course, the students will be able to uncover patterns in the data, and have the better understanding for various principles and concepts of regression analysis.
We have taken for granted the availability of a computer and a statistical package. Recently there has been a qualitative change in the analysis of linear models, from model fitting to model building, from overall tests to clinical examinations of data, from macroscopic to the microscopic analysis. To do this kind of analysis a computer is essential and we have assumed its availability. Almost all of the analyses we use are now available in software packages. We are particularly heartened by the arrival of the package R, available on the Internet under the General Public License. The package has excellent computing and graphical features. It is also free! Obviously, students also use other computer packages, such as SAS, SPSS, S-plus and Eviews. Through the training, students will be able to make a connection between the text and a computer package for doing the computations, and enhance the capability of using standard statistical packages for linear regression analysis. This combination of theory and applications will prepare the students to further explore the literature and to more correctly interpret the output from a linear models computer package.
The teaching approach
We make a collection between the theory and practice in the course of teaching; and will emphasis on the basic statistical theory, core ideas and practice of linear regression analysis.
Our presentation of the various concepts and techniques of regression analysis relies on carefully developed examples. In each example, we have isolated one or two techniques and discussed them in some detail. The data were chosen to highlight the techniques being presented. Although when analyzing a given set of data it is usually necessary to employ many techniques, we have tried to choose the various data sets so that it would not be necessary to discuss the same technique more than once.
Several case studies help students to understand the methods of regression analysis and to analyze real data and problems methodically, thoroughly and confidently.