Social Metrics I: Introduction to Structural Analysis in the Social Sciences

Open, Lecture—Fall

The course is designed for all students interested in the social sciences who wish to understand the methodology and techniques involved in the estimation of structural relationships between variables. It is designed for students who wish to be able to carry out empirical work in their particular field, both at Sarah Lawrence College and beyond, and critically engage empirical work done by academic or professional social scientists. After taking this course, students will be able to analyze questions such as the following: What effects do race, gender, and educational attainment have in the determination of wages? How does the female literacy rate affect the child mortality rate? How can one model the effect of economic growth on carbon dioxide emissions? What is the relationship among sociopolitical instability, inequality, and economic growth? How do geographic location and state spending affect average public-school teacher salaries? How do socioeconomic factors determine the crime rate in the United States? How can one model the US defense budget? The course is split up, broadly, into three sections. In the first part, we will study the application of statistical methods and techniques in order to: a) understand, analyze, and interpret a wide range of social phenomena such as those mentioned above; b) test hypotheses/theories regarding the possible links between variables; and c) make predictions about prospective changes in the economy. Social metrics is fundamentally a regression-based correlation methodology used to measure the overall strength, direction, and statistical significance between a “dependent” variable—the variable whose movement or change is to be explained—and one or more “independent” variables that will explain the movement or change in the dependent variable. Social metrics will require a detailed understanding of the mechanics, advantages, and limitations of the “classical” linear regression model. Thus, the first part of the course will cover the theoretical and applied statistical principles that underlie Ordinary Least Squares (OLS) regression techniques. This part will cover the assumptions needed to obtain the Best Linear Unbiased Estimates of a regression equation, also known as the “BLUE” conditions. Particular emphasis will be placed on the assumptions regarding the distribution of a model’s error term and other BLUE conditions. We will also cover hypothesis testing, sample selection, and the critical role of the t- and F-statistic in determining the statistical significance of a social metric model and its associated slope or “b” parameters. In the second part, we will address the three main problems associated with the violation of a particular BLUE assumption: multicollinearity, autocorrelation, and heteroscedasticity. We will learn how to identify, address, and remedy each of these problems. In addition, we will take a similar approach to understanding and correcting model specification errors. The third part of the fall class will focus on the analysis of historical time-series models and the study of long-run trend relationships between variables. No prior background in economics or the social sciences is required, but a knowledge of basic statistics and high-school algebra is required.