Projects for 2011-12

G4 4x6
CIR Fellows: Shannon Fast, Ellen Squires, Anna Johanson

Neuron classification related to light sensitivity in eyes

Description: Retinal ganglion cells (RGCs) are the output neurons of the retina and send their axons to a host of targets in the brain. Retinas have about 100,000 RGCs. There are estimated to be between 10 and 20 different subclasses of RGCs, each of which processes and transmits different information. One subclass of RGCs is directly sensitive to light, responding even in the absence of rod and cone photoreceptors. These RGCs are called intrinsically photosensitive RGCs (ipRGCs) and express melanopsin, a protein that allows them to respond to light. There are roughly 1,000 ipRGCs in each retina. ipRGCs signal mean light levels to the brain, controlling pupil constriction and setting the phase of our circadian rhythm. However, ipRGCs themselves are not a single class of cells. There are at least two different types of ipRGCs. The statistical task will be to look at the responses of ipRGCs to a series of light stimuli and classify them based on their response types. Once armed with a rigorous classification scheme, we will analyze data from mutant mice to see if certain subclasses of ipRGCs are affected by the mutation while others are not.

Domain Expert: Jay Demas (Physics, Neuroscience)


G6 4x6
CIR Fellows: Thomas Hegland, Charlotte Sivanich, Eric King (not pictured)

Alabama after the Civil War

Description: Professor Michael Fitzgerald is soliciting assistance for his current book project, a full-scale history of Alabama during the Civil War and Reconstruction era. The statistical work will primarily involve analysis of Alabama data from the University of Minnesota’s huge public use demographic history project, a 1% sample of the entire US manuscript census. We will be examining demographic data relevant to the author’s research on the Ku Klux Klan, emancipation and black suffrage, railroad and economic development efforts. There will also be some detailed analysis of voting returns for what can be determined of the social basis of Reconstruction politics.

Domain Expert: Michael Fitzgerald (History)

G3 4x6
CIR Fellows: Amanda Elling, Laura Smith, Michael Ann Finnin

Yield models in college admissions

Description: Accurately forecasting yield (the percentage of admitted applicants who enroll) is an important yet challenging part of a college’s admissions process. Having a reliable yield model helps a college decide on the optimal number and composition of students to admit; admitting too few students can lead to financial pressures, while admitting too many students can lead to pressures on class size and housing. The St. Olaf Office of Admissions is interested in evaluating and potentially improving its current yield models for admitted students. The probability that a particular admitted student enrolls at St. Olaf will vary by individual, but a healthy amount of data is available for each student who applies – from test scores to extracurriculars to high school attended to financial aid and much more – that can be used to create a yield model. In addition, the effects of individual characteristics on probability of enrollment will be of interest to Admissions staff. Data on enrollees can also be used to determine predictors of success in a student’s first semester at St. Olaf.

Domain Expert: Derek Gueldenzoph (Admissions)


G5 4x6
CIR Fellows: Peter Johnson, Michael Miller, Cecilia Noecker

Uncertainties in Chemical Measurements Resulting from Least Squares Regression


Analytical Chemistry is replete with instances where a calibration is performed by measuring a signal (y) for known concentrations (x). This calibration curve can be linear or non-linear. Most frequently the method of least squares is used to derive the relationship between the two variables; the resulting equation is used subsequently to determine the concentration (x) for a signal (y) resulting from analysis of an unknown solution. Analytical chemists are keenly interested in knowing the quality of the least squares fit to the data and the uncertainty in the determined concentration. It is this set of uncertainties that establishes the level of confidence in the result.

In this project, we will work on statistical methods to determine the uncertainty in the independent variable x as a function of linear or quadratic least squares fitted functions. There is interesting literature to explore regarding the “calibration problem”, where we use an equation to estimate x rather than y as is typical. Potential outcomes of the project include a written guide to linear and quadratic regression analysis and a spreadsheet-based computational tool to determine uncertainties.

Domain Experts: Paul Jackson (Chemistry, Environmental Studies) and Mary Walczak (Chemistry)


G2 4x6
CIR Fellows: Carrie Groth, Sam Bailey-Seiler, Andrea Dittman

Examining the association between extraversion and more extreme expressivity

Description: The concept of extraversion includes an element of enthusiasm, prompting the question: Does extraverts’ enthusiasm carry over into how they respond to everyday situations, items or concepts? To examine the relation between extraversion and extreme responses (e.g. the extreme points on a Likert scale, such as a “1” or a “7”), the responses of extraverts and introverts to an online questionnaire can be analyzed. In order to determine participants’ level of extraversion, the Extraversion scale from the NEO-PI-R was used (Costa & McCrae). The questionnaire then asked participants to rate their reactions to a variety of distinct stimuli which included: positive and negative self-relevant qualities, positive, negative and neutral hypothetical scenarios, positive and negative photographs. In addition, participants were given a word preferences task which required them to choose the word they preferred from a matched pair of words that differed in intensity. In a previous study, extraverts were found to endorse significantly more positive extreme responses than introverts did. This research will investigate whether extraverts have a bias toward more extreme responses, which could affect future research using self-report measures. A second potential project is the refinement of a measure to assess to what extent an individual sees intrinsic meaning, extrinsic meaning, or fails to find meaning in life.

Domain Expert: Donna McMillan (Psychology)


G1 4x6
CIR Fellows: Aaron Molstad, Ben Leis, Nicole Bettes

Statistical exploration of stylistic variation in the original and translations of the novels of O.E. Rolvaag


Much research in stylometry, or the use of statistics in the analysis of literary style, has been devoted to identification or characterization of authorship. This study will use many of the techniques of stylometry to explore stylistic characteristics in the novels of O.E. Rolvaag. First determining the stylistic characteristics of the novels in their original Norwegian, then looking for significant differences between the earlier and the later novel, and finally turning to the English translations to see how the style in English differs from the original, and what, if any, stylistic differences emerge that might be due to the different translators of each of the novels. There are a number of steps to this project. 1. Create the corpus. Scan the Norwegian novels and the English translations to create a corpus of digitized texts. 2. Determine the most appropriate encoding scheme and encode (annotate) the texts. I am hopeful that some if not most of this can be done automatically for both the Norwegian and the English texts. Searching for appropriate software is part of the project. 3. Use the annotated texts for both quantitative and qualitative comparisons. This includes for example, the use of concordances, KWIC, word frequencies, collocations, etc. along with a variety of statistical techniques to describe and explore the stylistic characteristics of each individual text and to compare the stylistic characteristics of the various texts. 4. Write up the results. For some examples of this type of analysis, see

D.L. Hoover: Multivariate analysis and the study of style variation. This paper investigates style variation . . . using multivariate analysis, specifically, cluster analysis of the frequencies of frequent words.

D.L. Hoover and Shervin Hess: An exercise in non-ideal authorship attribution: the mysterious Maria Ward three texts, along with similar texts by other authors, using cluster analysis, Delta analysis, t-testing, and PCA.

J.F. Burrows: Modal verbs and moral principles, An aspect of Jane Austin’s style. Differing frequency-patterns in the modal auxiliary verbs show statistically significant differentiations among Jane Austen’s characters, between dialogue and narrative, and between different modes of narrative

J. Rybicki: Burrowing into Translation: Character Idiolects in Henryk Sienkiewicz’s Trilogy and its Two English Translations The method used was Burrows’s technique of multivariate analysis of correlation matrices of relative frequencies of the most frequent words in the dialogue.

And many other articles in journals such as Literary and Linguistic Computing, Computing in the Humanities, Style, etc.

Domain Expert: Solveig Zempel (Norwegian)