Modeling, College Admissions, and Enrollment Prediction
Description: Accurately forecasting student interest, their propensity to apply and finally yield (the percentage of admitted applicants who enroll) is an important yet challenging part of a college’s admissions process. Having reliable models helps a college decide where to focus important recruitment dollars, determine the number of applications to expect for the yearly cycle and most importantly optimal number and composition of students to admit.
The St. Olaf Office of Admissions is interested in evaluating and potentially improving its current yield models. The recruitment and matriculation behavior of students at St. Olaf will vary by individual, but a healthy amount of data is available for each student– from test scores to extracurricular activities to high school attended to financial aid and much more – that can be used to create a yield model. Data on enrolled students can also be used to determine predictors of success in a student’s first semester at St. Olaf.
Domain Expert: Admissions Staff (Admissions)
Human Scent Project
Description: Our lab is involved in studying human scent profiles, in support of canine tracking dogs. Trained dogs can follow the scent of a person over a mile or more, up to several days after the person has left the trail. The current hypothesis is that the dogs are smelling the volatile organic compounds (VOCs) given off by dead epithelial (skin) cells shed by the person as he/she walks. We are collecting epithelial cells from volunteers and determining in the lab what VOCs are given off by each person. There are several aims of this work. The first is to show that each person has a unique scent profile that the dog could use to distinguish one person’s trail from those of other people. We routinely detect 50 or more compounds emanating from human epithelial cells, in various quantities. The identities of many of these compounds are the same between people, but there are some differences observed as well. The amount of each compound can also vary between people. We will ultimately collect roughly 100 sample scent profiles, each with roughly 50 compounds, each of varying amounts. Aside from hopefully demonstrating that everyone has a unique scent profile, we also want to investigate if there are class characteristics in the profiles. Along with each sample, we are collecting gender, age, ethnicity, and vegetarian or not data. We are interested in applying statistics, likely principle component analysis, to determine for example whether scent profiles of a given age group can be classified together, or if gender determination can be made based on the scent profile. If class characteristics can be identified, we would then want to build decision trees that might allow the age, gender, ethnicity, etc. to be determined based on the VOC scent profile observed.
Domain Expert: Doug Beussman (Chemistry)
Statistical Applications in Exercise Science
Description: Students in the exercise science senior seminar class choose a research topic early in the fall semester. CIR students will advise and guide students in the design of a study and data collection procedure. By the end of the first semester, the exercise science students will have written an abstract, introduction, review of the literature, and a proposed methodology. Typically, three or four seniors continue with their projects (i.e., collect data) in order to achieve departmental distinction by the end of spring semester. CIR students will continue to advise students and in some cases participate in data collection. Together the exercise science students and the CIR fellows will be involved in the data analysis during interim and spring for those moving toward distinction. We are excited about the possibility of getting much more from our studies with the collaboration of the statistics students.
Domain Expert: Cindy Book (Exercise Science)
To what degree are outpatient clinics cream-skimming hospital’s most profitable patients?
Description: Hospitals have complained that outpatient clinics are cream-skimming their most profitable patients. Outpatient clinics, generally owned by doctors, are places where doctors preform the most common procedures on low-risk patients. Hospitals claim that this kind of cream skimming puts them at risk for solvency, since they generally need to cross-subsidize the expensive, complex cases with the healthy, low-risk cases. So hospitals fear the recent rapid rise of the outpatient clinic industry. One the other hand, growth in the outpatient clinic industry represents a move toward a more efficient system. By taking up the high-volume procedures only, the doctors can hone their skills at those procedures and can both improve outcomes and reduce costs through the specialization process. Insurers have to answer the question: Do we reimburse outpatient clinics and hospitals differently for the same procedure? If they set the reimbursement for clinics too low, it could slow the growth of the outpatient clinic industry. If they set it too high, the hospitals could suffer financially and potentially close down.
Using data from both hospitals and outpatient clinics in California from 2005 to the present, I would like to come up with a measure of the degree to which cream skimming has occurred and the degree to which hospitals have suffered as a result. If we are able to collect data on the different reimbursement schemes by Medicare, Medical and (some estimate of) private insurers, we might also be able to measure the impact of reimbursement schemes on growth in the outpatient clinic industry.
Domain Expert: Ashley Hodgson (Economics)
Variability in Mixing Models in Paleo-ecology
Description: Paleo-ecologists often use stable isotope analysis of charcoal from lake sediment to infer regional changes in the ratio of C3 and C4 plant composition. C3 and C4 plants differ in how they fix CO2; C4plants are more efficient with water usage and generally survive better in warmer, drier climates. We study the carbon isotope ratios and subsequently infer C3/C4 composition of historical forests from charcoal obtained from lake sediment samples. We are looking at sources of variation that may occur in this process due to burning, heterogeneity of the charcoal mixture and sample size.
Isotopic values for C3 plants range from -25 to -35‰ while C4 plants range from -10 to -18‰. The isotopic values for the fossil charcoal from Sharkey Lake in Big Woods, MN, ranged from -20.9 to -22.2 at ~2000 calBP. Based on a simple mixing model this suggests that 57% of the plants were C3. The error calculated in the standard manner, due to sample heterogeneity of charcoal, was ~+0.18‰. However, natural variation in the isotopic values of the assumed fuels and a preliminary simulation analysis suggest a much larger (~20%) range in inferred plant composition.
I would like to further explore the implications of different levels of variation for the two end members. In particular, we would look at variation due to within species variability vs. among species variability. Most of the information on variance would come from the published literature, but we may generate some original additional data using the mass spec. This is a very viable candidate for a note in a paleoecology journal in as much I’ve not seen any direct treatment of this issue.
Domain Expert: Charles Umbanhowar(Biology)
Capital Expenditures as a Cost Driver in Higher Education
Description: The escalating cost of higher education—what some have described as an affordability crisis–has received increasing attention from the media, Congressional committees, and academic researchers, spurring debate within the sector about both the causes of these cost increases and a search for new models. This project investigates the various factors contributing to increasing college costs — identifying the factors responsible for the cost increases we observe and exploring the extent to which different aspects costs have contributed to comprehensive fee increases. As part of this analysis, we will need to determine if these factors play the same role in cost increases over time. Using publicly available data, statistical methods common in economic research will be employed.
Domain Expert: Paul Wojick (Economics)
Stochastic modeling, simulation, and analysis of aquatic invasive species dispersal
Description: Aquatic invasive species such as zebra mussels pose serious environmental and economic risks. Accurately predicting the spread of invasive species is an important component of environmental policy. In this project we will explore stochastic models of invasive species dispersal in lakes and rivers in the United States. Since many invasive species are transported via recreational boat traffic, models of boating patterns based on distances between lakes and other lake attributes are useful for predicting invasive species establishment. We will assess the challenges, accuracy, and limitations of building a lakes-only model of invasive species spread. Rivers are also critical to invasive species transport, so we will build and analyze a coupled lake and river model using publicly available data as well as synthetic data generated from computer simulations.
Domain Expert: Mike Swift (Biology)