Projects for 2020-21

Title: Discerning structure in random geometric point processes
Domain Expert: Matthew Wright (MSCS)
Stats/DS Mentor: Jaime Davila
CIR Fellows: Prabesh Paudel, Ken Wang, Julie Yuldasheva

Description: Random processes often produce small-scale variability amid large-scale structure. For random geometric processes, this large-scale structure can be studied with a combination of geometry, topology, and statistics. For example, the study of mathematical graphs generated by random processes provides insight into how graph-theoretic properties depend on the underlying randomness, with applications to real-world networks. As another example, the study of configurations of random geometric shapes provides limit theorems for domain coverage problems, with applications to materials science.

This project will study random geometric point processes from the perspective of filtrations: parametrized families of geometric objects constructed from a set of random points. Filtrations are essential constructions in the field of topological data analysis, in which data is indexed by one or more parameters. Students will write code to generate random geometric data according to one or more probability models. Students will construct filtrations from the data and investigate geometric and topological features. Students will then compute statistical properties of these features, with reference to the underlying probability model. This work will provide new insight into unexplored questions at the intersection of geometry, topology, and statistics.

Title: Staying on the Hill: modeling student retention
Domain Experts: Chris Chapp (Political Science) and Lauren Feiler (IE&A)
Stats/DS Mentor: Paul Roback
CIR Fellows: Martha Barnard, Lindsey Jansen, Kate Rudberg

Description: Colleges and universities have a vested interest in understanding why some students leave before graduation or transfer to other institutions. An ability to predict retention patterns gives the college information about how to best serve its students and fulfill its mission. To this end, St. Olaf’s Retention and Student Success Committee has spent several years piloting new programs to reduce transfer rates. While we have learned a great deal about why some students leave (and while St. Olaf’s overall retention rate is good compared to many of our peer institutions), an ability to predict retention remains a vexing problem.

This project will draw on several data sources to predict student retention. First, we have access to student demographic data and data related to academic preparedness and academic success at St. Olaf. Second, we plan to use a “text as data” approach to examine student writing and see if patterns in open-ended writing about St. Olaf is predictive of retention. We will utilize several statistical tools, potentially including cluster analysis, propensity score matching, machine learning, and rare event modelling. Students working on this project should have a willingness to explore new tools as well as an ability to explain potentially complex findings to a broad audience. This project will not only help the college better serve future Oles, it will contribute to research in higher education aimed at improving student success.

Title: Leaks in the classical music pipeline: where are the women soloists?
Domain Expert: Sara Clifton (MSCS)
Stats/DS Mentor: Sharon Lane-Getaz
CIR Fellows: Carly Dammann, Ella Koenig, David McGowan

Description: Women are poorly represented at the highest levels of most professions; the debate over the major causes of this phenomenon remains contentious and unresolved. In an attempt to understand the underlying causes of women’s underrepresentation, we will explore classical music performance as a case study.
Classical soloists are an elite group of professional musicians that tour the world, performing with the best orchestras in existence. These rock stars of the classical world – think Yo-Yo Ma, Joshua Bell, Lang Lang – often have one thing in common. They are mostly men. Using a large dataset of performances from the world’s major orchestras from their inceptions until now, we will explore how and why the representation of women soloists has changed over time. Some open questions include (1) why the pipeline for solo performance is so leaky for women, (2) why women are better represented in solo violin than solo piano, (3) how culture and society affect women musicians of different nationalities, and (4) how the soloist lifestyle influences men and women differently.

This project would be ideal for students with an interest in statistics/data science and one or more of the following fields: classical music, psychology, sociology, gender studies, and/or economics.  Experience with R is required. It would be beneficial, but not necessary, to have experience with time series analysis, data scraping, data cleaning, machine learning (particularly gender inference from name), and/or data visualization.  The raw data is one or more Excel spreadsheet(s) for each of the major world orchestras (around two dozen) back to the founding of each orchestra (about 100-150 years). For each orchestra, we have information on each performance (e.g., date, venue, type of event, conductor, pieces played). For each piece featuring a soloist, we have the composer, type of piece (usually a concerto), period of the piece, duration, and soloist name. For each soloist, conductor and composer, we have the instrument (for soloist), inferred gender, date of birth, and nationality. The data will be cleaned and processed using R.

Title: The effect of doctors’ care coordination style on multiple chronic condition patients: evidence from primary care physician retirements and relocations
Domain Expert: Ashley Hodgson (Economics)
Stats/DS Mentor: Laura Boehm Vock
CIR Fellows: Noah Hillman, Sarah Rodman, William Wei

Description: This project will look at the way that doctor “exits” (retirements or relocations) impact multiple chronic condition patients who are forced to switch to a new doctor.  When a patient’s primary care doctor exits, that patient is likely to experience a sudden and sustained change in the way they receive health care, according to differences in the way the old and new doctors practice medicine (Fadlon and Van Parys, 2020).  One group of patients who is understudied relative to their need for and use of the health care system is multiple chronic condition patients, patients with numerous ongoing medical concerns such as diabetes, asthma, chronic depression, heart disease, etc.  Forty-two percent of Americans have multiple chronic conditions, and this group makes up a higher share of those interacting with the health care system, spending about 65% of health care dollars.  Yet controlled medical studies generally exclude this population from trials of medical treatments due to the complexity they would add to the analysis.  One of the most important parts of health care for multiple chronic condition patients is the coordination of their care across doctors with different specialties.  Because primary care doctors play an important role in coordinating care, doctor exits are a good opportunity to observe how patients fare when they have a sudden change in the style of coordinating care.

We will use a large data set of Medicare patients from 2016 and 2017.  We will focus our analysis on the questions relating to care coordination for multiple chronic condition patients, particularly before and after primary care doctor exits:

  • What is the variation in coordination practice styles across primary care physicians? Coordination style measures observable in the data include number of specialists utilized and frequency of specialist use.
  • When a patient’s primary care doctor exits, to what degree do their specialist doctors also change?
  • Do multiple chronic condition patients experience more dramatic or less dramatic changes in practice style when their doctor exits?
  • Do hospitalizations respond more dramatically to a shift in medical practice style for multiple chronic condition patients relative to patients with fewer medical conditions?

Title: Donor retention and stewardship efforts at St. Olaf
Domain Expert: Sara Eldridge (Annual Giving)
Stats/DS Mentor: Joe Roith
CIR Fellows: Amelia Cichoski, Heri Lopez, Collin Nill

Description: The St. Olaf Advancement team seeks help to determine how email messages and other touches with current and prospective donors impact their giving to the college. What is the relationship between giving, donor retention, and messaging? Is there any pattern based on constituency (alumni, parents, individuals), general attributes (age, geography, major, other affiliations), type of email communication (solicitation, thank you, event invitation, engagement), signer and voice of email, donor giving capacity, and/or email frequency?   Results will be used to inform donor retention and stewardship efforts at St. Olaf.

Title: Using analytics to impact St. Olaf baseball
Domain Expert: Matt McDonald (Athletics)
Stats/DS Mentor: Matt Richey
CIR Fellows: Michael Daly, Luke Edwards, Guannan Liu

Description: The influence of data analytics has been exponentially rising at all levels of baseball, changing everything from the launch angle of batters, to the positioning of fielders, to the mechanics of pitchers.  Data-based evidence has changed how players train and how they are evaluated, while exposing some popular misconceptions.  The St. Olaf Baseball Team has been collecting data during both practice sessions (for example, swing speeds and angles) and games and scrimmages (for example, pitch by pitch records and the direction of batted balls).  We will use historical data and newly-collected data from the 2020-21 team to provide insights about player performance and game strategy.  We will also study the baseball analytics literature to discern current trends, create valuable visualizations, and plan data collection strategies for the future.