If hundreds of scientists created predictive algorithms with high-quality data, how well would the best predict life outcomes? Not very well. Fragile Families Challenge: paper in PNAS w 112 authors  https://doi.org/10.1073/pnas.1915006117">https://doi.org/10.1073/p... & Special Collection of Socius  https://journals.sagepub.com/topic/collections-srd/srd-1-fragile_families/srd">https://journals.sagepub.com/topic/col...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        We started with high-quality data. The Fragile Families and Child Wellbeing Study ( @FFCWS) measured numerous domains of life for a cohort of families over many years. It has been used in more than 750 scientific papers.  https://ffpubs.princeton.edu/ ">https://ffpubs.princeton.edu/">...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        We used these data in a new way: the common task method. We picked 6 outcome variables (eg GPA). Approved researchers who agreed to our terms received predictors for all families (background) & outcomes for half (training). Goal: predict outcomes they did not receive (holdout).
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        160 teams tried. No one was very successful. For every outcome, the best algorithm was much closer to simple guessing than it was to perfect prediction. And it was only slightly better than a 4 variable regression model (dashed).
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                    
                                    
                    
                        
                        
                        We thought perhaps some algorithms would predict some observations well, and other algorithms would predict other observations well. Nope. They missed pretty much the same way for all families.
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        For policymakers deploying predictive algorithms in high-stakes decisions, our result is a reminder of a basic fact: one should not assume that algorithms predict well. That must be demonstrated with transparent, empirical evidence.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        For scientists, our result raises an understanding/prediction paradox: understanding has been generated by these data (as demonstrated by more than 750 published journal articles), yet the very same data could not yield accurate predictions.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        The paradox is resolvable in at least three ways: (1) our understanding is poor, (2) prediction is a poor measure of understanding, or (3) our understanding is incomplete without a theory that points toward poor prediction. Future research is needed.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Poor predictions by 1 team could be ignored. The collective failure of 160 teams is harder to ignore. This mass collaboration illustrates a broader idea: some social research questions may be better solved collectively rather than individually. We can do more together than alone.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Paper at  https://doi.org/10.1073/pnas.1915006117.">https://doi.org/10.1073/p... Replication materials at  https://doi.org/10.7910/DVN/CXSECU.">https://doi.org/10.7910/D...
                        
                                                
                            
                                
                                
                                
                            
                            
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Filiz Garip ( @ProfFilizGarip) wrote a thoughtful commentary on our paper: What failure to predict life outcomes can teach us  https://doi.org/10.1073/pnas.2003390117">https://doi.org/10.1073/p...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        The Socius special collection includes 12 papers by participants describing their approaches to the Challenge, 3 papers by our group that will be helpful to researchers creating other mass collaborations, and 1 comment.
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Salganik, Lundberg, Kindel, and McLanahan. “Introduction to the Special Collection on the Fragile Families Challenge.”  @msalganik  @IanLundberg1  @alextkindel  https://doi.org/10.1177/2378023119871580">https://doi.org/10.1177/2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Ahearn and Brand. “Predicting Layoff among Fragile Families.”  @JennieBrand1  https://doi.org/10.1177%2F2378023118809757">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Altschul. "Leveraging Multiple Machine Learning Techniques to Predict Major Life Outcomes from a Small Set of Psychological and Socioeconomic Variables: A Combined Bottom-Up/Top-Down Approach."  @dremalt  https://doi.org/10.1177%2F2378023118819943">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Carnegie and Wu. "Variable Selection and Parameter Tuning for BART Modeling in the Fragile Families Challenge."  https://doi.org/10.1177%2F2378023119825886">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Compton. "A Data-Driven Approach to the Fragile Families Challenge: Prediction through Principal Components Analysis and Random Forests."  https://doi.org/10.1177%2F2378023118818720">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Davidson. "Black-Box Models and Sociological Explanations: Predicting High School GPA Using Neural Networks."  @thomasrdavidson  https://doi.org/10.1177%2F2378023118817702">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Filippova, Gilroy, Kashyap, Kirchner, Morgan, Polimis, Usmani, and Wang. "Humans in the Loop: Incorporating Expert and Crowdsourced Knowledge for Predictions Using Social Survey Data."  @anna_fil  @ccgilroy  @ridhikash07  @alliecmorgan  @kpolimis  https://doi.org/10.1177%2F2378023118820157">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Goode, Datta, and Ramakrishnan. "Imputing Data for the Fragile Families Challenge: Identifying Similar Survey Questions with Semi-automated Methods."  @devDdata  @profnaren  @VT_DAC  https://doi.org/10.1177%2F2378023118822647">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        McKay. "When 4 ≈ 10,000: The Power of Social Science Knowledge in Predictive Performance."  @SocialPolicy  https://doi.org/10.1177%2F2378023118811774">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Raes. "Predicting GPA at Age 15 in the Fragile Families and Child Wellbeing Study."  @TiUEconomics  https://doi.org/10.1177%2F2378023118824803">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Rigobon, Jahani, Suhara, Al-Ghoneim, Alghunaim, Pentland, and Almaatouq. "Winning Models for GPA, Grit, and Layoff in the Fragile Families Challenge."  @eamanjahani  @suhara  @khazgh  @azizkag  @alex_pentland  @amaatouq  https://doi.org/10.1177%2F2378023118820418">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Roberts. "Friend Request Pending: A Comparative Assessment of Engineering and Social Science Inspired Approaches to Analyzing Complex Birth Cohort Survey Data."  https://doi.org/10.1177%2F2378023118820431">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Stanescu, Wang, and Yamauchi. "Using LASSO to Assist Imputation and Predict Child Wellbeing."  @EHWpolisci  https://doi.org/10.1177%2F2378023118814623">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Kindel, Bansal, Catena, Hartshorne, Jaeger, Koffman, McLanahan, Phillips, Rouhani, Vinh, and Salganik. "Improving Metadata Infrastructure for Complex Surveys: Insights from the Fragile Families Challenge."  https://doi.org/10.1177%2F2378023118817378">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Fisher. “Data-specific Functions: A Comment on Kindel et al.”  @jacob_c_fisher  https://doi.org/10.1177%2F2378023118822893">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Liu and Salganik. “Successes and Struggles with Computational Reproducibility: Lessons from the Fragile Families Challenge.”  @dayvidliu  @msalganik  https://doi.org/10.1177%2F2378023119849803">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Lundberg, Narayanan, Levy, and Salganik. "Privacy, Ethics, and Data Access: A Case Study of the Fragile Families Challenge."  @IanLundberg1  @random_walker  @karen_ec_levy  @msalganik  https://doi.org/10.1177%2F2378023118813023">https://doi.org/10.1177%2...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        To promote computational reproducibility, there are Docker images for the Socius papers (see Liu and Salganik 2019)  @dayvidliu  @msalganik:  https://hub.docker.com/r/2018dliu/fragilefamilieschallenge_socius_reproducibility">https://hub.docker.com/r/2018dli...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        The Fragile Families Challenge was supported by grants from the Russell Sage Foundation, NSF, and NICHD.  @RussellSageFdn  @NSF  @NICHD_NIH
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        The Fragile Families Challenge builds on more than 20 years of work on the Fragile Families and Child Wellbeing Study, which was supported by grants from NICHD and a consortium of private foundations, including the Robert Wood Johnson Foundation.  @ffcws
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        We are grateful to the Fragile Families Challenge Board of Advisers.  #about">https://www.fragilefamilieschallenge.org/ #about ">https://www.fragilefamilieschallenge.org/...
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Thank you to everyone who participated in the Fragile Families Challenge!
                        
                        
                        
                        
                                                
                    
                    
                
                 
                         Read on Twitter
Read on Twitter 
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                             
                                     
                                    