1/Was asked by  @jasleen_grewal what I mean by "controls: they& #39;re not just for wet lab experiments". Good question!
Controls are a principle for analysis of biomedical research data, in general. Not just when machine learning is involved. https://twitter.com/michaelhoffman/status/1266408720340332544">https://twitter.com/michaelho...
                    
                                    
                    Controls are a principle for analysis of biomedical research data, in general. Not just when machine learning is involved. https://twitter.com/michaelhoffman/status/1266408720340332544">https://twitter.com/michaelho...
                        
                        
                        2/Biomedical data analysis often involves using complex methods on large amounts of data. There may have been some amount of ad hoc validation for the method. In my field, there& #39;s usually not the sort of rigorous validation with formal proofs of bounds.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        3/So often one is using the method outside a setting for which it is verified to work. (This times infinity if you& #39;re using a commercial analysis tool which has never been described in a paper.)
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        4/What& #39;s more, researchers often create pipelines where the output of one complex method becomes the input of the next. It& #39;s hard to know a priori where the interactions between these methods might cause them to break down.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        5/I suggest dealing with all this complexity the same way one would for a wet-lab experiment: add controls.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        6/Try replacing your dataset with a dataset that is similar but means nothing. In genomics analyses, the easiest way to do this is just shuffle your dataset—keep the summary statistics of the dataset the same but scramble the list of genes or the position of genomic regions.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        7/Repeat the rest of the pipeline. Maybe it ends in an gene set enrichment analysis. On your real gene list, you found that the top 5 GO terms are biological processes that are hot right now. On your shuffled control you find… the same thing. Oops!!!
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        8/People with expertise in these methods would expect that result. But with analytical controls you can still find this kind of problem without being an expert. Or even if you are a collaborator and aren& #39;t actually performing the analysis in the first place.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        9/Analytical controls provide a powerful, easy-to-execute, and easy-to-understand technique to ensure that your results mean something. They aren& #39;t used nearly often enough.
                        
                        
                        
                        
                                                
                    
                    
                
                 
                         Read on Twitter
Read on Twitter 
                                     
                                    