I& #39;m gonna start a thread on what I hope will be helpful R tips to wrangle this huge NFL Big Data Bowl data. If you& #39;re an advanced R programmer, this is probably not for you but feel free to correct me if I made a mistake or offer better alternatives
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                         #1 
slice_sample() if you want to quickly preview what your result might look like using a random sampling of rows in your data
                    
                                    
                    slice_sample() if you want to quickly preview what your result might look like using a random sampling of rows in your data
                        
                        
                         #2
janitor::clean_names() if variable names with random capitalization, spaces and other undesired characters make you sick
with the defaults you can turn gameTimeEastern ( https://abs.twimg.com/emoji/v2/... draggable="false" alt="đ" title="Unerfreutes Gesicht" aria-label="Emoji: Unerfreutes Gesicht">) into game_time_eastern (
https://abs.twimg.com/emoji/v2/... draggable="false" alt="đ" title="Unerfreutes Gesicht" aria-label="Emoji: Unerfreutes Gesicht">) into game_time_eastern ( https://abs.twimg.com/emoji/v2/... draggable="false" alt="đ" title="Kussgesicht mit lĂ€chelnden Augen" aria-label="Emoji: Kussgesicht mit lĂ€chelnden Augen">
https://abs.twimg.com/emoji/v2/... draggable="false" alt="đ" title="Kussgesicht mit lĂ€chelnden Augen" aria-label="Emoji: Kussgesicht mit lĂ€chelnden Augen"> https://abs.twimg.com/emoji/v2/... draggable="false" alt="đ" title="Ok hand" aria-label="Emoji: Ok hand">)
https://abs.twimg.com/emoji/v2/... draggable="false" alt="đ" title="Ok hand" aria-label="Emoji: Ok hand">)
                        
                        
                                                    
                        
                        
                                                
                    
                    
                                    
                    
                    
                                    
                    janitor::clean_names() if variable names with random capitalization, spaces and other undesired characters make you sick
with the defaults you can turn gameTimeEastern (
                        
                        
                         #4 
lubridate::parse_date_time() for inconsistent date formats
players %>%
mutate(birth_date = lubridate::parse_date_time(birth_date,
orders = c("y-m-d", "m/d/y"))
                    
                                    
                    
                    
                                    
                    
                    
                                    
                    lubridate::parse_date_time() for inconsistent date formats
players %>%
mutate(birth_date = lubridate::parse_date_time(birth_date,
orders = c("y-m-d", "m/d/y"))
                        
                        
                         #7
if you& #39;re going to bind all 17 weeks of data into one dataset, save it to disk as a parquet file via {arrow}. from my very unscientific testing with different file formats (rda, fst, feather, rds, tsv.gz), parquet was the fastest read
More on {arrow}: https://arrow.apache.org/docs/r/ ">https://arrow.apache.org/docs/r/&q...
                    
                                    
                    
                    
                                    
                    
                    
                                    
                    
                    
                                    
                    
                    
                                    
                    
                    
                
                if you& #39;re going to bind all 17 weeks of data into one dataset, save it to disk as a parquet file via {arrow}. from my very unscientific testing with different file formats (rda, fst, feather, rds, tsv.gz), parquet was the fastest read
More on {arrow}: https://arrow.apache.org/docs/r/ ">https://arrow.apache.org/docs/r/&q...
 
                         Read on Twitter
Read on Twitter 
                             
                             
                             
                             
                                         
                                         
                                         
                             
                                         
                                         
                                         
                                         
                                         
                                         
                                         
                                         
                                     
                                    