117M plant records after a spatial join with the map of "Terrestrial Ecoregions of the World" (URL:  https://www.worldwildlife.org/publications/terrestrial-ecoregions-of-the-world)">https://www.worldwildlife.org/publicati... by Olson et al. (2001). The spatial join was done with PostgreSQL/PostGIS, and took around 6 hours.
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        PostGIS facilitates this kind of large spatial joins by integrating its own commands with SQL. In this case, ST_Intersects compares the geometries of the species records and ecoregion polygons to yield TRUE when a record of "plantae" intersects with a polygon of "ecoregions".
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Even though the screen capture was done in pgAdmin4, the operation was done from Rstudio with the package "RPostgreSQL". It allows to establish a connection with a PostgreSQL database, and use either dplyr or SQL code chunks in an .Rmd file to process the data stored in the ddbb.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        So, from a single .Rmd file I have been working with: Spark to clean the data efficiently; with the system console to setup the PostgreSQL/PostGIS database; and with the database to do the spatial operations, so far.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        I have been working with GRASS GIS on the side to prepare environmental data, but yesterday I learned I should have been using the R package rgrass7 to do that from the .Rmd as well. Anyway, now these rasters I prepared will be swallowed by PostgreSQL/PostGIS as well.
                        
                        
                        
                        
                                                
                    
                    
                
                 
                         Read on Twitter
Read on Twitter 
                             
                             
                                     
                                    