The Checklists of native and alien Italian vascular flora (1, 2) are considered the gold standard in plant diversity-related research in Italy, having received 971 combined citations in Scopus-indexed journals since their publication. The Checklists provide comprehensive plant names, synonyms, and associated data which are publicly available in digital format through the Portal to the Flora of Italy (https://dryades.units.it/floritaly). The Portal offers additional benefits, including regular updates published semi-annually, links to other digital resources, and a powerful name match tool that aligns any list of names with the Checklists. Currently, the Checklists data are not included in the growing inventory of packages developed for R and its companion GUI, RStudio, which is an open-source statistical software increasingly used in biodiversity-related research. A Google Scholar advanced search with the query string “Rstudio biodiversity analysis OR analyses” revealed that the number of scientific papers using RStudio for biodiversity analyses more than tripled from 2020 (38) to 2022 (129). Several R packages are indeed available that retrieve taxonomic information from repositories of scientific names and standardise name lists, such as taxize and its extension taxizee, taxonlookup, rbg, taxonstand, rotl. Given the versatility afforded by R in a wide range of analyses and applications, we expect plant diversity researches to benefit from floritaly, an R package providing easy access to the Checklists data stored in the Portal to the Flora of Italy. Floritaly can be installed in RStudio by typing devtools::install_github(“gibedini/floritaly”) in RStudio console. Any use of the package and the data it contains is subjected to a CC-BY license, which requires proper citation of the package and the Checklists (1,2). Floritaly offers two key components: the data itself, and functions for aligning plant names in any given list with the accepted Checklists names along with their associated data. The data is available in three tables (“dataframes” in R jargon): the master table includes the accepted names and their distribution status in the main administrative subdivisions (“regioni”), while two ancillary tables contain synonyms, accepted names, and fully parsed names. The package incudes two functions, nameStand() and nameLink(), which enable users to interact with the data tables mentioned above without needing to directly access them. The nameStand() function takes an unrevised list of scientific names, identifies the closest matching names from the Checklists, retrieves the associated accepted name, and returns a standardized name table with four columns: unrevised names, matching names, accepted names, and name distances (computed as Levenshtein distance). The nameLink() function complements nameStand() by associating each standardized name with the corresponding Checklists distribution data through an inner join on the accepted name. By sequentially applying nameStand() and nameLink(), users can effortlessly generate standardized datasets that include accepted names and distribution statuses, starting from unrevised lists. Additionally, the resulting dataset produced by nameStand() and nameLink() can be further processed and joined with georeferenced occurrence records from public databases such as GBIF.
Floritaly, an R package to access the Checklists of the vascular flora of Italy
Fabio Conti;Matteo Conti;Stefano Martellos
2023-01-01
Abstract
The Checklists of native and alien Italian vascular flora (1, 2) are considered the gold standard in plant diversity-related research in Italy, having received 971 combined citations in Scopus-indexed journals since their publication. The Checklists provide comprehensive plant names, synonyms, and associated data which are publicly available in digital format through the Portal to the Flora of Italy (https://dryades.units.it/floritaly). The Portal offers additional benefits, including regular updates published semi-annually, links to other digital resources, and a powerful name match tool that aligns any list of names with the Checklists. Currently, the Checklists data are not included in the growing inventory of packages developed for R and its companion GUI, RStudio, which is an open-source statistical software increasingly used in biodiversity-related research. A Google Scholar advanced search with the query string “Rstudio biodiversity analysis OR analyses” revealed that the number of scientific papers using RStudio for biodiversity analyses more than tripled from 2020 (38) to 2022 (129). Several R packages are indeed available that retrieve taxonomic information from repositories of scientific names and standardise name lists, such as taxize and its extension taxizee, taxonlookup, rbg, taxonstand, rotl. Given the versatility afforded by R in a wide range of analyses and applications, we expect plant diversity researches to benefit from floritaly, an R package providing easy access to the Checklists data stored in the Portal to the Flora of Italy. Floritaly can be installed in RStudio by typing devtools::install_github(“gibedini/floritaly”) in RStudio console. Any use of the package and the data it contains is subjected to a CC-BY license, which requires proper citation of the package and the Checklists (1,2). Floritaly offers two key components: the data itself, and functions for aligning plant names in any given list with the accepted Checklists names along with their associated data. The data is available in three tables (“dataframes” in R jargon): the master table includes the accepted names and their distribution status in the main administrative subdivisions (“regioni”), while two ancillary tables contain synonyms, accepted names, and fully parsed names. The package incudes two functions, nameStand() and nameLink(), which enable users to interact with the data tables mentioned above without needing to directly access them. The nameStand() function takes an unrevised list of scientific names, identifies the closest matching names from the Checklists, retrieves the associated accepted name, and returns a standardized name table with four columns: unrevised names, matching names, accepted names, and name distances (computed as Levenshtein distance). The nameLink() function complements nameStand() by associating each standardized name with the corresponding Checklists distribution data through an inner join on the accepted name. By sequentially applying nameStand() and nameLink(), users can effortlessly generate standardized datasets that include accepted names and distribution statuses, starting from unrevised lists. Additionally, the resulting dataset produced by nameStand() and nameLink() can be further processed and joined with georeferenced occurrence records from public databases such as GBIF.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.