Welcome

This R Markdown document summarises the principal the steps to generate figures and tables using data pertaining to organ sizes across vertebrates and invertebrates species. I employed a systematic map to search for information on organ sizes, body sizes, and a set of metadata (see Figure S1). This approach facilitates a more comprehensive understanding of the database structure and also, will be serve as backbone for the studying scaling of organ sizes among species.

Citation

When using the data and/or code associated with this project, they should be cited as follows:

  • Leiva, F. P., Ockhuijsen L., Polinder, J., Schreyers, L., Xiong, J., Hendriks A. J. (2025). A systematic map and comprehensive database of invertebrate and vertebrate organ size. Zenodo. DOI will be available here soon.

Contact

This script is authored by Félix P. Leiva. For any questions related to this resource, please contact me at the email address: felixpleiva@gmail.com.

Disclaimer

This code routine may contain typographical errors, specific lines of code, or comments in Spanish (my native language). Should you encounter any errors in the code or data, please let me know via email.

Licence

This repository is provided by the author under the licence Attribution-NonCommercial-NoDerivatives 4.0 International.

Clean working space

rm(list = ls())

Load libraries

library(kableExtra)       # Enhances tables created with 'knitr::kable'
library(DataExplorer)     # Automates exploratory data analysis
library(dplyr)            # Efficient data manipulation
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
## 
##     group_rows
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)          # Data visualisation based on the grammar of graphics
## Warning: package 'ggplot2' was built under R version 4.3.3
library(RefManageR)       # Manages references and citations
library(ggpubr)           # Creates publication-ready graphics
library(cowplot)          # Arranges and annotates plots
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggpubr':
## 
##     get_legend
library(tidygeocoder)     # Converts addresses into geographic coordinates
## Warning: package 'tidygeocoder' was built under R version 4.3.3
library(rnaturalearth)    # Accesses Natural Earth geographic data
library(ape)              # Analyses phylogenies and evolution
## Warning: package 'ape' was built under R version 4.3.3
## 
## Attaching package: 'ape'
## The following object is masked from 'package:ggpubr':
## 
##     rotate
## The following object is masked from 'package:dplyr':
## 
##     where
library(ggtree)           # Visualises and annotates phylogenetic trees
## ggtree v3.10.1 For help: https://yulab-smu.top/treedata-book/
## 
## If you use the ggtree package suite in published research, please cite
## the appropriate paper(s):
## 
## Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam.
## ggtree: an R package for visualization and annotation of phylogenetic
## trees with their covariates and other associated data. Methods in
## Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
## 
## Shuangbin Xu, Lin Li, Xiao Luo, Meijun Chen, Wenli Tang, Li Zhan, Zehan
## Dai, Tommy T. Lam, Yi Guan, Guangchuang Yu. Ggtree: A serialized data
## object for visualization of a phylogenetic tree and annotation data.
## iMeta 2022, 1(4):e56. doi:10.1002/imt2.56
## 
## S Xu, Z Dai, P Guo, X Fu, S Liu, L Zhou, W Tang, T Feng, M Chen, L
## Zhan, T Wu, E Hu, Y Jiang, X Bo, G Yu. ggtreeExtra: Compact
## visualization of richly annotated phylogenetic data. Molecular Biology
## and Evolution. 2021, 38(9):4039-4042. doi: 10.1093/molbev/msab166
## 
## Attaching package: 'ggtree'
## The following object is masked from 'package:ape':
## 
##     rotate
## The following object is masked from 'package:ggpubr':
## 
##     rotate
library(tibble)           # Alternative to data frames
library(ggthemes)         # Additional themes for 'ggplot2' graphics
## 
## Attaching package: 'ggthemes'
## The following object is masked from 'package:cowplot':
## 
##     theme_map
library(sessioninfo)      # Documents session environment for reproducibility
## Warning: package 'sessioninfo' was built under R version 4.3.3
library(details)          # Adds inline or interactive details
## Warning: package 'details' was built under R version 4.3.3
library(stringr)
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:ggtree':
## 
##     expand

Load data

dat <- read.csv("../outputs/organ_size_with_taxonomy.csv")

Load phylogenetic tree

tree<-read.tree("../outputs/Phylogenetic tree for 363 species.tre")

Load references

refs <- ReadBib("../outputs/Data used in the DB.bib")

Overview of screening steps used to select studies included in the organ size database

Figure S1. PRISMA-type diagram showing the systematic and non-systematic literature search reporting organ size and body size pairs data. (*) We have not received responses from the corresponding author at the time of the manuscript submission.
Figure S1. PRISMA-type diagram showing the systematic and non-systematic literature search reporting organ size and body size pairs data. (*) We have not received responses from the corresponding author at the time of the manuscript submission.

Data exploration

Change some species names

# Lets change those names

dat$species[dat$species == "Ateles chamek"]                     <- "Ateles belzebuth chamek"
dat$species[dat$species == "Leontocebus nigrifrons"]            <- "Saguinus fuscicollis nigrifrons"
dat$species[dat$species == "Terrapene triunguis"]               <- "Terrapene carolina triunguis"
dat$species[dat$species == "Anarhynchus alexandrinus"]          <- "Charadrius alexandrinus"
dat$species[dat$species == "Cebuella pygmaea"]                  <- "Callithrix pygmaea"
dat$species[dat$species == "Dicotyles tajacu"]                  <- "Pecari tajacu"
dat$species[dat$species == "Heteromys salvini"]                 <- "Liomys salvini"
dat$species[dat$species == "Leontocebus fuscicollis"]           <- "Saguinus fuscicollis"
dat$species[dat$species == "Lithobates pipiens"]                <- "Rana pipiens"
dat$species[dat$species == "pekania pennanti"]                  <- "Martes pennanti"
dat$species[dat$species == "Sephanoides sephaniodes"]           <- "Sephanoides sephanoides"
dat$species[dat$species == "Spodiopsar sericeus"]               <- "Sturnus sericeus"
dat$species[dat$species == "Pekania pennanti"]                  <- "Martes pennanti"
dat$species[dat$species == "Saguinus fuscicollis"]              <- "Saguinus fuscicollis fuscicollis"
dat$species[dat$species == "Presbytis melalophos"]              <- "Presbytis melalophos mitrata"
dat$species[dat$species == "Eulemur fulvus"]                    <- "Eulemur fulvus fulvus"
dat$species[dat$species == "Saimiri sciureus"]                  <- "Saimiri sciureus sciureus"
dat$species[dat$species == "Alouatta seniculus"]                <- "Alouatta seniculus seniculus"

To give an overview and description of the metadata associated with the extraction of organ size and body size data in invertebrates and vertebrates. This table is labeled as Table S1 in the manuscript.

Check and reformat variables if is needed

str(dat)
## 'data.frame':    10605 obs. of  36 variables:
##  $ species_reported                  : chr  "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" ...
##  $ initials                          : chr  "FPLeiva" "FPLeiva" "FPLeiva" "FPLeiva" ...
##  $ key                               : chr  "rayyan-150592721" "rayyan-150592721" "rayyan-150592721" "rayyan-150592721" ...
##  $ context_study                     : chr  "ecological/evolutionary" "ecological/evolutionary" "ecological/evolutionary" "ecological/evolutionary" ...
##  $ taxonomic_group                   : chr  "mammal" "mammal" "mammal" "mammal" ...
##  $ habitat                           : chr  "terrestrial" "terrestrial" "terrestrial" "terrestrial" ...
##  $ origin                            : chr  "wild" "wild" "wild" "wild" ...
##  $ season_of_collection              : chr  "autumn" "autumn" "autumn" "autumn" ...
##  $ lat_dec                           : num  -47.5 -47.5 -47.4 -47.5 -47.4 ...
##  $ long_dec                          : num  -72.8 -72.8 -71.9 -72.8 -71.9 ...
##  $ age_years                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ life_stage_individual             : chr  NA NA NA NA ...
##  $ sex_individual                    : chr  NA NA NA NA ...
##  $ id_cluster                        : int  4 4 2 4 2 3 2 3 4 3 ...
##  $ trait_size_category               : chr  "body size" "small intestine" "stomach" "caecum" ...
##  $ weight_status                     : chr  "fresh" "fresh" "fresh" "fresh" ...
##  $ paired_organs_weighed_individually: chr  NA NA NA NA ...
##  $ organ_side                        : chr  NA NA NA NA ...
##  $ trait_details                     : chr  "body weight" "small intestine weight" "stomach weight" "caecum weight" ...
##  $ trait_unit                        : chr  "g" "g" "g" "g" ...
##  $ mean_trait                        : num  26.274 0.698 0.375 0.25 1.357 ...
##  $ error_trait                       : num  2.1934 0.0596 0.0182 0.0601 0.0675 ...
##  $ error_type                        : chr  "standard_error" "standard_error" "standard_error" "standard_error" ...
##  $ n_for_mean_trait                  : chr  "26" "26" "21" "26" ...
##  $ additional_trait                  : chr  NA NA NA NA ...
##  $ data_source                       : chr  "Figure 4F" "Figure 4C" "Figure 4A" "Figure 4B" ...
##  $ data_doi                          : chr  NA NA NA NA ...
##  $ relevant_notes                    : chr  NA NA NA NA ...
##  $ phylum                            : chr  "Chordata" "Chordata" "Chordata" "Chordata" ...
##  $ class                             : chr  "Mammalia" "Mammalia" "Mammalia" "Mammalia" ...
##  $ order                             : chr  "Rodentia" "Rodentia" "Rodentia" "Rodentia" ...
##  $ family                            : chr  "Cricetidae" "Cricetidae" "Cricetidae" "Cricetidae" ...
##  $ genus                             : chr  "Abrothrix" "Abrothrix" "Abrothrix" "Abrothrix" ...
##  $ species                           : chr  "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" ...
##  $ source                            : chr  "ncbi" "ncbi" "ncbi" "ncbi" ...
##  $ taxo_level                        : chr  "Species" "Species" "Species" "Species" ...
# make a new column of species underscored
dat$species_underscored <- gsub(" ", "_", dat$species)

# Choosing columns I want to convert to factor
columns_to_factor <- c(
  "species_reported", "initials", "key", "context_study", "taxonomic_group",
  "habitat", "origin", "season_of_collection", "life_stage_individual",
  "sex_individual", "id_cluster", "trait_size_category", "weight_status",
  "paired_organs_weighed_individually", "organ_side", "trait_details",
  "trait_unit", "error_type", "additional_trait", "data_source", "data_doi",
  "phylum", "class", "order", "family", "genus", "species", "source",
  "taxo_level", "species_underscored"
)

# Convert columns to factor
dat <- dat %>%
  mutate(across(all_of(columns_to_factor), as.factor))

# Convert column to numeric
dat <- dat %>%
  mutate(mean_trait = as.numeric(mean_trait))

# check again
str(dat)
## 'data.frame':    10605 obs. of  37 variables:
##  $ species_reported                  : Factor w/ 380 levels "Abrothrix longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ initials                          : Factor w/ 1 level "FPLeiva": 1 1 1 1 1 1 1 1 1 1 ...
##  $ key                               : Factor w/ 235 levels "rayyan-150591684",..: 158 158 158 158 158 158 158 158 158 158 ...
##  $ context_study                     : Factor w/ 6 levels "animal health",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ taxonomic_group                   : Factor w/ 7 levels "amphibian","bird",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ habitat                           : Factor w/ 3 levels "freshwater","marine",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ origin                            : Factor w/ 6 levels "captive","commercial",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ season_of_collection              : Factor w/ 4 levels "autumn","spring",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ lat_dec                           : num  -47.5 -47.5 -47.4 -47.5 -47.4 ...
##  $ long_dec                          : num  -72.8 -72.8 -71.9 -72.8 -71.9 ...
##  $ age_years                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ life_stage_individual             : Factor w/ 13 levels "adult","elderly",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ sex_individual                    : Factor w/ 3 levels "both","female",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ id_cluster                        : Factor w/ 360 levels "1","2","3","4",..: 4 4 2 4 2 3 2 3 4 3 ...
##  $ trait_size_category               : Factor w/ 54 levels "adipose depot",..: 4 46 48 8 31 28 4 48 48 31 ...
##  $ weight_status                     : Factor w/ 4 levels "dried","fixed",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ paired_organs_weighed_individually: Factor w/ 2 levels "no","yes": NA NA NA NA NA NA NA NA NA NA ...
##  $ organ_side                        : Factor w/ 2 levels "left","right": NA NA NA NA NA NA NA NA NA NA ...
##  $ trait_details                     : Factor w/ 443 levels "abdominal fat",..: 14 360 380 37 187 153 14 380 380 187 ...
##  $ trait_unit                        : Factor w/ 16 levels "% of body weight",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ mean_trait                        : num  26.274 0.698 0.375 0.25 1.357 ...
##  $ error_trait                       : num  2.1934 0.0596 0.0182 0.0601 0.0675 ...
##  $ error_type                        : Factor w/ 3 levels "confidence_interval",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ n_for_mean_trait                  : chr  "26" "26" "21" "26" ...
##  $ additional_trait                  : Factor w/ 3 levels "individual metabolic rates",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ data_source                       : Factor w/ 80 levels "archived raw data",..: 42 39 35 37 41 40 42 35 35 41 ...
##  $ data_doi                          : Factor w/ 5 levels "10.5061/\ndryad.6djh9w13b",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ relevant_notes                    : chr  NA NA NA NA ...
##  $ phylum                            : Factor w/ 3 levels "Arthropoda","Chordata",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ class                             : Factor w/ 8 levels "Actinopterygii",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ order                             : Factor w/ 60 levels "Accipitriformes",..: 48 48 48 48 48 48 48 48 48 48 ...
##  $ family                            : Factor w/ 133 levels "Accipitridae",..: 30 30 30 30 30 30 30 30 30 30 ...
##  $ genus                             : Factor w/ 267 levels "Abrothrix","Acomys",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ species                           : Factor w/ 366 levels "Abrothrix longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ source                            : Factor w/ 3 levels "gbif","itis",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ taxo_level                        : Factor w/ 1 level "Species": 1 1 1 1 1 1 1 1 1 1 ...
##  $ species_underscored               : Factor w/ 366 levels "Abrothrix_longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...

Grouping of organs

In the code below, we will attempt to group the different types of organs included in our database to improve visualisation across the various figures. As we see below, there 53 distinct organ types.

# check the organ names
length(unique(dat$trait_size_category))
## [1] 54

However, some of these actually represent the same organ type.

# check the organ names
unique(dat$trait_size_category)
##  [1] body size                       small intestine                
##  [3] stomach                         caecum                         
##  [5] liver                           intestine                      
##  [7] brain                           spleen                         
##  [9] adipose depot                   heart                          
## [11] digestive tract                 kidney                         
## [13] central nervous system          skeleton                       
## [15] musculature                     reproductive system            
## [17] digestive system                circulatory system and fat body
## [19] malpighian tubules              ileum                          
## [21] duodenum                        gizzard                        
## [23] muscle                          jejunum                        
## [25] lung                            ventricle                      
## [27] rectum                          colon                          
## [29] esophagus                       thymus                         
## [31] bursa                           testes                         
## [33] thyroid/parathyroid glands      salt gland                     
## [35] harderian gland                 ovary                          
## [37] pancreas                        adrenal glands                 
## [39] fat                             proventriculus                 
## [41] gonad                           bone                           
## [43] gill                            uterus                         
## [45] prostate gland                  pituitary gland                
## [47] oviduct                         fore limb                      
## [49] hind limb                       electric organ                 
## [51] epididymides                    gut                            
## [53] ureter                          bladder                        
## 54 Levels: adipose depot adrenal glands bladder body size bone brain ... ventricle

System categories

This happened because during data extraction, I retained the original organ names as reported in the studies.

# Add system column using case_when
dat <- dat %>%
  mutate(system = case_when(
    # Digestive system
    trait_size_category %in% c("liver", "caecum", "intestine", "stomach", 
                              "digestive tract", "digestive system",
                              "jejunum", "duodenum", "gizzard", "ileum",
                              "esophagus", "colon", "rectum", "proventriculus",
                              "gut", "pancreas") ~ "Digestive",
    
    # Excretory system
    trait_size_category %in% c("kidney", "malpighian tubules", "ureter", 
                              "bladder") ~ "Excretory",
    
    # Circulatory system
    trait_size_category %in% c("heart", "ventricle", "circulatory system and fat body",
                              "spleen") ~ "Circulatory",
    
    # Immune system
    trait_size_category %in% c("thymus", "bursa") ~ "Immune",
    
    # Nervous system
    trait_size_category %in% c("brain", "central nervous system", 
                              "pituitary gland") ~ "Nervous",
    
    # Endocrine system
    trait_size_category %in% c("thyroid/parathyroid glands", "adrenal glands",
                              "harderian gland", "salt gland") ~ "Endocrine",
    
    # Respiratory system
    trait_size_category %in% c("lung", "gill") ~ "Respiratory",
    
    # Reproductive system
    trait_size_category %in% c("ovary", "testes", "gonad", "uterus", 
                              "prostate gland", "oviduct", "epididymides",
                              "reproductive system") ~ "Reproductive",
    
    # Musculoskeletal system
    trait_size_category %in% c("skeleton", "bone", "hind limb", "fore limb",
                              "muscle", "musculature") ~ "Musculoskeletal",
    
    # Adipose/fat storage
    trait_size_category %in% c("adipose depot", "fat") ~ "Adipose tissue",
    
    # Default category for anything not matched
    TRUE ~ "Body size"
  ))
# Create summary table with group totals
summary_table <- dat %>%
  count(system, trait_size_category) %>%
  arrange(system, trait_size_category) %>%
  # Add group totals
  bind_rows(
    dat %>%
      count(system, name = "n") %>%
      mutate(trait_size_category = "TOTAL") %>%
      select(system, trait_size_category, n)
  ) %>%
  arrange(system, trait_size_category != "TOTAL", trait_size_category)

Lets check the grouping

# make a table to see the number of organs measured in each system
kable(summary_table, col.names = c("System", "Organ", "Count")) %>%
  kable_styling("striped", position = "left", full_width = TRUE) %>%
  row_spec(which(summary_table$trait_size_category == "TOTAL"), bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "500px")
System Organ Count
Adipose tissue TOTAL 155
Adipose tissue adipose depot 108
Adipose tissue fat 47
Body size TOTAL 3660
Body size body size 3210
Body size electric organ 60
Body size small intestine 390
Circulatory TOTAL 988
Circulatory circulatory system and fat body 18
Circulatory heart 511
Circulatory spleen 307
Circulatory ventricle 152
Digestive TOTAL 2593
Digestive caecum 289
Digestive colon 27
Digestive digestive system 18
Digestive digestive tract 117
Digestive duodenum 17
Digestive esophagus 15
Digestive gizzard 163
Digestive gut 88
Digestive ileum 13
Digestive intestine 656
Digestive jejunum 15
Digestive liver 741
Digestive pancreas 53
Digestive proventriculus 26
Digestive rectum 23
Digestive stomach 332
Endocrine TOTAL 183
Endocrine adrenal glands 90
Endocrine harderian gland 21
Endocrine salt gland 40
Endocrine thyroid/parathyroid glands 32
Excretory TOTAL 461
Excretory bladder 2
Excretory kidney 439
Excretory malpighian tubules 16
Excretory ureter 4
Immune TOTAL 101
Immune bursa 41
Immune thymus 60
Musculoskeletal TOTAL 1136
Musculoskeletal bone 479
Musculoskeletal fore limb 112
Musculoskeletal hind limb 112
Musculoskeletal muscle 395
Musculoskeletal musculature 18
Musculoskeletal skeleton 20
Nervous TOTAL 490
Nervous brain 436
Nervous central nervous system 33
Nervous pituitary gland 21
Reproductive TOTAL 457
Reproductive epididymides 13
Reproductive gonad 55
Reproductive ovary 149
Reproductive oviduct 5
Reproductive prostate gland 8
Reproductive reproductive system 17
Reproductive testes 197
Reproductive uterus 13
Respiratory TOTAL 381
Respiratory gill 165
Respiratory lung 216

Figure 1. Cumulative number of studies and most common journals

# Panel A: Extracting and cleaning publication years

df_years <- refs %>%
  as.data.frame() %>%
  select(year) %>%
  mutate(year = as.numeric(as.character(year))) %>%  # Ensure year is numeric
  filter(!is.na(year))  # Remove NA values

# Counting studies per year and calculating cumulative values:
studies_per_year <- df_years %>%
  group_by(year) %>%
  summarise(num_studies = n()) %>%
  arrange(year) %>%
  mutate(cumulative_studies = cumsum(num_studies))  # Calculate cumulative count

# Plotting the cumulative number of studies:
plot_years <- 
  ggplot(studies_per_year, aes(x = year, y = cumulative_studies)) +
  geom_line(color = "#009E73", linewidth = 2) +
  scale_y_continuous(limits = c(0, 250)) +  # Limit the y-axis to 400
  scale_x_continuous(
    limits = c(1955, 2024),
    breaks = seq(1960, 2024, by = 12)  # Set x-axis intervals to 55 years
  ) +
  labs(
    x = "Publication Year",
    y = "Number of Studies"
  ) +
  theme_pubr() +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12)
  )
# ------------------------------------------------------------------------------
# lets reformat the names of the journal to have to count them and then plot
df_journals <- refs %>%
  as.data.frame() %>%
  select(journal) %>%
  filter(!is.na(journal)) %>%
  mutate(
    journal_clean = tolower(journal),
    journal_clean = str_replace_all(journal_clean, "\\\\&", "&"),
    journal_clean = str_squish(journal_clean),
    journal_clean = str_to_title(journal_clean),
    journal_clean = str_replace_all(
      journal_clean,
      "\\b(Of|And|In|On|For|The|A|An|To|With|By|At|From|But|Or|Nor)\\b",
      function(x) tolower(x)
    ),
    journal_clean = sub("^([a-z])", toupper("\\1"), journal_clean)
  ) %>%
  arrange(desc(journal_clean))

df_journals <- df_journals %>%
  group_by(journal_clean) %>%
  summarise(num_articles = n(), .groups = "drop") %>%
  arrange(desc(num_articles)) %>%
  slice_head(n = 10)
plot_journals <- ggplot(df_journals, aes(x = reorder(journal_clean, num_articles), y = num_articles)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  scale_y_continuous(limits = c(0, 50)) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Journals",
    y = "Number of Studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_blank()
  ) +
  geom_text(aes(label = journal_clean), hjust = 0, vjust = 0.5, 
            color = "black", size = 4, check_overlap = TRUE, 
            position = position_dodge(width = 0.7)) +
  geom_text(aes(label = num_articles), 
            hjust = 2, color = "black", size = 4)
# ------------------------------------------------------------------------------
# Combining Panel A and Panel B
Figure_1 <- plot_grid(
  plot_years, 
  plot_journals,
  labels = c("A", "B"),
  nrow = 2,
  ncol = 1,
  label_size = 15,
  align = "v")

Figure_1

# Saving the combined figure
ggsave('../outputs/Figure_1_Studies_and_Journals.pdf', Figure_1, width = 6, height = 9)
ggsave('../outputs/Figure_1_Studies_and_Journals.png', Figure_1, width = 6, height = 9, dpi = 1200)

Figure 2: Number of studies by study context, origin and season of collection

# Calculate the number of unique studies by each categorical variable
plot_data <- dat %>%
  group_by(context_study, origin, season_of_collection) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_studies),
    names_to = "variable",
    values_to = "level"
  )

plot_data <- plot_data %>%
  mutate(variable = recode(variable,
                           context_study = "Study context",
                           origin = "Source of specimens",
                           season_of_collection = "Season of collection")) %>%
  group_by(variable, level) %>%
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# rename the NA level as "not reported"
plot_data <- plot_data %>%
  mutate(level = if_else(is.na(level), "not reported", as.character(level)))

#  reorder the levels
plot_data <- plot_data %>%
  mutate(
    variable = factor(
      variable,
      levels = c("Study context", "Source of specimens", "Season of collection")
    )
  )

Figure_2 <- 
  ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  # Add number of studies inside the bars
  geom_text(aes(label = unique_studies), 
            hjust = -0.15, color = "black", size = 4) +
  coord_flip() +
  scale_y_continuous(limits = c(0, 250)) +
  facet_wrap(~ variable, scales = "free_y", ncol = 3) +
  theme_pubr() +
  labs(
    x = NULL,  # Remove x axis label
    y = "Number of studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 10),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "transparent", linewidth = 0),
    panel.spacing = unit(1, "lines")
  )

Figure_2

# # Store Plots
ggsave('../outputs/Figure_2_Studies_by_Factors.pdf', Figure_2, width = 15/1.2, height = 4/1.2)
ggsave('../outputs/Figure_2_Studies_by_Factors.png', Figure_2, width = 15/1.2, height = 4/1.2, dpi = 1200)

Figure 3: Number of studies by habitat, life stage and sex

# Calculate the number of unique studies by each categorical variable
plot_data <- dat %>%
  group_by(sex_individual, life_stage_individual, habitat) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_studies),
    names_to = "variable",
    values_to = "level"
  )

plot_data <- plot_data %>%
  mutate(variable = recode(variable,
                           habitat = "Habitat",
                           sex_individual = "Sex studied",
                           life_stage_individual = "Life stage studied")) %>%
  group_by(variable, level) %>%
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# rename the NA level as "not reported"
plot_data <- plot_data %>%
  mutate(level = if_else(is.na(level), "not reported", as.character(level)))

#  reorder the levels
plot_data <- plot_data %>%
  mutate(
    variable = factor(
      variable,
      levels = c("Habitat", "Life stage studied", "Sex studied")
    )
  )

Figure_3 <- 
  ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  # Add number of studies inside the bars
  geom_text(aes(label = unique_studies), 
            hjust = -0.15, color = "black", size = 4) +
  coord_flip() +
  scale_y_continuous(limits = c(0, 300)) +
  facet_wrap(~ variable, scales = "free_y", ncol = 3) +
  theme_pubr() +
  labs(
    x = NULL,  # Remove x axis label
    y = "Number of studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 10),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "transparent", linewidth = 0),
    panel.spacing = unit(1, "lines")
  )

Figure_3

# # Store Plots
ggsave('../outputs/Figure_3_Studies_by_Factors.pdf', Figure_3, width = 15/1.5, height = 4/1.5)
ggsave('../outputs/Figure_3_Studies_by_Factors.png', Figure_3, width = 15/1.5, height = 4/1.5, dpi = 1200)

Figure 4: Studies by class and most common species in the database

data_studies <- dat %>%
  group_by(class) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop")

Figure_4a <- 
  ggplot(data_studies, aes(x = reorder(class, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  geom_text(aes(label = unique_studies), hjust = -0.15, color = "black", size = 4) +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 150)) +
  labs(x = NULL, y = "Number of studies") +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 11),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")
  )

# plot most studies species
plot_data <- dat %>%
  group_by(species) %>%
  summarise(num_studies = n_distinct(key), .groups = "drop") %>%
  arrange(desc(num_studies)) %>%
  slice_head(n = 10) %>%
  mutate(
    # Reemplazar espacio por ~ para que ggplot2 lo interprete bien
    species_italic = gsub(" ", "~~", species),
    # Crear expresión para cursiva
    species_italic = paste0("italic('", gsub(" ", " ", species), "')"),
    # Alternativamente, para separar género y especie en cursiva sin comillas:
    species_italic = gsub(" ", "~~", species),
    species_italic = paste0("italic(", species_italic, ")")
  )

# Graficar con nombres en cursiva
Figure_4b <- ggplot(plot_data, aes(x = reorder(species_italic, num_studies), y = num_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  geom_text(aes(label = num_studies), hjust = -0.15, color = "black", size = 4) +
  coord_flip() +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 150)) +
  labs(x = NULL, y = "Number of studies") +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8)
  ) +
  scale_x_discrete(labels = function(x) parse(text = x))

# Combining Panel 4A and Panel 4B
Figure_4 <- plot_grid(
  Figure_4a, 
  Figure_4b,
  labels = c("A", "B"),
  nrow = 2,
  ncol = 1,
  label_size = 15,
  align = "v")

Figure_4

# Store Plots
ggsave('../outputs/Figure_4_Most_common_species.pdf', Figure_4, width = 6, height = 9)
ggsave('../outputs/Figure_4_Most_common_species.png', Figure_4, width = 6, height = 9, dpi = 1200)

Figure 5: Number of studies by system

plot_data <- dat %>%
  group_by(system) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_studies),
    names_to = "variable",
    values_to = "level"
)

plot_data <- plot_data %>%
  mutate(variable = recode(variable,
                           system = "System studied")) %>%
  group_by(variable, level) %>%
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# Create plot with panel borders
Figure_5 <- 
  ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  # Add number of studies inside the bars
  geom_text(aes(label = unique_studies), 
            hjust = -0.15, color = "black", size = 4) +
  facet_wrap(~ variable, scales = "free_y", ncol = 1) +
  theme_pubr() +
  labs(
     x = NULL,  # Remove x axis label
    y = "Number of studies"
  ) +
 theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 12),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "transparent", linewidth = 0),
    panel.spacing = unit(1, "lines")
  )

Figure_5

# # Store Plots
ggsave('../outputs/Figure_5_Studies_by_system.pdf', Figure_5, width = 7, height = 7)
ggsave('../outputs/Figure_5_Studies_by_system.png', Figure_5, width = 7, height = 7, dpi = 1200)

Figure 6: Phylogeny of species included in the DB with studies

# Summaries data: calculate the mean value per species for each trait
names(dat)
##  [1] "species_reported"                   "initials"                          
##  [3] "key"                                "context_study"                     
##  [5] "taxonomic_group"                    "habitat"                           
##  [7] "origin"                             "season_of_collection"              
##  [9] "lat_dec"                            "long_dec"                          
## [11] "age_years"                          "life_stage_individual"             
## [13] "sex_individual"                     "id_cluster"                        
## [15] "trait_size_category"                "weight_status"                     
## [17] "paired_organs_weighed_individually" "organ_side"                        
## [19] "trait_details"                      "trait_unit"                        
## [21] "mean_trait"                         "error_trait"                       
## [23] "error_type"                         "n_for_mean_trait"                  
## [25] "additional_trait"                   "data_source"                       
## [27] "data_doi"                           "relevant_notes"                    
## [29] "phylum"                             "class"                             
## [31] "order"                              "family"                            
## [33] "genus"                              "species"                           
## [35] "source"                             "taxo_level"                        
## [37] "species_underscored"                "system"
summary_data <- dat %>%
  count(species_underscored, system) %>%  # Count occurrences by species and system
  mutate(presence = if_else(n > 0, "Yes", "No")) %>%  # Convert counts to "Yes"/"No"
  select(-n) %>% # Remove the original count column
  pivot_wider(
    names_from = system,                   # Each system becomes a column
    values_from = presence,                # Values are now "Yes" or "No"
    values_fill = list(presence = "No")    # Fill missing combinations with "No"
  )

summary_data <- summary_data %>%
  select(-`Body size`)

names(summary_data)
##  [1] "species_underscored" "Digestive"           "Adipose tissue"     
##  [4] "Circulatory"         "Excretory"           "Nervous"            
##  [7] "Musculoskeletal"     "Reproductive"        "Respiratory"        
## [10] "Endocrine"           "Immune"
# Check for mismatches between tree tips and data
setdiff(tree$tip.label, summary_data$species_underscored)
## [1] "Saimiri_sciureus_macrodon"
setdiff(summary_data$species_underscored, tree$tip.label)
## [1] "Muusoctopus_aegir" "Neomys_anomalus"   "Neomys_fodiens"   
## [4] "Saimiri_macrodon"
tree <- keep.tip(tree, intersect(tree$tip.label, summary_data$species_underscored))

summary_data <- summary_data[summary_data$species_underscored %in% tree$tip.label, ]

# check again
setdiff(tree$tip.label, summary_data$species_underscored)
## character(0)
setdiff(summary_data$species_underscored, tree$tip.label)
## character(0)
# Align data and tree
datF <- summary_data %>%
  column_to_rownames("species_underscored")

# Plot with species without names
circ_names <- ggtree(tree, layout = "fan", open.angle = 18, branch.length = "none") +
  geom_tiplab(offset = 0.1, hjust = 0, size = 1)
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
circ_names <- rotate_tree(circ_names, 90)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
circ_names

# Plot with species without names
circ <- ggtree(tree, layout = "fan", open.angle = 18, branch.length = "none") 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
circ <- rotate_tree(circ, 90)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
circ

# Create a new plot with heatmap for each trait using a single scale
tree_data <- gheatmap(
  circ, 
  datF, 
  width = 0.4, 
  offset = 0,  # Offset for placing the heatmap
  colnames_offset_x = 0, 
  colnames_offset_y = 0, 
  font.size = 3, 
  hjust = 0
)
tree_data

# Apply the same scale for all traits
circ_data <- tree_data + 
  scale_fill_manual(values = c("grey", "#009E73"), name = "System measured") +
  theme(
    legend.position = c(0.56, 0.57),  # Posición manual de la leyenda
    legend.background = element_rect(fill = "transparent", colour = NA), # Fondo transparente
    legend.box.background = element_rect(fill = "transparent", colour = NA) # Borde transparente
  )
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
## 3.5.0.
## ℹ Please use the `legend.position.inside` argument of `theme()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
circ_data

ggsave("../outputs/Figure_6_Phylogenetic_tree_with_data.png", circ_data, width = 12, height = 12, dpi = 1500)
ggsave("../outputs/Figure_6_Phylogenetic_tree_with_data.pdf", circ_data, width = 12, height = 12)
ggsave("../outputs/Figure_6_Phylogenetic_tree_with_names.pdf", circ_names, width = 12, height = 12)

Figure S2: Number of species by taxonomic group (Class)

#prepare data for ploting
data_taxa <- dat %>%
  group_by(class) %>%
  summarise(unique_species = n_distinct(species), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_species),
    names_to = "variable",
    values_to = "level"
  )

# Create plot with panel borders
Figure_S2 <- 
  ggplot(data_taxa, aes(x = reorder(level, unique_species), y = unique_species)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  # Add number of studies inside the bars
  geom_text(aes(label = unique_species), 
            hjust = -0.15, color = "black", size = 4) +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 250)) +
  labs(
    x = NULL,  # Remove x axis label
    y = "Number of species"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 11),
    plot.title = element_text(face = "bold", hjust = 0.5),
    # Panel border additions:
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "gray90", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")  # Adds space between facets
  )

Figure_S2

# Store Plots
ggsave('../outputs/Figure_S2_Species_by_class.pdf', Figure_S2, width = 6, height = 6)
ggsave('../outputs/Figure_S2_Species_by_class.png', Figure_S2, width = 6, height = 6, dpi = 1200)

Figure S3: Number of studies per category of trait (body and organ)

plot_data <- dat %>%
  group_by(trait_size_category) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_studies),
    names_to = "variable",
    values_to = "level"
)

plot_data <- plot_data %>%
  mutate(variable = recode(variable,
                           trait_size_category = "Type of organ reported")) %>%
  group_by(variable, level) %>%
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# Create plot with panel borders
Figure_S3 <- 
  ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  facet_wrap(~ variable, scales = "free_y", ncol = 1) +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 250)) +
  labs(
     x = NULL,  # Remove x axis label
    y = "Number of studies"
  ) +
 theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 12),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "transparent", linewidth = 0),
    panel.spacing = unit(1, "lines")
  )

Figure_S3

# # Store Plots
ggsave('../outputs/Figure_S3_Studies_by_organ.pdf', Figure_S3, width = 7, height = 7)
ggsave('../outputs/Figure_S3_Studies_by_organ.png', Figure_S3, width = 7, height = 7, dpi = 1200)

Data for the manuscript

Number of species

dat %>%
  distinct(species) %>%
  nrow()
## [1] 366

Studies published over years

df_years <- refs %>% 
  as.data.frame() %>% 
  select(year) %>% 
  mutate(year = as.numeric(as.character(year))) %>%  # Convert the 'year' column to numeric
  filter(!is.na(year)) %>%  # Remove rows where 'year' is missing (NA)
  group_by(year) %>%
  summarise(num_studies = n()) %>%
  arrange(year) %>%
  mutate(cumulative_studies = cumsum(num_studies))  # Calculate cumulative count

## Range de years included in the database
df_years %>% 
  reframe(min_year = min(year), 
          max_year = max(year), 
          total_years = max_year - min_year)
## # A tibble: 1 × 3
##   min_year max_year total_years
##      <dbl>    <dbl>       <dbl>
## 1     1964     2024          60

Number of species by class

table_sp_number <-dat %>%
  group_by(class) %>%
  reframe(n_spp = length(unique(species)), 
          total_study = length(unique(key)),
          perc_species = (n_spp/length(unique(dat$species)))* 100) %>%
  arrange(desc(perc_species))
table_sp_number
## # A tibble: 8 × 4
##   class          n_spp total_study perc_species
##   <fct>          <int>       <int>        <dbl>
## 1 Mammalia         227          78       62.0  
## 2 Aves              71         110       19.4  
## 3 Actinopterygii    24          29        6.56 
## 4 Insecta           14           1        3.83 
## 5 Amphibia          13           6        3.55 
## 6 Reptilia          12           6        3.28 
## 7 Chondrichthyes     4           4        1.09 
## 8 Cephalopoda        1           1        0.273
# test table kable
kable(table_sp_number, col.names = c("Class", "N", "Number of studies", "Percentage of species")) %>%
  kable_styling("striped", position = "left", full_width = TRUE) %>%
  row_spec(which(table_sp_number$n_spp == "TOTAL"), bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "500px")
Class N Number of studies Percentage of species
Mammalia 227 78 62.021858
Aves 71 110 19.398907
Actinopterygii 24 29 6.557377
Insecta 14 1 3.825137
Amphibia 13 6 3.551913
Reptilia 12 6 3.278688
Chondrichthyes 4 4 1.092896
Cephalopoda 1 1 0.273224

Percentage of coverage by organ type

table_organ_type <- dat %>%
  group_by(trait_size_category) %>%
  summarise(
    n_spp = n_distinct(species),
    total_study = n_distinct(key)
  ) %>%
  mutate(
    perc_studies = (total_study / n_distinct(dat$key)) * 100,
    perc_species = (n_spp / n_distinct(dat$species)) * 100
  ) %>%
  arrange(desc(perc_studies))

# Select and rename columns for display
table_show <- table_organ_type %>%
  select(
    Organ = trait_size_category,
    `Number of species` = n_spp,
    `Percentage of species` = perc_species,
    `Number of studies` = total_study,
    `Percentage of studies` = perc_studies
    )

# Display the table with kable and kableExtra
kable(table_show, digits = 2) %>%
  kable_styling("striped", position = "left", full_width = TRUE) %>%
  scroll_box(width = "100%", height = "500px")
Organ Number of species Percentage of species Number of studies Percentage of studies
body size 366 100.00 235 100.00
liver 171 46.72 161 68.51
spleen 125 34.15 94 40.00
heart 158 43.17 93 39.57
kidney 159 43.44 80 34.04
gizzard 13 3.55 54 22.98
lung 32 8.74 38 16.17
bursa 5 1.37 36 15.32
muscle 21 5.74 36 15.32
testes 21 5.74 35 14.89
brain 165 45.08 34 14.47
fat 12 3.28 32 13.62
intestine 138 37.70 32 13.62
thymus 10 2.73 32 13.62
pancreas 16 4.37 31 13.19
stomach 167 45.63 29 12.34
small intestine 36 9.84 26 11.06
caecum 43 11.75 25 10.64
ovary 14 3.83 23 9.79
adrenal glands 14 3.83 22 9.36
proventriculus 6 1.64 19 8.09
duodenum 5 1.37 12 5.11
epididymides 5 1.37 11 4.68
jejunum 4 1.09 11 4.68
ileum 3 0.82 10 4.26
uterus 4 1.09 9 3.83
colon 21 5.74 8 3.40
gonad 7 1.91 8 3.40
thyroid/parathyroid glands 7 1.91 7 2.98
bone 67 18.31 6 2.55
digestive tract 104 28.42 5 2.13
gill 9 2.46 5 2.13
pituitary gland 7 1.91 5 2.13
prostate gland 4 1.09 5 2.13
rectum 17 4.64 5 2.13
ventricle 5 1.37 5 2.13
adipose depot 100 27.32 4 1.70
oviduct 2 0.55 4 1.70
salt gland 4 1.09 4 1.70
esophagus 14 3.83 2 0.85
gut 2 0.55 2 0.85
harderian gland 2 0.55 2 0.85
skeleton 14 3.83 2 0.85
bladder 1 0.27 1 0.43
central nervous system 14 3.83 1 0.43
circulatory system and fat body 13 3.55 1 0.43
digestive system 13 3.55 1 0.43
electric organ 1 0.27 1 0.43
fore limb 2 0.55 1 0.43
hind limb 2 0.55 1 0.43
malpighian tubules 12 3.28 1 0.43
musculature 13 3.55 1 0.43
reproductive system 13 3.55 1 0.43
ureter 1 0.27 1 0.43

Export database

# check names y sselec the most relevamt columns
names(dat)
##  [1] "species_reported"                   "initials"                          
##  [3] "key"                                "context_study"                     
##  [5] "taxonomic_group"                    "habitat"                           
##  [7] "origin"                             "season_of_collection"              
##  [9] "lat_dec"                            "long_dec"                          
## [11] "age_years"                          "life_stage_individual"             
## [13] "sex_individual"                     "id_cluster"                        
## [15] "trait_size_category"                "weight_status"                     
## [17] "paired_organs_weighed_individually" "organ_side"                        
## [19] "trait_details"                      "trait_unit"                        
## [21] "mean_trait"                         "error_trait"                       
## [23] "error_type"                         "n_for_mean_trait"                  
## [25] "additional_trait"                   "data_source"                       
## [27] "data_doi"                           "relevant_notes"                    
## [29] "phylum"                             "class"                             
## [31] "order"                              "family"                            
## [33] "genus"                              "species"                           
## [35] "source"                             "taxo_level"                        
## [37] "species_underscored"                "system"
# slect the most releventa columns and sort where is needed
OrganYsize_DB_v1.0.0 <- dat %>% 
  select("key",
         "initials", "context_study",
         "phylum", "class", "order", "family", "genus","species", 
         "species_reported","species_underscored",
         "origin", "habitat","season_of_collection",
         "lat_dec",
         "long_dec",
         "age_years",
         "sex_individual", "life_stage_individual", "n_for_mean_trait",
         "id_cluster", "trait_size_category", 
         "weight_status","paired_organs_weighed_individually", "organ_side", "trait_details", "trait_unit", 
         "mean_trait", "error_trait", "error_type", 
         "data_source", "data_doi",
         "relevant_notes")

# export file as csv
write.csv(OrganYsize_DB_v1.0.0, "../outputs/Organ_Size_Database_v1.0.0.csv", row.names = FALSE)

#  and excel
writexl::write_xlsx(OrganYsize_DB_v1.0.0, "../outputs/Organ_Size_Database_v1.0.0.xlsx")

Session information

session_info() %>%
details(summary = 'Current Session Information', open = TRUE)
Current Session Information

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS 15.5
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Berlin
 date     2025-06-25
 pandoc   3.4 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
 quarto   1.6.42 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package       * version date (UTC) lib source
 abind           1.4-8   2024-09-12 [1] CRAN (R 4.3.3)
 ape           * 5.8-1   2024-12-16 [1] CRAN (R 4.3.3)
 aplot           0.2.5   2025-02-27 [1] CRAN (R 4.3.3)
 backports       1.5.0   2024-05-23 [1] CRAN (R 4.3.3)
 bibtex          0.5.1   2023-01-26 [1] CRAN (R 4.3.3)
 bookdown        0.43    2025-04-15 [1] CRAN (R 4.3.3)
 broom           1.0.8   2025-03-28 [1] CRAN (R 4.3.3)
 bslib           0.9.0   2025-01-30 [1] CRAN (R 4.3.3)
 cachem          1.1.0   2024-05-16 [1] CRAN (R 4.3.3)
 car             3.1-3   2024-09-27 [1] CRAN (R 4.3.3)
 carData         3.0-5   2022-01-06 [1] CRAN (R 4.3.3)
 class           7.3-23  2025-01-01 [1] CRAN (R 4.3.3)
 classInt        0.4-11  2025-01-08 [1] CRAN (R 4.3.3)
 cli             3.6.5   2025-04-23 [1] CRAN (R 4.3.3)
 clipr           0.8.0   2022-02-22 [1] CRAN (R 4.3.3)
 codetools       0.2-20  2024-03-31 [1] CRAN (R 4.3.3)
 cowplot       * 1.1.3   2024-01-22 [1] CRAN (R 4.3.1)
 crayon          1.5.3   2024-06-20 [1] CRAN (R 4.3.3)
 data.table      1.17.4  2025-05-26 [1] CRAN (R 4.3.3)
 data.tree       1.1.0   2023-11-12 [1] CRAN (R 4.3.3)
 DataExplorer  * 0.8.3   2024-01-24 [1] CRAN (R 4.3.1)
 DBI             1.2.3   2024-06-02 [1] CRAN (R 4.3.3)
 desc            1.4.3   2023-12-10 [1] CRAN (R 4.3.3)
 details       * 0.4.0   2025-02-09 [1] CRAN (R 4.3.3)
 digest          0.6.37  2024-08-19 [1] CRAN (R 4.3.3)
 dplyr         * 1.1.4   2023-11-17 [1] CRAN (R 4.3.1)
 e1071           1.7-16  2024-09-16 [1] CRAN (R 4.3.3)
 evaluate        1.0.3   2025-01-10 [1] CRAN (R 4.3.3)
 farver          2.1.2   2024-05-13 [1] CRAN (R 4.3.3)
 fastmap         1.2.0   2024-05-15 [1] CRAN (R 4.3.3)
 Formula         1.2-5   2023-02-24 [1] CRAN (R 4.3.3)
 fs              1.6.6   2025-04-12 [1] CRAN (R 4.3.3)
 generics        0.1.4   2025-05-09 [1] CRAN (R 4.3.3)
 ggfun           0.1.8   2024-12-03 [1] CRAN (R 4.3.3)
 ggplot2       * 3.5.2   2025-04-09 [1] CRAN (R 4.3.3)
 ggplotify       0.1.2   2023-08-09 [1] CRAN (R 4.3.0)
 ggpubr        * 0.6.0   2023-02-10 [1] CRAN (R 4.3.0)
 ggsignif        0.6.4   2022-10-13 [1] CRAN (R 4.3.0)
 ggthemes      * 5.1.0   2024-02-10 [1] CRAN (R 4.3.1)
 ggtree        * 3.10.1  2024-02-27 [1] Bioconductor 3.18 (R 4.3.2)
 glue            1.8.0   2024-09-30 [1] CRAN (R 4.3.3)
 gridExtra       2.3     2017-09-09 [1] CRAN (R 4.3.3)
 gridGraphics    0.5-1   2020-12-13 [1] CRAN (R 4.3.3)
 gtable          0.3.6   2024-10-25 [1] CRAN (R 4.3.3)
 htmltools       0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
 htmlwidgets     1.6.4   2023-12-06 [1] CRAN (R 4.3.1)
 httr            1.4.7   2023-08-15 [1] CRAN (R 4.3.0)
 igraph          2.1.4   2025-01-23 [1] CRAN (R 4.3.3)
 jquerylib       0.1.4   2021-04-26 [1] CRAN (R 4.3.3)
 jsonlite        2.0.0   2025-03-27 [1] CRAN (R 4.3.3)
 kableExtra    * 1.4.0   2024-01-24 [1] CRAN (R 4.3.1)
 KernSmooth      2.23-26 2025-01-01 [1] CRAN (R 4.3.3)
 knitr           1.50    2025-03-16 [1] CRAN (R 4.3.3)
 labeling        0.4.3   2023-08-29 [1] CRAN (R 4.3.3)
 lattice         0.22-7  2025-04-02 [1] CRAN (R 4.3.3)
 lazyeval        0.2.2   2019-03-15 [1] CRAN (R 4.3.3)
 lifecycle       1.0.4   2023-11-07 [1] CRAN (R 4.3.3)
 lubridate       1.9.4   2024-12-08 [1] CRAN (R 4.3.3)
 magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.3.3)
 networkD3       0.4.1   2025-04-14 [1] CRAN (R 4.3.3)
 nlme            3.1-168 2025-03-31 [1] CRAN (R 4.3.3)
 patchwork       1.3.0   2024-09-16 [1] CRAN (R 4.3.3)
 pillar          1.10.2  2025-04-05 [1] CRAN (R 4.3.3)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.3.3)
 plyr            1.8.9   2023-10-02 [1] CRAN (R 4.3.3)
 png             0.1-8   2022-11-29 [1] CRAN (R 4.3.3)
 proxy           0.4-27  2022-06-09 [1] CRAN (R 4.3.3)
 purrr           1.0.4   2025-02-05 [1] CRAN (R 4.3.3)
 R6              2.6.1   2025-02-15 [1] CRAN (R 4.3.3)
 ragg            1.4.0   2025-04-10 [1] CRAN (R 4.3.3)
 RColorBrewer    1.1-3   2022-04-03 [1] CRAN (R 4.3.3)
 Rcpp            1.0.14  2025-01-12 [1] CRAN (R 4.3.3)
 RefManageR    * 1.4.0   2022-09-30 [1] CRAN (R 4.3.0)
 rlang           1.1.6   2025-04-11 [1] CRAN (R 4.3.3)
 rmarkdown       2.29    2024-11-04 [1] CRAN (R 4.3.3)
 rnaturalearth * 1.0.1   2023-12-15 [1] CRAN (R 4.3.1)
 rstatix         0.7.2   2023-02-01 [1] CRAN (R 4.3.0)
 rstudioapi      0.17.1  2024-10-22 [1] CRAN (R 4.3.3)
 sass            0.4.10  2025-04-11 [1] CRAN (R 4.3.3)
 scales          1.4.0   2025-04-24 [1] CRAN (R 4.3.3)
 sessioninfo   * 1.2.3   2025-02-05 [1] CRAN (R 4.3.3)
 sf              1.0-21  2025-05-15 [1] CRAN (R 4.3.3)
 stringi         1.8.7   2025-03-27 [1] CRAN (R 4.3.3)
 stringr       * 1.5.1   2023-11-14 [1] CRAN (R 4.3.1)
 svglite         2.2.1   2025-05-12 [1] CRAN (R 4.3.3)
 systemfonts     1.2.3   2025-04-30 [1] CRAN (R 4.3.3)
 terra           1.8-50  2025-05-09 [1] CRAN (R 4.3.3)
 textshaping     1.0.1   2025-05-01 [1] CRAN (R 4.3.3)
 tibble        * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidygeocoder  * 1.0.6   2025-03-31 [1] CRAN (R 4.3.3)
 tidyr         * 1.3.1   2024-01-24 [1] CRAN (R 4.3.1)
 tidyselect      1.2.1   2024-03-11 [1] CRAN (R 4.3.1)
 tidytree        0.4.6   2023-12-12 [1] CRAN (R 4.3.1)
 timechange      0.3.0   2024-01-18 [1] CRAN (R 4.3.3)
 treeio          1.26.0  2023-11-06 [1] Bioconductor
 units           0.8-7   2025-03-11 [1] CRAN (R 4.3.3)
 utf8            1.2.5   2025-05-01 [1] CRAN (R 4.3.3)
 vctrs           0.6.5   2023-12-01 [1] CRAN (R 4.3.3)
 viridisLite     0.4.2   2023-05-02 [1] CRAN (R 4.3.3)
 withr           3.0.2   2024-10-28 [1] CRAN (R 4.3.3)
 writexl         1.5.4   2025-04-15 [1] CRAN (R 4.3.3)
 xfun            0.52    2025-04-02 [1] CRAN (R 4.3.3)
 xml2            1.3.8   2025-03-14 [1] CRAN (R 4.3.3)
 yaml            2.3.10  2024-07-26 [1] CRAN (R 4.3.3)
 yulab.utils     0.2.0   2025-01-29 [1] CRAN (R 4.3.3)

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────