Welcome

This R Markdown document summarises the principal the steps to generate figures and tables using data pertaining to organ sizes across vertebrates and invertebrates species. I employed a systematic map to search for information on organ sizes, body sizes, and a set of metadata (see Figure S1). This approach facilitates a more comprehensive understanding of the database structure and also, will be serve as backbone for the studying scaling of organ sizes among species.

Abstract

The relationship between individual organ size and overall body size in animals is a fundamental biological phenomenon that spans multiple disciplines and has been studied for decades. However, a comprehensive synthesis of the sources of variation in organ-specific scaling remains lacking, even among mammals, the most extensively studied vertebrate group. To address this, we developed a systematic map and compiled a large database of paired organ and body size measurements. This database includes over 10,000 records from 366 species across eight major vertebrate and invertebrate classes: Aves, Mammalia, Actinopterygii, Reptilia, Amphibia, Chondrichthyes, Insecta, and Cephalopoda. Our database provides size estimates for 53 organ types, categorised into 10 physiological systems, with most data derived from digestive, circulatory and excretory systems. In addition, we include extensive metadata to contextualise the original studies, which highlights gaps—such as the season of animal collection and life stage, both of which were among the least frequently reported. We anticipate this comprehensive and reproducible resource will offer a robust foundation for improving the parameterisation and cross-species applicability of simulation models based on physiological and kinetic principles, thereby advancing our understanding of organ size scaling across diverse taxa.

Citation

When using the data and/or code associated with this project, they should be cited as follows:

  • Leiva, F. P., Ockhuijsen L., Polinder, J., Schreyers, L.J., Xiong, J., Hendriks A. J. (2025). A systematic map and comprehensive database of animal organ sizes. Zenodo. DOI will be available here soon.

Contact

This script is authored by Félix P. Leiva. For any questions related to this resource, please contact me at the email address: felixpleiva@gmail.com.

Disclaimer

This code routine may contain typographical errors, specific lines of code, or comments in Spanish (my native language). Should you encounter any errors in the code or data, please let me know via email to: felixpleiva@gmail.com.

Licence

This repository is provided by the author under the licence Attribution-NonCommercial-NoDerivatives 4.0 International.

Clean working space

rm(list = ls())

Load libraries

library(kableExtra)       # Enhances tables created with 'knitr::kable'
library(DataExplorer)     # Automates exploratory data analysis
library(dplyr)            # Efficient data manipulation
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
## 
##     group_rows
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)          # Data visualisation based on the grammar of graphics
## Warning: package 'ggplot2' was built under R version 4.3.3
library(RefManageR)       # Manages references and citations
library(ggpubr)           # Creates publication-ready graphics
library(cowplot)          # Arranges and annotates plots
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggpubr':
## 
##     get_legend
library(tidygeocoder)     # Converts addresses into geographic coordinates
## Warning: package 'tidygeocoder' was built under R version 4.3.3
library(rnaturalearth)    # Accesses Natural Earth geographic data
library(ape)              # Analyses phylogenies and evolution
## Warning: package 'ape' was built under R version 4.3.3
## 
## Attaching package: 'ape'
## The following object is masked from 'package:ggpubr':
## 
##     rotate
## The following object is masked from 'package:dplyr':
## 
##     where
library(ggtree)           # Visualises and annotates phylogenetic trees
## ggtree v3.10.1 For help: https://yulab-smu.top/treedata-book/
## 
## If you use the ggtree package suite in published research, please cite
## the appropriate paper(s):
## 
## Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam.
## ggtree: an R package for visualization and annotation of phylogenetic
## trees with their covariates and other associated data. Methods in
## Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
## 
## Guangchuang Yu. Using ggtree to visualize data on tree-like structures.
## Current Protocols in Bioinformatics. 2020, 69:e96. doi:10.1002/cpbi.96
## 
## Guangchuang Yu.  Data Integration, Manipulation and Visualization of
## Phylogenetic Trees (1st edition). Chapman and Hall/CRC. 2022,
## doi:10.1201/9781003279242
## 
## Attaching package: 'ggtree'
## The following object is masked from 'package:ape':
## 
##     rotate
## The following object is masked from 'package:ggpubr':
## 
##     rotate
library(tibble)           # Alternative to data frames
library(ggthemes)         # Additional themes for 'ggplot2' graphics
## 
## Attaching package: 'ggthemes'
## The following object is masked from 'package:cowplot':
## 
##     theme_map
library(sessioninfo)      # Documents session environment for reproducibility
## Warning: package 'sessioninfo' was built under R version 4.3.3
library(details)          # Adds inline or interactive details
## Warning: package 'details' was built under R version 4.3.3
library(stringr)
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:ggtree':
## 
##     expand

Load data

dat <- read.csv("../outputs/organ_size_with_taxonomy.csv")

Load phylogenetic tree

tree<-read.tree("../outputs/Phylogenetic tree for 363 species.tre")

Load references

refs <- ReadBib("../outputs/Data used in the DB.bib")

Overview of screening steps used to select studies included in the organ size database

Figure S1. PRISMA-type diagram showing the systematic and non-systematic literature search reporting organ size and body size pairs data. (*) We have not received responses from the corresponding author at the time of the manuscript submission.
Figure S1. PRISMA-type diagram showing the systematic and non-systematic literature search reporting organ size and body size pairs data. (*) We have not received responses from the corresponding author at the time of the manuscript submission.

Data exploration

Change some species names

In the code below I am just changing few names in the column species…I know.. I already checked the names those names in NCBI, GBIF, ITIS, pero Open Tree of Life (https://tree.opentreeoflife.org/) have labels at the subpecies level. The idea with the code bewlo is just macth the names in th db and the synthetic tree.

dat$species[dat$species == "Ateles chamek"]                     <- "Ateles belzebuth chamek"
dat$species[dat$species == "Leontocebus nigrifrons"]            <- "Saguinus fuscicollis nigrifrons"
dat$species[dat$species == "Terrapene triunguis"]               <- "Terrapene carolina triunguis"
dat$species[dat$species == "Anarhynchus alexandrinus"]          <- "Charadrius alexandrinus"
dat$species[dat$species == "Cebuella pygmaea"]                  <- "Callithrix pygmaea"
dat$species[dat$species == "Dicotyles tajacu"]                  <- "Pecari tajacu"
dat$species[dat$species == "Heteromys salvini"]                 <- "Liomys salvini"
dat$species[dat$species == "Leontocebus fuscicollis"]           <- "Saguinus fuscicollis"
dat$species[dat$species == "Lithobates pipiens"]                <- "Rana pipiens"
dat$species[dat$species == "pekania pennanti"]                  <- "Martes pennanti"
dat$species[dat$species == "Sephanoides sephaniodes"]           <- "Sephanoides sephanoides"
dat$species[dat$species == "Spodiopsar sericeus"]               <- "Sturnus sericeus"
dat$species[dat$species == "Pekania pennanti"]                  <- "Martes pennanti"
dat$species[dat$species == "Saguinus fuscicollis"]              <- "Saguinus fuscicollis fuscicollis"
dat$species[dat$species == "Presbytis melalophos"]              <- "Presbytis melalophos mitrata"
dat$species[dat$species == "Eulemur fulvus"]                    <- "Eulemur fulvus fulvus"
dat$species[dat$species == "Saimiri sciureus"]                  <- "Saimiri sciureus sciureus"
dat$species[dat$species == "Alouatta seniculus"]                <- "Alouatta seniculus seniculus"
dat$species[dat$species == "Herpailurus yagouaroundi"]          <- "Puma yagouaroundi"
dat$species[dat$species == "Saimiri macrodon"]                  <- "Saimiri sciureus macrodon"

To give an overview and description of the metadata associated with the extraction of organ size and body size data in invertebrates and vertebrates. This table is labeled as Table S1 in the manuscript.

Check and reformat variables if is needed

str(dat); names(dat)
## 'data.frame':    10604 obs. of  39 variables:
##  $ species_reported                  : chr  "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" ...
##  $ id_checker                        : chr  NA NA NA NA ...
##  $ double_checked                    : chr  "no" "no" "no" "no" ...
##  $ notes_double_checking             : chr  "none" "none" "none" "none" ...
##  $ id_data_extractor                 : chr  "FPLeiva" "FPLeiva" "FPLeiva" "FPLeiva" ...
##  $ key                               : chr  "rayyan-150592721" "rayyan-150592721" "rayyan-150592721" "rayyan-150592721" ...
##  $ context_study                     : chr  "ecological/evolutionary" "ecological/evolutionary" "ecological/evolutionary" "ecological/evolutionary" ...
##  $ taxonomic_group                   : chr  "mammal" "mammal" "mammal" "mammal" ...
##  $ habitat                           : chr  "terrestrial" "terrestrial" "terrestrial" "terrestrial" ...
##  $ origin                            : chr  "wild" "wild" "wild" "wild" ...
##  $ season_of_collection              : chr  "autumn" "autumn" "autumn" "autumn" ...
##  $ lat_dec                           : num  -39.6 -39.6 -47.5 -47.5 -47.5 ...
##  $ long_dec                          : num  -73.2 -73.2 -72.8 -72.8 -72.8 ...
##  $ age_years                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ life_stage_individual             : chr  NA NA NA NA ...
##  $ sex_individual                    : chr  NA NA NA NA ...
##  $ id_cluster                        : int  3 3 4 4 4 3 2 4 2 2 ...
##  $ trait_size_category               : chr  "small intestine" "liver" "body size" "small intestine" ...
##  $ weight_status                     : chr  "fresh" "fresh" "fresh" "fresh" ...
##  $ paired_organs_weighed_individually: chr  NA NA NA NA ...
##  $ organ_side                        : chr  NA NA NA NA ...
##  $ trait_details                     : chr  "small intestine weight" "liver weight" "body weight" "small intestine weight" ...
##  $ trait_unit                        : chr  "g" "g" "g" "g" ...
##  $ mean_trait                        : num  2.24 1.841 26.274 0.698 0.25 ...
##  $ error_trait                       : num  0.1191 0.1173 2.1934 0.0596 0.0601 ...
##  $ error_type                        : chr  "standard_error" "standard_error" "standard_error" "standard_error" ...
##  $ n_for_mean_trait                  : chr  "7" "7" "26" "26" ...
##  $ additional_trait                  : chr  NA NA NA NA ...
##  $ data_source                       : chr  "Figure 4C" "Figure 4E" "Figure 4F" "Figure 4C" ...
##  $ data_doi                          : chr  NA NA NA NA ...
##  $ relevant_notes                    : chr  NA NA NA NA ...
##  $ phylum                            : chr  "Chordata" "Chordata" "Chordata" "Chordata" ...
##  $ class                             : chr  "Mammalia" "Mammalia" "Mammalia" "Mammalia" ...
##  $ order                             : chr  "Rodentia" "Rodentia" "Rodentia" "Rodentia" ...
##  $ family                            : chr  "Cricetidae" "Cricetidae" "Cricetidae" "Cricetidae" ...
##  $ genus                             : chr  "Abrothrix" "Abrothrix" "Abrothrix" "Abrothrix" ...
##  $ species                           : chr  "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" ...
##  $ source                            : chr  "ncbi" "ncbi" "ncbi" "ncbi" ...
##  $ taxo_level                        : chr  "Species" "Species" "Species" "Species" ...
##  [1] "species_reported"                   "id_checker"                        
##  [3] "double_checked"                     "notes_double_checking"             
##  [5] "id_data_extractor"                  "key"                               
##  [7] "context_study"                      "taxonomic_group"                   
##  [9] "habitat"                            "origin"                            
## [11] "season_of_collection"               "lat_dec"                           
## [13] "long_dec"                           "age_years"                         
## [15] "life_stage_individual"              "sex_individual"                    
## [17] "id_cluster"                         "trait_size_category"               
## [19] "weight_status"                      "paired_organs_weighed_individually"
## [21] "organ_side"                         "trait_details"                     
## [23] "trait_unit"                         "mean_trait"                        
## [25] "error_trait"                        "error_type"                        
## [27] "n_for_mean_trait"                   "additional_trait"                  
## [29] "data_source"                        "data_doi"                          
## [31] "relevant_notes"                     "phylum"                            
## [33] "class"                              "order"                             
## [35] "family"                             "genus"                             
## [37] "species"                            "source"                            
## [39] "taxo_level"
# make a new column of species underscored
dat$species_underscored <- gsub(" ", "_", dat$species)

# Choosing columns I want to convert to factor
columns_to_factor <- c(
  "species_reported", "id_checker", "double_checked",
  "notes_double_checking", "id_data_extractor", "key", "context_study", "taxonomic_group",
  "habitat", "origin", "season_of_collection", "life_stage_individual",
  "sex_individual", "id_cluster", "trait_size_category", "weight_status",
  "paired_organs_weighed_individually", "organ_side", "trait_details",
  "trait_unit", "error_type", "additional_trait", "data_source", "data_doi",
  "phylum", "class", "order", "family", "genus", "species", "source",
  "taxo_level", "species_underscored"
)

# Convert columns to factor
dat <- dat %>%
  mutate(across(all_of(columns_to_factor), as.factor))

# Convert column to numeric
dat <- dat %>%
  mutate(mean_trait = as.numeric(mean_trait))

# check again
str(dat)
## 'data.frame':    10604 obs. of  40 variables:
##  $ species_reported                  : Factor w/ 380 levels "Abrothrix longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ id_checker                        : Factor w/ 6 levels "AJHendriks","FPLeiva",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ double_checked                    : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ notes_double_checking             : Factor w/ 47 levels "add age","add age and correct trait value",..: 44 44 44 44 44 44 44 44 44 44 ...
##  $ id_data_extractor                 : Factor w/ 1 level "FPLeiva": 1 1 1 1 1 1 1 1 1 1 ...
##  $ key                               : Factor w/ 235 levels "rayyan-150591684",..: 158 158 158 158 158 158 158 158 158 158 ...
##  $ context_study                     : Factor w/ 6 levels "animal health",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ taxonomic_group                   : Factor w/ 7 levels "amphibian","bird",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ habitat                           : Factor w/ 3 levels "freshwater","marine",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ origin                            : Factor w/ 6 levels "captive","commercial",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ season_of_collection              : Factor w/ 4 levels "autumn","spring",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ lat_dec                           : num  -39.6 -39.6 -47.5 -47.5 -47.5 ...
##  $ long_dec                          : num  -73.2 -73.2 -72.8 -72.8 -72.8 ...
##  $ age_years                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ life_stage_individual             : Factor w/ 13 levels "adult","elderly",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ sex_individual                    : Factor w/ 3 levels "both","female",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ id_cluster                        : Factor w/ 360 levels "1","2","3","4",..: 3 3 4 4 4 3 2 4 2 2 ...
##  $ trait_size_category               : Factor w/ 54 levels "adipose depot",..: 46 31 4 46 8 4 4 31 31 8 ...
##  $ weight_status                     : Factor w/ 4 levels "dried","fixed",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ paired_organs_weighed_individually: Factor w/ 2 levels "no","yes": NA NA NA NA NA NA NA NA NA NA ...
##  $ organ_side                        : Factor w/ 2 levels "left","right": NA NA NA NA NA NA NA NA NA NA ...
##  $ trait_details                     : Factor w/ 445 levels "abdominal fat",..: 362 189 14 362 37 14 14 189 189 37 ...
##  $ trait_unit                        : Factor w/ 16 levels "% of body weight",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ mean_trait                        : num  2.24 1.841 26.274 0.698 0.25 ...
##  $ error_trait                       : num  0.1191 0.1173 2.1934 0.0596 0.0601 ...
##  $ error_type                        : Factor w/ 3 levels "confidence_interval",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ n_for_mean_trait                  : chr  "7" "7" "26" "26" ...
##  $ additional_trait                  : Factor w/ 3 levels "individual metabolic rates",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ data_source                       : Factor w/ 80 levels "archived raw data",..: 39 41 42 39 37 42 42 41 41 37 ...
##  $ data_doi                          : Factor w/ 5 levels "10.5061/\ndryad.6djh9w13b",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ relevant_notes                    : chr  NA NA NA NA ...
##  $ phylum                            : Factor w/ 3 levels "Arthropoda","Chordata",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ class                             : Factor w/ 8 levels "Actinopterygii",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ order                             : Factor w/ 60 levels "Accipitriformes",..: 48 48 48 48 48 48 48 48 48 48 ...
##  $ family                            : Factor w/ 133 levels "Accipitridae",..: 30 30 30 30 30 30 30 30 30 30 ...
##  $ genus                             : Factor w/ 268 levels "Abrothrix","Acomys",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ species                           : Factor w/ 366 levels "Abrothrix longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ source                            : Factor w/ 3 levels "gbif","itis",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ taxo_level                        : Factor w/ 1 level "Species": 1 1 1 1 1 1 1 1 1 1 ...
##  $ species_underscored               : Factor w/ 366 levels "Abrothrix_longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...

Grouping of organs

In the code below, we will attempt to group the different types of organs included in our database to improve visualisation across the various figures. As we see below, there 54 distinct organ types (including the body size as a trait).

# check the organ names
length(unique(dat$trait_size_category))
## [1] 54

However, some of these actually represent the same system type.

# check the organ names
unique(dat$trait_size_category)
##  [1] small intestine                 liver                          
##  [3] body size                       caecum                         
##  [5] intestine                       stomach                        
##  [7] brain                           heart                          
##  [9] kidney                          adipose depot                  
## [11] digestive tract                 spleen                         
## [13] central nervous system          malpighian tubules             
## [15] digestive system                musculature                    
## [17] skeleton                        reproductive system            
## [19] circulatory system and fat body ileum                          
## [21] duodenum                        gizzard                        
## [23] jejunum                         muscle                         
## [25] lung                            ventricle                      
## [27] colon                           rectum                         
## [29] esophagus                       harderian gland                
## [31] pancreas                        ovary                          
## [33] bursa                           fat                            
## [35] thymus                          testes                         
## [37] proventriculus                  salt gland                     
## [39] adrenal glands                  thyroid/parathyroid glands     
## [41] gonad                           bone                           
## [43] gill                            prostate gland                 
## [45] pituitary gland                 uterus                         
## [47] oviduct                         hind limb                      
## [49] fore limb                       electric organ                 
## [51] epididymides                    gut                            
## [53] ureter                          bladder                        
## 54 Levels: adipose depot adrenal glands bladder body size bone brain ... ventricle

System categories

This happened because during data extraction, I retained the original organ names as reported in the studies.

# Add system column using case_when
dat <- dat %>%
  mutate(system = case_when(
    # Digestive system
    trait_size_category %in% c("liver", "caecum", "intestine", "stomach", 
                              "digestive tract", "digestive system",
                              "jejunum", "duodenum", "gizzard", "ileum",
                              "esophagus", "colon", "rectum", "proventriculus",
                              "gut", "pancreas") ~ "Digestive",
    
    # Excretory system
    trait_size_category %in% c("kidney", "malpighian tubules", "ureter", 
                              "bladder") ~ "Excretory",
    
    # Circulatory system
    trait_size_category %in% c("heart", "ventricle", "circulatory system and fat body",
                              "spleen") ~ "Circulatory",
    
    # Immune system
    trait_size_category %in% c("thymus", "bursa") ~ "Immune",
    
    # Nervous system
    trait_size_category %in% c("brain", "central nervous system", 
                              "pituitary gland") ~ "Nervous",
    
    # Endocrine system
    trait_size_category %in% c("thyroid/parathyroid glands", "adrenal glands",
                              "harderian gland", "salt gland") ~ "Endocrine",
    
    # Respiratory system
    trait_size_category %in% c("lung", "gill") ~ "Respiratory",
    
    # Reproductive system
    trait_size_category %in% c("ovary", "testes", "gonad", "uterus", 
                              "prostate gland", "oviduct", "epididymides",
                              "reproductive system") ~ "Reproductive",
    
    # Musculoskeletal system
    trait_size_category %in% c("skeleton", "bone", "hind limb", "fore limb",
                              "muscle", "musculature") ~ "Musculoskeletal",
    
    # Adipose/fat storage
    trait_size_category %in% c("adipose depot", "fat") ~ "Adipose tissue",
    
    # Default category for anything not matched
    TRUE ~ "Body size"
  ))
# Create summary table with group totals
summary_table <- dat %>%
  count(system, trait_size_category) %>%
  arrange(system, trait_size_category) %>%
  # Add group totals
  bind_rows(
    dat %>%
      count(system, name = "n") %>%
      mutate(trait_size_category = "TOTAL") %>%
      select(system, trait_size_category, n)
  ) %>%
  arrange(system, trait_size_category != "TOTAL", trait_size_category)

Lets check the grouping

# make a table to see the number of organs measured in each system
kable(summary_table, col.names = c("System", "Organ", "Count")) %>%
  kable_styling("striped", position = "left", full_width = TRUE) %>%
  row_spec(which(summary_table$trait_size_category == "TOTAL"), bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "500px")
System Organ Count
Adipose tissue TOTAL 155
Adipose tissue adipose depot 108
Adipose tissue fat 47
Body size TOTAL 3659
Body size body size 3209
Body size electric organ 60
Body size small intestine 390
Circulatory TOTAL 992
Circulatory circulatory system and fat body 18
Circulatory heart 515
Circulatory spleen 307
Circulatory ventricle 152
Digestive TOTAL 2594
Digestive caecum 289
Digestive colon 27
Digestive digestive system 18
Digestive digestive tract 117
Digestive duodenum 17
Digestive esophagus 15
Digestive gizzard 163
Digestive gut 88
Digestive ileum 13
Digestive intestine 656
Digestive jejunum 15
Digestive liver 741
Digestive pancreas 54
Digestive proventriculus 26
Digestive rectum 23
Digestive stomach 332
Endocrine TOTAL 181
Endocrine adrenal glands 88
Endocrine harderian gland 21
Endocrine salt gland 40
Endocrine thyroid/parathyroid glands 32
Excretory TOTAL 457
Excretory bladder 2
Excretory kidney 435
Excretory malpighian tubules 16
Excretory ureter 4
Immune TOTAL 101
Immune bursa 41
Immune thymus 60
Musculoskeletal TOTAL 1136
Musculoskeletal bone 479
Musculoskeletal fore limb 112
Musculoskeletal hind limb 112
Musculoskeletal muscle 395
Musculoskeletal musculature 18
Musculoskeletal skeleton 20
Nervous TOTAL 491
Nervous brain 437
Nervous central nervous system 33
Nervous pituitary gland 21
Reproductive TOTAL 457
Reproductive epididymides 13
Reproductive gonad 55
Reproductive ovary 149
Reproductive oviduct 5
Reproductive prostate gland 8
Reproductive reproductive system 17
Reproductive testes 197
Reproductive uterus 13
Respiratory TOTAL 381
Respiratory gill 165
Respiratory lung 216

Figures

Figure 1. Cumulative number of studies and most common journals

# Panel A: Extracting and cleaning publication years

df_years <- refs %>%
  as.data.frame() %>%
  select(year) %>%
  mutate(year = as.numeric(as.character(year))) %>%  # Ensure year is numeric
  filter(!is.na(year))  # Remove NA values

# Counting studies per year and calculating cumulative values:
studies_per_year <- df_years %>%
  group_by(year) %>%
  summarise(num_studies = n()) %>%
  arrange(year) %>%
  mutate(cumulative_studies = cumsum(num_studies))  # Calculate cumulative count

# Plotting the cumulative number of studies:
plot_years <- 
  ggplot(studies_per_year, aes(x = year, y = cumulative_studies)) +
  geom_line(color = "#009E73", linewidth = 2) +
  scale_y_continuous(limits = c(0, 250)) +  # Limit the y-axis to 400
  scale_x_continuous(
    limits = c(1955, 2024),
    breaks = seq(1960, 2024, by = 12)  # Set x-axis intervals to 55 years
  ) +
  labs(
    x = "Publication Year",
    y = "Number of Studies"
  ) +
  theme_pubr() +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12)
  )
# ------------------------------------------------------------------------------
# lets reformat the names of the journal to have to count them and then plot
df_journals <- refs %>%
  as.data.frame() %>%
  select(journal) %>%
  filter(!is.na(journal)) %>%
  mutate(
    journal_clean = tolower(journal),
    journal_clean = str_replace_all(journal_clean, "\\\\&", "&"),
    journal_clean = str_squish(journal_clean),
    journal_clean = str_to_title(journal_clean),
    journal_clean = str_replace_all(
      journal_clean,
      "\\b(Of|And|In|On|For|The|A|An|To|With|By|At|From|But|Or|Nor)\\b",
      function(x) tolower(x)
    ),
    journal_clean = sub("^([a-z])", toupper("\\1"), journal_clean)
  ) %>%
  arrange(desc(journal_clean))

df_journals <- df_journals %>%
  group_by(journal_clean) %>%
  summarise(num_articles = n(), .groups = "drop") %>%
  arrange(desc(num_articles)) %>%
  slice_head(n = 10)
plot_journals <- ggplot(df_journals, aes(x = reorder(journal_clean, num_articles), y = num_articles)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  scale_y_continuous(limits = c(0, 50)) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Journals",
    y = "Number of Studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_blank()
  ) +
  geom_text(aes(label = journal_clean), hjust = 0, vjust = 0.5, 
            color = "black", size = 4, check_overlap = TRUE, 
            position = position_dodge(width = 0.7)) +
  geom_text(aes(label = num_articles), 
            hjust = 2, color = "black", size = 4)
# ------------------------------------------------------------------------------
# Combining Panel A and Panel B
Figure_1 <- plot_grid(
  plot_years, 
  plot_journals,
  labels = c("A", "B"),
  nrow = 2,
  ncol = 1,
  label_size = 15,
  align = "v")

Figure_1

# Saving the combined figure
ggsave('../outputs/Figure_1_Studies_and_Journals.pdf', Figure_1, width = 6, height = 9)
ggsave('../outputs/Figure_1_Studies_and_Journals.png', Figure_1, width = 6, height = 9, dpi = 1200)

Figure 2: Number of studies by study context, origin and season of collection

# Calculate number of unique studies per category
plot_data <- dat %>%
  group_by(context_study, origin, season_of_collection) %>%      
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%  
  pivot_longer(
    cols = -c(unique_studies),                                       
    names_to = "variable",
    values_to = "level"
  ) %>%
  mutate(
    # Recode variable names for clear labels on plots
    variable = recode(variable,
                      context_study = "Study context",
                      origin = "Source of specimens",
                      season_of_collection = "Season of collection"),
    level = if_else(is.na(level), "not reported", as.character(level))
  ) %>%
  group_by(variable, level) %>%                                   
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# Set the factor levels for ordering the facets/plots consistently
plot_data <- plot_data %>%
  mutate(variable = factor(variable,
                           levels = c("Study context", "Source of specimens", "Season of collection")))

# Subset per plot
context_data <- plot_data %>% filter(variable == "Study context")
source_data <- plot_data %>% filter(variable == "Source of specimens")
season_data <- plot_data %>% filter(variable == "Season of collection")

# Now lets do the plots
Figure_2a <- 
  ggplot(context_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +                 
  geom_text(aes(label = unique_studies),                                         
            hjust = -0.15, color = "black", size = 4) +
  coord_flip() + 
  scale_y_continuous(limits = c(0, 250)) +                                      
  labs(x = NULL, y = "Number of studies") +                                     
  theme_pubr() +                                                                
  theme(
    axis.title.x = element_text(face = "bold", size = 12),                      
    axis.title.y = element_text(face = "bold", size = 12),                      
    axis.text.y = element_text(size = 12),                                      
    strip.text = element_text(face = "bold", size = 11),                        
    plot.title = element_text(face = "bold", hjust = 0.5),                     
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),  
    strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")                                            
  )

Figure_2b <- 
  ggplot(source_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +                 
  geom_text(aes(label = unique_studies),                                         
            hjust = -0.15, color = "black", size = 4) +
  coord_flip() +                                                                
  scale_y_continuous(limits = c(0, 250)) +                                      
  labs(x = NULL, y = "Number of studies") +                                     
  theme_pubr() +                                                                
  theme(
    axis.title.x = element_text(face = "bold", size = 12),                      
    axis.title.y = element_text(face = "bold", size = 12),                      
    axis.text.y = element_text(size = 12),                                      
    strip.text = element_text(face = "bold", size = 11),                        
    plot.title = element_text(face = "bold", hjust = 0.5),                     
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),  
    strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")                                            
  )

Figure_2c <- 
  ggplot(season_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +                 
  geom_text(aes(label = unique_studies),                                         
            hjust = -0.15, color = "black", size = 4) +
  coord_flip() +                                                                
  scale_y_continuous(limits = c(0, 250)) +                                      
  labs(x = NULL, y = "Number of studies") +                                     
  theme_pubr() +                                                                
  theme(
    axis.title.x = element_text(face = "bold", size = 12),                      
    axis.title.y = element_text(face = "bold", size = 12),                      
    axis.text.y = element_text(size = 12),                                      
    strip.text = element_text(face = "bold", size = 11),                        
    plot.title = element_text(face = "bold", hjust = 0.5),                     
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),  
    strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")                                            
  )


# Combining Panel 2A, 2B and 2C
Figure_2 <- plot_grid(
  Figure_2a, 
  Figure_2b,
  Figure_2c,
  labels = c("A", "B", "C"),
  nrow = 3,
  ncol = 1,
  label_size = 15,
  align = "v")

Figure_2

# Store Plots
ggsave('../outputs/Figure_2_Studies_by_context_source_season.pdf', Figure_2, width = 6, height = 12)
ggsave('../outputs/Figure_2_Studies_by_context_source_season.png', Figure_2, width = 6, height = 12, dpi = 1200)

Figure 3: Number of studies by habitat, life stage and sex

# Calculate number of unique studies per category
plot_data <- dat %>%
  group_by(sex_individual, life_stage_individual, habitat) %>%      
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%  
  pivot_longer(
    cols = -c(unique_studies),                                       
    names_to = "variable",
    values_to = "level"
  ) %>%
  mutate(
    # Recode variable names for clear labels on plots
    variable = recode(variable,
                      habitat = "Habitat",
                           sex_individual = "Sex studied",
                           life_stage_individual = "Life stage studied"),
    level = if_else(is.na(level), "not reported", as.character(level))
  ) %>%
  group_by(variable, level) %>%                                   
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# Set the factor levels for ordering the facets/plots consistently
plot_data <- plot_data %>%
  mutate(variable = factor(variable,
                           levels = c("Habitat", 
                                      "Life stage studied", 
                                      "Sex studied")))

# Subset per plot
habitat_data <- plot_data %>% filter(variable == "Habitat")
stage_data <- plot_data %>% filter(variable == "Life stage studied")
sex_data <- plot_data %>% filter(variable == "Sex studied")

# Now lets do the plots
Figure_3a <- 
  ggplot(habitat_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +                 
  geom_text(aes(label = unique_studies),                                         
            hjust = -0.15, color = "black", size = 4) +
  coord_flip() +                                                                
  scale_y_continuous(limits = c(0, 300)) +                                      
  labs(x = NULL, y = "Number of studies") +                                     
  theme_pubr() +                                                                
  theme(
    axis.title.x = element_text(face = "bold", size = 12),                      
    axis.title.y = element_text(face = "bold", size = 12),                      
    axis.text.y = element_text(size = 12),                                      
    strip.text = element_text(face = "bold", size = 11),                        
    plot.title = element_text(face = "bold", hjust = 0.5),                     
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),  
    strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")                                            
  )

Figure_3b <- 
  ggplot(stage_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +                 
  geom_text(aes(label = unique_studies),                                         
            hjust = -0.15, color = "black", size = 4) +
  coord_flip() +                                                                
  scale_y_continuous(limits = c(0, 300)) +                                      
  labs(x = NULL, y = "Number of studies") +                                     
  theme_pubr() +                                                                
  theme(
    axis.title.x = element_text(face = "bold", size = 12),                      
    axis.title.y = element_text(face = "bold", size = 12),                      
    axis.text.y = element_text(size = 12),                                      
    strip.text = element_text(face = "bold", size = 11),                        
    plot.title = element_text(face = "bold", hjust = 0.5),                     
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),  
    strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")                                            
  )

Figure_3c <- 
  ggplot(sex_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +                 
  geom_text(aes(label = unique_studies),                                         
            hjust = -0.15, color = "black", size = 4) +
  coord_flip() +                                                                
  scale_y_continuous(limits = c(0, 300)) +                                      
  labs(x = NULL, y = "Number of studies") +                                     
  theme_pubr() +                                                                
  theme(
    axis.title.x = element_text(face = "bold", size = 12),                      
    axis.title.y = element_text(face = "bold", size = 12),                      
    axis.text.y = element_text(size = 12),                                      
    strip.text = element_text(face = "bold", size = 11),                        
    plot.title = element_text(face = "bold", hjust = 0.5),                     
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),  
    strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")                                            
  )


# Combining Panel 2A, 2B and 2C
Figure_3 <- plot_grid(
  Figure_3a, 
  Figure_3b,
  Figure_3c,
  labels = c("A", "B", "C"),
  nrow = 3,
  ncol = 1,
  label_size = 15,
  align = "v")

Figure_3

# # Store Plots
ggsave('../outputs/Figure_3_Studies_by_habitat_stage_sex.pdf', Figure_3, width = 6, height = 12)
ggsave('../outputs/Figure_3_Studies_by_habitat_stage_sex.png', Figure_3, width = 6, height = 12, dpi = 1200)

Figure 4: Studies by class and most common species in the database

data_studies <- dat %>%
  group_by(class) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop")

Figure_4a <- 
  ggplot(data_studies, aes(x = reorder(class, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  geom_text(aes(label = unique_studies), hjust = -0.15, color = "black", size = 4) +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 150)) +
  labs(x = NULL, y = "Number of studies") +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 12),
    strip.text = element_text(face = "bold", size = 11),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")
  )

# plot most studies species
plot_data <- dat %>%
  group_by(species) %>%
  summarise(num_studies = n_distinct(key), .groups = "drop") %>%
  arrange(desc(num_studies)) %>%
  slice_head(n = 10) %>%
  mutate(
    # Reemplazar espacio por ~ para que ggplot2 lo interprete bien
    species_italic = gsub(" ", "~~", species),
    # Crear expresión para cursiva
    species_italic = paste0("italic('", gsub(" ", " ", species), "')"),
    # Alternativamente, para separar género y especie en cursiva sin comillas:
    species_italic = gsub(" ", "~~", species),
    species_italic = paste0("italic(", species_italic, ")")
  )

# Graficar con nombres en cursiva
Figure_4b <- ggplot(plot_data, aes(x = reorder(species_italic, num_studies), y = num_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  geom_text(aes(label = num_studies), hjust = -0.15, color = "black", size = 4) +
  coord_flip() +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 150)) +
  labs(x = NULL, y = "Number of studies") +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 12),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8)
  ) +
  scale_x_discrete(labels = function(x) parse(text = x))

# Combining Panel 4A and Panel 4B
Figure_4 <- plot_grid(
  Figure_4a, 
  Figure_4b,
  labels = c("A", "B"),
  nrow = 2,
  ncol = 1,
  label_size = 15,
  align = "v")

Figure_4

# Store Plots
ggsave('../outputs/Figure_4_Most_common_species.pdf', Figure_4, width = 6, height = 9)
ggsave('../outputs/Figure_4_Most_common_species.png', Figure_4, width = 6, height = 9, dpi = 1200)

Figure 5: Number of studies by system

plot_data <- dat %>%
  group_by(system) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_studies),
    names_to = "variable",
    values_to = "level"
) %>%
  mutate(variable = recode(variable,
                           system = "System studied")) %>%
  group_by(variable, level) %>%
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# let do the plot
Figure_5 <- 
  ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  geom_text(aes(label = unique_studies), 
            hjust = -0.15, color = "black", size = 4) +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 250)) +
  labs(
    x = NULL,  # Remove x axis label
    y = "Number of species"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 12),
    strip.text = element_text(face = "bold", size = 11),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "gray90", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")
  )

Figure_5

# # Store Plots
ggsave('../outputs/Figure_5_Studies_by_system.pdf', Figure_5, width = 7, height = 7)
ggsave('../outputs/Figure_5_Studies_by_system.png', Figure_5, width = 7, height = 7, dpi = 1200)

Figure 6: Phylogeny of species included in the DB studies

# Summaries data: calculate the mean value per species for each trait
names(dat)
##  [1] "species_reported"                   "id_checker"                        
##  [3] "double_checked"                     "notes_double_checking"             
##  [5] "id_data_extractor"                  "key"                               
##  [7] "context_study"                      "taxonomic_group"                   
##  [9] "habitat"                            "origin"                            
## [11] "season_of_collection"               "lat_dec"                           
## [13] "long_dec"                           "age_years"                         
## [15] "life_stage_individual"              "sex_individual"                    
## [17] "id_cluster"                         "trait_size_category"               
## [19] "weight_status"                      "paired_organs_weighed_individually"
## [21] "organ_side"                         "trait_details"                     
## [23] "trait_unit"                         "mean_trait"                        
## [25] "error_trait"                        "error_type"                        
## [27] "n_for_mean_trait"                   "additional_trait"                  
## [29] "data_source"                        "data_doi"                          
## [31] "relevant_notes"                     "phylum"                            
## [33] "class"                              "order"                             
## [35] "family"                             "genus"                             
## [37] "species"                            "source"                            
## [39] "taxo_level"                         "species_underscored"               
## [41] "system"
summary_data <- dat %>%
  count(species_underscored, system) %>%  # Count occurrences by species and system
  mutate(presence = if_else(n > 0, "Yes", "No")) %>%  # Convert counts to "Yes"/"No"
  select(-n) %>% # Remove the original count column
  pivot_wider(
    names_from = system,                   # Each system becomes a column
    values_from = presence,                # Values are now "Yes" or "No"
    values_fill = list(presence = "No")    # Fill missing combinations with "No"
  )

summary_data <- summary_data %>%
  select(-`Body size`)

names(summary_data)
##  [1] "species_underscored" "Digestive"           "Adipose tissue"     
##  [4] "Circulatory"         "Excretory"           "Nervous"            
##  [7] "Musculoskeletal"     "Reproductive"        "Respiratory"        
## [10] "Endocrine"           "Immune"
# Check for mismatches between tree tips and data

# are all species of the tree in the database?
setdiff(tree$tip.label, summary_data$species_underscored) 
## character(0)
# are all species from the database in the tree? 
setdiff(summary_data$species_underscored, tree$tip.label)
## [1] "Muusoctopus_aegir" "Neomys_anomalus"   "Neomys_fodiens"
# As I mentioned in the manuscript there were three species with no phylo information. In the code below, we will esclude those species from the dataframe (it is just for preparae data for this figure)

tree <- keep.tip(tree, intersect(tree$tip.label, summary_data$species_underscored))

summary_data <- summary_data[summary_data$species_underscored %in% tree$tip.label, ]

# check again
setdiff(tree$tip.label, summary_data$species_underscored)
## character(0)
setdiff(summary_data$species_underscored, tree$tip.label)
## character(0)
# Align data and tree
datF <- summary_data %>%
  column_to_rownames("species_underscored")

# Plot with species without names
circ_names <- ggtree(tree, layout = "fan", open.angle = 18, branch.length = "none") +
  geom_tiplab(offset = 0.1, hjust = 0, size = 1)
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
circ_names <- rotate_tree(circ_names, 90)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
# Plot with species without names
circ <- ggtree(tree, layout = "fan", open.angle = 18, branch.length = "none") 
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
circ <- rotate_tree(circ, 90)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
# Create a new plot with heatmap for each trait using a single scale
tree_data <- gheatmap(
  circ, 
  datF, 
  width = 0.4, 
  offset = 0,  # Offset for placing the heatmap
  colnames_offset_x = 0, 
  colnames_offset_y = 0, 
  font.size = 3, 
  hjust = 0
)

# Apply the same scale for all traits
circ_data <- tree_data + 
  scale_fill_manual(values = c("grey", "#009E73"), name = "System measured") +
  theme(
    legend.position = c(0.56, 0.57),  # Posición manual de la leyenda
    legend.background = element_rect(fill = "transparent", colour = NA), # Fondo transparente
    legend.box.background = element_rect(fill = "transparent", colour = NA) # Borde transparente
  )
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
## 3.5.0.
## ℹ Please use the `legend.position.inside` argument of `theme()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
circ_data

ggsave("../outputs/Figure_6_Phylogenetic_tree_with_data.png", circ_data, width = 12, height = 12, dpi = 1500)
ggsave("../outputs/Figure_6_Phylogenetic_tree_with_data.pdf", circ_data, width = 12, height = 12)
ggsave("../outputs/Figure_6_Phylogenetic_tree_with_names.pdf", circ_names, width = 12, height = 12)

Figure S2: Number of species by taxonomic group (Class)

#prepare data for ploting
data_taxa <- dat %>%
  group_by(class) %>%
  summarise(unique_species = n_distinct(species), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_species),
    names_to = "variable",
    values_to = "level"
  )

# Create plot with panel borders
Figure_S2 <- 
  ggplot(data_taxa, aes(x = reorder(level, unique_species), y = unique_species)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  # Add number of studies inside the bars
  geom_text(aes(label = unique_species), 
            hjust = -0.15, color = "black", size = 4) +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 250)) +
  labs(
    x = NULL,  # Remove x axis label
    y = "Number of species"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 11),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "gray90", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")  
  )

Figure_S2

# Store Plots
ggsave('../outputs/Figure_S2_Species_by_class.pdf', Figure_S2, width = 6, height = 6)
ggsave('../outputs/Figure_S2_Species_by_class.png', Figure_S2, width = 6, height = 6, dpi = 1200)

Figure S3: Number of studies per category of trait (body and organ)

plot_data <- dat %>%
  group_by(trait_size_category) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_studies),
    names_to = "variable",
    values_to = "level"
) %>%
  mutate(variable = recode(variable,
                           trait_size_category = "Type of organ reported")) %>%
  group_by(variable, level) %>%
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# Create plot with panel borders
Figure_S3 <- 
  ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  geom_text(aes(label = unique_studies), 
            hjust = -0.15, color = "black", size = 4) +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 250)) +
  labs(
    x = NULL,  # Remove x axis label
    y = "Number of studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 12),
    strip.text = element_text(face = "bold", size = 11),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "gray90", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")
  )
Figure_S3

# Store Plots
ggsave('../outputs/Figure_S3_Studies_by_organ.pdf', Figure_S3, width = 7, height = 10)
ggsave('../outputs/Figure_S3_Studies_by_organ.png', Figure_S3, width = 7, height = 10, dpi = 1200)

Data for the manuscript

Number of species

dat %>%
  distinct(species) %>%
  nrow()
## [1] 366

Studies published over years

df_years <- refs %>% 
  as.data.frame() %>% 
  select(year) %>% 
  mutate(year = as.numeric(as.character(year))) %>%  
  filter(!is.na(year)) %>%  # Remove rows where 'year' is missing (NA)
  group_by(year) %>%
  summarise(num_studies = n()) %>%
  arrange(year) %>%
  mutate(cumulative_studies = cumsum(num_studies))  

## Range de years included in the database
df_years %>% 
  reframe(min_year = min(year), 
          max_year = max(year), 
          total_years = max_year - min_year)
## # A tibble: 1 × 3
##   min_year max_year total_years
##      <dbl>    <dbl>       <dbl>
## 1     1964     2024          60

Number of species by class

table_sp_number <-dat %>%
  group_by(class) %>%
  reframe(n_spp = length(unique(species)), 
          total_study = length(unique(key)),
          perc_species = (n_spp/length(unique(dat$species)))* 100) %>%
  arrange(desc(perc_species))

# test table kable
kable(table_sp_number, col.names = c("Class", "N", "Number of studies", "Percentage of species")) %>%
  kable_styling("striped", position = "left", full_width = TRUE) %>%
  row_spec(which(table_sp_number$n_spp == "TOTAL"), bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "500px")
Class N Number of studies Percentage of species
Mammalia 227 78 62.021858
Aves 71 110 19.398907
Actinopterygii 24 29 6.557377
Insecta 14 1 3.825137
Amphibia 13 6 3.551913
Reptilia 12 6 3.278688
Chondrichthyes 4 4 1.092896
Cephalopoda 1 1 0.273224

Percentage of coverage by organ type

table_organ_type <- dat %>%
  group_by(trait_size_category) %>%
  summarise(
    n_spp = n_distinct(species),
    total_study = n_distinct(key)
  ) %>%
  mutate(
    perc_studies = (total_study / n_distinct(dat$key)) * 100,
    perc_species = (n_spp / n_distinct(dat$species)) * 100
  ) %>%
  arrange(desc(perc_studies))

# Select and rename columns for display
table_show <- table_organ_type %>%
  select(
    Organ = trait_size_category,
    `Number of species` = n_spp,
    `Percentage of species` = perc_species,
    `Number of studies` = total_study,
    `Percentage of studies` = perc_studies
    )

# Display the table with kable and kableExtra
kable(table_show, digits = 2) %>%
  kable_styling("striped", position = "left", full_width = TRUE) %>%
  scroll_box(width = "100%", height = "500px")
Organ Number of species Percentage of species Number of studies Percentage of studies
body size 366 100.00 235 100.00
liver 171 46.72 161 68.51
heart 158 43.17 94 40.00
spleen 125 34.15 94 40.00
kidney 159 43.44 80 34.04
gizzard 13 3.55 54 22.98
lung 32 8.74 38 16.17
bursa 5 1.37 36 15.32
muscle 21 5.74 36 15.32
testes 21 5.74 35 14.89
brain 165 45.08 34 14.47
fat 12 3.28 32 13.62
intestine 138 37.70 32 13.62
pancreas 17 4.64 32 13.62
thymus 10 2.73 32 13.62
stomach 167 45.63 29 12.34
small intestine 36 9.84 26 11.06
caecum 43 11.75 25 10.64
ovary 14 3.83 23 9.79
adrenal glands 14 3.83 22 9.36
proventriculus 6 1.64 19 8.09
duodenum 5 1.37 12 5.11
epididymides 5 1.37 11 4.68
jejunum 4 1.09 11 4.68
ileum 3 0.82 10 4.26
uterus 4 1.09 9 3.83
colon 21 5.74 8 3.40
gonad 7 1.91 8 3.40
thyroid/parathyroid glands 7 1.91 7 2.98
bone 67 18.31 6 2.55
digestive tract 104 28.42 5 2.13
gill 9 2.46 5 2.13
pituitary gland 7 1.91 5 2.13
prostate gland 4 1.09 5 2.13
rectum 17 4.64 5 2.13
ventricle 5 1.37 5 2.13
adipose depot 100 27.32 4 1.70
oviduct 2 0.55 4 1.70
salt gland 4 1.09 4 1.70
esophagus 14 3.83 2 0.85
gut 2 0.55 2 0.85
harderian gland 2 0.55 2 0.85
skeleton 14 3.83 2 0.85
bladder 1 0.27 1 0.43
central nervous system 14 3.83 1 0.43
circulatory system and fat body 13 3.55 1 0.43
digestive system 13 3.55 1 0.43
electric organ 1 0.27 1 0.43
fore limb 2 0.55 1 0.43
hind limb 2 0.55 1 0.43
malpighian tubules 12 3.28 1 0.43
musculature 13 3.55 1 0.43
reproductive system 13 3.55 1 0.43
ureter 1 0.27 1 0.43

Export database

# check names y sselec the most relevamt columns
names(dat)
##  [1] "species_reported"                   "id_checker"                        
##  [3] "double_checked"                     "notes_double_checking"             
##  [5] "id_data_extractor"                  "key"                               
##  [7] "context_study"                      "taxonomic_group"                   
##  [9] "habitat"                            "origin"                            
## [11] "season_of_collection"               "lat_dec"                           
## [13] "long_dec"                           "age_years"                         
## [15] "life_stage_individual"              "sex_individual"                    
## [17] "id_cluster"                         "trait_size_category"               
## [19] "weight_status"                      "paired_organs_weighed_individually"
## [21] "organ_side"                         "trait_details"                     
## [23] "trait_unit"                         "mean_trait"                        
## [25] "error_trait"                        "error_type"                        
## [27] "n_for_mean_trait"                   "additional_trait"                  
## [29] "data_source"                        "data_doi"                          
## [31] "relevant_notes"                     "phylum"                            
## [33] "class"                              "order"                             
## [35] "family"                             "genus"                             
## [37] "species"                            "source"                            
## [39] "taxo_level"                         "species_underscored"               
## [41] "system"
# slect the most releventa columns and sort where is needed
Organ_Size_DB_v1.0.0 <- dat %>% 
  select("key", 
         "id_checker",
         "double_checked", "id_data_extractor",
         "context_study",
         "phylum", "class", "order", "family", "genus","species", 
         "species_reported","species_underscored",
         "origin", "habitat","season_of_collection",
         "lat_dec",
         "long_dec",
         "age_years",
         "sex_individual", "life_stage_individual", "n_for_mean_trait",
         "id_cluster", "trait_size_category", 
         "weight_status","paired_organs_weighed_individually", "organ_side", "trait_details", "trait_unit", 
         "mean_trait", "error_trait", "error_type", 
         "data_source", "data_doi",
         "relevant_notes")

# export file as csv
write.csv(Organ_Size_DB_v1.0.0, "../outputs/Organ_Size_Database_v1.0.0.csv", row.names = FALSE)

#  and excel
writexl::write_xlsx(Organ_Size_DB_v1.0.0, "../outputs/Organ_Size_Database_v1.0.0.xlsx")

Session information

session_info() %>%
details(summary = 'Current Session Information', open = TRUE)
Current Session Information

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS 15.6
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Amsterdam
 date     2025-08-06
 pandoc   3.4 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
 quarto   1.6.42 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package       * version date (UTC) lib source
 abind           1.4-8   2024-09-12 [1] CRAN (R 4.3.3)
 ape           * 5.8-1   2024-12-16 [1] CRAN (R 4.3.3)
 aplot           0.2.5   2025-02-27 [1] CRAN (R 4.3.3)
 backports       1.5.0   2024-05-23 [1] CRAN (R 4.3.3)
 bibtex          0.5.1   2023-01-26 [1] CRAN (R 4.3.3)
 bookdown        0.43    2025-04-15 [1] CRAN (R 4.3.3)
 broom           1.0.8   2025-03-28 [1] CRAN (R 4.3.3)
 bslib           0.9.0   2025-01-30 [1] CRAN (R 4.3.3)
 cachem          1.1.0   2024-05-16 [1] CRAN (R 4.3.3)
 car             3.1-3   2024-09-27 [1] CRAN (R 4.3.3)
 carData         3.0-5   2022-01-06 [1] CRAN (R 4.3.3)
 class           7.3-23  2025-01-01 [1] CRAN (R 4.3.3)
 classInt        0.4-11  2025-01-08 [1] CRAN (R 4.3.3)
 cli             3.6.5   2025-04-23 [1] CRAN (R 4.3.3)
 clipr           0.8.0   2022-02-22 [1] CRAN (R 4.3.3)
 codetools       0.2-20  2024-03-31 [1] CRAN (R 4.3.3)
 cowplot       * 1.1.3   2024-01-22 [1] CRAN (R 4.3.1)
 crayon          1.5.3   2024-06-20 [1] CRAN (R 4.3.3)
 data.table      1.17.4  2025-05-26 [1] CRAN (R 4.3.3)
 data.tree       1.1.0   2023-11-12 [1] CRAN (R 4.3.3)
 DataExplorer  * 0.8.3   2024-01-24 [1] CRAN (R 4.3.1)
 DBI             1.2.3   2024-06-02 [1] CRAN (R 4.3.3)
 desc            1.4.3   2023-12-10 [1] CRAN (R 4.3.3)
 details       * 0.4.0   2025-02-09 [1] CRAN (R 4.3.3)
 digest          0.6.37  2024-08-19 [1] CRAN (R 4.3.3)
 dplyr         * 1.1.4   2023-11-17 [1] CRAN (R 4.3.1)
 e1071           1.7-16  2024-09-16 [1] CRAN (R 4.3.3)
 evaluate        1.0.3   2025-01-10 [1] CRAN (R 4.3.3)
 farver          2.1.2   2024-05-13 [1] CRAN (R 4.3.3)
 fastmap         1.2.0   2024-05-15 [1] CRAN (R 4.3.3)
 Formula         1.2-5   2023-02-24 [1] CRAN (R 4.3.3)
 fs              1.6.6   2025-04-12 [1] CRAN (R 4.3.3)
 generics        0.1.4   2025-05-09 [1] CRAN (R 4.3.3)
 ggfun           0.1.8   2024-12-03 [1] CRAN (R 4.3.3)
 ggplot2       * 3.5.2   2025-04-09 [1] CRAN (R 4.3.3)
 ggplotify       0.1.2   2023-08-09 [1] CRAN (R 4.3.0)
 ggpubr        * 0.6.0   2023-02-10 [1] CRAN (R 4.3.0)
 ggsignif        0.6.4   2022-10-13 [1] CRAN (R 4.3.0)
 ggthemes      * 5.1.0   2024-02-10 [1] CRAN (R 4.3.1)
 ggtree        * 3.10.1  2024-02-27 [1] Bioconductor 3.18 (R 4.3.2)
 glue            1.8.0   2024-09-30 [1] CRAN (R 4.3.3)
 gridExtra       2.3     2017-09-09 [1] CRAN (R 4.3.3)
 gridGraphics    0.5-1   2020-12-13 [1] CRAN (R 4.3.3)
 gtable          0.3.6   2024-10-25 [1] CRAN (R 4.3.3)
 htmltools       0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
 htmlwidgets     1.6.4   2023-12-06 [1] CRAN (R 4.3.1)
 httr            1.4.7   2023-08-15 [1] CRAN (R 4.3.0)
 igraph          2.1.4   2025-01-23 [1] CRAN (R 4.3.3)
 jquerylib       0.1.4   2021-04-26 [1] CRAN (R 4.3.3)
 jsonlite        2.0.0   2025-03-27 [1] CRAN (R 4.3.3)
 kableExtra    * 1.4.0   2024-01-24 [1] CRAN (R 4.3.1)
 KernSmooth      2.23-26 2025-01-01 [1] CRAN (R 4.3.3)
 knitr           1.50    2025-03-16 [1] CRAN (R 4.3.3)
 labeling        0.4.3   2023-08-29 [1] CRAN (R 4.3.3)
 lattice         0.22-7  2025-04-02 [1] CRAN (R 4.3.3)
 lazyeval        0.2.2   2019-03-15 [1] CRAN (R 4.3.3)
 lifecycle       1.0.4   2023-11-07 [1] CRAN (R 4.3.3)
 lubridate       1.9.4   2024-12-08 [1] CRAN (R 4.3.3)
 magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.3.3)
 networkD3       0.4.1   2025-04-14 [1] CRAN (R 4.3.3)
 nlme            3.1-168 2025-03-31 [1] CRAN (R 4.3.3)
 patchwork       1.3.0   2024-09-16 [1] CRAN (R 4.3.3)
 pillar          1.10.2  2025-04-05 [1] CRAN (R 4.3.3)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.3.3)
 plyr            1.8.9   2023-10-02 [1] CRAN (R 4.3.3)
 png             0.1-8   2022-11-29 [1] CRAN (R 4.3.3)
 proxy           0.4-27  2022-06-09 [1] CRAN (R 4.3.3)
 purrr           1.0.4   2025-02-05 [1] CRAN (R 4.3.3)
 R6              2.6.1   2025-02-15 [1] CRAN (R 4.3.3)
 ragg            1.4.0   2025-04-10 [1] CRAN (R 4.3.3)
 RColorBrewer    1.1-3   2022-04-03 [1] CRAN (R 4.3.3)
 Rcpp            1.0.14  2025-01-12 [1] CRAN (R 4.3.3)
 RefManageR    * 1.4.0   2022-09-30 [1] CRAN (R 4.3.0)
 rlang           1.1.6   2025-04-11 [1] CRAN (R 4.3.3)
 rmarkdown       2.29    2024-11-04 [1] CRAN (R 4.3.3)
 rnaturalearth * 1.0.1   2023-12-15 [1] CRAN (R 4.3.1)
 rstatix         0.7.2   2023-02-01 [1] CRAN (R 4.3.0)
 rstudioapi      0.17.1  2024-10-22 [1] CRAN (R 4.3.3)
 sass            0.4.10  2025-04-11 [1] CRAN (R 4.3.3)
 scales          1.4.0   2025-04-24 [1] CRAN (R 4.3.3)
 sessioninfo   * 1.2.3   2025-02-05 [1] CRAN (R 4.3.3)
 sf              1.0-21  2025-05-15 [1] CRAN (R 4.3.3)
 stringi         1.8.7   2025-03-27 [1] CRAN (R 4.3.3)
 stringr       * 1.5.1   2023-11-14 [1] CRAN (R 4.3.1)
 svglite         2.2.1   2025-05-12 [1] CRAN (R 4.3.3)
 systemfonts     1.2.3   2025-04-30 [1] CRAN (R 4.3.3)
 terra           1.8-50  2025-05-09 [1] CRAN (R 4.3.3)
 textshaping     1.0.1   2025-05-01 [1] CRAN (R 4.3.3)
 tibble        * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidygeocoder  * 1.0.6   2025-03-31 [1] CRAN (R 4.3.3)
 tidyr         * 1.3.1   2024-01-24 [1] CRAN (R 4.3.1)
 tidyselect      1.2.1   2024-03-11 [1] CRAN (R 4.3.1)
 tidytree        0.4.6   2023-12-12 [1] CRAN (R 4.3.1)
 timechange      0.3.0   2024-01-18 [1] CRAN (R 4.3.3)
 treeio          1.26.0  2023-11-06 [1] Bioconductor
 units           0.8-7   2025-03-11 [1] CRAN (R 4.3.3)
 vctrs           0.6.5   2023-12-01 [1] CRAN (R 4.3.3)
 viridisLite     0.4.2   2023-05-02 [1] CRAN (R 4.3.3)
 withr           3.0.2   2024-10-28 [1] CRAN (R 4.3.3)
 writexl         1.5.4   2025-04-15 [1] CRAN (R 4.3.3)
 xfun            0.52    2025-04-02 [1] CRAN (R 4.3.3)
 xml2            1.3.8   2025-03-14 [1] CRAN (R 4.3.3)
 yaml            2.3.10  2024-07-26 [1] CRAN (R 4.3.3)
 yulab.utils     0.2.0   2025-01-29 [1] CRAN (R 4.3.3)

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────