Welcome

This R Markdown document summarises the principal the steps to generate figures and tables using data pertaining to organ sizes across vertebrates and invertebrates species. I employed a systematic map to search for information on organ sizes, body sizes, and a set of metadata (see Figure S1). This approach facilitates a more comprehensive understanding of the database structure and also, will be serve as backbone for the studying scaling of organ sizes among species.

Citation

When using the data and/or code associated with this project, they should be cited as follows:

Leiva, F. P., Ockhuijsen L., Polinder, J., Schreyers, L., Xiong, J., Hendriks A. J. (2025). A systematic map and comprehensive database of invertebrate and vertebrate organ size. Zenodo. DOI will be available here soon.

Contact

This script is authored by Félix P. Leiva. For any questions related to this resource, please contact me at the email address: felixpleiva@gmail.com.

Disclaimer

This code routine may contain typographical errors, specific lines of code, or comments in Spanish (my native language). Should you encounter any errors in the code or data, please let me know via email.

Licence

This repository is provided by the author under the licence Attribution-NonCommercial-NoDerivatives 4.0 International.

Clean working space

rm(list = ls())

Load libraries

library(kableExtra)       # Enhances tables created with 'knitr::kable'
library(DataExplorer)     # Automates exploratory data analysis
library(dplyr)            # Efficient data manipulation

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:kableExtra':
## 
##     group_rows

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)          # Data visualisation based on the grammar of graphics

## Warning: package 'ggplot2' was built under R version 4.3.3

library(RefManageR)       # Manages references and citations
library(ggpubr)           # Creates publication-ready graphics
library(cowplot)          # Arranges and annotates plots

## 
## Attaching package: 'cowplot'

## The following object is masked from 'package:ggpubr':
## 
##     get_legend

library(tidygeocoder)     # Converts addresses into geographic coordinates

## Warning: package 'tidygeocoder' was built under R version 4.3.3

library(rnaturalearth)    # Accesses Natural Earth geographic data
library(ape)              # Analyses phylogenies and evolution

## Warning: package 'ape' was built under R version 4.3.3

## 
## Attaching package: 'ape'

## The following object is masked from 'package:ggpubr':
## 
##     rotate

## The following object is masked from 'package:dplyr':
## 
##     where

library(ggtree)           # Visualises and annotates phylogenetic trees

## ggtree v3.10.1 For help: https://yulab-smu.top/treedata-book/
## 
## If you use the ggtree package suite in published research, please cite
## the appropriate paper(s):
## 
## Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam.
## ggtree: an R package for visualization and annotation of phylogenetic
## trees with their covariates and other associated data. Methods in
## Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
## 
## Shuangbin Xu, Lin Li, Xiao Luo, Meijun Chen, Wenli Tang, Li Zhan, Zehan
## Dai, Tommy T. Lam, Yi Guan, Guangchuang Yu. Ggtree: A serialized data
## object for visualization of a phylogenetic tree and annotation data.
## iMeta 2022, 1(4):e56. doi:10.1002/imt2.56
## 
## S Xu, Z Dai, P Guo, X Fu, S Liu, L Zhou, W Tang, T Feng, M Chen, L
## Zhan, T Wu, E Hu, Y Jiang, X Bo, G Yu. ggtreeExtra: Compact
## visualization of richly annotated phylogenetic data. Molecular Biology
## and Evolution. 2021, 38(9):4039-4042. doi: 10.1093/molbev/msab166

## 
## Attaching package: 'ggtree'

## The following object is masked from 'package:ape':
## 
##     rotate

## The following object is masked from 'package:ggpubr':
## 
##     rotate

library(tibble)           # Alternative to data frames
library(ggthemes)         # Additional themes for 'ggplot2' graphics

## 
## Attaching package: 'ggthemes'

## The following object is masked from 'package:cowplot':
## 
##     theme_map

library(sessioninfo)      # Documents session environment for reproducibility

## Warning: package 'sessioninfo' was built under R version 4.3.3

library(details)          # Adds inline or interactive details

## Warning: package 'details' was built under R version 4.3.3

library(stringr)
library(tidyr)

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:ggtree':
## 
##     expand

Load data

dat <- read.csv("../outputs/organ_size_with_taxonomy.csv")

Load phylogenetic tree

tree<-read.tree("../outputs/Phylogenetic tree for 363 species.tre")

Load references

refs <- ReadBib("../outputs/Data used in the DB.bib")

Overview of screening steps used to select studies included in the organ size database

Figure S1. PRISMA-type diagram showing the systematic and non-systematic literature search reporting organ size and body size pairs data. (*) We have not received responses from the corresponding author at the time of the manuscript submission.

Data exploration

Change some species names

# Lets change those names

dat$species[dat$species == "Ateles chamek"]                     <- "Ateles belzebuth chamek"
dat$species[dat$species == "Leontocebus nigrifrons"]            <- "Saguinus fuscicollis nigrifrons"
dat$species[dat$species == "Terrapene triunguis"]               <- "Terrapene carolina triunguis"
dat$species[dat$species == "Anarhynchus alexandrinus"]          <- "Charadrius alexandrinus"
dat$species[dat$species == "Cebuella pygmaea"]                  <- "Callithrix pygmaea"
dat$species[dat$species == "Dicotyles tajacu"]                  <- "Pecari tajacu"
dat$species[dat$species == "Heteromys salvini"]                 <- "Liomys salvini"
dat$species[dat$species == "Leontocebus fuscicollis"]           <- "Saguinus fuscicollis"
dat$species[dat$species == "Lithobates pipiens"]                <- "Rana pipiens"
dat$species[dat$species == "pekania pennanti"]                  <- "Martes pennanti"
dat$species[dat$species == "Sephanoides sephaniodes"]           <- "Sephanoides sephanoides"
dat$species[dat$species == "Spodiopsar sericeus"]               <- "Sturnus sericeus"
dat$species[dat$species == "Pekania pennanti"]                  <- "Martes pennanti"
dat$species[dat$species == "Saguinus fuscicollis"]              <- "Saguinus fuscicollis fuscicollis"
dat$species[dat$species == "Presbytis melalophos"]              <- "Presbytis melalophos mitrata"
dat$species[dat$species == "Eulemur fulvus"]                    <- "Eulemur fulvus fulvus"
dat$species[dat$species == "Saimiri sciureus"]                  <- "Saimiri sciureus sciureus"
dat$species[dat$species == "Alouatta seniculus"]                <- "Alouatta seniculus seniculus"

To give an overview and description of the metadata associated with the extraction of organ size and body size data in invertebrates and vertebrates. This table is labeled as Table S1 in the manuscript.

Check and reformat variables if is needed

str(dat)

## 'data.frame':    10605 obs. of  36 variables:
##  $ species_reported                  : chr  "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" ...
##  $ initials                          : chr  "FPLeiva" "FPLeiva" "FPLeiva" "FPLeiva" ...
##  $ key                               : chr  "rayyan-150592721" "rayyan-150592721" "rayyan-150592721" "rayyan-150592721" ...
##  $ context_study                     : chr  "ecological/evolutionary" "ecological/evolutionary" "ecological/evolutionary" "ecological/evolutionary" ...
##  $ taxonomic_group                   : chr  "mammal" "mammal" "mammal" "mammal" ...
##  $ habitat                           : chr  "terrestrial" "terrestrial" "terrestrial" "terrestrial" ...
##  $ origin                            : chr  "wild" "wild" "wild" "wild" ...
##  $ season_of_collection              : chr  "autumn" "autumn" "autumn" "autumn" ...
##  $ lat_dec                           : num  -47.5 -47.5 -47.4 -47.5 -47.4 ...
##  $ long_dec                          : num  -72.8 -72.8 -71.9 -72.8 -71.9 ...
##  $ age_years                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ life_stage_individual             : chr  NA NA NA NA ...
##  $ sex_individual                    : chr  NA NA NA NA ...
##  $ id_cluster                        : int  4 4 2 4 2 3 2 3 4 3 ...
##  $ trait_size_category               : chr  "body size" "small intestine" "stomach" "caecum" ...
##  $ weight_status                     : chr  "fresh" "fresh" "fresh" "fresh" ...
##  $ paired_organs_weighed_individually: chr  NA NA NA NA ...
##  $ organ_side                        : chr  NA NA NA NA ...
##  $ trait_details                     : chr  "body weight" "small intestine weight" "stomach weight" "caecum weight" ...
##  $ trait_unit                        : chr  "g" "g" "g" "g" ...
##  $ mean_trait                        : num  26.274 0.698 0.375 0.25 1.357 ...
##  $ error_trait                       : num  2.1934 0.0596 0.0182 0.0601 0.0675 ...
##  $ error_type                        : chr  "standard_error" "standard_error" "standard_error" "standard_error" ...
##  $ n_for_mean_trait                  : chr  "26" "26" "21" "26" ...
##  $ additional_trait                  : chr  NA NA NA NA ...
##  $ data_source                       : chr  "Figure 4F" "Figure 4C" "Figure 4A" "Figure 4B" ...
##  $ data_doi                          : chr  NA NA NA NA ...
##  $ relevant_notes                    : chr  NA NA NA NA ...
##  $ phylum                            : chr  "Chordata" "Chordata" "Chordata" "Chordata" ...
##  $ class                             : chr  "Mammalia" "Mammalia" "Mammalia" "Mammalia" ...
##  $ order                             : chr  "Rodentia" "Rodentia" "Rodentia" "Rodentia" ...
##  $ family                            : chr  "Cricetidae" "Cricetidae" "Cricetidae" "Cricetidae" ...
##  $ genus                             : chr  "Abrothrix" "Abrothrix" "Abrothrix" "Abrothrix" ...
##  $ species                           : chr  "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" ...
##  $ source                            : chr  "ncbi" "ncbi" "ncbi" "ncbi" ...
##  $ taxo_level                        : chr  "Species" "Species" "Species" "Species" ...

# make a new column of species underscored
dat$species_underscored <- gsub(" ", "_", dat$species)

# Choosing columns I want to convert to factor
columns_to_factor <- c(
  "species_reported", "initials", "key", "context_study", "taxonomic_group",
  "habitat", "origin", "season_of_collection", "life_stage_individual",
  "sex_individual", "id_cluster", "trait_size_category", "weight_status",
  "paired_organs_weighed_individually", "organ_side", "trait_details",
  "trait_unit", "error_type", "additional_trait", "data_source", "data_doi",
  "phylum", "class", "order", "family", "genus", "species", "source",
  "taxo_level", "species_underscored"
)

# Convert columns to factor
dat <- dat %>%
  mutate(across(all_of(columns_to_factor), as.factor))

# Convert column to numeric
dat <- dat %>%
  mutate(mean_trait = as.numeric(mean_trait))

# check again
str(dat)

## 'data.frame':    10605 obs. of  37 variables:
##  $ species_reported                  : Factor w/ 380 levels "Abrothrix longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ initials                          : Factor w/ 1 level "FPLeiva": 1 1 1 1 1 1 1 1 1 1 ...
##  $ key                               : Factor w/ 235 levels "rayyan-150591684",..: 158 158 158 158 158 158 158 158 158 158 ...
##  $ context_study                     : Factor w/ 6 levels "animal health",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ taxonomic_group                   : Factor w/ 7 levels "amphibian","bird",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ habitat                           : Factor w/ 3 levels "freshwater","marine",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ origin                            : Factor w/ 6 levels "captive","commercial",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ season_of_collection              : Factor w/ 4 levels "autumn","spring",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ lat_dec                           : num  -47.5 -47.5 -47.4 -47.5 -47.4 ...
##  $ long_dec                          : num  -72.8 -72.8 -71.9 -72.8 -71.9 ...
##  $ age_years                         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ life_stage_individual             : Factor w/ 13 levels "adult","elderly",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ sex_individual                    : Factor w/ 3 levels "both","female",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ id_cluster                        : Factor w/ 360 levels "1","2","3","4",..: 4 4 2 4 2 3 2 3 4 3 ...
##  $ trait_size_category               : Factor w/ 54 levels "adipose depot",..: 4 46 48 8 31 28 4 48 48 31 ...
##  $ weight_status                     : Factor w/ 4 levels "dried","fixed",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ paired_organs_weighed_individually: Factor w/ 2 levels "no","yes": NA NA NA NA NA NA NA NA NA NA ...
##  $ organ_side                        : Factor w/ 2 levels "left","right": NA NA NA NA NA NA NA NA NA NA ...
##  $ trait_details                     : Factor w/ 443 levels "abdominal fat",..: 14 360 380 37 187 153 14 380 380 187 ...
##  $ trait_unit                        : Factor w/ 16 levels "% of body weight",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ mean_trait                        : num  26.274 0.698 0.375 0.25 1.357 ...
##  $ error_trait                       : num  2.1934 0.0596 0.0182 0.0601 0.0675 ...
##  $ error_type                        : Factor w/ 3 levels "confidence_interval",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ n_for_mean_trait                  : chr  "26" "26" "21" "26" ...
##  $ additional_trait                  : Factor w/ 3 levels "individual metabolic rates",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ data_source                       : Factor w/ 80 levels "archived raw data",..: 42 39 35 37 41 40 42 35 35 41 ...
##  $ data_doi                          : Factor w/ 5 levels "10.5061/\ndryad.6djh9w13b",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ relevant_notes                    : chr  NA NA NA NA ...
##  $ phylum                            : Factor w/ 3 levels "Arthropoda","Chordata",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ class                             : Factor w/ 8 levels "Actinopterygii",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ order                             : Factor w/ 60 levels "Accipitriformes",..: 48 48 48 48 48 48 48 48 48 48 ...
##  $ family                            : Factor w/ 133 levels "Accipitridae",..: 30 30 30 30 30 30 30 30 30 30 ...
##  $ genus                             : Factor w/ 267 levels "Abrothrix","Acomys",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ species                           : Factor w/ 366 levels "Abrothrix longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ source                            : Factor w/ 3 levels "gbif","itis",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ taxo_level                        : Factor w/ 1 level "Species": 1 1 1 1 1 1 1 1 1 1 ...
##  $ species_underscored               : Factor w/ 366 levels "Abrothrix_longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...

Grouping of organs

In the code below, we will attempt to group the different types of organs included in our database to improve visualisation across the various figures. As we see below, there 53 distinct organ types.

# check the organ names
length(unique(dat$trait_size_category))

## [1] 54

However, some of these actually represent the same organ type.

# check the organ names
unique(dat$trait_size_category)

##  [1] body size                       small intestine                
##  [3] stomach                         caecum                         
##  [5] liver                           intestine                      
##  [7] brain                           spleen                         
##  [9] adipose depot                   heart                          
## [11] digestive tract                 kidney                         
## [13] central nervous system          skeleton                       
## [15] musculature                     reproductive system            
## [17] digestive system                circulatory system and fat body
## [19] malpighian tubules              ileum                          
## [21] duodenum                        gizzard                        
## [23] muscle                          jejunum                        
## [25] lung                            ventricle                      
## [27] rectum                          colon                          
## [29] esophagus                       thymus                         
## [31] bursa                           testes                         
## [33] thyroid/parathyroid glands      salt gland                     
## [35] harderian gland                 ovary                          
## [37] pancreas                        adrenal glands                 
## [39] fat                             proventriculus                 
## [41] gonad                           bone                           
## [43] gill                            uterus                         
## [45] prostate gland                  pituitary gland                
## [47] oviduct                         fore limb                      
## [49] hind limb                       electric organ                 
## [51] epididymides                    gut                            
## [53] ureter                          bladder                        
## 54 Levels: adipose depot adrenal glands bladder body size bone brain ... ventricle

System categories

This happened because during data extraction, I retained the original organ names as reported in the studies.

# Add system column using case_when
dat <- dat %>%
  mutate(system = case_when(
    # Digestive system
    trait_size_category %in% c("liver", "caecum", "intestine", "stomach", 
                              "digestive tract", "digestive system",
                              "jejunum", "duodenum", "gizzard", "ileum",
                              "esophagus", "colon", "rectum", "proventriculus",
                              "gut", "pancreas") ~ "Digestive",
    
    # Excretory system
    trait_size_category %in% c("kidney", "malpighian tubules", "ureter", 
                              "bladder") ~ "Excretory",
    
    # Circulatory system
    trait_size_category %in% c("heart", "ventricle", "circulatory system and fat body",
                              "spleen") ~ "Circulatory",
    
    # Immune system
    trait_size_category %in% c("thymus", "bursa") ~ "Immune",
    
    # Nervous system
    trait_size_category %in% c("brain", "central nervous system", 
                              "pituitary gland") ~ "Nervous",
    
    # Endocrine system
    trait_size_category %in% c("thyroid/parathyroid glands", "adrenal glands",
                              "harderian gland", "salt gland") ~ "Endocrine",
    
    # Respiratory system
    trait_size_category %in% c("lung", "gill") ~ "Respiratory",
    
    # Reproductive system
    trait_size_category %in% c("ovary", "testes", "gonad", "uterus", 
                              "prostate gland", "oviduct", "epididymides",
                              "reproductive system") ~ "Reproductive",
    
    # Musculoskeletal system
    trait_size_category %in% c("skeleton", "bone", "hind limb", "fore limb",
                              "muscle", "musculature") ~ "Musculoskeletal",
    
    # Adipose/fat storage
    trait_size_category %in% c("adipose depot", "fat") ~ "Adipose tissue",
    
    # Default category for anything not matched
    TRUE ~ "Body size"
  ))

# Create summary table with group totals
summary_table <- dat %>%
  count(system, trait_size_category) %>%
  arrange(system, trait_size_category) %>%
  # Add group totals
  bind_rows(
    dat %>%
      count(system, name = "n") %>%
      mutate(trait_size_category = "TOTAL") %>%
      select(system, trait_size_category, n)
  ) %>%
  arrange(system, trait_size_category != "TOTAL", trait_size_category)

Lets check the grouping

# make a table to see the number of organs measured in each system
kable(summary_table, col.names = c("System", "Organ", "Count")) %>%
  kable_styling("striped", position = "left", full_width = TRUE) %>%
  row_spec(which(summary_table$trait_size_category == "TOTAL"), bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "500px")

System	Organ	Count
Adipose tissue	TOTAL	155
Adipose tissue	adipose depot	108
Adipose tissue	fat	47
Body size	TOTAL	3660
Body size	body size	3210
Body size	electric organ	60
Body size	small intestine	390
Circulatory	TOTAL	988
Circulatory	circulatory system and fat body	18
Circulatory	heart	511
Circulatory	spleen	307
Circulatory	ventricle	152
Digestive	TOTAL	2593
Digestive	caecum	289
Digestive	colon	27
Digestive	digestive system	18
Digestive	digestive tract	117
Digestive	duodenum	17
Digestive	esophagus	15
Digestive	gizzard	163
Digestive	gut	88
Digestive	ileum	13
Digestive	intestine	656
Digestive	jejunum	15
Digestive	liver	741
Digestive	pancreas	53
Digestive	proventriculus	26
Digestive	rectum	23
Digestive	stomach	332
Endocrine	TOTAL	183
Endocrine	adrenal glands	90
Endocrine	harderian gland	21
Endocrine	salt gland	40
Endocrine	thyroid/parathyroid glands	32
Excretory	TOTAL	461
Excretory	bladder	2
Excretory	kidney	439
Excretory	malpighian tubules	16
Excretory	ureter	4
Immune	TOTAL	101
Immune	bursa	41
Immune	thymus	60
Musculoskeletal	TOTAL	1136
Musculoskeletal	bone	479
Musculoskeletal	fore limb	112
Musculoskeletal	hind limb	112
Musculoskeletal	muscle	395
Musculoskeletal	musculature	18
Musculoskeletal	skeleton	20
Nervous	TOTAL	490
Nervous	brain	436
Nervous	central nervous system	33
Nervous	pituitary gland	21
Reproductive	TOTAL	457
Reproductive	epididymides	13
Reproductive	gonad	55
Reproductive	ovary	149
Reproductive	oviduct	5
Reproductive	prostate gland	8
Reproductive	reproductive system	17
Reproductive	testes	197
Reproductive	uterus	13
Respiratory	TOTAL	381
Respiratory	gill	165
Respiratory	lung	216

Figure 1. Cumulative number of studies and most common journals

# Panel A: Extracting and cleaning publication years

df_years <- refs %>%
  as.data.frame() %>%
  select(year) %>%
  mutate(year = as.numeric(as.character(year))) %>%  # Ensure year is numeric
  filter(!is.na(year))  # Remove NA values

# Counting studies per year and calculating cumulative values:
studies_per_year <- df_years %>%
  group_by(year) %>%
  summarise(num_studies = n()) %>%
  arrange(year) %>%
  mutate(cumulative_studies = cumsum(num_studies))  # Calculate cumulative count

# Plotting the cumulative number of studies:
plot_years <- 
  ggplot(studies_per_year, aes(x = year, y = cumulative_studies)) +
  geom_line(color = "#009E73", linewidth = 2) +
  scale_y_continuous(limits = c(0, 250)) +  # Limit the y-axis to 400
  scale_x_continuous(
    limits = c(1955, 2024),
    breaks = seq(1960, 2024, by = 12)  # Set x-axis intervals to 55 years
  ) +
  labs(
    x = "Publication Year",
    y = "Number of Studies"
  ) +
  theme_pubr() +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12)
  )
# ------------------------------------------------------------------------------
# lets reformat the names of the journal to have to count them and then plot
df_journals <- refs %>%
  as.data.frame() %>%
  select(journal) %>%
  filter(!is.na(journal)) %>%
  mutate(
    journal_clean = tolower(journal),
    journal_clean = str_replace_all(journal_clean, "\\\\&", "&"),
    journal_clean = str_squish(journal_clean),
    journal_clean = str_to_title(journal_clean),
    journal_clean = str_replace_all(
      journal_clean,
      "\\b(Of|And|In|On|For|The|A|An|To|With|By|At|From|But|Or|Nor)\\b",
      function(x) tolower(x)
    ),
    journal_clean = sub("^([a-z])", toupper("\\1"), journal_clean)
  ) %>%
  arrange(desc(journal_clean))

df_journals <- df_journals %>%
  group_by(journal_clean) %>%
  summarise(num_articles = n(), .groups = "drop") %>%
  arrange(desc(num_articles)) %>%
  slice_head(n = 10)

plot_journals <- ggplot(df_journals, aes(x = reorder(journal_clean, num_articles), y = num_articles)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  scale_y_continuous(limits = c(0, 50)) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Journals",
    y = "Number of Studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_blank()
  ) +
  geom_text(aes(label = journal_clean), hjust = 0, vjust = 0.5, 
            color = "black", size = 4, check_overlap = TRUE, 
            position = position_dodge(width = 0.7)) +
  geom_text(aes(label = num_articles), 
            hjust = 2, color = "black", size = 4)
# ------------------------------------------------------------------------------
# Combining Panel A and Panel B
Figure_1 <- plot_grid(
  plot_years, 
  plot_journals,
  labels = c("A", "B"),
  nrow = 2,
  ncol = 1,
  label_size = 15,
  align = "v")

Figure_1

# Saving the combined figure
ggsave('../outputs/Figure_1_Studies_and_Journals.pdf', Figure_1, width = 6, height = 9)
ggsave('../outputs/Figure_1_Studies_and_Journals.png', Figure_1, width = 6, height = 9, dpi = 1200)

Figure 2: Number of studies by study context, origin and season of collection

# Calculate the number of unique studies by each categorical variable
plot_data <- dat %>%
  group_by(context_study, origin, season_of_collection) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_studies),
    names_to = "variable",
    values_to = "level"
  )

plot_data <- plot_data %>%
  mutate(variable = recode(variable,
                           context_study = "Study context",
                           origin = "Source of specimens",
                           season_of_collection = "Season of collection")) %>%
  group_by(variable, level) %>%
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# rename the NA level as "not reported"
plot_data <- plot_data %>%
  mutate(level = if_else(is.na(level), "not reported", as.character(level)))

#  reorder the levels
plot_data <- plot_data %>%
  mutate(
    variable = factor(
      variable,
      levels = c("Study context", "Source of specimens", "Season of collection")
    )
  )

Figure_2 <- 
  ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  # Add number of studies inside the bars
  geom_text(aes(label = unique_studies), 
            hjust = -0.15, color = "black", size = 4) +
  coord_flip() +
  scale_y_continuous(limits = c(0, 250)) +
  facet_wrap(~ variable, scales = "free_y", ncol = 3) +
  theme_pubr() +
  labs(
    x = NULL,  # Remove x axis label
    y = "Number of studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 10),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "transparent", linewidth = 0),
    panel.spacing = unit(1, "lines")
  )

Figure_2

# # Store Plots
ggsave('../outputs/Figure_2_Studies_by_Factors.pdf', Figure_2, width = 15/1.2, height = 4/1.2)
ggsave('../outputs/Figure_2_Studies_by_Factors.png', Figure_2, width = 15/1.2, height = 4/1.2, dpi = 1200)

Figure 3: Number of studies by habitat, life stage and sex

# Calculate the number of unique studies by each categorical variable
plot_data <- dat %>%
  group_by(sex_individual, life_stage_individual, habitat) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_studies),
    names_to = "variable",
    values_to = "level"
  )

plot_data <- plot_data %>%
  mutate(variable = recode(variable,
                           habitat = "Habitat",
                           sex_individual = "Sex studied",
                           life_stage_individual = "Life stage studied")) %>%
  group_by(variable, level) %>%
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# rename the NA level as "not reported"
plot_data <- plot_data %>%
  mutate(level = if_else(is.na(level), "not reported", as.character(level)))

#  reorder the levels
plot_data <- plot_data %>%
  mutate(
    variable = factor(
      variable,
      levels = c("Habitat", "Life stage studied", "Sex studied")
    )
  )

Figure_3 <- 
  ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  # Add number of studies inside the bars
  geom_text(aes(label = unique_studies), 
            hjust = -0.15, color = "black", size = 4) +
  coord_flip() +
  scale_y_continuous(limits = c(0, 300)) +
  facet_wrap(~ variable, scales = "free_y", ncol = 3) +
  theme_pubr() +
  labs(
    x = NULL,  # Remove x axis label
    y = "Number of studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 10),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "transparent", linewidth = 0),
    panel.spacing = unit(1, "lines")
  )

Figure_3

# # Store Plots
ggsave('../outputs/Figure_3_Studies_by_Factors.pdf', Figure_3, width = 15/1.5, height = 4/1.5)
ggsave('../outputs/Figure_3_Studies_by_Factors.png', Figure_3, width = 15/1.5, height = 4/1.5, dpi = 1200)

Figure 4: Studies by class and most common species in the database

data_studies <- dat %>%
  group_by(class) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop")

Figure_4a <- 
  ggplot(data_studies, aes(x = reorder(class, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  geom_text(aes(label = unique_studies), hjust = -0.15, color = "black", size = 4) +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 150)) +
  labs(x = NULL, y = "Number of studies") +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 11),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")
  )

# plot most studies species
plot_data <- dat %>%
  group_by(species) %>%
  summarise(num_studies = n_distinct(key), .groups = "drop") %>%
  arrange(desc(num_studies)) %>%
  slice_head(n = 10) %>%
  mutate(
    # Reemplazar espacio por ~ para que ggplot2 lo interprete bien
    species_italic = gsub(" ", "~~", species),
    # Crear expresión para cursiva
    species_italic = paste0("italic('", gsub(" ", " ", species), "')"),
    # Alternativamente, para separar género y especie en cursiva sin comillas:
    species_italic = gsub(" ", "~~", species),
    species_italic = paste0("italic(", species_italic, ")")
  )

# Graficar con nombres en cursiva
Figure_4b <- ggplot(plot_data, aes(x = reorder(species_italic, num_studies), y = num_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  geom_text(aes(label = num_studies), hjust = -0.15, color = "black", size = 4) +
  coord_flip() +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 150)) +
  labs(x = NULL, y = "Number of studies") +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8)
  ) +
  scale_x_discrete(labels = function(x) parse(text = x))

# Combining Panel 4A and Panel 4B
Figure_4 <- plot_grid(
  Figure_4a, 
  Figure_4b,
  labels = c("A", "B"),
  nrow = 2,
  ncol = 1,
  label_size = 15,
  align = "v")

Figure_4

# Store Plots
ggsave('../outputs/Figure_4_Most_common_species.pdf', Figure_4, width = 6, height = 9)
ggsave('../outputs/Figure_4_Most_common_species.png', Figure_4, width = 6, height = 9, dpi = 1200)

Figure 5: Number of studies by system

plot_data <- dat %>%
  group_by(system) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_studies),
    names_to = "variable",
    values_to = "level"
)

plot_data <- plot_data %>%
  mutate(variable = recode(variable,
                           system = "System studied")) %>%
  group_by(variable, level) %>%
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# Create plot with panel borders
Figure_5 <- 
  ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  # Add number of studies inside the bars
  geom_text(aes(label = unique_studies), 
            hjust = -0.15, color = "black", size = 4) +
  facet_wrap(~ variable, scales = "free_y", ncol = 1) +
  theme_pubr() +
  labs(
     x = NULL,  # Remove x axis label
    y = "Number of studies"
  ) +
 theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 12),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "transparent", linewidth = 0),
    panel.spacing = unit(1, "lines")
  )

Figure_5

# # Store Plots
ggsave('../outputs/Figure_5_Studies_by_system.pdf', Figure_5, width = 7, height = 7)
ggsave('../outputs/Figure_5_Studies_by_system.png', Figure_5, width = 7, height = 7, dpi = 1200)

Figure 6: Phylogeny of species included in the DB with studies

# Summaries data: calculate the mean value per species for each trait
names(dat)

##  [1] "species_reported"                   "initials"                          
##  [3] "key"                                "context_study"                     
##  [5] "taxonomic_group"                    "habitat"                           
##  [7] "origin"                             "season_of_collection"              
##  [9] "lat_dec"                            "long_dec"                          
## [11] "age_years"                          "life_stage_individual"             
## [13] "sex_individual"                     "id_cluster"                        
## [15] "trait_size_category"                "weight_status"                     
## [17] "paired_organs_weighed_individually" "organ_side"                        
## [19] "trait_details"                      "trait_unit"                        
## [21] "mean_trait"                         "error_trait"                       
## [23] "error_type"                         "n_for_mean_trait"                  
## [25] "additional_trait"                   "data_source"                       
## [27] "data_doi"                           "relevant_notes"                    
## [29] "phylum"                             "class"                             
## [31] "order"                              "family"                            
## [33] "genus"                              "species"                           
## [35] "source"                             "taxo_level"                        
## [37] "species_underscored"                "system"

summary_data <- dat %>%
  count(species_underscored, system) %>%  # Count occurrences by species and system
  mutate(presence = if_else(n > 0, "Yes", "No")) %>%  # Convert counts to "Yes"/"No"
  select(-n) %>% # Remove the original count column
  pivot_wider(
    names_from = system,                   # Each system becomes a column
    values_from = presence,                # Values are now "Yes" or "No"
    values_fill = list(presence = "No")    # Fill missing combinations with "No"
  )

summary_data <- summary_data %>%
  select(-`Body size`)

names(summary_data)

##  [1] "species_underscored" "Digestive"           "Adipose tissue"     
##  [4] "Circulatory"         "Excretory"           "Nervous"            
##  [7] "Musculoskeletal"     "Reproductive"        "Respiratory"        
## [10] "Endocrine"           "Immune"

# Check for mismatches between tree tips and data
setdiff(tree$tip.label, summary_data$species_underscored)

## [1] "Saimiri_sciureus_macrodon"

setdiff(summary_data$species_underscored, tree$tip.label)

## [1] "Muusoctopus_aegir" "Neomys_anomalus"   "Neomys_fodiens"   
## [4] "Saimiri_macrodon"

tree <- keep.tip(tree, intersect(tree$tip.label, summary_data$species_underscored))

summary_data <- summary_data[summary_data$species_underscored %in% tree$tip.label, ]

# check again
setdiff(tree$tip.label, summary_data$species_underscored)

## character(0)

setdiff(summary_data$species_underscored, tree$tip.label)

## character(0)

# Align data and tree
datF <- summary_data %>%
  column_to_rownames("species_underscored")

# Plot with species without names
circ_names <- ggtree(tree, layout = "fan", open.angle = 18, branch.length = "none") +
  geom_tiplab(offset = 0.1, hjust = 0, size = 1)

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

circ_names <- rotate_tree(circ_names, 90)

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

circ_names

# Plot with species without names
circ <- ggtree(tree, layout = "fan", open.angle = 18, branch.length = "none")

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

circ <- rotate_tree(circ, 90)

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

circ

# Create a new plot with heatmap for each trait using a single scale
tree_data <- gheatmap(
  circ, 
  datF, 
  width = 0.4, 
  offset = 0,  # Offset for placing the heatmap
  colnames_offset_x = 0, 
  colnames_offset_y = 0, 
  font.size = 3, 
  hjust = 0
)
tree_data

# Apply the same scale for all traits
circ_data <- tree_data + 
  scale_fill_manual(values = c("grey", "#009E73"), name = "System measured") +
  theme(
    legend.position = c(0.56, 0.57),  # Posición manual de la leyenda
    legend.background = element_rect(fill = "transparent", colour = NA), # Fondo transparente
    legend.box.background = element_rect(fill = "transparent", colour = NA) # Borde transparente
  )

## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.

## Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
## 3.5.0.
## ℹ Please use the `legend.position.inside` argument of `theme()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

circ_data

ggsave("../outputs/Figure_6_Phylogenetic_tree_with_data.png", circ_data, width = 12, height = 12, dpi = 1500)
ggsave("../outputs/Figure_6_Phylogenetic_tree_with_data.pdf", circ_data, width = 12, height = 12)
ggsave("../outputs/Figure_6_Phylogenetic_tree_with_names.pdf", circ_names, width = 12, height = 12)

Figure S2: Number of species by taxonomic group (Class)

#prepare data for ploting
data_taxa <- dat %>%
  group_by(class) %>%
  summarise(unique_species = n_distinct(species), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_species),
    names_to = "variable",
    values_to = "level"
  )

# Create plot with panel borders
Figure_S2 <- 
  ggplot(data_taxa, aes(x = reorder(level, unique_species), y = unique_species)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  # Add number of studies inside the bars
  geom_text(aes(label = unique_species), 
            hjust = -0.15, color = "black", size = 4) +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 250)) +
  labs(
    x = NULL,  # Remove x axis label
    y = "Number of species"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 11),
    plot.title = element_text(face = "bold", hjust = 0.5),
    # Panel border additions:
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "gray90", colour = "black", linewidth = 0.8),
    panel.spacing = unit(1, "lines")  # Adds space between facets
  )

Figure_S2

# Store Plots
ggsave('../outputs/Figure_S2_Species_by_class.pdf', Figure_S2, width = 6, height = 6)
ggsave('../outputs/Figure_S2_Species_by_class.png', Figure_S2, width = 6, height = 6, dpi = 1200)

Figure S3: Number of studies per category of trait (body and organ)

plot_data <- dat %>%
  group_by(trait_size_category) %>%
  summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
  pivot_longer(
    cols = -c(unique_studies),
    names_to = "variable",
    values_to = "level"
)

plot_data <- plot_data %>%
  mutate(variable = recode(variable,
                           trait_size_category = "Type of organ reported")) %>%
  group_by(variable, level) %>%
  summarise(unique_studies = sum(unique_studies), .groups = "drop")

# Create plot with panel borders
Figure_S3 <- 
  ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  facet_wrap(~ variable, scales = "free_y", ncol = 1) +
  theme_pubr() +
  scale_y_continuous(limits = c(0, 250)) +
  labs(
     x = NULL,  # Remove x axis label
    y = "Number of studies"
  ) +
 theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 9),
    strip.text = element_text(face = "bold", size = 12),
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
    strip.background = element_rect(fill = "#B2ABD270", colour = "transparent", linewidth = 0),
    panel.spacing = unit(1, "lines")
  )

Figure_S3

# # Store Plots
ggsave('../outputs/Figure_S3_Studies_by_organ.pdf', Figure_S3, width = 7, height = 7)
ggsave('../outputs/Figure_S3_Studies_by_organ.png', Figure_S3, width = 7, height = 7, dpi = 1200)

Data for the manuscript

Number of species

dat %>%
  distinct(species) %>%
  nrow()

## [1] 366

Studies published over years

df_years <- refs %>% 
  as.data.frame() %>% 
  select(year) %>% 
  mutate(year = as.numeric(as.character(year))) %>%  # Convert the 'year' column to numeric
  filter(!is.na(year)) %>%  # Remove rows where 'year' is missing (NA)
  group_by(year) %>%
  summarise(num_studies = n()) %>%
  arrange(year) %>%
  mutate(cumulative_studies = cumsum(num_studies))  # Calculate cumulative count

## Range de years included in the database
df_years %>% 
  reframe(min_year = min(year), 
          max_year = max(year), 
          total_years = max_year - min_year)

## # A tibble: 1 × 3
##   min_year max_year total_years
##      <dbl>    <dbl>       <dbl>
## 1     1964     2024          60

Number of species by class

table_sp_number <-dat %>%
  group_by(class) %>%
  reframe(n_spp = length(unique(species)), 
          total_study = length(unique(key)),
          perc_species = (n_spp/length(unique(dat$species)))* 100) %>%
  arrange(desc(perc_species))
table_sp_number

## # A tibble: 8 × 4
##   class          n_spp total_study perc_species
##   <fct>          <int>       <int>        <dbl>
## 1 Mammalia         227          78       62.0  
## 2 Aves              71         110       19.4  
## 3 Actinopterygii    24          29        6.56 
## 4 Insecta           14           1        3.83 
## 5 Amphibia          13           6        3.55 
## 6 Reptilia          12           6        3.28 
## 7 Chondrichthyes     4           4        1.09 
## 8 Cephalopoda        1           1        0.273

# test table kable
kable(table_sp_number, col.names = c("Class", "N", "Number of studies", "Percentage of species")) %>%
  kable_styling("striped", position = "left", full_width = TRUE) %>%
  row_spec(which(table_sp_number$n_spp == "TOTAL"), bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "500px")

Class	N	Number of studies	Percentage of species
Mammalia	227	78	62.021858
Aves	71	110	19.398907
Actinopterygii	24	29	6.557377
Insecta	14	1	3.825137
Amphibia	13	6	3.551913
Reptilia	12	6	3.278688
Chondrichthyes	4	4	1.092896
Cephalopoda	1	1	0.273224

Percentage of coverage by organ type

table_organ_type <- dat %>%
  group_by(trait_size_category) %>%
  summarise(
    n_spp = n_distinct(species),
    total_study = n_distinct(key)
  ) %>%
  mutate(
    perc_studies = (total_study / n_distinct(dat$key)) * 100,
    perc_species = (n_spp / n_distinct(dat$species)) * 100
  ) %>%
  arrange(desc(perc_studies))

# Select and rename columns for display
table_show <- table_organ_type %>%
  select(
    Organ = trait_size_category,
    `Number of species` = n_spp,
    `Percentage of species` = perc_species,
    `Number of studies` = total_study,
    `Percentage of studies` = perc_studies
    )

# Display the table with kable and kableExtra
kable(table_show, digits = 2) %>%
  kable_styling("striped", position = "left", full_width = TRUE) %>%
  scroll_box(width = "100%", height = "500px")

Organ	Number of species	Percentage of species	Number of studies	Percentage of studies
body size	366	100.00	235	100.00
liver	171	46.72	161	68.51
spleen	125	34.15	94	40.00
heart	158	43.17	93	39.57
kidney	159	43.44	80	34.04
gizzard	13	3.55	54	22.98
lung	32	8.74	38	16.17
bursa	5	1.37	36	15.32
muscle	21	5.74	36	15.32
testes	21	5.74	35	14.89
brain	165	45.08	34	14.47
fat	12	3.28	32	13.62
intestine	138	37.70	32	13.62
thymus	10	2.73	32	13.62
pancreas	16	4.37	31	13.19
stomach	167	45.63	29	12.34
small intestine	36	9.84	26	11.06
caecum	43	11.75	25	10.64
ovary	14	3.83	23	9.79
adrenal glands	14	3.83	22	9.36
proventriculus	6	1.64	19	8.09
duodenum	5	1.37	12	5.11
epididymides	5	1.37	11	4.68
jejunum	4	1.09	11	4.68
ileum	3	0.82	10	4.26
uterus	4	1.09	9	3.83
colon	21	5.74	8	3.40
gonad	7	1.91	8	3.40
thyroid/parathyroid glands	7	1.91	7	2.98
bone	67	18.31	6	2.55
digestive tract	104	28.42	5	2.13
gill	9	2.46	5	2.13
pituitary gland	7	1.91	5	2.13
prostate gland	4	1.09	5	2.13
rectum	17	4.64	5	2.13
ventricle	5	1.37	5	2.13
adipose depot	100	27.32	4	1.70
oviduct	2	0.55	4	1.70
salt gland	4	1.09	4	1.70
esophagus	14	3.83	2	0.85
gut	2	0.55	2	0.85
harderian gland	2	0.55	2	0.85
skeleton	14	3.83	2	0.85
bladder	1	0.27	1	0.43
central nervous system	14	3.83	1	0.43
circulatory system and fat body	13	3.55	1	0.43
digestive system	13	3.55	1	0.43
electric organ	1	0.27	1	0.43
fore limb	2	0.55	1	0.43
hind limb	2	0.55	1	0.43
malpighian tubules	12	3.28	1	0.43
musculature	13	3.55	1	0.43
reproductive system	13	3.55	1	0.43
ureter	1	0.27	1	0.43

Export database

# check names y sselec the most relevamt columns
names(dat)

##  [1] "species_reported"                   "initials"                          
##  [3] "key"                                "context_study"                     
##  [5] "taxonomic_group"                    "habitat"                           
##  [7] "origin"                             "season_of_collection"              
##  [9] "lat_dec"                            "long_dec"                          
## [11] "age_years"                          "life_stage_individual"             
## [13] "sex_individual"                     "id_cluster"                        
## [15] "trait_size_category"                "weight_status"                     
## [17] "paired_organs_weighed_individually" "organ_side"                        
## [19] "trait_details"                      "trait_unit"                        
## [21] "mean_trait"                         "error_trait"                       
## [23] "error_type"                         "n_for_mean_trait"                  
## [25] "additional_trait"                   "data_source"                       
## [27] "data_doi"                           "relevant_notes"                    
## [29] "phylum"                             "class"                             
## [31] "order"                              "family"                            
## [33] "genus"                              "species"                           
## [35] "source"                             "taxo_level"                        
## [37] "species_underscored"                "system"

# slect the most releventa columns and sort where is needed
OrganYsize_DB_v1.0.0 <- dat %>% 
  select("key",
         "initials", "context_study",
         "phylum", "class", "order", "family", "genus","species", 
         "species_reported","species_underscored",
         "origin", "habitat","season_of_collection",
         "lat_dec",
         "long_dec",
         "age_years",
         "sex_individual", "life_stage_individual", "n_for_mean_trait",
         "id_cluster", "trait_size_category", 
         "weight_status","paired_organs_weighed_individually", "organ_side", "trait_details", "trait_unit", 
         "mean_trait", "error_trait", "error_type", 
         "data_source", "data_doi",
         "relevant_notes")

# export file as csv
write.csv(OrganYsize_DB_v1.0.0, "../outputs/Organ_Size_Database_v1.0.0.csv", row.names = FALSE)

#  and excel
writexl::write_xlsx(OrganYsize_DB_v1.0.0, "../outputs/Organ_Size_Database_v1.0.0.xlsx")

Session information

session_info() %>%
details(summary = 'Current Session Information', open = TRUE)

Current Session Information


─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS 15.5
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Berlin
 date     2025-06-25
 pandoc   3.4 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
 quarto   1.6.42 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package       * version date (UTC) lib source
 abind           1.4-8   2024-09-12 [1] CRAN (R 4.3.3)
 ape           * 5.8-1   2024-12-16 [1] CRAN (R 4.3.3)
 aplot           0.2.5   2025-02-27 [1] CRAN (R 4.3.3)
 backports       1.5.0   2024-05-23 [1] CRAN (R 4.3.3)
 bibtex          0.5.1   2023-01-26 [1] CRAN (R 4.3.3)
 bookdown        0.43    2025-04-15 [1] CRAN (R 4.3.3)
 broom           1.0.8   2025-03-28 [1] CRAN (R 4.3.3)
 bslib           0.9.0   2025-01-30 [1] CRAN (R 4.3.3)
 cachem          1.1.0   2024-05-16 [1] CRAN (R 4.3.3)
 car             3.1-3   2024-09-27 [1] CRAN (R 4.3.3)
 carData         3.0-5   2022-01-06 [1] CRAN (R 4.3.3)
 class           7.3-23  2025-01-01 [1] CRAN (R 4.3.3)
 classInt        0.4-11  2025-01-08 [1] CRAN (R 4.3.3)
 cli             3.6.5   2025-04-23 [1] CRAN (R 4.3.3)
 clipr           0.8.0   2022-02-22 [1] CRAN (R 4.3.3)
 codetools       0.2-20  2024-03-31 [1] CRAN (R 4.3.3)
 cowplot       * 1.1.3   2024-01-22 [1] CRAN (R 4.3.1)
 crayon          1.5.3   2024-06-20 [1] CRAN (R 4.3.3)
 data.table      1.17.4  2025-05-26 [1] CRAN (R 4.3.3)
 data.tree       1.1.0   2023-11-12 [1] CRAN (R 4.3.3)
 DataExplorer  * 0.8.3   2024-01-24 [1] CRAN (R 4.3.1)
 DBI             1.2.3   2024-06-02 [1] CRAN (R 4.3.3)
 desc            1.4.3   2023-12-10 [1] CRAN (R 4.3.3)
 details       * 0.4.0   2025-02-09 [1] CRAN (R 4.3.3)
 digest          0.6.37  2024-08-19 [1] CRAN (R 4.3.3)
 dplyr         * 1.1.4   2023-11-17 [1] CRAN (R 4.3.1)
 e1071           1.7-16  2024-09-16 [1] CRAN (R 4.3.3)
 evaluate        1.0.3   2025-01-10 [1] CRAN (R 4.3.3)
 farver          2.1.2   2024-05-13 [1] CRAN (R 4.3.3)
 fastmap         1.2.0   2024-05-15 [1] CRAN (R 4.3.3)
 Formula         1.2-5   2023-02-24 [1] CRAN (R 4.3.3)
 fs              1.6.6   2025-04-12 [1] CRAN (R 4.3.3)
 generics        0.1.4   2025-05-09 [1] CRAN (R 4.3.3)
 ggfun           0.1.8   2024-12-03 [1] CRAN (R 4.3.3)
 ggplot2       * 3.5.2   2025-04-09 [1] CRAN (R 4.3.3)
 ggplotify       0.1.2   2023-08-09 [1] CRAN (R 4.3.0)
 ggpubr        * 0.6.0   2023-02-10 [1] CRAN (R 4.3.0)
 ggsignif        0.6.4   2022-10-13 [1] CRAN (R 4.3.0)
 ggthemes      * 5.1.0   2024-02-10 [1] CRAN (R 4.3.1)
 ggtree        * 3.10.1  2024-02-27 [1] Bioconductor 3.18 (R 4.3.2)
 glue            1.8.0   2024-09-30 [1] CRAN (R 4.3.3)
 gridExtra       2.3     2017-09-09 [1] CRAN (R 4.3.3)
 gridGraphics    0.5-1   2020-12-13 [1] CRAN (R 4.3.3)
 gtable          0.3.6   2024-10-25 [1] CRAN (R 4.3.3)
 htmltools       0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
 htmlwidgets     1.6.4   2023-12-06 [1] CRAN (R 4.3.1)
 httr            1.4.7   2023-08-15 [1] CRAN (R 4.3.0)
 igraph          2.1.4   2025-01-23 [1] CRAN (R 4.3.3)
 jquerylib       0.1.4   2021-04-26 [1] CRAN (R 4.3.3)
 jsonlite        2.0.0   2025-03-27 [1] CRAN (R 4.3.3)
 kableExtra    * 1.4.0   2024-01-24 [1] CRAN (R 4.3.1)
 KernSmooth      2.23-26 2025-01-01 [1] CRAN (R 4.3.3)
 knitr           1.50    2025-03-16 [1] CRAN (R 4.3.3)
 labeling        0.4.3   2023-08-29 [1] CRAN (R 4.3.3)
 lattice         0.22-7  2025-04-02 [1] CRAN (R 4.3.3)
 lazyeval        0.2.2   2019-03-15 [1] CRAN (R 4.3.3)
 lifecycle       1.0.4   2023-11-07 [1] CRAN (R 4.3.3)
 lubridate       1.9.4   2024-12-08 [1] CRAN (R 4.3.3)
 magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.3.3)
 networkD3       0.4.1   2025-04-14 [1] CRAN (R 4.3.3)
 nlme            3.1-168 2025-03-31 [1] CRAN (R 4.3.3)
 patchwork       1.3.0   2024-09-16 [1] CRAN (R 4.3.3)
 pillar          1.10.2  2025-04-05 [1] CRAN (R 4.3.3)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.3.3)
 plyr            1.8.9   2023-10-02 [1] CRAN (R 4.3.3)
 png             0.1-8   2022-11-29 [1] CRAN (R 4.3.3)
 proxy           0.4-27  2022-06-09 [1] CRAN (R 4.3.3)
 purrr           1.0.4   2025-02-05 [1] CRAN (R 4.3.3)
 R6              2.6.1   2025-02-15 [1] CRAN (R 4.3.3)
 ragg            1.4.0   2025-04-10 [1] CRAN (R 4.3.3)
 RColorBrewer    1.1-3   2022-04-03 [1] CRAN (R 4.3.3)
 Rcpp            1.0.14  2025-01-12 [1] CRAN (R 4.3.3)
 RefManageR    * 1.4.0   2022-09-30 [1] CRAN (R 4.3.0)
 rlang           1.1.6   2025-04-11 [1] CRAN (R 4.3.3)
 rmarkdown       2.29    2024-11-04 [1] CRAN (R 4.3.3)
 rnaturalearth * 1.0.1   2023-12-15 [1] CRAN (R 4.3.1)
 rstatix         0.7.2   2023-02-01 [1] CRAN (R 4.3.0)
 rstudioapi      0.17.1  2024-10-22 [1] CRAN (R 4.3.3)
 sass            0.4.10  2025-04-11 [1] CRAN (R 4.3.3)
 scales          1.4.0   2025-04-24 [1] CRAN (R 4.3.3)
 sessioninfo   * 1.2.3   2025-02-05 [1] CRAN (R 4.3.3)
 sf              1.0-21  2025-05-15 [1] CRAN (R 4.3.3)
 stringi         1.8.7   2025-03-27 [1] CRAN (R 4.3.3)
 stringr       * 1.5.1   2023-11-14 [1] CRAN (R 4.3.1)
 svglite         2.2.1   2025-05-12 [1] CRAN (R 4.3.3)
 systemfonts     1.2.3   2025-04-30 [1] CRAN (R 4.3.3)
 terra           1.8-50  2025-05-09 [1] CRAN (R 4.3.3)
 textshaping     1.0.1   2025-05-01 [1] CRAN (R 4.3.3)
 tibble        * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidygeocoder  * 1.0.6   2025-03-31 [1] CRAN (R 4.3.3)
 tidyr         * 1.3.1   2024-01-24 [1] CRAN (R 4.3.1)
 tidyselect      1.2.1   2024-03-11 [1] CRAN (R 4.3.1)
 tidytree        0.4.6   2023-12-12 [1] CRAN (R 4.3.1)
 timechange      0.3.0   2024-01-18 [1] CRAN (R 4.3.3)
 treeio          1.26.0  2023-11-06 [1] Bioconductor
 units           0.8-7   2025-03-11 [1] CRAN (R 4.3.3)
 utf8            1.2.5   2025-05-01 [1] CRAN (R 4.3.3)
 vctrs           0.6.5   2023-12-01 [1] CRAN (R 4.3.3)
 viridisLite     0.4.2   2023-05-02 [1] CRAN (R 4.3.3)
 withr           3.0.2   2024-10-28 [1] CRAN (R 4.3.3)
 writexl         1.5.4   2025-04-15 [1] CRAN (R 4.3.3)
 xfun            0.52    2025-04-02 [1] CRAN (R 4.3.3)
 xml2            1.3.8   2025-03-14 [1] CRAN (R 4.3.3)
 yaml            2.3.10  2024-07-26 [1] CRAN (R 4.3.3)
 yulab.utils     0.2.0   2025-01-29 [1] CRAN (R 4.3.3)

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────

A systematic map and comprehensive database of invertebrate and vertebrate organ size

Electronic Supplementary Material

Félix P. Leiva

latest update: 25 June 2025