This R Markdown document summarises the principal the steps to generate figures and tables using data pertaining to organ sizes across vertebrates and invertebrates species. I employed a systematic map to search for information on organ sizes, body sizes, and a set of metadata (see Figure S1). This approach facilitates a more comprehensive understanding of the database structure and also, will be serve as backbone for the studying scaling of organ sizes among species.
The relationship between individual organ size and overall body size in animals is a fundamental biological phenomenon that spans multiple disciplines and has been studied for decades. However, a comprehensive synthesis of the sources of variation in organ-specific scaling remains lacking, even among mammals, the most extensively studied vertebrate group. To address this, we developed a systematic map and compiled a large database of paired organ and body size measurements. This database includes over 10,000 records from 366 species across eight major vertebrate and invertebrate classes: Aves, Mammalia, Actinopterygii, Reptilia, Amphibia, Chondrichthyes, Insecta, and Cephalopoda. Our database provides size estimates for 53 organ types, categorised into 10 physiological systems, with most data derived from digestive, circulatory and excretory systems. In addition, we include extensive metadata to contextualise the original studies, which highlights gaps—such as the season of animal collection and life stage, both of which were among the least frequently reported. We anticipate this comprehensive and reproducible resource will offer a robust foundation for improving the parameterisation and cross-species applicability of simulation models based on physiological and kinetic principles, thereby advancing our understanding of organ size scaling across diverse taxa.
When using the data and/or code associated with this project, they should be cited as follows:
This script is authored by Félix P. Leiva. For any questions related to this resource, please contact me at the email address: felixpleiva@gmail.com.
This code routine may contain typographical errors, specific lines of code, or comments in Spanish (my native language). Should you encounter any errors in the code or data, please let me know via email to: felixpleiva@gmail.com.
This repository is provided by the author under the licence Attribution-NonCommercial-NoDerivatives 4.0 International.
rm(list = ls())
library(kableExtra) # Enhances tables created with 'knitr::kable'
library(DataExplorer) # Automates exploratory data analysis
library(dplyr) # Efficient data manipulation
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
##
## group_rows
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # Data visualisation based on the grammar of graphics
## Warning: package 'ggplot2' was built under R version 4.3.3
library(RefManageR) # Manages references and citations
library(ggpubr) # Creates publication-ready graphics
library(cowplot) # Arranges and annotates plots
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggpubr':
##
## get_legend
library(tidygeocoder) # Converts addresses into geographic coordinates
## Warning: package 'tidygeocoder' was built under R version 4.3.3
library(rnaturalearth) # Accesses Natural Earth geographic data
library(ape) # Analyses phylogenies and evolution
## Warning: package 'ape' was built under R version 4.3.3
##
## Attaching package: 'ape'
## The following object is masked from 'package:ggpubr':
##
## rotate
## The following object is masked from 'package:dplyr':
##
## where
library(ggtree) # Visualises and annotates phylogenetic trees
## ggtree v3.10.1 For help: https://yulab-smu.top/treedata-book/
##
## If you use the ggtree package suite in published research, please cite
## the appropriate paper(s):
##
## Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam.
## ggtree: an R package for visualization and annotation of phylogenetic
## trees with their covariates and other associated data. Methods in
## Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
##
## Guangchuang Yu. Using ggtree to visualize data on tree-like structures.
## Current Protocols in Bioinformatics. 2020, 69:e96. doi:10.1002/cpbi.96
##
## Guangchuang Yu. Data Integration, Manipulation and Visualization of
## Phylogenetic Trees (1st edition). Chapman and Hall/CRC. 2022,
## doi:10.1201/9781003279242
##
## Attaching package: 'ggtree'
## The following object is masked from 'package:ape':
##
## rotate
## The following object is masked from 'package:ggpubr':
##
## rotate
library(tibble) # Alternative to data frames
library(ggthemes) # Additional themes for 'ggplot2' graphics
##
## Attaching package: 'ggthemes'
## The following object is masked from 'package:cowplot':
##
## theme_map
library(sessioninfo) # Documents session environment for reproducibility
## Warning: package 'sessioninfo' was built under R version 4.3.3
library(details) # Adds inline or interactive details
## Warning: package 'details' was built under R version 4.3.3
library(stringr)
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:ggtree':
##
## expand
dat <- read.csv("../outputs/organ_size_with_taxonomy.csv")
tree<-read.tree("../outputs/Phylogenetic tree for 363 species.tre")
refs <- ReadBib("../outputs/Data used in the DB.bib")
In the code below I am just changing few names in the column species…I know.. I already checked the names those names in NCBI, GBIF, ITIS, pero Open Tree of Life (https://tree.opentreeoflife.org/) have labels at the subpecies level. The idea with the code bewlo is just macth the names in th db and the synthetic tree.
dat$species[dat$species == "Ateles chamek"] <- "Ateles belzebuth chamek"
dat$species[dat$species == "Leontocebus nigrifrons"] <- "Saguinus fuscicollis nigrifrons"
dat$species[dat$species == "Terrapene triunguis"] <- "Terrapene carolina triunguis"
dat$species[dat$species == "Anarhynchus alexandrinus"] <- "Charadrius alexandrinus"
dat$species[dat$species == "Cebuella pygmaea"] <- "Callithrix pygmaea"
dat$species[dat$species == "Dicotyles tajacu"] <- "Pecari tajacu"
dat$species[dat$species == "Heteromys salvini"] <- "Liomys salvini"
dat$species[dat$species == "Leontocebus fuscicollis"] <- "Saguinus fuscicollis"
dat$species[dat$species == "Lithobates pipiens"] <- "Rana pipiens"
dat$species[dat$species == "pekania pennanti"] <- "Martes pennanti"
dat$species[dat$species == "Sephanoides sephaniodes"] <- "Sephanoides sephanoides"
dat$species[dat$species == "Spodiopsar sericeus"] <- "Sturnus sericeus"
dat$species[dat$species == "Pekania pennanti"] <- "Martes pennanti"
dat$species[dat$species == "Saguinus fuscicollis"] <- "Saguinus fuscicollis fuscicollis"
dat$species[dat$species == "Presbytis melalophos"] <- "Presbytis melalophos mitrata"
dat$species[dat$species == "Eulemur fulvus"] <- "Eulemur fulvus fulvus"
dat$species[dat$species == "Saimiri sciureus"] <- "Saimiri sciureus sciureus"
dat$species[dat$species == "Alouatta seniculus"] <- "Alouatta seniculus seniculus"
dat$species[dat$species == "Herpailurus yagouaroundi"] <- "Puma yagouaroundi"
dat$species[dat$species == "Saimiri macrodon"] <- "Saimiri sciureus macrodon"
To give an overview and description of the metadata associated with the extraction of organ size and body size data in invertebrates and vertebrates. This table is labeled as Table S1 in the manuscript.
str(dat); names(dat)
## 'data.frame': 10604 obs. of 39 variables:
## $ species_reported : chr "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" ...
## $ id_checker : chr NA NA NA NA ...
## $ double_checked : chr "no" "no" "no" "no" ...
## $ notes_double_checking : chr "none" "none" "none" "none" ...
## $ id_data_extractor : chr "FPLeiva" "FPLeiva" "FPLeiva" "FPLeiva" ...
## $ key : chr "rayyan-150592721" "rayyan-150592721" "rayyan-150592721" "rayyan-150592721" ...
## $ context_study : chr "ecological/evolutionary" "ecological/evolutionary" "ecological/evolutionary" "ecological/evolutionary" ...
## $ taxonomic_group : chr "mammal" "mammal" "mammal" "mammal" ...
## $ habitat : chr "terrestrial" "terrestrial" "terrestrial" "terrestrial" ...
## $ origin : chr "wild" "wild" "wild" "wild" ...
## $ season_of_collection : chr "autumn" "autumn" "autumn" "autumn" ...
## $ lat_dec : num -39.6 -39.6 -47.5 -47.5 -47.5 ...
## $ long_dec : num -73.2 -73.2 -72.8 -72.8 -72.8 ...
## $ age_years : num NA NA NA NA NA NA NA NA NA NA ...
## $ life_stage_individual : chr NA NA NA NA ...
## $ sex_individual : chr NA NA NA NA ...
## $ id_cluster : int 3 3 4 4 4 3 2 4 2 2 ...
## $ trait_size_category : chr "small intestine" "liver" "body size" "small intestine" ...
## $ weight_status : chr "fresh" "fresh" "fresh" "fresh" ...
## $ paired_organs_weighed_individually: chr NA NA NA NA ...
## $ organ_side : chr NA NA NA NA ...
## $ trait_details : chr "small intestine weight" "liver weight" "body weight" "small intestine weight" ...
## $ trait_unit : chr "g" "g" "g" "g" ...
## $ mean_trait : num 2.24 1.841 26.274 0.698 0.25 ...
## $ error_trait : num 0.1191 0.1173 2.1934 0.0596 0.0601 ...
## $ error_type : chr "standard_error" "standard_error" "standard_error" "standard_error" ...
## $ n_for_mean_trait : chr "7" "7" "26" "26" ...
## $ additional_trait : chr NA NA NA NA ...
## $ data_source : chr "Figure 4C" "Figure 4E" "Figure 4F" "Figure 4C" ...
## $ data_doi : chr NA NA NA NA ...
## $ relevant_notes : chr NA NA NA NA ...
## $ phylum : chr "Chordata" "Chordata" "Chordata" "Chordata" ...
## $ class : chr "Mammalia" "Mammalia" "Mammalia" "Mammalia" ...
## $ order : chr "Rodentia" "Rodentia" "Rodentia" "Rodentia" ...
## $ family : chr "Cricetidae" "Cricetidae" "Cricetidae" "Cricetidae" ...
## $ genus : chr "Abrothrix" "Abrothrix" "Abrothrix" "Abrothrix" ...
## $ species : chr "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" "Abrothrix longipilis" ...
## $ source : chr "ncbi" "ncbi" "ncbi" "ncbi" ...
## $ taxo_level : chr "Species" "Species" "Species" "Species" ...
## [1] "species_reported" "id_checker"
## [3] "double_checked" "notes_double_checking"
## [5] "id_data_extractor" "key"
## [7] "context_study" "taxonomic_group"
## [9] "habitat" "origin"
## [11] "season_of_collection" "lat_dec"
## [13] "long_dec" "age_years"
## [15] "life_stage_individual" "sex_individual"
## [17] "id_cluster" "trait_size_category"
## [19] "weight_status" "paired_organs_weighed_individually"
## [21] "organ_side" "trait_details"
## [23] "trait_unit" "mean_trait"
## [25] "error_trait" "error_type"
## [27] "n_for_mean_trait" "additional_trait"
## [29] "data_source" "data_doi"
## [31] "relevant_notes" "phylum"
## [33] "class" "order"
## [35] "family" "genus"
## [37] "species" "source"
## [39] "taxo_level"
# make a new column of species underscored
dat$species_underscored <- gsub(" ", "_", dat$species)
# Choosing columns I want to convert to factor
columns_to_factor <- c(
"species_reported", "id_checker", "double_checked",
"notes_double_checking", "id_data_extractor", "key", "context_study", "taxonomic_group",
"habitat", "origin", "season_of_collection", "life_stage_individual",
"sex_individual", "id_cluster", "trait_size_category", "weight_status",
"paired_organs_weighed_individually", "organ_side", "trait_details",
"trait_unit", "error_type", "additional_trait", "data_source", "data_doi",
"phylum", "class", "order", "family", "genus", "species", "source",
"taxo_level", "species_underscored"
)
# Convert columns to factor
dat <- dat %>%
mutate(across(all_of(columns_to_factor), as.factor))
# Convert column to numeric
dat <- dat %>%
mutate(mean_trait = as.numeric(mean_trait))
# check again
str(dat)
## 'data.frame': 10604 obs. of 40 variables:
## $ species_reported : Factor w/ 380 levels "Abrothrix longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ id_checker : Factor w/ 6 levels "AJHendriks","FPLeiva",..: NA NA NA NA NA NA NA NA NA NA ...
## $ double_checked : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ notes_double_checking : Factor w/ 47 levels "add age","add age and correct trait value",..: 44 44 44 44 44 44 44 44 44 44 ...
## $ id_data_extractor : Factor w/ 1 level "FPLeiva": 1 1 1 1 1 1 1 1 1 1 ...
## $ key : Factor w/ 235 levels "rayyan-150591684",..: 158 158 158 158 158 158 158 158 158 158 ...
## $ context_study : Factor w/ 6 levels "animal health",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ taxonomic_group : Factor w/ 7 levels "amphibian","bird",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ habitat : Factor w/ 3 levels "freshwater","marine",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ origin : Factor w/ 6 levels "captive","commercial",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ season_of_collection : Factor w/ 4 levels "autumn","spring",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ lat_dec : num -39.6 -39.6 -47.5 -47.5 -47.5 ...
## $ long_dec : num -73.2 -73.2 -72.8 -72.8 -72.8 ...
## $ age_years : num NA NA NA NA NA NA NA NA NA NA ...
## $ life_stage_individual : Factor w/ 13 levels "adult","elderly",..: NA NA NA NA NA NA NA NA NA NA ...
## $ sex_individual : Factor w/ 3 levels "both","female",..: NA NA NA NA NA NA NA NA NA NA ...
## $ id_cluster : Factor w/ 360 levels "1","2","3","4",..: 3 3 4 4 4 3 2 4 2 2 ...
## $ trait_size_category : Factor w/ 54 levels "adipose depot",..: 46 31 4 46 8 4 4 31 31 8 ...
## $ weight_status : Factor w/ 4 levels "dried","fixed",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ paired_organs_weighed_individually: Factor w/ 2 levels "no","yes": NA NA NA NA NA NA NA NA NA NA ...
## $ organ_side : Factor w/ 2 levels "left","right": NA NA NA NA NA NA NA NA NA NA ...
## $ trait_details : Factor w/ 445 levels "abdominal fat",..: 362 189 14 362 37 14 14 189 189 37 ...
## $ trait_unit : Factor w/ 16 levels "% of body weight",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ mean_trait : num 2.24 1.841 26.274 0.698 0.25 ...
## $ error_trait : num 0.1191 0.1173 2.1934 0.0596 0.0601 ...
## $ error_type : Factor w/ 3 levels "confidence_interval",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ n_for_mean_trait : chr "7" "7" "26" "26" ...
## $ additional_trait : Factor w/ 3 levels "individual metabolic rates",..: NA NA NA NA NA NA NA NA NA NA ...
## $ data_source : Factor w/ 80 levels "archived raw data",..: 39 41 42 39 37 42 42 41 41 37 ...
## $ data_doi : Factor w/ 5 levels "10.5061/\ndryad.6djh9w13b",..: NA NA NA NA NA NA NA NA NA NA ...
## $ relevant_notes : chr NA NA NA NA ...
## $ phylum : Factor w/ 3 levels "Arthropoda","Chordata",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ class : Factor w/ 8 levels "Actinopterygii",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ order : Factor w/ 60 levels "Accipitriformes",..: 48 48 48 48 48 48 48 48 48 48 ...
## $ family : Factor w/ 133 levels "Accipitridae",..: 30 30 30 30 30 30 30 30 30 30 ...
## $ genus : Factor w/ 268 levels "Abrothrix","Acomys",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ species : Factor w/ 366 levels "Abrothrix longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ source : Factor w/ 3 levels "gbif","itis",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ taxo_level : Factor w/ 1 level "Species": 1 1 1 1 1 1 1 1 1 1 ...
## $ species_underscored : Factor w/ 366 levels "Abrothrix_longipilis",..: 1 1 1 1 1 1 1 1 1 1 ...
In the code below, we will attempt to group the different types of organs included in our database to improve visualisation across the various figures. As we see below, there 54 distinct organ types (including the body size as a trait).
# check the organ names
length(unique(dat$trait_size_category))
## [1] 54
However, some of these actually represent the same system type.
# check the organ names
unique(dat$trait_size_category)
## [1] small intestine liver
## [3] body size caecum
## [5] intestine stomach
## [7] brain heart
## [9] kidney adipose depot
## [11] digestive tract spleen
## [13] central nervous system malpighian tubules
## [15] digestive system musculature
## [17] skeleton reproductive system
## [19] circulatory system and fat body ileum
## [21] duodenum gizzard
## [23] jejunum muscle
## [25] lung ventricle
## [27] colon rectum
## [29] esophagus harderian gland
## [31] pancreas ovary
## [33] bursa fat
## [35] thymus testes
## [37] proventriculus salt gland
## [39] adrenal glands thyroid/parathyroid glands
## [41] gonad bone
## [43] gill prostate gland
## [45] pituitary gland uterus
## [47] oviduct hind limb
## [49] fore limb electric organ
## [51] epididymides gut
## [53] ureter bladder
## 54 Levels: adipose depot adrenal glands bladder body size bone brain ... ventricle
This happened because during data extraction, I retained the original organ names as reported in the studies.
# Add system column using case_when
dat <- dat %>%
mutate(system = case_when(
# Digestive system
trait_size_category %in% c("liver", "caecum", "intestine", "stomach",
"digestive tract", "digestive system",
"jejunum", "duodenum", "gizzard", "ileum",
"esophagus", "colon", "rectum", "proventriculus",
"gut", "pancreas") ~ "Digestive",
# Excretory system
trait_size_category %in% c("kidney", "malpighian tubules", "ureter",
"bladder") ~ "Excretory",
# Circulatory system
trait_size_category %in% c("heart", "ventricle", "circulatory system and fat body",
"spleen") ~ "Circulatory",
# Immune system
trait_size_category %in% c("thymus", "bursa") ~ "Immune",
# Nervous system
trait_size_category %in% c("brain", "central nervous system",
"pituitary gland") ~ "Nervous",
# Endocrine system
trait_size_category %in% c("thyroid/parathyroid glands", "adrenal glands",
"harderian gland", "salt gland") ~ "Endocrine",
# Respiratory system
trait_size_category %in% c("lung", "gill") ~ "Respiratory",
# Reproductive system
trait_size_category %in% c("ovary", "testes", "gonad", "uterus",
"prostate gland", "oviduct", "epididymides",
"reproductive system") ~ "Reproductive",
# Musculoskeletal system
trait_size_category %in% c("skeleton", "bone", "hind limb", "fore limb",
"muscle", "musculature") ~ "Musculoskeletal",
# Adipose/fat storage
trait_size_category %in% c("adipose depot", "fat") ~ "Adipose tissue",
# Default category for anything not matched
TRUE ~ "Body size"
))
# Create summary table with group totals
summary_table <- dat %>%
count(system, trait_size_category) %>%
arrange(system, trait_size_category) %>%
# Add group totals
bind_rows(
dat %>%
count(system, name = "n") %>%
mutate(trait_size_category = "TOTAL") %>%
select(system, trait_size_category, n)
) %>%
arrange(system, trait_size_category != "TOTAL", trait_size_category)
Lets check the grouping
# make a table to see the number of organs measured in each system
kable(summary_table, col.names = c("System", "Organ", "Count")) %>%
kable_styling("striped", position = "left", full_width = TRUE) %>%
row_spec(which(summary_table$trait_size_category == "TOTAL"), bold = TRUE, background = "#f5f5f5") %>%
scroll_box(width = "100%", height = "500px")
| System | Organ | Count |
|---|---|---|
| Adipose tissue | TOTAL | 155 |
| Adipose tissue | adipose depot | 108 |
| Adipose tissue | fat | 47 |
| Body size | TOTAL | 3659 |
| Body size | body size | 3209 |
| Body size | electric organ | 60 |
| Body size | small intestine | 390 |
| Circulatory | TOTAL | 992 |
| Circulatory | circulatory system and fat body | 18 |
| Circulatory | heart | 515 |
| Circulatory | spleen | 307 |
| Circulatory | ventricle | 152 |
| Digestive | TOTAL | 2594 |
| Digestive | caecum | 289 |
| Digestive | colon | 27 |
| Digestive | digestive system | 18 |
| Digestive | digestive tract | 117 |
| Digestive | duodenum | 17 |
| Digestive | esophagus | 15 |
| Digestive | gizzard | 163 |
| Digestive | gut | 88 |
| Digestive | ileum | 13 |
| Digestive | intestine | 656 |
| Digestive | jejunum | 15 |
| Digestive | liver | 741 |
| Digestive | pancreas | 54 |
| Digestive | proventriculus | 26 |
| Digestive | rectum | 23 |
| Digestive | stomach | 332 |
| Endocrine | TOTAL | 181 |
| Endocrine | adrenal glands | 88 |
| Endocrine | harderian gland | 21 |
| Endocrine | salt gland | 40 |
| Endocrine | thyroid/parathyroid glands | 32 |
| Excretory | TOTAL | 457 |
| Excretory | bladder | 2 |
| Excretory | kidney | 435 |
| Excretory | malpighian tubules | 16 |
| Excretory | ureter | 4 |
| Immune | TOTAL | 101 |
| Immune | bursa | 41 |
| Immune | thymus | 60 |
| Musculoskeletal | TOTAL | 1136 |
| Musculoskeletal | bone | 479 |
| Musculoskeletal | fore limb | 112 |
| Musculoskeletal | hind limb | 112 |
| Musculoskeletal | muscle | 395 |
| Musculoskeletal | musculature | 18 |
| Musculoskeletal | skeleton | 20 |
| Nervous | TOTAL | 491 |
| Nervous | brain | 437 |
| Nervous | central nervous system | 33 |
| Nervous | pituitary gland | 21 |
| Reproductive | TOTAL | 457 |
| Reproductive | epididymides | 13 |
| Reproductive | gonad | 55 |
| Reproductive | ovary | 149 |
| Reproductive | oviduct | 5 |
| Reproductive | prostate gland | 8 |
| Reproductive | reproductive system | 17 |
| Reproductive | testes | 197 |
| Reproductive | uterus | 13 |
| Respiratory | TOTAL | 381 |
| Respiratory | gill | 165 |
| Respiratory | lung | 216 |
# Panel A: Extracting and cleaning publication years
df_years <- refs %>%
as.data.frame() %>%
select(year) %>%
mutate(year = as.numeric(as.character(year))) %>% # Ensure year is numeric
filter(!is.na(year)) # Remove NA values
# Counting studies per year and calculating cumulative values:
studies_per_year <- df_years %>%
group_by(year) %>%
summarise(num_studies = n()) %>%
arrange(year) %>%
mutate(cumulative_studies = cumsum(num_studies)) # Calculate cumulative count
# Plotting the cumulative number of studies:
plot_years <-
ggplot(studies_per_year, aes(x = year, y = cumulative_studies)) +
geom_line(color = "#009E73", linewidth = 2) +
scale_y_continuous(limits = c(0, 250)) + # Limit the y-axis to 400
scale_x_continuous(
limits = c(1955, 2024),
breaks = seq(1960, 2024, by = 12) # Set x-axis intervals to 55 years
) +
labs(
x = "Publication Year",
y = "Number of Studies"
) +
theme_pubr() +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12)
)
# ------------------------------------------------------------------------------
# lets reformat the names of the journal to have to count them and then plot
df_journals <- refs %>%
as.data.frame() %>%
select(journal) %>%
filter(!is.na(journal)) %>%
mutate(
journal_clean = tolower(journal),
journal_clean = str_replace_all(journal_clean, "\\\\&", "&"),
journal_clean = str_squish(journal_clean),
journal_clean = str_to_title(journal_clean),
journal_clean = str_replace_all(
journal_clean,
"\\b(Of|And|In|On|For|The|A|An|To|With|By|At|From|But|Or|Nor)\\b",
function(x) tolower(x)
),
journal_clean = sub("^([a-z])", toupper("\\1"), journal_clean)
) %>%
arrange(desc(journal_clean))
df_journals <- df_journals %>%
group_by(journal_clean) %>%
summarise(num_articles = n(), .groups = "drop") %>%
arrange(desc(num_articles)) %>%
slice_head(n = 10)
plot_journals <- ggplot(df_journals, aes(x = reorder(journal_clean, num_articles), y = num_articles)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
scale_y_continuous(limits = c(0, 50)) +
coord_flip() +
theme_pubr() +
labs(
x = "Journals",
y = "Number of Studies"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_blank()
) +
geom_text(aes(label = journal_clean), hjust = 0, vjust = 0.5,
color = "black", size = 4, check_overlap = TRUE,
position = position_dodge(width = 0.7)) +
geom_text(aes(label = num_articles),
hjust = 2, color = "black", size = 4)
# ------------------------------------------------------------------------------
# Combining Panel A and Panel B
Figure_1 <- plot_grid(
plot_years,
plot_journals,
labels = c("A", "B"),
nrow = 2,
ncol = 1,
label_size = 15,
align = "v")
Figure_1
# Saving the combined figure
ggsave('../outputs/Figure_1_Studies_and_Journals.pdf', Figure_1, width = 6, height = 9)
ggsave('../outputs/Figure_1_Studies_and_Journals.png', Figure_1, width = 6, height = 9, dpi = 1200)
# Calculate number of unique studies per category
plot_data <- dat %>%
group_by(context_study, origin, season_of_collection) %>%
summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
pivot_longer(
cols = -c(unique_studies),
names_to = "variable",
values_to = "level"
) %>%
mutate(
# Recode variable names for clear labels on plots
variable = recode(variable,
context_study = "Study context",
origin = "Source of specimens",
season_of_collection = "Season of collection"),
level = if_else(is.na(level), "not reported", as.character(level))
) %>%
group_by(variable, level) %>%
summarise(unique_studies = sum(unique_studies), .groups = "drop")
# Set the factor levels for ordering the facets/plots consistently
plot_data <- plot_data %>%
mutate(variable = factor(variable,
levels = c("Study context", "Source of specimens", "Season of collection")))
# Subset per plot
context_data <- plot_data %>% filter(variable == "Study context")
source_data <- plot_data %>% filter(variable == "Source of specimens")
season_data <- plot_data %>% filter(variable == "Season of collection")
# Now lets do the plots
Figure_2a <-
ggplot(context_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
geom_text(aes(label = unique_studies),
hjust = -0.15, color = "black", size = 4) +
coord_flip() +
scale_y_continuous(limits = c(0, 250)) +
labs(x = NULL, y = "Number of studies") +
theme_pubr() +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 12),
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
panel.spacing = unit(1, "lines")
)
Figure_2b <-
ggplot(source_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
geom_text(aes(label = unique_studies),
hjust = -0.15, color = "black", size = 4) +
coord_flip() +
scale_y_continuous(limits = c(0, 250)) +
labs(x = NULL, y = "Number of studies") +
theme_pubr() +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 12),
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
panel.spacing = unit(1, "lines")
)
Figure_2c <-
ggplot(season_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
geom_text(aes(label = unique_studies),
hjust = -0.15, color = "black", size = 4) +
coord_flip() +
scale_y_continuous(limits = c(0, 250)) +
labs(x = NULL, y = "Number of studies") +
theme_pubr() +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 12),
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
panel.spacing = unit(1, "lines")
)
# Combining Panel 2A, 2B and 2C
Figure_2 <- plot_grid(
Figure_2a,
Figure_2b,
Figure_2c,
labels = c("A", "B", "C"),
nrow = 3,
ncol = 1,
label_size = 15,
align = "v")
Figure_2
# Store Plots
ggsave('../outputs/Figure_2_Studies_by_context_source_season.pdf', Figure_2, width = 6, height = 12)
ggsave('../outputs/Figure_2_Studies_by_context_source_season.png', Figure_2, width = 6, height = 12, dpi = 1200)
# Calculate number of unique studies per category
plot_data <- dat %>%
group_by(sex_individual, life_stage_individual, habitat) %>%
summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
pivot_longer(
cols = -c(unique_studies),
names_to = "variable",
values_to = "level"
) %>%
mutate(
# Recode variable names for clear labels on plots
variable = recode(variable,
habitat = "Habitat",
sex_individual = "Sex studied",
life_stage_individual = "Life stage studied"),
level = if_else(is.na(level), "not reported", as.character(level))
) %>%
group_by(variable, level) %>%
summarise(unique_studies = sum(unique_studies), .groups = "drop")
# Set the factor levels for ordering the facets/plots consistently
plot_data <- plot_data %>%
mutate(variable = factor(variable,
levels = c("Habitat",
"Life stage studied",
"Sex studied")))
# Subset per plot
habitat_data <- plot_data %>% filter(variable == "Habitat")
stage_data <- plot_data %>% filter(variable == "Life stage studied")
sex_data <- plot_data %>% filter(variable == "Sex studied")
# Now lets do the plots
Figure_3a <-
ggplot(habitat_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
geom_text(aes(label = unique_studies),
hjust = -0.15, color = "black", size = 4) +
coord_flip() +
scale_y_continuous(limits = c(0, 300)) +
labs(x = NULL, y = "Number of studies") +
theme_pubr() +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 12),
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
panel.spacing = unit(1, "lines")
)
Figure_3b <-
ggplot(stage_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
geom_text(aes(label = unique_studies),
hjust = -0.15, color = "black", size = 4) +
coord_flip() +
scale_y_continuous(limits = c(0, 300)) +
labs(x = NULL, y = "Number of studies") +
theme_pubr() +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 12),
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
panel.spacing = unit(1, "lines")
)
Figure_3c <-
ggplot(sex_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
geom_text(aes(label = unique_studies),
hjust = -0.15, color = "black", size = 4) +
coord_flip() +
scale_y_continuous(limits = c(0, 300)) +
labs(x = NULL, y = "Number of studies") +
theme_pubr() +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 12),
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
panel.spacing = unit(1, "lines")
)
# Combining Panel 2A, 2B and 2C
Figure_3 <- plot_grid(
Figure_3a,
Figure_3b,
Figure_3c,
labels = c("A", "B", "C"),
nrow = 3,
ncol = 1,
label_size = 15,
align = "v")
Figure_3
# # Store Plots
ggsave('../outputs/Figure_3_Studies_by_habitat_stage_sex.pdf', Figure_3, width = 6, height = 12)
ggsave('../outputs/Figure_3_Studies_by_habitat_stage_sex.png', Figure_3, width = 6, height = 12, dpi = 1200)
data_studies <- dat %>%
group_by(class) %>%
summarise(unique_studies = n_distinct(key), .groups = "drop")
Figure_4a <-
ggplot(data_studies, aes(x = reorder(class, unique_studies), y = unique_studies)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
coord_flip() +
geom_text(aes(label = unique_studies), hjust = -0.15, color = "black", size = 4) +
theme_pubr() +
scale_y_continuous(limits = c(0, 150)) +
labs(x = NULL, y = "Number of studies") +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 12),
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
strip.background = element_rect(fill = "#B2ABD270", colour = "black", linewidth = 0.8),
panel.spacing = unit(1, "lines")
)
# plot most studies species
plot_data <- dat %>%
group_by(species) %>%
summarise(num_studies = n_distinct(key), .groups = "drop") %>%
arrange(desc(num_studies)) %>%
slice_head(n = 10) %>%
mutate(
# Reemplazar espacio por ~ para que ggplot2 lo interprete bien
species_italic = gsub(" ", "~~", species),
# Crear expresión para cursiva
species_italic = paste0("italic('", gsub(" ", " ", species), "')"),
# Alternativamente, para separar género y especie en cursiva sin comillas:
species_italic = gsub(" ", "~~", species),
species_italic = paste0("italic(", species_italic, ")")
)
# Graficar con nombres en cursiva
Figure_4b <- ggplot(plot_data, aes(x = reorder(species_italic, num_studies), y = num_studies)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
geom_text(aes(label = num_studies), hjust = -0.15, color = "black", size = 4) +
coord_flip() +
theme_pubr() +
scale_y_continuous(limits = c(0, 150)) +
labs(x = NULL, y = "Number of studies") +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 12),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8)
) +
scale_x_discrete(labels = function(x) parse(text = x))
# Combining Panel 4A and Panel 4B
Figure_4 <- plot_grid(
Figure_4a,
Figure_4b,
labels = c("A", "B"),
nrow = 2,
ncol = 1,
label_size = 15,
align = "v")
Figure_4
# Store Plots
ggsave('../outputs/Figure_4_Most_common_species.pdf', Figure_4, width = 6, height = 9)
ggsave('../outputs/Figure_4_Most_common_species.png', Figure_4, width = 6, height = 9, dpi = 1200)
plot_data <- dat %>%
group_by(system) %>%
summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
pivot_longer(
cols = -c(unique_studies),
names_to = "variable",
values_to = "level"
) %>%
mutate(variable = recode(variable,
system = "System studied")) %>%
group_by(variable, level) %>%
summarise(unique_studies = sum(unique_studies), .groups = "drop")
# let do the plot
Figure_5 <-
ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
coord_flip() +
geom_text(aes(label = unique_studies),
hjust = -0.15, color = "black", size = 4) +
theme_pubr() +
scale_y_continuous(limits = c(0, 250)) +
labs(
x = NULL, # Remove x axis label
y = "Number of species"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 12),
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
strip.background = element_rect(fill = "gray90", colour = "black", linewidth = 0.8),
panel.spacing = unit(1, "lines")
)
Figure_5
# # Store Plots
ggsave('../outputs/Figure_5_Studies_by_system.pdf', Figure_5, width = 7, height = 7)
ggsave('../outputs/Figure_5_Studies_by_system.png', Figure_5, width = 7, height = 7, dpi = 1200)
# Summaries data: calculate the mean value per species for each trait
names(dat)
## [1] "species_reported" "id_checker"
## [3] "double_checked" "notes_double_checking"
## [5] "id_data_extractor" "key"
## [7] "context_study" "taxonomic_group"
## [9] "habitat" "origin"
## [11] "season_of_collection" "lat_dec"
## [13] "long_dec" "age_years"
## [15] "life_stage_individual" "sex_individual"
## [17] "id_cluster" "trait_size_category"
## [19] "weight_status" "paired_organs_weighed_individually"
## [21] "organ_side" "trait_details"
## [23] "trait_unit" "mean_trait"
## [25] "error_trait" "error_type"
## [27] "n_for_mean_trait" "additional_trait"
## [29] "data_source" "data_doi"
## [31] "relevant_notes" "phylum"
## [33] "class" "order"
## [35] "family" "genus"
## [37] "species" "source"
## [39] "taxo_level" "species_underscored"
## [41] "system"
summary_data <- dat %>%
count(species_underscored, system) %>% # Count occurrences by species and system
mutate(presence = if_else(n > 0, "Yes", "No")) %>% # Convert counts to "Yes"/"No"
select(-n) %>% # Remove the original count column
pivot_wider(
names_from = system, # Each system becomes a column
values_from = presence, # Values are now "Yes" or "No"
values_fill = list(presence = "No") # Fill missing combinations with "No"
)
summary_data <- summary_data %>%
select(-`Body size`)
names(summary_data)
## [1] "species_underscored" "Digestive" "Adipose tissue"
## [4] "Circulatory" "Excretory" "Nervous"
## [7] "Musculoskeletal" "Reproductive" "Respiratory"
## [10] "Endocrine" "Immune"
# Check for mismatches between tree tips and data
# are all species of the tree in the database?
setdiff(tree$tip.label, summary_data$species_underscored)
## character(0)
# are all species from the database in the tree?
setdiff(summary_data$species_underscored, tree$tip.label)
## [1] "Muusoctopus_aegir" "Neomys_anomalus" "Neomys_fodiens"
# As I mentioned in the manuscript there were three species with no phylo information. In the code below, we will esclude those species from the dataframe (it is just for preparae data for this figure)
tree <- keep.tip(tree, intersect(tree$tip.label, summary_data$species_underscored))
summary_data <- summary_data[summary_data$species_underscored %in% tree$tip.label, ]
# check again
setdiff(tree$tip.label, summary_data$species_underscored)
## character(0)
setdiff(summary_data$species_underscored, tree$tip.label)
## character(0)
# Align data and tree
datF <- summary_data %>%
column_to_rownames("species_underscored")
# Plot with species without names
circ_names <- ggtree(tree, layout = "fan", open.angle = 18, branch.length = "none") +
geom_tiplab(offset = 0.1, hjust = 0, size = 1)
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
circ_names <- rotate_tree(circ_names, 90)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
# Plot with species without names
circ <- ggtree(tree, layout = "fan", open.angle = 18, branch.length = "none")
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
circ <- rotate_tree(circ, 90)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
# Create a new plot with heatmap for each trait using a single scale
tree_data <- gheatmap(
circ,
datF,
width = 0.4,
offset = 0, # Offset for placing the heatmap
colnames_offset_x = 0,
colnames_offset_y = 0,
font.size = 3,
hjust = 0
)
# Apply the same scale for all traits
circ_data <- tree_data +
scale_fill_manual(values = c("grey", "#009E73"), name = "System measured") +
theme(
legend.position = c(0.56, 0.57), # Posición manual de la leyenda
legend.background = element_rect(fill = "transparent", colour = NA), # Fondo transparente
legend.box.background = element_rect(fill = "transparent", colour = NA) # Borde transparente
)
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
## 3.5.0.
## ℹ Please use the `legend.position.inside` argument of `theme()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
circ_data
ggsave("../outputs/Figure_6_Phylogenetic_tree_with_data.png", circ_data, width = 12, height = 12, dpi = 1500)
ggsave("../outputs/Figure_6_Phylogenetic_tree_with_data.pdf", circ_data, width = 12, height = 12)
ggsave("../outputs/Figure_6_Phylogenetic_tree_with_names.pdf", circ_names, width = 12, height = 12)
#prepare data for ploting
data_taxa <- dat %>%
group_by(class) %>%
summarise(unique_species = n_distinct(species), .groups = "drop") %>%
pivot_longer(
cols = -c(unique_species),
names_to = "variable",
values_to = "level"
)
# Create plot with panel borders
Figure_S2 <-
ggplot(data_taxa, aes(x = reorder(level, unique_species), y = unique_species)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
coord_flip() +
# Add number of studies inside the bars
geom_text(aes(label = unique_species),
hjust = -0.15, color = "black", size = 4) +
theme_pubr() +
scale_y_continuous(limits = c(0, 250)) +
labs(
x = NULL, # Remove x axis label
y = "Number of species"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 9),
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
strip.background = element_rect(fill = "gray90", colour = "black", linewidth = 0.8),
panel.spacing = unit(1, "lines")
)
Figure_S2
# Store Plots
ggsave('../outputs/Figure_S2_Species_by_class.pdf', Figure_S2, width = 6, height = 6)
ggsave('../outputs/Figure_S2_Species_by_class.png', Figure_S2, width = 6, height = 6, dpi = 1200)
plot_data <- dat %>%
group_by(trait_size_category) %>%
summarise(unique_studies = n_distinct(key), .groups = "drop") %>%
pivot_longer(
cols = -c(unique_studies),
names_to = "variable",
values_to = "level"
) %>%
mutate(variable = recode(variable,
trait_size_category = "Type of organ reported")) %>%
group_by(variable, level) %>%
summarise(unique_studies = sum(unique_studies), .groups = "drop")
# Create plot with panel borders
Figure_S3 <-
ggplot(plot_data, aes(x = reorder(level, unique_studies), y = unique_studies)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
coord_flip() +
geom_text(aes(label = unique_studies),
hjust = -0.15, color = "black", size = 4) +
theme_pubr() +
scale_y_continuous(limits = c(0, 250)) +
labs(
x = NULL, # Remove x axis label
y = "Number of studies"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 12),
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", hjust = 0.5),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 0.8),
strip.background = element_rect(fill = "gray90", colour = "black", linewidth = 0.8),
panel.spacing = unit(1, "lines")
)
Figure_S3
# Store Plots
ggsave('../outputs/Figure_S3_Studies_by_organ.pdf', Figure_S3, width = 7, height = 10)
ggsave('../outputs/Figure_S3_Studies_by_organ.png', Figure_S3, width = 7, height = 10, dpi = 1200)
dat %>%
distinct(species) %>%
nrow()
## [1] 366
df_years <- refs %>%
as.data.frame() %>%
select(year) %>%
mutate(year = as.numeric(as.character(year))) %>%
filter(!is.na(year)) %>% # Remove rows where 'year' is missing (NA)
group_by(year) %>%
summarise(num_studies = n()) %>%
arrange(year) %>%
mutate(cumulative_studies = cumsum(num_studies))
## Range de years included in the database
df_years %>%
reframe(min_year = min(year),
max_year = max(year),
total_years = max_year - min_year)
## # A tibble: 1 × 3
## min_year max_year total_years
## <dbl> <dbl> <dbl>
## 1 1964 2024 60
table_sp_number <-dat %>%
group_by(class) %>%
reframe(n_spp = length(unique(species)),
total_study = length(unique(key)),
perc_species = (n_spp/length(unique(dat$species)))* 100) %>%
arrange(desc(perc_species))
# test table kable
kable(table_sp_number, col.names = c("Class", "N", "Number of studies", "Percentage of species")) %>%
kable_styling("striped", position = "left", full_width = TRUE) %>%
row_spec(which(table_sp_number$n_spp == "TOTAL"), bold = TRUE, background = "#f5f5f5") %>%
scroll_box(width = "100%", height = "500px")
| Class | N | Number of studies | Percentage of species |
|---|---|---|---|
| Mammalia | 227 | 78 | 62.021858 |
| Aves | 71 | 110 | 19.398907 |
| Actinopterygii | 24 | 29 | 6.557377 |
| Insecta | 14 | 1 | 3.825137 |
| Amphibia | 13 | 6 | 3.551913 |
| Reptilia | 12 | 6 | 3.278688 |
| Chondrichthyes | 4 | 4 | 1.092896 |
| Cephalopoda | 1 | 1 | 0.273224 |
table_organ_type <- dat %>%
group_by(trait_size_category) %>%
summarise(
n_spp = n_distinct(species),
total_study = n_distinct(key)
) %>%
mutate(
perc_studies = (total_study / n_distinct(dat$key)) * 100,
perc_species = (n_spp / n_distinct(dat$species)) * 100
) %>%
arrange(desc(perc_studies))
# Select and rename columns for display
table_show <- table_organ_type %>%
select(
Organ = trait_size_category,
`Number of species` = n_spp,
`Percentage of species` = perc_species,
`Number of studies` = total_study,
`Percentage of studies` = perc_studies
)
# Display the table with kable and kableExtra
kable(table_show, digits = 2) %>%
kable_styling("striped", position = "left", full_width = TRUE) %>%
scroll_box(width = "100%", height = "500px")
| Organ | Number of species | Percentage of species | Number of studies | Percentage of studies |
|---|---|---|---|---|
| body size | 366 | 100.00 | 235 | 100.00 |
| liver | 171 | 46.72 | 161 | 68.51 |
| heart | 158 | 43.17 | 94 | 40.00 |
| spleen | 125 | 34.15 | 94 | 40.00 |
| kidney | 159 | 43.44 | 80 | 34.04 |
| gizzard | 13 | 3.55 | 54 | 22.98 |
| lung | 32 | 8.74 | 38 | 16.17 |
| bursa | 5 | 1.37 | 36 | 15.32 |
| muscle | 21 | 5.74 | 36 | 15.32 |
| testes | 21 | 5.74 | 35 | 14.89 |
| brain | 165 | 45.08 | 34 | 14.47 |
| fat | 12 | 3.28 | 32 | 13.62 |
| intestine | 138 | 37.70 | 32 | 13.62 |
| pancreas | 17 | 4.64 | 32 | 13.62 |
| thymus | 10 | 2.73 | 32 | 13.62 |
| stomach | 167 | 45.63 | 29 | 12.34 |
| small intestine | 36 | 9.84 | 26 | 11.06 |
| caecum | 43 | 11.75 | 25 | 10.64 |
| ovary | 14 | 3.83 | 23 | 9.79 |
| adrenal glands | 14 | 3.83 | 22 | 9.36 |
| proventriculus | 6 | 1.64 | 19 | 8.09 |
| duodenum | 5 | 1.37 | 12 | 5.11 |
| epididymides | 5 | 1.37 | 11 | 4.68 |
| jejunum | 4 | 1.09 | 11 | 4.68 |
| ileum | 3 | 0.82 | 10 | 4.26 |
| uterus | 4 | 1.09 | 9 | 3.83 |
| colon | 21 | 5.74 | 8 | 3.40 |
| gonad | 7 | 1.91 | 8 | 3.40 |
| thyroid/parathyroid glands | 7 | 1.91 | 7 | 2.98 |
| bone | 67 | 18.31 | 6 | 2.55 |
| digestive tract | 104 | 28.42 | 5 | 2.13 |
| gill | 9 | 2.46 | 5 | 2.13 |
| pituitary gland | 7 | 1.91 | 5 | 2.13 |
| prostate gland | 4 | 1.09 | 5 | 2.13 |
| rectum | 17 | 4.64 | 5 | 2.13 |
| ventricle | 5 | 1.37 | 5 | 2.13 |
| adipose depot | 100 | 27.32 | 4 | 1.70 |
| oviduct | 2 | 0.55 | 4 | 1.70 |
| salt gland | 4 | 1.09 | 4 | 1.70 |
| esophagus | 14 | 3.83 | 2 | 0.85 |
| gut | 2 | 0.55 | 2 | 0.85 |
| harderian gland | 2 | 0.55 | 2 | 0.85 |
| skeleton | 14 | 3.83 | 2 | 0.85 |
| bladder | 1 | 0.27 | 1 | 0.43 |
| central nervous system | 14 | 3.83 | 1 | 0.43 |
| circulatory system and fat body | 13 | 3.55 | 1 | 0.43 |
| digestive system | 13 | 3.55 | 1 | 0.43 |
| electric organ | 1 | 0.27 | 1 | 0.43 |
| fore limb | 2 | 0.55 | 1 | 0.43 |
| hind limb | 2 | 0.55 | 1 | 0.43 |
| malpighian tubules | 12 | 3.28 | 1 | 0.43 |
| musculature | 13 | 3.55 | 1 | 0.43 |
| reproductive system | 13 | 3.55 | 1 | 0.43 |
| ureter | 1 | 0.27 | 1 | 0.43 |
# check names y sselec the most relevamt columns
names(dat)
## [1] "species_reported" "id_checker"
## [3] "double_checked" "notes_double_checking"
## [5] "id_data_extractor" "key"
## [7] "context_study" "taxonomic_group"
## [9] "habitat" "origin"
## [11] "season_of_collection" "lat_dec"
## [13] "long_dec" "age_years"
## [15] "life_stage_individual" "sex_individual"
## [17] "id_cluster" "trait_size_category"
## [19] "weight_status" "paired_organs_weighed_individually"
## [21] "organ_side" "trait_details"
## [23] "trait_unit" "mean_trait"
## [25] "error_trait" "error_type"
## [27] "n_for_mean_trait" "additional_trait"
## [29] "data_source" "data_doi"
## [31] "relevant_notes" "phylum"
## [33] "class" "order"
## [35] "family" "genus"
## [37] "species" "source"
## [39] "taxo_level" "species_underscored"
## [41] "system"
# slect the most releventa columns and sort where is needed
Organ_Size_DB_v1.0.0 <- dat %>%
select("key",
"id_checker",
"double_checked", "id_data_extractor",
"context_study",
"phylum", "class", "order", "family", "genus","species",
"species_reported","species_underscored",
"origin", "habitat","season_of_collection",
"lat_dec",
"long_dec",
"age_years",
"sex_individual", "life_stage_individual", "n_for_mean_trait",
"id_cluster", "trait_size_category",
"weight_status","paired_organs_weighed_individually", "organ_side", "trait_details", "trait_unit",
"mean_trait", "error_trait", "error_type",
"data_source", "data_doi",
"relevant_notes")
# export file as csv
write.csv(Organ_Size_DB_v1.0.0, "../outputs/Organ_Size_Database_v1.0.0.csv", row.names = FALSE)
# and excel
writexl::write_xlsx(Organ_Size_DB_v1.0.0, "../outputs/Organ_Size_Database_v1.0.0.xlsx")
session_info() %>%
details(summary = 'Current Session Information', open = TRUE)
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os macOS 15.6
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Amsterdam
date 2025-08-06
pandoc 3.4 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
quarto 1.6.42 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-8 2024-09-12 [1] CRAN (R 4.3.3)
ape * 5.8-1 2024-12-16 [1] CRAN (R 4.3.3)
aplot 0.2.5 2025-02-27 [1] CRAN (R 4.3.3)
backports 1.5.0 2024-05-23 [1] CRAN (R 4.3.3)
bibtex 0.5.1 2023-01-26 [1] CRAN (R 4.3.3)
bookdown 0.43 2025-04-15 [1] CRAN (R 4.3.3)
broom 1.0.8 2025-03-28 [1] CRAN (R 4.3.3)
bslib 0.9.0 2025-01-30 [1] CRAN (R 4.3.3)
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.3.3)
car 3.1-3 2024-09-27 [1] CRAN (R 4.3.3)
carData 3.0-5 2022-01-06 [1] CRAN (R 4.3.3)
class 7.3-23 2025-01-01 [1] CRAN (R 4.3.3)
classInt 0.4-11 2025-01-08 [1] CRAN (R 4.3.3)
cli 3.6.5 2025-04-23 [1] CRAN (R 4.3.3)
clipr 0.8.0 2022-02-22 [1] CRAN (R 4.3.3)
codetools 0.2-20 2024-03-31 [1] CRAN (R 4.3.3)
cowplot * 1.1.3 2024-01-22 [1] CRAN (R 4.3.1)
crayon 1.5.3 2024-06-20 [1] CRAN (R 4.3.3)
data.table 1.17.4 2025-05-26 [1] CRAN (R 4.3.3)
data.tree 1.1.0 2023-11-12 [1] CRAN (R 4.3.3)
DataExplorer * 0.8.3 2024-01-24 [1] CRAN (R 4.3.1)
DBI 1.2.3 2024-06-02 [1] CRAN (R 4.3.3)
desc 1.4.3 2023-12-10 [1] CRAN (R 4.3.3)
details * 0.4.0 2025-02-09 [1] CRAN (R 4.3.3)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.3.3)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
e1071 1.7-16 2024-09-16 [1] CRAN (R 4.3.3)
evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.3.3)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.3.3)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.3.3)
Formula 1.2-5 2023-02-24 [1] CRAN (R 4.3.3)
fs 1.6.6 2025-04-12 [1] CRAN (R 4.3.3)
generics 0.1.4 2025-05-09 [1] CRAN (R 4.3.3)
ggfun 0.1.8 2024-12-03 [1] CRAN (R 4.3.3)
ggplot2 * 3.5.2 2025-04-09 [1] CRAN (R 4.3.3)
ggplotify 0.1.2 2023-08-09 [1] CRAN (R 4.3.0)
ggpubr * 0.6.0 2023-02-10 [1] CRAN (R 4.3.0)
ggsignif 0.6.4 2022-10-13 [1] CRAN (R 4.3.0)
ggthemes * 5.1.0 2024-02-10 [1] CRAN (R 4.3.1)
ggtree * 3.10.1 2024-02-27 [1] Bioconductor 3.18 (R 4.3.2)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.3.3)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.3)
gridGraphics 0.5-1 2020-12-13 [1] CRAN (R 4.3.3)
gtable 0.3.6 2024-10-25 [1] CRAN (R 4.3.3)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.0)
igraph 2.1.4 2025-01-23 [1] CRAN (R 4.3.3)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.3.3)
jsonlite 2.0.0 2025-03-27 [1] CRAN (R 4.3.3)
kableExtra * 1.4.0 2024-01-24 [1] CRAN (R 4.3.1)
KernSmooth 2.23-26 2025-01-01 [1] CRAN (R 4.3.3)
knitr 1.50 2025-03-16 [1] CRAN (R 4.3.3)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.3)
lattice 0.22-7 2025-04-02 [1] CRAN (R 4.3.3)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.3)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.3)
lubridate 1.9.4 2024-12-08 [1] CRAN (R 4.3.3)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.3)
networkD3 0.4.1 2025-04-14 [1] CRAN (R 4.3.3)
nlme 3.1-168 2025-03-31 [1] CRAN (R 4.3.3)
patchwork 1.3.0 2024-09-16 [1] CRAN (R 4.3.3)
pillar 1.10.2 2025-04-05 [1] CRAN (R 4.3.3)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.3)
plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.3)
png 0.1-8 2022-11-29 [1] CRAN (R 4.3.3)
proxy 0.4-27 2022-06-09 [1] CRAN (R 4.3.3)
purrr 1.0.4 2025-02-05 [1] CRAN (R 4.3.3)
R6 2.6.1 2025-02-15 [1] CRAN (R 4.3.3)
ragg 1.4.0 2025-04-10 [1] CRAN (R 4.3.3)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.3)
Rcpp 1.0.14 2025-01-12 [1] CRAN (R 4.3.3)
RefManageR * 1.4.0 2022-09-30 [1] CRAN (R 4.3.0)
rlang 1.1.6 2025-04-11 [1] CRAN (R 4.3.3)
rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.3.3)
rnaturalearth * 1.0.1 2023-12-15 [1] CRAN (R 4.3.1)
rstatix 0.7.2 2023-02-01 [1] CRAN (R 4.3.0)
rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.3.3)
sass 0.4.10 2025-04-11 [1] CRAN (R 4.3.3)
scales 1.4.0 2025-04-24 [1] CRAN (R 4.3.3)
sessioninfo * 1.2.3 2025-02-05 [1] CRAN (R 4.3.3)
sf 1.0-21 2025-05-15 [1] CRAN (R 4.3.3)
stringi 1.8.7 2025-03-27 [1] CRAN (R 4.3.3)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
svglite 2.2.1 2025-05-12 [1] CRAN (R 4.3.3)
systemfonts 1.2.3 2025-04-30 [1] CRAN (R 4.3.3)
terra 1.8-50 2025-05-09 [1] CRAN (R 4.3.3)
textshaping 1.0.1 2025-05-01 [1] CRAN (R 4.3.3)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidygeocoder * 1.0.6 2025-03-31 [1] CRAN (R 4.3.3)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.1)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1)
tidytree 0.4.6 2023-12-12 [1] CRAN (R 4.3.1)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.3)
treeio 1.26.0 2023-11-06 [1] Bioconductor
units 0.8-7 2025-03-11 [1] CRAN (R 4.3.3)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.3)
viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.3)
withr 3.0.2 2024-10-28 [1] CRAN (R 4.3.3)
writexl 1.5.4 2025-04-15 [1] CRAN (R 4.3.3)
xfun 0.52 2025-04-02 [1] CRAN (R 4.3.3)
xml2 1.3.8 2025-03-14 [1] CRAN (R 4.3.3)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.3.3)
yulab.utils 0.2.0 2025-01-29 [1] CRAN (R 4.3.3)
[1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
* ── Packages attached to the search path.
──────────────────────────────────────────────────────────────────────────────