get_metadata {chem16S}R Documentation

Template function for metadata

Description

Simple function that processes metadata files.

Usage

  get_metadata(file, metrics)

Arguments

file

character, metadata file (CSV format)

metrics

data frame, output of get_metrics

Details

This function is provided as a simple example of reading metadata about 16S rRNA datasets. The basic metadata for all datasets include ‘⁠name⁠’ (study name), ‘⁠Run⁠’ (SRA or other run ID), ‘⁠BioSample⁠’ (SRA BioSample), and ‘⁠Sample⁠’ (sample description). Each data set contains environmental data, but the specific variables differ between datasets. Therefore, this function processes the environmental data for different datasets and appends ‘⁠pch⁠’ and ‘⁠col⁠’ columns that can be used for visualization and identifying trends between groups of samples.

Samples are identified by the Run column (case-sensitive) in both file and metrics. It is possible that metrics for one or more samples are not available (for example, if the number of RDP classifications is below mincount in get_metrics). The function therefore removes metadata for unavailable samples. Then, the samples in metrics are placed in the same order as they appear in metadata. Ordering the samples is important because the columns in RDP Classifier output are not necessarily in the same order as the metadata.

Value

If metrics is NULL, a data frame read from file with columns ‘⁠pch⁠’ and ‘⁠col⁠’ appended. Otherwise, a list with two components, metadata and metrics. In this list, metrics is the value from the metrics argument and metadata is a data frame read from file with columns ‘⁠pch⁠’ and ‘⁠col⁠’ appended and excluding any rows that correspond to unavailable samples in metrics.

Examples

# Get metrics for the Bison Pool dataset (Swingley et al., 2012)
RDPfile <- system.file("extdata/RDP/SMS+12.tab.xz", package = "chem16S")
RDP <- read_RDP(RDPfile)
map <- map_taxa(RDP, refdb = "RefSeq_206")
metrics <- get_metrics(RDP, map, refdb = "RefSeq_206")
# Read the metadata file
mdatfile <- system.file("extdata/metadata/SMS+12.csv", package = "chem16S")
mdat <- get_metadata(mdatfile, metrics)

# Print sample names
mdat$metadata$Sample # 1 2 3 4 5
# Plot Zc of community reference proteomes at sites 1-5
Site <- mdat$metadata$Sample
Zc <- mdat$metrics$Zc
plot(Site, Zc, ylab = "Carbon oxidation state", type = "b")

# Find the columns of the RDP file for samples 1 to 5
match(mdat$metadata$Run, colnames(RDP)) # 6 5 7 8 9
# NOTE: for this dataset, samples 1 and 2 are in columns 6 and 5, respectively

[Package chem16S version 1.1.0-5 Index]