Importing files

GRIMER is independent of any quantification method and requires a contingency table with raw counts of observations/components for each samples/compositions in the study. Observations are usually, but not limited to, taxonomic entries (e.g. genus, species, strains), operational taxonomic units (OTUs), amplicon sequence variants (ASVs), metagenome-assembled genomes (MAGs) or sequence features.

GRIMER --input-file accepts a file with tab-separated values (.tsv) containing a table of counts (Observation table, Count table, Contingency Tables, ...) or a .biom file.

The Biological Observation Matrix file (.biom)

GRIMER parses BIOM files and affiliated metadata, if available. Alternatively, an external metadata file can be provided with -m/--metadata and will take precedence over the .biom metadata.

Example UgandaMaternalV3V4.16s_DADA2.taxon_abundance.biom file from microbiomedb.org

  • Default report (no taxonomy)
grimer --input-file UgandaMaternalV3V4.16s_DADA2.taxon_abundance.biom
  • Integrated NCBI taxonomy (will translate names to taxonomy ids)
grimer --input-file UgandaMaternalV3V4.16s_DADA2.taxon_abundance.biom \
       --taxonomy ncbi \
       --ranks superkingdom phylum class order family genus species
grimer --input-file UgandaMaternalV3V4.16s_DADA2.taxon_abundance.biom \
       --metadata-file UgandaMaternalV3V4.16s_DADA2.sample_details.tsv \
       --taxonomy ncbi \
       --ranks superkingdom phylum class order family genus species

tab-separated file (.tsv)

GRIMER parses .tsv files with single taxonomic identifier/names annotations or with multi-level (e.g.: lineage) taxonomic annotated observations.

  • Rows contain observations and columns contain samples (use --transpose if your file is reversed)
  • First column and first row are used as headers
  • Taxonomy integration: files can have either taxonomic identifiers (NCBI, e.g.: 562) or taxonomic names (NCBI, e.g.: Escherichia coli or GTDB, e.g.: s__Escherichia coli)

Multi-level annotations (e.g. Bacteria;Proteobacteria;Gammaproteobacteria...)

grimer --input-file UgandaMaternalV3V4.16s_DADA2.taxon_abundance.tsv \
       --level-separator ";"
grimer --input-file UgandaMaternalV3V4.16s_DADA2.taxon_abundance.tsv \
       --level-separator ";" \
       --metadata-file UgandaMaternalV3V4.16s_DADA2.sample_details.tsv
  • With integrated NCBI taxonomy (will translate names to taxids)
grimer --input-file UgandaMaternalV3V4.16s_DADA2.taxon_abundance.tsv \
       --level-separator ";" \
       --metadata-file UgandaMaternalV3V4.16s_DADA2.sample_details.tsv \
       --taxonomy ncbi \
       --ranks superkingdom phylum class order family genus species

Single level annotations (e.g. Neisseria animalis)

# Removing first column with kingdom
cut -f 2- ERP108433_phylum_taxonomy_abundances_SSU_v4.1.tsv > ERP108433_phylum_taxonomy_abundances_SSU_v4.1_parsed.tsv
# Set identifier for unassigned observations as "Unassigned" (many occurences, will be summed)
grimer --input-file ERP108433_phylum_taxonomy_abundances_SSU_v4.1_parsed.tsv \
       --unassigned-header "Unassigned"
  • Re-generating taxonomic lineage from single annotations (in this case only superkingdom)
grimer --input-file ERP108433_phylum_taxonomy_abundances_SSU_v4.1_parsed.tsv \
       --unassigned-header "Unassigned" \
       --taxonomy ncbi \
       --ranks superkingdom phylum 

From commonly used tools/sources

ganon

ganon table --input *.tre \
            --output-file ganon_table.tsv \
            --header taxid \
            --rank species

grimer --input-file ganon_table.tsv \
       --taxonomy ncbi \
       --ranks superkingdom phylum class order family genus species

MetaPhlAn

# merge_metaphlan_tables.py is available with the metaphlan package
merge_metaphlan_tables.py *.tsv | head -n+2 > metaphlan_table.tsv

grimer --input-file metaphlan_table.tsv \
       --level-separator "|" \
       --obs-replace '^.+__' '' '_' ' ' \
       --taxonomy ncbi \
       --ranks superkingdom phylum class order family genus species

QIIME2 feature table (.qza)

qiime tools export --input-path feature-table.qza --output-path exported-feature-table
grimer --input-file exported-feature-table/feature-table.biom

phyloseq

#source("http://bioconductor.org/biocLite.R")
#biocLite("biomformat")
#biocLite('phyloseq')
library("biomformat")
library('phyloseq')
data(soilrep)
b <- make_biom(data = otu_table(soilrep))
write_biom(b, 'out.biom')
grimer --input-file out.biom

MGnify

  • grimer-mgnify.py will download and generate a GRIMER report for any MGnify study accession (e.g. MGYS00006024)
# Install API dependency
conda install "jsonapi-client>=0.9.7"
./grimer-mgnify.py -i MGYS00006024 -o out_folder_mgnify/