Configuration file
GRIMER uses a configuration file to set reference sources of annotation (e.g. contaminants), controls and external tools (decontam, mgnify). The configuration can be provided with the argument -c/--config
and it should be in the YAML format.
A basic example of a configuration file:
references:
"Contaminants": "files/contaminants.yml"
"Human-related": "files/human-related.yml"
controls:
"Negative Controls": "path/file1.tsv"
"Positve Controls":
"Metadata_Field":
- "Metadata_Value1"
- "Metadata_Value2"
external:
mgnify: "files/mgnify5989.tsv"
decontam:
threshold: 0.1
method: "frequency"
references
References can be provided as an external .yml/.yaml
file in a specific format (see below) or as a text file with one taxonomic identifier or taxonomic name per line.
"General Description":
"Specific description":
url: "www.website.com?id={}"
ids: [1,2,3]
A real example of saliva organisms extracted from BacDive (NCBI taxonomic ids):
"Human-related bacterial isolates from BacDive":
"Saliva":
url: "https://bacdive.dsmz.de/search?search=taxid:{}"
ids: [152331, 113107, 157688, 979627, 45634, 60133, 157687, 1624, 1583331, 1632, 249188]
Common contaminants compiled from the literature and human-related possible sources of contamination are available in the GRIMER repository. For more information, please refer to the pre-print. If the target study overlaps with some of those annotation (e.g. study of human skin), related entries can be easily removed from the provided files to not generate redundant annotations.
controls
Several control groups can be provided to annotate samples. They can be provided as a file with one sample identifier per line:
controls:
"Controls": "controls.txt"
or directly from the metadata (-m/--metadata-file
) as a field and value(s) information:
controls:
"Other Controls":
"sample_type": # field
- "blank" # value
- "control" # value
Both methods can be combined into one configuration file.
external
Set the configuration and functionality of external tools executed by GRIMER.
mgnify
GRIMER uses a parsed MGnify database to annotate observations and link them to the respective MGnify repository, reporting most common biome occurrences. Instructions on how to re-generate the parsed database from MGnify can be found here.
A pre-parsed database is available in the GRIMER repository (generated on 2022-03-09). To use it, please set the file in the configuration as follows and activate it with the -g/--mgnify
when running GRIMER.
external:
mgnify: "files/mgnify5989.tsv"
decontam
GRIMER can run DECONTAM with -d/--decontam
, but some configuration is necessary. It is possible to set the threshold (P* hyperparameter) and the method (frequency, prevalence, combined).
For the frequency/combined method, DNA frequencies for each sample have to be provided either in a .tsv
file (sample identifier
For the prevalence/combined method, file(s) with a list of sample identifiers or a metadata field/value can be provided. If none is provided, all samples defined in the "controls" are considered for the prevalence calculation.
Below an example of how to set-up the configuration file for DECONTAM:
external:
decontam:
threshold: 0.1 # P* hyperparameter threshold, values between 0 and 1
method: "frequency" # Options: frequency, prevalence, combined
frequency_file: "path/file1.txt"
# frequency_metadata: "Field1"
# prevalence_file:
# - "path/file1.txt"
# - "path/file2.txt"
prevalence_metadata:
"Field1":
- "ValueA"
- "ValueB"
"Field2":
- "ValueC"
Using the configuration file
Example UgandaMaternalV3V4.16s_DADA2.taxon_abundance.biom file from microbiomedb.org
config.yml (external .yml files are available in the GRIMER repository)
references:
"Contaminants": "files/contaminants.yml"
"Human-related": "files/human-related.yml"
external:
mgnify: "files/mgnify5989.tsv"
decontam:
threshold: 0.1 # [0-1] P* hyperparameter
method: "frequency" # frequency, prevalence, combined
Running GRIMER with DECONTAM and MGnify integration
grimer --input-file UgandaMaternalV3V4.16s_DADA2.taxon_abundance.biom \
--config config.yml \
--decontam --mgnify \
--taxonomy ncbi \
--ranks superkingdom phylum class order family genus species