Reports:)
ganon report generates taxonomic reports and summaries based on the results obtained using ganon classify. Optionally filters and formats the report. Results results can be summarised in terms of taxonomic and sequence abundances, as well as the total number of matches.
A file with .tre extension is generated containing the taxonomic report, more infos about the format here. Additionaly, a summary is reported (STDERR) ny taxonomic rank, for example:
unique shared children total
root 0% 0% 71.75% 71.75%
domain 0% 0% 71.75% 71.75%
phylum 0% 0% 71.75% 71.75%
class 0% 0% 71.55% 71.55%
order 0% 0% 71.7% 71.7%
family 0% 0% 71.6% 71.6%
genus 0% 0% 71.75% 71.75%
species 26.65% 16.25% 28.85% 71.75%
assembly 17.2% 11.65% 0% 28.85%
Details on unique, shared and children values can be found here.
Report types (--report-type):)
Several reports are available with --report-type: reads, abundance, dist, corr, matches:
readsreports sequence abundances which are the basic proportion of reads classified in the sample.abundancewill convert sequence abundance into taxonomic abundances by re-distributing read counts among leaf nodes and correcting by genome size. The re-distribution applies for reads classified with a LCA assignment and it is proportional to the number of unique matches of leaf nodes available in the ganon database (relative to the LCA node). If EM was previously used to re-distribute reads, will only correct for sequence abundance (same ascorr). Genome size is estimated based on NCBI or GTDB auxiliary files. Genome size correction is applied by rank based on default ranks only (superkingdom phylum class order family genus species assembly). Read counts in intermediate ranks will be corrected based on the closest parent default rank and re-assigned to its original rank.distis the same ofreadswith read count re-distributioncorris the same ofreadswith correction by genome sizematcheswill report the total number of matches classified, either unique or shared. This option will output the total number of matches instead the total number of reads
Filter and format:)
Results can be filtered in several ways:
--ranksfilters specific taxonomic ranks.--top-percentilekeeps only the top percentile chosen for each rank (e.g.0.5will keep 50% of taxa for each rank).--min-count/--max-countmin/max percentage or number of counts to keep an taxa. Values between 0-1 for percentage, >1 for number of counts (total).--names/--names-with/--taxidskeeps only entries matching specific names or taxids.--skip-hierarchy/--keep-hierarchyskip or keep hiearchy levels on the report
Results can be formatted with:
--output-formatfile output format:text, tsv, csv, bioboxes.--sortorder of the output:rank, lineage, count, unique.--normalizewill ignore the number of unclassified reads, normalizing the output to 100%.--split-hierarchysplits output reports by hierarchy levels.
Examples:)
Given the output .rep from ganon classify and the database used (--db-prefix):
Taxonomic profile with abundance estimation (default):)
ganon report --db-prefix mydb --input results.rep --output-prefix tax_profile --report-type abundance
Sequence profile:)
ganon report --db-prefix mydb --input results.rep --output-prefix seq_profile --report-type reads
Matches profile:)
ganon report --db-prefix mydb --input results.rep --output-prefix matches --report-type matches
Filtering results:)
ganon report --db-prefix mydb --input results.rep --output-prefix filtered --min-count 0.0005 --top-percentile 0.8
This will keep only results with a min. abundance of 0.05% and only the top 80% most abundant.