1. Run blast on reads. Output format should be specified with -outfmt 6. The details about this format can be found here. https://sites.google.com/site/wiki4metagenomics/tools/blast/blastn-output-format-6 blastn -query genes.fasta -subject genome.fasta -outfmt 6 Additional notes: Blast is an alignment software that finds alignments for sequences in a query file with sequences in a reference file. How to install and run blast alignment: https://doctorlib.info/medical/blast/11.html ----------------------------------------------------------------------------------------------------------------------------------------------- 2. Analyze the blast output compile analyse_blast_csvout.cpp: g++ analyse_blast_csvout.cpp -o abcout.exe Analyze: ./abcout.exe Additional notes: E-value output by blast is in exponential notation ex 2.44e-12. The figure after the '-' gives the exponent figure which is used to determine if an alignment is above the E_value_threshold. Higher the exponent figure, better the alignment. More on E value: http://www.metagenomics.wiki/tools/blast/evalue ----------------------------------------------------------------------------------------------------------------------------------------------- 3. Find unique query genes/reads found in a blast output file compile unique_genes.cpp: g++ unique_genes.cpp -o unique_genes.exe Run: /unique_genes.exe