SyntenyLink 🧬
Table of Contents 📚
Overview 📖
===========
The SyntenyLink package has six major components: the SyntenyLink algorithm allows users to handle reconstruct subgenomes of polyploid species more conveniently and to separate the set of genes belong to each subgenome in the organism with the aid of reference proteomes of polyploid species and related ancestor. 🌱
For more details access our published work here: https://ieeexplore.ieee.org/document/10385622
All programs are executed using command line options on Linux systems or Mac OS. Usage or help information are well built into the programs. 💻
All code is copiable, distributable, modifiable, and usable without any restrictions.
Requirements 🛠️
=============
To use SyntenyLink, ensure you have the following requirements and python packages:
Python packages:
- Python
- biopython
- ipython
- matplotlib
- numpy
- pandas
- seaborn
- pickle
- csv
- os
- math
- sys
- re
- warnings
Other tools:
- makeblastdb
- blastall
- dagchainer
Installation ⚙️
=============
- Clone this repository to your local machine:
git clone https://git.cs.usask.ca/qnm481/syntenylink.git
cd syntenylink
- Install the required dependencies:
pip install -r requirements.txt
How to use SyntenyLink 🚀
=============
- Reproduce all the experiments:
i. Run SyntenyLink.sh
./SyntenyLink.sh ref_pep.fasta ref_cds.fasta query_pep.fasta query_cds.fasta query.gff3 ref.gff3 ref_genelist.txt query.bed -n <number of subgenomes> -s <ploidy status> -chr1 <query chromosome number for subgenome1> -p <gene prefix>
More information 🚀
=============
Utilize this repository to replicate our experiments and explore the functionalities of SyntenyLink. The codebase is organized to help you easily navigate through different components and reproduce our results.
The following is the list of executable programs
Usage
=============
Parameter and command examples are shown below.
-
SyntenyLink.sh
- -i Input collinear file
- -s Ploidy status. If diploid 2, tetraploid 4, hexaploid 6, octaploid 8
- -p Gene prefix (for example 'G' is the gene prefix for this gene AT1G01010; where we could separate the chromsome number part from the integer part)
- -gt Groundtruth subgenome separation. Default is None
- -bed Query bed file
- -chr Chromosome number for each subgenome. For example in Brassica napus chr1 is 10 and chr2 is 9
- -n number of subgenomes.
- -sub Prefix for subgenome chromosome name. For example in Brassica napus sub1 is A and sub2 is C
- -dag Collinear file created with blastn output
- To create ref_genelist.txt file:
- First column: Gene ID.
- Second column: Gene ID with version number.
- Third column: Amino acid length.
- Fourth column: A combination of the fifth and sixth columns, joined by an underscore.
- Fifth column: Chromosome of the reference genome.
- Sixth column: Numbers in order.
- Seventh column: Start position of the gene.
- Eighth column: End position of the gene.
- Ninth column: Gene ID with version number.
- Tenth column: Functional path of the gene (use "None" if not found).
Refer to the ath_genes.txt file in data folder
** Note: Use the prefix 'Chr' to name chromosomes in ancestral reference genome and prefix that doesn't start with 'C' to name chromosomes in query genome (the prefix that you will use for -sub).
Tested species
=============
- Brassica rapa
- Brassica oleracea
- Brassica nigra
- Brassica napus
- Brassica carinata
- Brassica juncea
- Sinapis alba
- Cercis canadensis
Contact 📬
===============
For any questions or inquiries, please feel free to open an issue on our repository or contact us at qnm481@usask.ca.
License 📜
===============
This project is licensed under the MIT License