SW-Actors: Smith--Waterman with Actors
SW-Actors uses the Actor Model to perform parallelized sequence alignment with Smith–Waterman algorithm across DNA, RNA, or protein sequence pairs. This approach enhances scalability and efficiency for sequence alignment, optimizing computational resources for large datasets. SW-Actors is implemented using the C++ Actor Framework.
Documentation
This guide will cover the essentials for compiling and running SW-Actors. For further details or to report any issues, feel free to open an issue in this repository.
Compiling SW-Actors
Dependencies
SW-Actors has dependency on the C++ Actor Framework, specifically the 1.0.1 release.
The following steps can be used to install the C++ Actor Framework on a Linux system:
wget https://github.com/actor-framework/actor-framework/archive/refs/tags/1.0.1.tar.gz
tar -xzf 1.0.1.tar.gz
cd actor-framework-1.0.1/
./configure --prefix=/path/to/install
cd build
make # [-j] for parallel build
make install # [use root privileges if necessary]
Steps To Compile
To compile SW-Actors:
- Clone the repository:
git clone https://git.cs.usask.ca/numerical_simulations_lab/actors/papers/sw-actors
cd SW-Actors
-
Add the paths in the
Makefile
to your CAF installation path if you used --prefix=/path/to/install option. -
Run:
make
Running SW-Actors
After compilation, SW-Actors can be executed using the provided job scripts.
Modify job scripts to match your system’s configurations, such as paths, resources, and any specific environment modules you may need.
- Use
single.sh
to run on a single node. - Use
multi.sh
to run the code in distributed mode across multiple nodes.
Example Command:
./swActor -D -s -p 4444 -Q ./dataset/BRCA1.fasta -A 40 -m 2 -M -1 -g -2 -R 1 -C 1 --caf.scheduler.max-threads=40
Command-line Options
Option | Flag | Description | Default |
---|---|---|---|
Query Input |
-Q / --query
|
Path to input query sequences | (empty) |
Subject Input |
-S / --subject
|
Path to input subject sequences | (empty) |
Worker Actors |
-A / --actorNumber
|
Number of worker actors per node | 1 |
Match Score |
-m / --match
|
Score for a match | 2 |
Mismatch Penalty |
-M / --mismatch
|
Penalty for a mismatch | -1 |
Gap Penalty |
-g / --gap
|
Penalty for a gap | -2 |
Row Divider |
-R / --devideRow
|
Division factor for scoring matrix rows | 1 |
Column Divider |
-C / --devideCol
|
Division factor for scoring matrix columns | 1 |
Distributed Mode |
-D / --distributedMode
|
Enable distributed execution across multiple nodes | false |
Server Mode |
-s / --serverMode
|
Start the actor system in server mode (listens for incoming connections) | false |
Host |
-H / --host
|
Host address (used in client mode only) | localhost |
Port |
-p / --port
|
Port number for server/client communication | 0 |
Additional CAF options
You can also pass additional CAF runtime options such as:
--caf.scheduler.max-threads=40
to control the thread pool size used by CAF's scheduler.
Credits
The implementation of SW-Actors builds on the foundation provided by the Smith–Waterman algorithm and the C++ Actor Framework. We credit the original creators of the C++ Actor Framework which allowed us to implement the actor model into SW-Actors. Links to their research work can be found below.
-
Charousset, D., Schmidt, T. C., Hiesgen, R., Wählisch, M., 2013: Native actors: a scalable software platform for distributed, heterogeneous environments. AGERE!, doi:10.1145/2541329.2541336.
-
Charousset, D., Schmidt, T. C., Hiesgen, R., 2016: Revisiting actor programming in C++. Computer Languages, Systems & Structures, doi:10.1016/j.cl.2016.01.002