Develop your own module¶
There are three types of modules that can be developed in CloudConductor:
- Tool - represents a tool that can have one or multiple functions, represented as submodules
- Splitter - represents a tool that splits one input data entity into multiple chunks of data of the same type
- Merger - represents a tool that merges chunks of data of the same type, into one output data entity
Tool¶
To develop a new Tool, you will need to create a new actual Python module in the CloudConductor’s directory Modules/Tools
with
the name you are interested to develop. Then, for each task that the new tool performs, create a class that extends Modules/Module
.
Let’s name our new tool as NewTool
and its subcommand/task as Subcommand
. In this case, the Python module
Modules/Tools/NewTool.py
should look as following:
from Modules import Module
class Subcommand(Module):
def __init__(self, module_id, is_docker=False):
"""
Initialize the new Subcommand class.
Args:
module_id (string) - the unique ID generated by CloudConductor for this object
is_docker (boolean) - the current module should return a docker specific command
"""
super(Subcommand, self).__init__(module_id, is_docker)
# Define list of output_keys the command will generate data for
self.output_keys = ["output_key1", "output_key2", "output_key3"]
def define_input(self):
"""
Define the input of the subcommand
"""
pass
def define_output(self):
"""
Define the output of the subcommand
"""
pass
def define_command(self):
"""
Generate the actual command
"""
pass
In the new Subcommand
class constructor, you should extend the base class Module
and specify what are the output keys that the subcommand is generating.
In the define_input()
method you should use the inherited method self.add_argument()
to define any input key.
An input key has three properties that can be set with the self.add_argument()
method:
- is_required - sets if the input_key is mandatory (False by default)
- is_resource - sets if the input_key represents a resource to be searched in resource kit (False by default)
- default_value - a default value for the input_key, in case it never gets set (None by default)
For example:
def define_input(self):
self.add_argument("R1", is_required=True)
self.add_argument("R2")
self.add_argument("bwa", is_required=True, is_resource=True)
self.add_argument("samtools", is_required=True, is_resource=True)
In the define_output()
method you should use the inherited method self.add_output()
to define any output_key.
You can use self.get_argument()
method to obtain any of the arguments value.
An output key has two properties that can be set with the self.add_output()
method:
- value - represents the actual value of the output key. If file, you can use the inherited method
self.generate_unique_file_name()
to obtain a unique file name for a generated output file - is_path - sets if the value is a path (i.e. file or directory).
For example:
def define_output(self):
bam_output = self.generate_unique_file_name(extension=".bam")
self.add_output("bam", bam_output)
In the define_command()
method you should expect that both the input and output keys are already associated with the correct values.
If you need to obtain the value of an input key use self.get_argument()
method.
If you need to obtain the value of an output key use self.get_output()
method.
The method define_command()
should return an actual command.
For example:
def define_command(self):
R1_fastq = self.get_argument("R1")
R2_fastq = self.get_argument("R2")
bwa = self.get_argument("bwa")
samtools = self.get_argument("samtools")
bam_output = self.get_output("bam")
return "%s -M %s %s !LOG2! | %s view > %s !LOG2!" % (bwa, R1_fastq, R2_fastq, samtools, bam_output)
Note: When generating the command, you can use the following placeholders and CloudConductor will create a log file for you:
- “!LOG0!” - pipes the stdout and strerr to /dev/null
- “!LOG1!” - pipes only the stdout to a log file that will be available after the module finished running
- “!LOG2!” - pipes only the stderr to a log file that will be available after the module finished running
- “!LOG3!” - pipes both the stdout and the stderr to a log file that will be available after the module finished running
Example command with placeholders: “tool1 !LOG2! | tool2 !LOG2! | tool3 !LOG3!”
Splitter¶
There are only two differences between the way splitters and tools are created.
First different is that to create a splitter you will need to extend the Modules/Splitter
abstract class instead of Modules/Module
.
Second difference is that the output of a tool is a list of output keys associated with values, while the output of a splitter
if a list of splits, each split having a list of output_keys associated with values. Consequently, every output key has an additional
property and that is split_id, the ID of the split it is associated with. In order to define a new split ID, you will need to
call the self.make_split()
method and then associate any output key to the newly created split id.
For example:
def define_output(self):
nr_splits = self.get_argument("nr_splits")
for split_ID in xrange(nr_splits):
self.make_split(split_ID)
self.add_output(split_id=split_ID, key="square", value=split_ID**2, is_path=False)
self.add_output(split_id=split_ID, key="cube", value=split_ID**3, is_path=False)
Merger¶
There is only one difference between the way mergers and tools are created. The difference if that you will need to extend
the Module/Merger
abstract class instead of Modules/Module
. Other than that, the whole logic is similar.