CLI instructions for running spi-epi2gene.


The examples cover some simple case where we map 1) a bed file with genes that overlap (e.g. H3K36me3) and 2) a bed file with peaks we would expect to land in the promoter (H3K27ac). Lastly, we show how a DMRseq file would be annotated to genes.

Genes are assigned to a peak if the peak overlaps with 2500 (default) upstream of the TSS or 500 (default) base pairs on the gene body. The peak could overlap with any part of this region and it will be assigned to the gene. Peaks can be assigned to multiple genes i.e. if we have a broad peak then it could be assigned to many genes (i.e. H3K27me3).

Annotate gene body overlapping peaks

Here we read in the file test_H3K27me3.bed and annotate it to peaks that fall in the promoter region of genes annotated in hsapiens_gene_ensembl-GRCh38.p13.csv (generated using sci-biomart). Genes are assigned to a peak if the peak overlaps with 2500 upstream of the TSS or 500 base pairs from the gene end.

scie2g --a data/hsapiens_gene_ensembl-GRCh38.p13.csv --o data/output_file.csv --l2g data/test_H3K36me3.bed --t b --upflank 3000 --downflank 500 --m overlaps

Annotate promoter region peaks

Here we read in the file test_H3K27me3.bed and annotate it to peaks that fall in the promoter region of genes annotated in hsapiens_gene_ensembl-GRCh38.p13.csv (generated using sci-biomart). Genes are assigned to a peak if the peak overlaps with 2500 upstream of the TSS or 500 base pairs on the gene body. The peak could overlap with any part of this region and it will be assigned to the gene.

scie2g --a data/hsapiens_gene_ensembl-GRCh38.p13.csv --o data/output_file.csv --l2g data/test_H3K27me3.bed --t b --upflank 2500 --overlap 500 --m in_promoter

Annotate DMRseq regions (CSV) to genes

Here we have had to override the column: ‘chr’ with ‘seqnames’ seen with the tag –chr and the ‘value’ term, with ‘stat’ .. code-block:: bash

scie2g –a data/hsapiens_gene_ensembl-GRCh38.p13.csv –o data/output_file.csv –l2g data/test_dmrseq.csv –t d –upflank 2500 –m overlaps –chr seqnames –value stat

Annotate MethylKit DMCs (CSV) to genes

scie2g –a data/hsapiens_gene_ensembl-GRCh38.p13.csv –o data/output_file.csv –l2g data/test_H3K27me3.bed –t b –upflank 2500 –overlap 500 –m in_promoter

scie2g --a data/hsapiens_gene_ensembl-GRCh38.p13.csv --o data/output_file.csv --l2g data/test_methyl.csv --t d --upflank 2500 --m overlaps --value meth.diff



usage: scie2g [-h] [--a A] [--o O] [--b B] [--l2g L2G] [--t T] [--upflank UPFLANK] [--downflank DOWNFLANK] [--overlap OVERLAP] [--m M] [--chr CHR] [--start START] [--end END] [--value VALUE] [--hdr HDR] [--chridx CHRIDX] [--startidx STARTIDX] [--endidx ENDIDX] [--valueidx VALUEIDX] [--hdridx HDRIDX] [--hdrlbl HDRLBL] [--gchr GCHR] [--gstart GSTART]
              [--gend GEND] [--gdir GDIR] [--gname GNAME]

Named Arguments


Annotation with the gene locations


Output file (csv)

Default: “l2g_outputfile.csv”


Output file (bed)

Default: “l2g_outputfile.bed”


Input file to run scie2g on


The input file type: d=CSV, b=Bed

Default: “b”


Maximum distance upstream from TSS (default = 2500) for overlaps and in_promoter

Default: 2500


Maximum distance downstream from gene end (default = 500) only used in overlaps

Default: 500


Overlap with gene body (default = 500) used in in_promoter

Default: 500


Overlap method (overlaps or in_promoter <- default).

Default: “in_promoter”


CSV only: name of your chromosone column

Default: “chr”


CSV only: name of your start column

Default: “start”


CSV only: name of your end column

Default: “end”


CSV only: name of your value column


CSV only: comma separated list of other columns you want to include in the output e.g “stat,pvalue”

Default: “”


BED only: index of your chromosone column

Default: 0


BED only: index of your start column

Default: 1


BED only: index of your end column

Default: 2


BED only: index of your value column

Default: 7


BED only: comma separated list of indexs

Default: “0,1,2,3,6,8”


BED only: comma separated list of header in human readable format as output to your csv file.

Default: “”chr”,”start”,”end”,”peak_name”,”signal”,”qvalue””


Position in annotation file that your chr annotation is.

Default: 2


Position in annotation file that your start is.

Default: 3


Position in annotation file that your end is.

Default: 4


Position in annotation file that your gene direction is.

Default: 5


Position in annotation file that gene name is.

Default: 0