Notebook

Example merging DMCs and DMRs

This notebook provides an example on merging files from Methylkit (DMC’s) and DMRs from Methylsig.

[1]:
import pandas as pd
from sciutil import SciUtil
from scidmg import SciDMG
import os
data_dir = f'{os.path.dirname(os.path.realpath("."))}/../../tests/data/'
dmr_file = f'{data_dir}methylSig_prom.csv'
dmc_file = f'{data_dir}methylKit_DMC.csv'
percent_meth_file = f'{data_dir}methylKit_percentMeth.csv'

u = SciUtil()

Inspect files

Just to give you an idea of what the files as input are we print out the head of each input file.

Note that each file has the same column name for the gene symbol external_gene_name. This will be used to merge the files.

[2]:
u.dp(['DMR file header:'])
print(pd.read_csv(dmr_file))

u.dp(['DMC file header:'])
print(pd.read_csv(dmc_file))

u.dp(['Percentage methylation file header:'])
print(pd.read_csv(percent_meth_file))

--------------------------------------------------------------------------------
                               DMR file header:                                      
--------------------------------------------------------------------------------
   idx seqnames  start  end  gene_idx  meth_diff     uid  pvalue    fdr  \
0    1     chr1     10  100         1       -0.6   dmr_1   0.001  0.010
1    2     chr1    123  190         2       -0.1   dmr_2   0.010  0.300
2    3     chr1      3  200         3        0.8   dmr_3   0.010  0.010
3    4     chr1    123  312         4       -0.1   dmr_4   0.010  0.010
4    5     chr1    123  190         2       -0.1   dmr_5   0.010  0.001
5    6     chr1    123  190         2        0.3   dmr_6   0.010  0.010
6    7     chr1    123  190         2       -0.2   dmr_7   0.010  0.010
7    3     chr1      3  200         3       -0.9  dmr_10   0.010  0.010

   ensembl_gene_id external_gene_name chromosome_name  start_position  \
0  ENSG00000278267         AC114488.2            chr1               1
1  ENSG00000116273              PHF13            chrX               2
2  ENSG00000249087         ZNF436-AS1            chr3               3
3  ENSG00000278267         AC114488.2            chr1               1
4  ENSG00000116273              HOXA1            chr4               2
5  ENSG00000116273              HOXA1            chr4               2
6  ENSG00000116273              HOXB9            chr8               2
7  ENSG00000249087         ZNF436-AS1            chr3               3

   end_position  strand
0            30      -1
1            12       1
2           431       1
3           231       1
4            12       1
5            12       1
6            12       1
7           431       1
--------------------------------------------------------------------------------
                               DMC file header:                                      
--------------------------------------------------------------------------------
   idx   chr  start  end  gene_idx  meth.diff     uid  pvalue  qvalue  \
0    1  chr1      1    1         1       -0.6   dmc_1   0.001   0.010
1    2  chrX      1    1         2       -0.1   dmc_2   0.010   0.300
2    3  chr3      1    1         3        0.1   dmc_3   0.010   0.010
3    4  chr1      1    1         4       -0.1   dmc_4   0.010   0.010
4    5  chr1      1    1         5       -0.3   dmc_5   0.010   0.010
5    6  chr1      1    1         6        0.0   dmc_6   0.010   0.010
6    7  chr1      1    1         7        2.0   dmc_7   0.010   0.010
7    5  chr1      1    1         9       -0.1   dmc_8   0.010   0.001
8    3  chr3      1    1         3       -0.7  dmc_10   0.010   0.010

   ensembl_gene_id external_gene_name chromosome_name  start_position  \
0  ENSG00000278267         AC114488.2            chr1               1
1  ENSG00000116273              PHF13            chrX               2
2  ENSG00000249087         ZNF436-AS1            chr3               3
3  ENSG00000278267         AC114488.2            chr1               1
4  ENSG00000278267         AC114488.2            chr1               2
5  ENSG00000278267         AC114488.2            chr1               3
6  ENSG00000278267         AC114488.2            chr1               4
7  ENSG00000116273              HOXA1            chr4               2
8  ENSG00000249087         ZNF436-AS1            chr3               3

   end_position  strand
0             1      -1
1             2       1
2             3       1
3             1       1
4             3       1
5             3       1
6             4       1
7            12       1
8             3       1
--------------------------------------------------------------------------------
                      Percentage methylation file header:                          
--------------------------------------------------------------------------------
   idx   chr  start  end  gene_idx     uid  WT_1  WT_2  WT_3  KO_1  KO_2  \
0    1  chr1      1    1         1   dmc_1  0.33  0.18  0.53  0.11  0.59
1    2  chrX      1    1         2   dmc_2  0.12  0.47  0.22  0.12  0.42
2    3  chr3      1    1         3   dmc_3  0.06  0.21  0.43  0.04  0.39
3    4  chr1      1    1         4   dmc_4  0.72  0.75  0.11  0.76  0.17
4    5  chr1      1    1         5   dmc_5  0.91  0.11  0.24  0.12  0.49
5    6  chr1      1    1         6   dmc_6  0.75  0.84  0.59  0.33  0.44
6    7  chr1      1    1         7   dmc_7  0.98  0.90  0.64  0.18  0.06
7    7  chr1      1    1         7   dmc_8  0.98  0.90  0.64  0.18  0.06
8    3  chr3      1    1         3  dmc_10  0.06  0.21  0.43  0.04  0.39

   KO_3  ensembl_gene_id external_gene_name chromosome_name  start_position  \
0  0.50  ENSG00000278267         AC114488.2            chr1               1
1  0.86  ENSG00000116273              PHF13            chrX               2
2  0.25  ENSG00000249087         ZNF436-AS1            chr3               3
3  0.89  ENSG00000278267         AC114488.2            chr1               1
4  0.23  ENSG00000278267         AC114488.2            chr1               2
5  0.73  ENSG00000278267         AC114488.2            chr1               3
6  0.51  ENSG00000278267         AC114488.2            chr1               4
7  0.51  ENSG00000116273              HOXA1            chr4               2
8  0.25  ENSG00000249087         ZNF436-AS1            chr3               3

   end_position  strand
0             1      -1
1             2       1
2             3       1
3             1       1
4             3       1
5             3       1
6             4       1
7            12       1
8             3       1

Override the default parameters

Here we just show how some of the parameters can be overridden.

For example, we have made this a bit stricter than the defaults and required a cutoff for DMC’s of 0.05.

We also make the DMRs have at least 70% of DMCs in agreement of direction.

[3]:
dmg = SciDMG(dmr_file, percent_meth_file, dmc_file,
           dmc_methdiff="meth.diff", dmc_padj="qvalue", dmc_uid="uid", dmc_padj_cutoff=0.05,
           dmr_methdiff="meth_diff", dmr_padj="fdr", dmr_uid="uid", dmr_padj_cutoff=0.1,
            min_perc_agreement=0.7, min_meth_diff=0.1, plot=True
        )
dmg.run()
dmg.print_stats()
--------------------------------------------------------------------------------
              Please run dmg.run() before plotting the volcanos.                  
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
              Please run dmg.run() before plotting the volcanos.                  
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
WARNING: Running merge_dmc_perc_meth which assumes that you have run the DMC analysis using MethylKit.     
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
                  Length of all merged methylation data:     16                         
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
            Length of merged methylation data grouped by region:     6                   
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
                Number of CpGs to keep based on the regions:         2                       
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Length of filtered methylation dataframe:    3
Number of genes with Methylation:       2       
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
                 Dropping any genes with disagreeing DMRs:   0                         
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
     Length of dataframe filtered to only keep top DMC mapped to genes:      2             
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
                               Printing stats:                                       
--------------------------------------------------------------------------------
Length grouped by DMRs 6
Number of CpGs to keep from grouped DMRs 2
Length of merged DMR and DMC 16
Length grouped by Genes 2
Number of Genes with DMRs that disagree in direction 0

Plot volcanos

We may want to inspect our files as well, so we can plkot a volcano of the DMcs and the DMRs.

[4]:
dmg.plot_dmc_volcano()
dmg.plot_dmr_volcano()

--------------------------------------------------------------------------------
No offset was provided, setting offset to be smallest value recorded in dataset:   0.01    
--------------------------------------------------------------------------------
../_images/examples_python_notebook_7_1.png
--------------------------------------------------------------------------------
No offset was provided, setting offset to be smallest value recorded in dataset:   0.01    
--------------------------------------------------------------------------------
../_images/examples_python_notebook_7_3.png