Skip to contents

SiRCle Regulatory Clustering Model



This notebook is meant as a guide for choosing the settings for the Signature Regulatory Clustering (SiRCle) dependent on the biological question of interest. This guide is divided into three main chapters that give an overview on the SiRcle method, each covering one of the main settings in the sircleRCM() functions (Regulatory Clustering Method) namely the “Background method (BG)”, the “Regulation Grouping (RG)” and the thresholds/cutoffs of the input data.

sircleRCM_MRP()
The original Regulatory Clustering Method (RCM) sircleRCM_MRP() was developed to perform biologically meaningful clustering integrating DNA methylation, mRNA, and protein data at the gene level. Based on the differential analysis (e.g. Tumour versus Normal), we defined three states, namely positive, negative or unchanged, for each gene and data layer. Given that the protein expression is the data layer closest to the phenotype, yet at the same time has the lowest coverage, “No Change” is subdivided into four states for the protein data layer, “Not Detected”, “Not Significant”, “significant negative” and “significant positive”. For more details and explanation check out the publication from Mora&Schmidt et al. (Mora et al. 2024).

Fig. 1: Overview of SiRCle Regulatory Clustering. a) Alluvial plot showing the ordered series of three 3-state-3-state-6-state transitions between the data layers, a total of 54 possible “flows”. b) Biological meaning the flows between the data layers will conveye.
Fig. 1: Overview of SiRCle Regulatory Clustering. a) Alluvial plot showing the ordered series of three 3-state-3-state-6-state transitions between the data layers, a total of 54 possible “flows”. b) Biological meaning the flows between the data layers will conveye.


sircleRCM_RP()
It is likely that one only obtained data for the RNA and protein data layer. This is the reason why we also offer the option of the sircleRCM_RP() function, which only uses RNAseq and proteomics as input. The above principle stays the same, yet here we one have two 3-state-6-state transitions between the data layers, a total of 18 possible “flows”.

Fig. 2: Overview of SiRCle Regulatory Clustering without the DNA-Methylation data layer.
Fig. 2: Overview of SiRCle Regulatory Clustering without the DNA-Methylation data layer.



sircleRCM_2Cond()
Lastly, to enable the application of SiRCle RCM to any two data layers of choice, we provide the sircleRCM_2Cond() function. Here one can either input two different data layers, but also the same data layers (e.g. RNAseq comparing KO versus WT in hypoxia and in normoxia). In the above options we take into account that the protein expression is the data layer closest to the phenotype, yet at the same time has the lowest coverage. Here we want to give the full flexibility to the user and hence we have a two 6-state-6-state transitions between the data layers, a total of 36 possible “flows”.

In the next chapters we will discuss how to choose the “Background method (BG)”, the “Regulation Grouping (RG)” and the thresholds/cutoffs of the input data.

1. Background method (BG)

The Background setting defines, which features will be considered for the clusters. For example if you have two data layers, RNAseq and proteomics, you could include only features (= genes) that are detected in both data layers, removing the rest of the features.

Three data layers: Methylation, RNAseq and Proteomics

1.1. - 1.7. goes from most restrictive to least restrictive. Hence the chosen background method will define the number of genes included in the sircleRCM input. Given that the proteomics data has the lowest coverage of the three input data layers (Proteomics, RNAseq and DNA-methylation), the proteomics data layer will have the lowest number of detected genes (=features) and hence has the biggest impact on the number of genes included in the sircleRCM input.

1.1. P&M&R
Most stringend background setting and will lead to a small number of genes that is mainly dependent on the proteins detected (proteomics data layer), since proteomics has the lowest coverage of the three omics.

1.2. P&R
This background ensures to detect a gene in Proteomics and RNAseq. Similar to 1.1, yet less stringent as the gene does not need to be detected in DNA-Methylation.

1.3. P|(M&R)
Focus is on protein expression. Even though it is unlikely that P&M&R (1.1.) or P&R (1.2.) excludes many proteins, a protein could be excluded if it is not detected on the RNAseq and/or DNA-methylation data layer. Hence, this background P | (M&R) (1.2.) ensures to include all of the detected proteins (= all features on the proteomics data layer). On the other hand it will also include genes that are not detected on the protein layer (proteomics data), but on the mRNA (RNAseq data) and on for DNA-methylation (DNA-methylation data).

1.4. (P&M)|(P&R)
This background ensures to detect a gene in Proteomics and one of the other two data layers. Similar to 1.1, yet less stringent as the gene does not need to be detected in RNAseq and DNA-Methylation.

1.5. (P&M)|(P&R)|(M&R)
This background ensures to detect a gene in at least two of the three data layers. Hence, the focus moves away from the protein layer (proteomics data layer) and the number of genes that are not detected on the proteomics data layer will increase in comparison to P & M & R (1.1.) and P | (M&R) (1.2.).

1.6. P|R
A gene will be included in the SiRCle input if it has been detected on one of the two data layers (proteomics or RNAseq).

1.7. P|M|R
Least stringent background method, since a gene will be included in the SiRCle input if it has been detected on one of the three data layers (proteomics, RNAseq or DNA-methylation).

Two data layers: RNAseq and Proteomics

As discussed above, often only RNAseq and Proteomics is conducted, and hence there is the sircleRCM_RP() function focusing on only those two data layers not requiring the DNA-methylation data. In turn the background methods are the following from 1.1. - 1.4. from most restrictive to least restrictive:

1.1. P&R
Most stringend background setting and will lead to a small number of genes that is mainly dependent on the proteins detected (proteomics data layer), since proteomics has the lower coverage compared to RNAseq.

1.2. P
Focus is on protein expression. Even though it is unlikely that P&R (1.1.) excludes many proteins, a protein could be excluded if it is not detected on the RNAseq data layer. Hence, this background ensures to include all of the detected proteins (= all features on the proteomics data layer).

1.3. R
Focus is on mRNA expression. P&R (1.1.) will exclude many genes from the mRNA data layer, hence this background ensures to include all of the detected mRNAs (= all features on the RNAseq data layer).

1.4. P|R
Least stringent background method, since a gene will be included in the SiRCle input if it has been detected on one of the two data layers (proteomics or RNAseq).

Two data sets

Here one can either input two different data layers or two different conditions of the same data layer. In turn the background methods are the following from 1.1. - 1.4. from most restrictive to least restrictive:

1.1. C1&C2
Most stringend background setting and will lead to a small number of genes.

1.2. C1
Focus is on The gene expression of Condition 1 (C1).

1.3. C2
Focus is on The gene expression of Condition 2 (C2).

1.4. C1|C2
Least stringent background method, since a gene will be included in the SiRCle input if it has been detected on one of the two conditions.

2. Input data threshold

Here we can set two different thresholds, one for the differential expression, which would mostly be Log2FC for the RNAseq and proteomics data layer and Differential Methylated Regions (DMR) for DNA-methylation, and one for the significance (e.g. p.adj). This will define if a feature (= e.g. gene) is assigned into:

1. “UP”, which means a feature is significantly up-regulated in the underlying comparison.

2. “DOWN”, which means a feature is significantly down-regulated in the underlying comparison.

3. “No Change”, which means a feature does not change significantly in the underlying comparison and/or is not defined as up-regulated/down-regulated based on the Log2FC threshold chosen.


Given that the protein expression is the data layer closest to the phenotype, yet at the same time has the lowest coverage, “No Change” is subdivided into four states for the protein data layer:

1. “Not Detected”, which means a feature is not detected in the underlying data layer.

2. “Not Significant”, which means a feature is not significant in the underlying comparison.

3. “Significant negative”, which means a feature is significant in the underlying comparison and the differential expression is positive, yet does not meet the threshold set for “UP” (e.g. Log2FC >1 = “UP” and we have a significant Log2FC=0.8).

4. “Significant positive”, which means a feature is significant in the underlying comparison and the differential expression is positive, yet does not meet the threshold set for “DOWN”.


This definition is done individually for each data layer,(proteomics, RNAseq, DNA-methylation) and will impact in which SiRCle cluster a gene is sorted into. The latter will be discussed in detail below (see “3. Regulation Grouping (RG)”).

3. Regulation Grouping (RG)

Based on the background method (BG) we have defined the genes that are included in the SiRCle input and based on our cut-offs (Log2FC and p-adjusted value) we defined the change (“UP”, “DOWN”, “No Change”) a gene is assigned to.
The results of the SirCle RCM includes different Regulation Groupings (RG), which define the SiRCle cluster a gene will be assigned to, therebye summarizing the flows into a smaller amount of SiRCle clusters. In brief, each SiRCle cluster reflects the regulation(s) that ultimately result in the protein expression and the names assigned to the SiRCle clusters reflect the biological meaning, namely “Enhancing (+1)” or “Suppressing (-1)”, of the regulation (see Fig. 1b). Hence, a regulation can happen between DNA methylation and mRNA expression (MR) and/or between mRNA expression and protein expression (RP). Each change of regulation that happens between the data layers will define the SiRCle cluster’s name. For example, if a gene is hypermethylated, has a decrease in mRNA expression, and displays a decrease in protein expression, we can likely conclude dysregulation first occurred on the DNA methylation layer, meaning this gene is suppressed via Methylation-Driven Suppression (MDS).

Fig. 1b: Biological meaning the flows between the data layers will conveye.
Fig. 1b: Biological meaning the flows between the data layers will conveye.


In Figure 1b, we see three different plots and in each plot we have 9 flows connecting the three data layers (DNA methylation, mRNA expression and Protein expression). This gives a total of 27 possible flows.
Given that our regulatory rules have the aim to reflects the regulation(s) that ultimately result in the protein expression, it becomes essential to deal with proteins assigned to “No Change”, since “No Change” on the protein level can mean:
1. “Not Detected”
2. “Not Significant”
3. “Significant negative”
4. “Significant positive”

Depending on the “RG” chosen by the user, different granularities of the regulations are considered, which can lead to SiRCle cluster names reflecting one (MR or RP) or two regulations (MR+RP) (see Table 1).
3.1. RG1: All
Column RG1_All, which includes all 54 possible flows (= ordered series of three 3-state-3-state-6-state transitions between the data layers) as the output and does not summarizes multiple flows into a SiRCle cluster.

3.2. RG2: Focus Changes
Column RG2_Changes, which summarizes the 54 possible flows into 10 SiRCle clusters taking into account any changes between the data layers (Table 1). Here we focus on changes and hence reflect any small change between the data layers by including in in the SiRCle cluster.

3.3. RG3: Focus Translation
Column RG3_Protein, which only take into account regulation between mRNA expression and protein expression (RP). Meaning even if tehre has been also a regulation between the DNA-methylation layer and the RNA layer, this is ignored and the genes are just sorted based on the secondary regulation on the translational level.

3.4 RG4: Focus Detection
Column RG4_Detection, will not take into account changes between mRNA and protein layer if a protein has not been detected. In this case only the regulation between DNA-methylation and RNA is taken into account to define the SiRCle cluster. Of course, if one choses any Background method that enforces the protein to be detected (e.g. M&R&P), Column RG4_Detection will include the same as RG3_Protein as the flow trough “Not Detected” can not occur.

Table 1: Regulatory Grouping (RG) 1-4 for sircleRCM_MRP() function
Methylation RNA-seq Proteomics Proteomics_Detection RG1_All RG2_Changes RG3_Protein RG4_Detection
Hypermethylation DOWN DOWN DOWN Hypermethylation + DOWN + DOWN MDS MDS MDS
Hypomethylation DOWN DOWN DOWN Hypomethylation + DOWN + DOWN TPDS TPDS TPDS
No Change DOWN DOWN DOWN No Change + DOWN + DOWN TPDS TPDS TPDS
Hypermethylation No Change DOWN DOWN Hypermethylation + No Change + DOWN TMDS TMDS TMDS
Hypomethylation No Change DOWN DOWN Hypomethylation + No Change + DOWN TMDS TMDS TMDS
No Change No Change DOWN DOWN No Change + No Change + DOWN TMDS TMDS TMDS
Hypermethylation UP DOWN DOWN Hypermethylation + UP + DOWN TPDE+TMDS TMDS TPDE+TMDS
Hypomethylation UP DOWN DOWN Hypomethylation + UP + DOWN MDE+TMDS TMDS MDE+TMDS
No Change UP DOWN DOWN No Change + UP + DOWN TPDE+TMDS TMDS TPDE+TMDS
No Change No Change No Change No Change No Change + No Change + No Change None None None
Hypermethylation DOWN No Change Not detected Hypermethylation + DOWN + Not detected MDS+TMDE MDS MDS
Hypomethylation DOWN No Change Not detected Hypomethylation + DOWN + Not detected TPDS+TMDE TPDS TPDS
No Change DOWN No Change Not detected No Change + DOWN + Not detected TPDS+TMDE TPDS TPDS
Hypermethylation No Change No Change Not detected Hypermethylation + No Change + Not detected None None None
Hypomethylation No Change No Change Not detected Hypomethylation + No Change + Not detected None None None
No Change No Change No Change Not detected No Change + No Change + Not detected None None None
Hypermethylation UP No Change Not detected Hypermethylation + UP + Not detected TPDE+TMDS TPDE TPDE
Hypomethylation UP No Change Not detected Hypomethylation + UP + Not detected MDE+TMDS MDE MDE
No Change UP No Change Not detected No Change + UP + Not detected TPDE+TMDS TPDE TPDE
Hypermethylation DOWN No Change Not significant Hypermethylation + DOWN + Not significant MDS+TMDE None MDS+TMDE
Hypomethylation DOWN No Change Not significant Hypomethylation + DOWN + Not significant TPDS+TMDE None TPDS+TMDE
No Change DOWN No Change Not significant No Change + DOWN + Not significant TPDS+TMDE None TPDS+TMDE
Hypermethylation No Change No Change Not significant Hypermethylation + No Change + Not significant None None None
Hypomethylation No Change No Change Not significant Hypomethylation + No Change + Not significant None None None
No Change No Change No Change Not significant No Change + No Change + Not significant TPDS+TMDE None None
Hypermethylation UP No Change Not significant Hypermethylation + UP + Not significant TPDE+TMDS None TPDE+TMDS
Hypomethylation UP No Change Not significant Hypomethylation + UP + Not significant MDE+TMDS None MDE+TMDS
No Change UP No Change Not significant No Change + UP + Not significant TPDE+TMDS None TPDE+TMDS
Hypermethylation DOWN No Change Significant negative Hypermethylation + DOWN + Significant negative MDS+TMDE MDS MDS
Hypomethylation DOWN No Change Significant negative Hypomethylation + DOWN + Significant negative TPDS+TMDE TPDS TPDS
No Change DOWN No Change Significant negative No Change + DOWN + Significant negative TPDS+TMDE TPDS TPDS
Hypermethylation No Change No Change Significant negative Hypermethylation + No Change + Significant negative None None None
Hypomethylation No Change No Change Significant negative Hypomethylation + No Change + Significant negative None None None
Hypermethylation UP No Change Significant negative Hypermethylation + UP + Significant negative TPDE+TMDS TMDS TPDE+TMDS
Hypomethylation UP No Change Significant negative Hypomethylation + UP + Significant negative MDE+TMDS TMDS MDE+TMDS
No Change UP No Change Significant negative No Change + UP + Significant negative TPDE+TMDS TMDS TPDE+TMDS
Hypermethylation DOWN No Change Significant positive Hypermethylation + DOWN + Significant positive MDS+TMDE TMDE MDS+TMDE
Hypomethylation DOWN No Change Significant positive Hypomethylation + DOWN + Significant positive TPDS+TMDE TMDE TPDS+TMDE
No Change DOWN No Change Significant positive No Change + DOWN + Significant positive TPDS+TMDE TMDE TPDS+TMDE
Hypermethylation No Change No Change Significant positive Hypermethylation + No Change + Significant positive None None None
Hypomethylation No Change No Change Significant positive Hypomethylation + No Change + Significant positive None None None
Hypermethylation UP No Change Significant positive Hypermethylation + UP + Significant positive TPDE+TMDS TPDE TPDE
Hypomethylation UP No Change Significant positive Hypomethylation + UP + Significant positive MDE+TMDS MDE MDE
No Change UP No Change Significant positive No Change + UP + Significant positive TPDE+TMDS TPDE TPDE
Hypermethylation DOWN UP UP Hypermethylation + DOWN + UP MDS+TMDE TMDE MDS+TMDE
Hypomethylation DOWN UP UP Hypomethylation + DOWN + UP TPDS+TMDE TMDE TPDS+TMDE
No Change DOWN UP UP No Change + DOWN + UP TPDS+TMDE TMDE TPDS+TMDE
Hypermethylation No Change UP UP Hypermethylation + No Change + UP TMDE TMDE TMDE
Hypomethylation No Change UP UP Hypomethylation + No Change + UP TMDE TMDE TMDE
No Change No Change UP UP No Change + No Change + UP TMDE TMDE TMDE
Hypermethylation UP UP UP Hypermethylation + UP + UP TPDE TPDE TPDE
Hypomethylation UP UP UP Hypomethylation + UP + UP MDE MDE MDE
No Change UP UP UP No Change + UP + UP TPDE TPDE TPDE


Lastly, it is worth mentioning that in case where only RNAseq and proteomics data are available and sircleRCM_RP() was used, all the above stays the same, yet all the flows and resulting SiRCle clusters that include DNA-methylation are not available.
Using sircleRCM_2Cond() based on any two input data, will provide RG1_All, RG2_Significant taking into account genes that are significant (UP, DOWN, significant positive, significant negative) and RG3_SignificantChange only takes into account genes that have significant changes (UP, DOWN).



Session information

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.1 (2024-06-14 ucrt)
##  os       Windows 10 x64 (build 19045)
##  system   x86_64, mingw32
##  ui       RTerm
##  language en
##  collate  English_United Kingdom.utf8
##  ctype    English_United Kingdom.utf8
##  tz       Europe/Berlin
##  date     2025-01-15
##  pandoc   3.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  bslib         0.8.0   2024-07-29 [1] CRAN (R 4.4.1)
##  cachem        1.1.0   2024-05-16 [1] CRAN (R 4.4.1)
##  cli           3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
##  colorspace    2.1-1   2024-07-26 [1] CRAN (R 4.4.1)
##  desc          1.4.3   2023-12-10 [1] CRAN (R 4.4.1)
##  digest        0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
##  dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.4.1)
##  evaluate      1.0.1   2024-10-10 [1] CRAN (R 4.4.2)
##  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
##  forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.4.1)
##  fs            1.6.4   2024-04-25 [1] CRAN (R 4.4.1)
##  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.1)
##  ggplot2     * 3.5.1   2024-04-23 [1] CRAN (R 4.4.1)
##  glue          1.7.0   2024-01-09 [1] CRAN (R 4.4.1)
##  gtable        0.3.6   2024-10-25 [1] CRAN (R 4.4.2)
##  hms           1.1.3   2023-03-21 [1] CRAN (R 4.4.1)
##  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
##  htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.4.1)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.4.1)
##  jsonlite      1.8.9   2024-09-20 [1] CRAN (R 4.4.1)
##  kableExtra  * 1.4.0   2024-01-24 [1] CRAN (R 4.4.1)
##  knitr         1.49    2024-11-08 [1] CRAN (R 4.4.2)
##  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
##  lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.4.1)
##  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.1)
##  munsell       0.5.1   2024-04-01 [1] CRAN (R 4.4.1)
##  pillar        1.10.1  2025-01-07 [1] CRAN (R 4.4.1)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.1)
##  pkgdown       2.1.1   2024-09-17 [1] CRAN (R 4.4.1)
##  purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.4.1)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.4.1)
##  ragg          1.3.3   2024-09-11 [1] CRAN (R 4.4.1)
##  readr       * 2.1.5   2024-01-10 [1] CRAN (R 4.4.1)
##  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.4.1)
##  rmarkdown   * 2.29    2024-11-04 [1] CRAN (R 4.4.2)
##  rstudioapi    0.17.1  2024-10-22 [1] CRAN (R 4.4.2)
##  sass          0.4.9   2024-03-15 [1] CRAN (R 4.4.1)
##  scales        1.3.0   2023-11-28 [1] CRAN (R 4.4.1)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.1)
##  stringi       1.8.4   2024-05-06 [1] CRAN (R 4.4.0)
##  stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.4.1)
##  svglite       2.1.3   2023-12-08 [1] CRAN (R 4.4.1)
##  systemfonts   1.1.0   2024-05-15 [1] CRAN (R 4.4.1)
##  textshaping   0.4.0   2024-05-24 [1] CRAN (R 4.4.1)
##  tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.4.1)
##  tidyr       * 1.3.1   2024-01-24 [1] CRAN (R 4.4.1)
##  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.1)
##  tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.4.1)
##  timechange    0.3.0   2024-01-18 [1] CRAN (R 4.4.1)
##  tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.4.1)
##  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.1)
##  viridisLite   0.4.2   2023-05-02 [1] CRAN (R 4.4.1)
##  withr         3.0.2   2024-10-28 [1] CRAN (R 4.4.2)
##  xfun          0.49    2024-10-31 [1] CRAN (R 4.4.2)
##  xml2          1.3.6   2023-12-04 [1] CRAN (R 4.4.1)
##  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.4.1)
## 
##  [1] C:/Users/chris/AppData/Local/R/win-library/4.4
##  [2] C:/Program Files/R/R-4.4.1/library
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Bibliography

Mora, Ariane, Christina Schmidt, Brad Balderson, Christian Frezza, and Mikael Bodén. 2024. “SiRCle (Signature Regulatory Clustering) Model Integration Reveals Mechanisms of Phenotype Regulation in Renal Cancer.” Genome Medicine 16 (1): 144. https://doi.org/10.1186/s13073-024-01415-3.