The goal of NetCoupler is to estimate causal links between a set of -omic (e.g. metabolomics, lipidomics) or other high-dimensional data and an external variable, such as a disease outcome, an exposure, or both. The NetCoupler-algorithm, initially formulated during Clemens’ PhD thesis (Wittenbecher 2017), links a conditional dependency network with an external variable (i.e. an outcome or exposure) to identify network-independent associations between the network variables and the external variable, classified as direct effects.

A typical use case we have in mind would be if a researcher might be interested in exploring potential pathways that exist between a health exposure like red meat consumption, its impact on the metabolic profile, and the subsequent impact on an outcome like type 2 diabetes incidence. So for instance, you want to ask questions to get answers that look like the figure below.

The input for NetCoupler includes:

- Standardized metabolic or other high-dimensional data.
- Exposure or outcome data.
- Network estimating method (default is the PC algorithm (Colombo and Maathuis 2014) from the pcalg package).
- Modeling method (e.g. linear regression with
`lm()`

), including confounders to adjust for.

The final output is the modeling results along with the results from NetCoupler’s classification. Results can then be displayed as a joint network model in graphical format.

There are a few key assumptions to consider before using NetCoupler for your own research purposes.

- -omics data is the basis for the network. We haven’t tested this on non-omics datasets, so can’t guarantee it works as intended.
- The variables used for the metabolic network are numerical
- Metabolic data should have a theoretical network underlying it.
- Missing data are not used in any of the NetCoupler processes.

NetCoupler has several frameworks in mind:

- Works with magrittr
`%>%`

or base R`|>`

operator. - Works with tidyselect
helpers (e.g.
`starts_with()`

,`contains()`

). - Is auto-complete friendly (e.g. start function names with
`nc_`

). - Inputs and outputs of functions are tibbles/dataframes or tidygraph tibbles.
- Generic modeling approach by using model and settings as function
argument inputs.
- This allows flexibility with what model can be used (e.g. linear regression, cox models).
- Almost all functionality of modeling in R is available here, for instance handling of missing data or of categorical variables.

The general workflow for using NetCoupler revolves around several main functions, listed below as well as visualized in the figure below:

`nc_standardize()`

: The algorithm in general, but especially the network estimation method, is sensitive to the values and distribution of the variables. Scaling the variables by standardizing, mean-centering, and natural log transforming them are important to obtaining more accurate estimations.`nc_estimate_network()`

: Estimate the connections between metabolic variables as a undirected graph based on dependencies between variables. This network is used to identify metabolic variables that are connected to each other as neighbours.- We plan on implementing other network estimators aside from the PC-algorithm at some point in the future.

`nc_estimate_exposure_links()`

and`nc_estimate_outcome_links()`

: Uses the standardized data and the estimated network to classify the conditionally independent relationship between each metabolic variable and an external variable (e.g. an outcome or an exposure) as either being a direct, ambiguous, or no effect relationship.- Setting the threshold for classifying effects as direct, ambigious,
or none is done through the argument
`classify_option_list`

. See the help documentation of the estimating functions for more details. For larger datasets, with more sample size and variables included in the network, we*strongly*recommend lowering the threshold used to reduce the risk of false positives.

- Setting the threshold for classifying effects as direct, ambigious,
or none is done through the argument
`nc_join_links()`

:**Not implemented yet.**Join together the exposure- and outcome-side estimated links.`nc_plot_network()`

:**Not implemented yet.**Visualize the connections estimated from`nc_estimate_network()`

.`nc_plot_links()`

:**Not implemented yet.**Plots the output results from either`nc_estimate_exposure_links()`

,`nc_estimate_outcome_links()`

, or`nc_join_links()`

.