## Main Functions

`FusionLearn`

package provides three functions `fusionbase`

, `fusionbinary`

, and `fusionmixed`

, which can be employed to combine continuous, binary, or mixtures of continuous and binary responses from different platforms.

### Fusion Learning Algorithm

#### Continuous Responses

`fusionbase`

is the function applied to the datasets with all continuous response variables. In the function `fusionbase`

, users can set the penalty parameter `lambda`

and choose the penalty functions such as `methods = scad`

or `methods = lasso`

. The function input also include \(N\), the sample sizes for each platform and \(p\), the number of the predictors and \(m\), the number of the platforms. If the sample sizes are all equal, \(N\) is given as a single value. If the sample sizes are different, \(N\) is specified as a vector. If some predictors are only measured in some but not all of the platforms, then the datasets do not have complete information and users can specify the option `Complete = FALSE`

. Detailed examples about this type of missing information are shown in Section 4.2.

```
result <- fusionbase(x, y, lambda = 0.9, N = N ,p = p, m = m, methods = "scad",
Complete = TRUE)
print(result) ## partial estimated parameters
```

```
## $beta
## [,1] [,2] [,3] [,4]
## [1,] 2.166467 3.040631 7.785279 8.627343
## [2,] 3.981514 4.026194 13.773084 11.871934
## [3,] 3.374323 3.396763 11.606946 9.867747
## [4,] 4.943008 5.614259 17.270441 15.877448
## [5,] 4.262251 5.094297 17.604539 16.271033
## [6,] 0.000000 0.000000 0.000000 0.000000
## [7,] 0.000000 0.000000 0.000000 0.000000
## [8,] 0.000000 0.000000 0.000000 0.000000
## [9,] 0.000000 0.000000 0.000000 0.000000
## [10,] 0.000000 0.000000 0.000000 0.000000
##
## $method
## [1] "scad"
##
## $threshold
## [1] 0.04598454
##
## $iteration
## [1] 10
```

The result of the algorithm yields 5 non-zero coefficient vectors and 5 zero coefficient vectors for the 10 predictors. Based on the penalized estimation results, the predictors with non-zero regression coefficients are selected as important predictors. Users can further use the model selection function `fusionbase.fit`

in this package to obtain the value of the pseudo likelihood informaiton criterion for the model.

The algorithm outputs include `beta`

, `method`

, `iteration`

, and `threshold`

. In the output `beta`

, the estimated coefficients are listed. For the example above, the algorithm correctly selects all the five true non-zero coefficients. The figures show the non-zero coefficients and the linear models are well fitted with selected predictors.

#### Binary Responses

If the responses from different platforms are all binary variables, users can use the function `fusionbinay`

. Most function inputs are similar to the inputs of the function `fusionbase`

. Users can specify the link functions `link = "logit"`

or `link = "probit"`

for the binary response variables.

```
result <- fusionbinary(x, y.bin, lambda = 0.15, N = N, p = p, m = m, methods = "scad",
link = "logit")
print(result)
```

```
## $beta
## [,1] [,2] [,3] [,4]
## [1,] 0.03466414 0.02843913 0.04444378 0.05617795
## [2,] 2.05486607 2.01779142 1.13098065 1.59532191
## [3,] 0.17717603 0.10475392 0.16700634 0.13813861
## [4,] 2.13573764 2.08503177 2.24617532 1.81413152
## [5,] 1.24689136 1.11981241 1.84452987 1.91821936
## [6,] 0.00000000 0.00000000 0.00000000 0.00000000
## [7,] 0.00000000 0.00000000 0.00000000 0.00000000
## [8,] 0.00000000 0.00000000 0.00000000 0.00000000
## [9,] 0.00000000 0.00000000 0.00000000 0.00000000
## [10,] 0.00000000 0.00000000 0.00000000 0.00000000
##
## $method
## [1] "scad"
##
## $link_fn
## [1] "logit"
##
## $threshold
## [1] 0.09549591
##
## $iteration
## [1] 12
```

```
##
## FALSE TRUE
## FALSE 5 0
## TRUE 0 5
```

#### Mixed-type Responses

If the responses across multiple platforms contain both binary and continuous types, users can use the function `fusionmixed`

. Besides all the inputs similarly required by `fusionbase`

, the function requires the specification of the numbers of the platforms for each type of reponse (`m1`

: the number of the platforms with continuous responses; `m2`

: the number of platforms with binary responses). The `link`

option is used to specify the link function for the binary response variables.

```
result <- fusionmixed(x, y.mixed, lambda = 0.4, N = N, p = p, m1 = 2, m2 = 2, methods
= "scad", link = "logit")
print(result)
```

```
## $beta
## [,1] [,2] [,3] [,4]
## [1,] 2.167506 3.030309 0.8995928 1.217401
## [2,] 3.981460 4.036748 1.5948099 1.790453
## [3,] 3.374642 3.424883 1.1906511 1.041814
## [4,] 4.943319 5.613636 2.5516305 2.036690
## [5,] 4.261845 5.083497 2.2867149 1.988079
## [6,] 0.000000 0.000000 0.0000000 0.000000
## [7,] 0.000000 0.000000 0.0000000 0.000000
## [8,] 0.000000 0.000000 0.0000000 0.000000
## [9,] 0.000000 0.000000 0.0000000 0.000000
## [10,] 0.000000 0.000000 0.0000000 0.000000
##
## $method
## [1] "scad"
##
## $link_fn
## [1] "logit"
##
## $threshold
## [1] 0.09406964
##
## $iteration
## [1] 12
```

```
##
## FALSE TRUE
## FALSE 5 0
## TRUE 0 5
```