1 Illustration of PCDs on an Artificial 1D Dataset

This data set consists of simulated points from two classes, $\mathcal{X}$ and $\mathcal{Y}$, where $\mathcal{X}$ points are uniformly distributed on the interval $[a,b]=[0,10]$, while $\mathcal{Y}$ points are chosen at approximately regular distances for better illustration. Here $n_x$ is the size of class $\mathcal{X}$ points, $n_y$ is the size of class $\mathcal{Y}$ points, and for better illustration of certain structures and graph constructs.

a<-0; b<-10; int<-c(a,b)
#nx is number of X points (target) and ny is number of Y points (nontarget)
nx<-10; ny<-5; #try also nx<-40; ny<-10 or nx<-1000; ny<-10;

xf<-(b-a)*.1
set.seed(11)
Xp<-runif(nx,a-xf,b+xf)
Yp<-runif(ny,-1,1)*(b-a)/(10*ny)+ ((b-a)/(ny-1))*(0:(ny-1)) #try also Yp<-runif(ny,a,b)

We take $n_x=$ 10 and $n_y=$ 5 (however, one is encouraged to try the specifications that follow in the comments after “#try also” in the commented out script here and henceforth.) More specifically, $\mathcal{Y}$ points are generated as $Y_i = a + U$ for $a = 0.0, 2.5, 5.0, 7.5, 10.0$ and $U \sim \text{Uniform}(-.25,.25)$ to provide jitter around $a$ values. $\mathcal{X}$ points are denoted as Xp and $\mathcal{Y}$ points are denoted as Yp in what follows.

The scatterplot of $\mathcal{X}$ and $\mathcal{Y}$ points on the real line can be obtained by the below code; $y$-axis is added for better visualization.

XYpts =c(Xp,Yp) #combined Xp and Yp
lab=c(rep(1,nx),rep(2,ny))
lab.fac=as.factor(lab)
plot(XYpts,rep(0,length(XYpts)),col=lab,pch=lab,xlab="x",ylab="",ylim=.005*c(-1,1),
     main="Scatterplot of 1D Points from Two Classes")

The PCDs are constructed with vertices from $\mathcal{X}$ points and Delaunay triangulation of $\mathcal{Y}$ points.

The PCDs in the 1D case are constructed with vertices from $\mathcal{X}$ points and the binary relation that determines the arcs are based on proximity regions which depend on the intervals whose end points are the ordered $\mathcal{Y}$ points (which is the Delaunay tessellation of $\mathcal{Y}$ points in $\mathbb{R}$). More specifically, the proximity regions are defined with respect to the Delaunay cells (i.e., intervals) based on the order statistics of the $\mathcal{Y}$ points and vertex regions in each interval are based on the center $M_c=a+c\,(b-a)$ for the interval $[a,b]$ where $c \in (0,1)$. That is, Delaunay tessellation of $\mathcal{Y}$ points provides an interval partitioning of the range of $\mathcal{Y}$ points based on the order statistics of the $\mathcal{Y}$ points.

The convex hull of $\mathcal{Y}$ points (i.e., the interval $\left[\mathsf{y}_{(1)},\mathsf{y}_{(m)}\right]$) is partitioned by the intervals based on the ordered $\mathcal{Y}$ points (i.e., multiple intervals are the set of these intervals whose union constitutes the range of $\mathcal{Y}$ points).

Below we plot the $\mathcal{X}$ points together with the intervals based on $\mathcal{Y}$ points.

Xlim<-range(Xp)
Ylim<-.005*c(-1,1)
xd<-Xlim[2]-Xlim[1]
plot(Xp,rep(0,nx),xlab="x", ylab=" ",xlim=Xlim+xd*c(-.05,.05), yaxt='n',
     ylim=Ylim,pch=".",cex=3,main="X Points and Intervals based on Y Points")
abline(h=0,lty=2)
#now, we add the intervals based on Y points
par(new=TRUE)
plotIntervals(Xp,Yp,xlab="",ylab="",main="")

The plot of the $X$ points (black circles) in the artificial data set together with the intervals (blue rounded brackets) based on $Y$ points (red circles).

Figure 1.1: The plot of the $X$ points (black circles) in the artificial data set together with the intervals (blue rounded brackets) based on $Y$ points (red circles).

Or, alternatively, we can use the plotIntervals function in pcds to obtain the same plot by executing plotIntervals(Xp,Yp,xlab="",ylab="") command.

1.1 Summary and Visualization with Proportional Edge PCDs

PE proximity regions are defined with respect to the intervals based on $\mathcal{Y}$ points and vertex regions in each interval are based on the centrality parameter c in $(0,1)$. For PE-PCDs, the default centrality parameter used to construct the vertex regions is c=.5 (which gives the center of mass of each interval). The range of $\mathcal{Y}$ is partitioned by the intervals based on the order statistics of (i.e., sorted) $\mathcal{Y}$ points (i.e., multiple intervals are the set of these intervals whose union constitutes the range (or convex hull) of $\mathcal{Y}$ points).

See Ceyhan (2012) for more on PE-PCDs for 1D data.

Number of arcs of the PE-PCD can be computed by the function num.arcsPE1D which is an object of class “NumArcs” and takes the arguments

Xp, a set or vector of 1D points which constitute the vertices of the PE-PCD,
Yp, a set or vector of 1D points which constitute the end points of the partition intervals,
r, a positive real number which serves as the expansion parameter in PE proximity region; must be $\ge 1$.
c, a positive real number in $(0,1)$ parameterizing the center inside the middle (partition) intervals with the default c=.5. For an interval, $(a,b)$, the parameterized center is $M_c=a+c(b-a)$.

Its call (with Narcs in the below script) just returns the description of the digraph. Its summary returns a description of the digraph, number of arcs of the PE-PCD, number of data (Xp) points in the range of Yp (nontarget) points, number of data points in the partition intervals based on Yp points, numbers of arcs in the induced subdigraphs in the partition intervals, lengths of the partition intervals, end points of the vertices of the partition intervals, indices of the partition intervals data points resides. The plot function (i.e., plot.NumArcs) returns the plot of the partition intervals of Yp points, scatter plot of the Xp points and the number of arcs of the induced subdigraphs for each partition interval in the centroid of the interval.

This function returns the list of

res<-list(desc=desc, #description of the output
        num.arcs=narcs, #number of arcs for the entire PCD
        int.num.arcs=arcs, #vector of number of arcs for the partition intervals
        num.in.range=nx2, #number of Xp points in the range of Yp points
        num.in.ints=ni.vec, #number of Xp points in the partition intervals
        weight.vec=Wvec, #lengths of the middle partition intervals
        partition.intervals=t(part.ints), #matrix of the partition intervals, each column is one interval
        data.int.ind=int.ind, #indices of partition intervals in which data points reside, i.e., column number of part.int for each Xp point
        tess.points=Yp, #tessellation points
        vertices=Xp #vertices of the digraph

desc: A description of the PCD and the output
num.arcs: Total number of arcs in all intervals (including the end intervals), i.e., the number of arcs for the entire PE-PCD
int.num.arcs: The vector of the number of arcs of the components of the PE-PCD in the partition intervals (including the end intervals) based on Yp points,
num.in.range: Number of Xp points in the range or convex hull of Yp points
num.in.ints: The vector of number of Xp points in the partition intervals (including the end intervals) based on Yp points
weight.vec: The vector of the lengths of the middle partition intervals (i.e., end intervals excluded) based on Yp points
partition.intervals: Matrix of the partition intervals, each column is one interval
data.int.ind: A vector of indices of partition intervals in which data points reside, i.e., column number of part.int is provided for each Xp point. Partition intervals are numbered from left to right with 1 being the left end interval.
ind.mid: Indices of data points in the middle interval
ind.right.end: Indices of data points in the right end interval
tess.points: intervalization points (i.e. Yp points)
vertices: vertices of the PCD (i.e. Xp points)

r<-2 #try also r=1.5
c<-.4  #try also c=.3

Narcs = num.arcsPE1D(Xp,Yp,r,c)
summary(Narcs)
#> Call:
#> num.arcsPE1D(Xp = Xp, Yp = Yp, r = r, c = c)
#>
#> Description of the output:
#> Number of Arcs of the PE-PCD with vertices Xp and Related Quantities for the Induced Subdigraphs for the Points in the Partition Intervals
#>
#> Number of data (Xp) points in the range of Yp (nontarget) points =  6
#> Number of data points in the partition intervals based on Yp points =  3 3 2 0 1 1
#> Number of arcs in the entire digraph =  5
#> Numbers of arcs in the induced subdigraphs in the partition intervals =  4 1 0 0 0 0
#> Lengths of the (middle) partition intervals (used as weights in the arc density of multi-interval case):
#> 2.606255 2.686573 2.477544 2.453178
#>
#> End points of the partition intervals (each column refers to a partition interval):
#>            [,1]       [,2]     [,3]     [,4]      [,5]     [,6]
#> [1,]       -Inf -0.1299548 2.476300 5.162873  7.640417 10.09359
#> [2,] -0.1299548  2.4763001 5.162873 7.640417 10.093595      Inf
#>
#> Indices of the partition intervals data points resides:
#> 2 1 3 1 1 6 2 3 5 2 
#> 
#plot(Narcs)

The incidence matrix of the PE-PCD can be found by inci.matPE1D function by running inci.matPE1D(Xp,Yp,r,c). As in the 2D case, given the incidence matrix, we can find the approximate or the exact domination number of the PE-PCD, using the functions dom.num.greedy and dom.num.exact.

Plot of the arcs of the digraph PE-PCD are provided by the function plotPEarcs1D, which take the arguments

Xp,Yp,r,c are same as in the function num.arcsPE1D,
Jit, a positive real number that determines the amount of jitter along the $y$-axis, default=0.1 and Xp points are jittered according to $U(-Jit,Jit)$ distribution along the $y$-axis where Jit equals to the range of the union of Xp and Yp points multiplied by Jit).
main an overall title for the plot (default=NULL),
xlab,ylab titles for the $x$ and $y$ axes, respectively (default=NULL for both),
xlim,ylim, two numeric vectors of length 2, giving the $x$- and $y$-coordinate ranges (default=NULL for both),
centers, a logical argument, if TRUE, the plot includes the centers of the intervals as vertical lines in the plot, else centers of the intervals are not plotted, and
..., additional plot parameters.

We plot the arcs together with the centers, with centers=TRUE option in the plot function. Arcs are jittered along the $y$-axis to avoid clutter on the real line and thus provide better visualization.

jit<-.1
set.seed(1)
plotPEarcs1D(Xp,Yp,r,c,jit,xlab="",ylab="",centers=TRUE)

The arcs of the PE-PCD for the 1D artificial data set with centrality parameter $c=.4$, the end points of the $Y$ intervals (red) and the centers (green) are plotted with vertical dashed lines.

Figure 1.2: The arcs of the PE-PCD for the 1D artificial data set with centrality parameter $c=.4$, the end points of the $Y$ intervals (red) and the centers (green) are plotted with vertical dashed lines.

Plots of the PE proximity regions (i.e. proximity intervals) are provided with the function plotPEregs1D, which has the same arguments as the function plotPEarcs1D. We plot the proximity regions together with the centers with centers=TRUE option:

set.seed(12)
plotPEregs1D(Xp,Yp,r,c,xlab="x",ylab="",centers = TRUE)

The PE proximity regions (blue) for the 1D artificial data set, the end points of the $Y$ intervals (black) and the centers (green) are plotted with vertical dashed lines.

Figure 1.3: The PE proximity regions (blue) for the 1D artificial data set, the end points of the $Y$ intervals (black) and the centers (green) are plotted with vertical dashed lines.

The function arcsPE1D is an object of class “PCDs” and has the same arguments as in num.arcsPE1D. Its call (with Arcs in the below script) just provides the description of the digraph, and summary provides a description of the digraph, the names of the data points constituting the vertices of the digraph and also the interval points, selected tail (or source) points of the arcs in the digraph (first 6 or fewer are printed), selected head (or end) points of the arcs in the digraph (first 6 or fewer are printed), the parameters of the digraph (here centrality parameter and the expansion parameter), and various quantities of the digraph (namely, the number of vertices, number of partition points, number of triangles, number of arcs, and arc density. The plot function (i.e., plot.PCDs) provides the plot of the arcs in the digraph together with the intervals based on the ordered $\mathcal{Y}$ points.

For this function, PE proximity regions are constructed for data points inside or outside the intervals based on Yp points with expansion parameter $r \ge 1$ and centrality parameter $c \in (0,1)$. That is, for this function, arcs may exist for points in the middle and end intervals. Arcs are jittered along the $y$-axis in the plot for better visualization. The plot function returns the same plot as in plotPEarcs1D, hence we comment it out below.

Arcs<-arcsPE1D(Xp,Yp,r,c)
Arcs
#> Call:
#> arcsPE1D(Xp = Xp, Yp = Yp, r = r, c = c)
#> 
#> Type:
#> [1] "Proportional Edge Proximity Catch Digraph (PE-PCD) for 1D Points with Expansion Parameter r = 2 and Centrality Parameter c = 0.4"
summary(Arcs)
#> Call:
#> arcsPE1D(Xp = Xp, Yp = Yp, r = r, c = c)
#> 
#> Type of the digraph:
#> [1] "Proportional Edge Proximity Catch Digraph (PE-PCD) for 1D Points with Expansion Parameter r = 2 and Centrality Parameter c = 0.4"
#> 
#>  Vertices of the digraph =  Xp 
#>  Partition points of the region =  Yp 
#> 
#>  Selected tail (or source) points of the arcs in the digraph
#>       (first 6 or fewer are printed) 
#> [1] 3.907723 4.479377 5.617220 8.459662 8.459662 9.596209
#> 
#>  Selected head (or end) points of the arcs in the digraph
#>       (first 6 or fewer are printed) 
#> [1] 4.479377 3.907723 5.337266 9.596209 9.709029 9.709029
#> 
#> Parameters of the digraph
#> centrality parameter  expansion parameter 
#>                  0.4                  2.0 
#> 
#> Various quantities of the digraph
#>         number of vertices number of partition points 
#>                10.00000000                 5.00000000 
#>        number of intervals             number of arcs 
#>                 6.00000000                 6.00000000 
#>                arc density 
#>                 0.06666667

set.seed(1)
plot(Arcs)

1.1.1 Testing Interaction and Uniformity with the PE-PCDs

We can test the interaction between two classes/species or uniformity of points from one class in the 1D setting based on arc density or domination number of PE-PCDs.

The Use of Arc Density of PE-PCDs for Testing 1D Interaction

We can test the 1D interaction of segregation/association or uniformity based on arc density of PE-PCD using the function PEarc.dens.test1D which takes the arguments Xp,Yp,r,c,support.int,end.int.cor,alternative,conf.level where

r,alternative,conf.level are as in PEarc.dens.test,
Xp, a set of 1D points which constitute the vertices of the PE-PCD,
Yp, a set of 1D points which constitute the end points of the partition intervals,
support.int, the support interval $(a,b)$ with $a<b$. Uniformity of Xp points in this interval is tested, default is NULL.
c, a positive real number which serves as the centrality parameter in PE proximity region; must be in $(0,1)$ (default c=.5).
end.int.cor, a logical argument for end interval correction, default is FALSE, recommended when both Xp and Yp have the same interval support.

This function is an object of class “htest” (i.e., hypothesis test) and performs a hypothesis test of complete spatial randomness (CSR) or uniformity of Xp points in the range of Yp points against the alternatives of segregation (where Xp points cluster away from Yp points) and association (where Xp points cluster around Yp points) based on the normal approximation of the arc density of the PE-PCD for uniform 1D data utilizing the asymptotic normality of the $U$-statistics. For testing of uniformity of $\mathcal{X}$ points in a bounded interval support, $\mathcal{Y}$ points are artificially inserted randomly or at regular distances in the support.

The function is based on similar assumptions and returns the similar type of output as in PEarc.dens.test, see Section “VS1_1_2DArtiData” and also Ceyhan (2012) for more on the uniformity test based on the arc density of PE-PCDs for 1D data.

PEarc.dens.test1D(Xp,Yp,r,c) # try also PEarc.dens.test1D(Xp,Yp,r,c,alt="l")
#> 
#>  Large Sample z-Test Based on Arc Density of PE-PCD for Testing
#>  Uniformity of 1D Data ---
#>  without End Interval Correction
#> 
#> data:  Xp
#> standardized arc density (i.e., Z) = -0.77073, p-value = 0.4409
#> alternative hypothesis: true (expected) arc density is not equal to 0.1279913
#> 95 percent confidence interval:
#>  0.05557408 0.15952931
#> sample estimates:
#> arc density 
#>   0.1075517

The Use of Domination Number of PE-PCDs for Testing 1D Interaction

We first provide two functions to compute the domination number of PE-PCDs: PEdom.num1D and PEdom.num1Dnondeg.

The function PEdom.num1D takes the same arguments as num.arcsPE1D and returns a list with four elements as output:

dom.num, the overall domination number of PE-PCD with vertex set Xp and expansion parameter $r \ge 1$ and centrality parameter $c \in (0,1)$,
mds, a minimum dominating set of the PE-PCD,
ind.mds, the vector of data indices of the minimum dominating set of the PE-PCD whose vertices are Xp points,
int.dom.nums, the vector of domination numbers of the PE-PCD components for the partition intervals.

This function takes any center in the interior of the intervals as its argument. The vertex regions in each interval are based on the center $M_c=(a+c(b-a)$ for the interval $[a,b]$ with $c \in (0,1)$ (default for $c=.5$ which gives the center of mass of the interval).

On the other hand, PEdom.num1Dnondeg takes only the arguments Xp,Yp,r and returns the same output as in PEdom.num1D function, but uses one of the non-degeneracy centrality values in the multiple interval case (hence c is not an argument for this function). That is, c is one of the two values $\{(r-1)/r,1/r\}$ that renders the asymptotic distribution of domination number non-degenerate for a given value of $r \in (1,2]$ and M is center of mass (i.e., $c=.5$) for $r=2$.

These two functions are different from the function dom.num.greedy since they give an exact minimum dominating set and the exact domination number and from dom.num.exact, since they give a minimum dominating set and the domination number in polynomial time (in the number of vertices of the digraph, i.e., number of Xp points).

PEdom.num1D(Xp,Yp,r,c)
#> $dom.num
#> [1] 6
#> 
#> $mds
#> [1] -0.453322  2.450930  3.907723  5.617220  8.459662 10.285607
#> 
#> $ind.mds
#> [1] 6 1 3 9 2 5
#> 
#> $int.dom.nums
#> [1] 1 1 1 1 1 0 0 1
PEdom.num1Dnondeg(Xp,Yp,r)
#> $dom.num
#> [1] 7
#> 
#> $mds
#> [1] -0.453322  2.450930  3.907723  5.617220  8.459662  9.596209 10.285607
#> 
#> $ind.mds
#> [1] 6 1 3 9 2 4 5
#> 
#> $int.dom.nums
#> [1] 1 1 1 1 2 0 0 1

We can test the interaction pattern of segregation/association or uniformity based on domination of PE-PCD using the function PEdom.num.binom.test1D or PEdom.num.binom.test1Dint, each of which is an object of class “htest” and performs the same hypothesis test as in PEarc.dens.test1D. This function takes the same arguments as in PEarc.dens.test1D and returns the test statistic, $p$-value for the corresponding alternative, the confidence interval, estimate and null value for the parameter of interest (which is $P(\mbox{domination number}\le 1)$), and method and name of the data set used.

Under the null hypothesis of uniformity of Xp points in the range of Yp points, probability of success (i.e., $P(\mbox{domination number}\le 1)$) equals to its expected value under the uniform distribution) and alternative could be two-sided, or right-sided (i.e., data is accumulated around the Yp points, or association) or left-sided (i.e., data is accumulated around the centers of the triangles, or segregation).

Here, the PE proximity region is constructed with the centrality parameter $c \in (0,1)$ with an expansion parameter $r \ge 1$ that yields non-degenerate asymptotic distribution of the domination number. That is, for the centrality parameter c and for a given $c \in (0,1)$, the expansion parameter $r$ is taken to be $1/\max(c,1-c)$ which yields non-degenerate asymptotic distribution of the domination number.

The test statistic in PEdom.num.binom.test1D is based on the binomial distribution, when success is defined as domination number being less than or equal to 1 in the one interval case (i.e., number of successes is equal to domination number $\le 1$ in the partition intervals). That is, the test statistic is based on the domination number for Xp points inside the range of Yp points for the PE-PCD and default end interval correction, end.int.cor, is FALSE. For this approximation to work, Xp must be at least 5 times more than Yp points (or Xp must be at least 5 or more per partition interval). Here, the probability of success is the exact probability of success for the binomial distribution. See also Ceyhan (2020) for more on the uniformity test based on the domination number of PE-PCDs for 1D data. For testing uniformity of $\mathcal{X}$ points in $(0,10)$, one can run PEdom.num.binom.test1Dint(Xp,int,c) (here the default options are used for the other arguments).

PEdom.num.binom.test1D(Xp,Yp,c) #try also PEdom.num.binom.test1D(Xp,Yp,c,alt="l")
#> 
#>  Large Sample Binomial Test based on the Domination Number of PE-PCD for
#>  Testing Uniformity of 1D Data ---
#>  without End Interval Correction
#> 
#> data:  Xp
#> adjusted domination number = 0, p-value = 0.3042
#> alternative hypothesis: true Pr(Domination Number=2) is not equal to 0.375
#> 95 percent confidence interval:
#>  0.0000000 0.6023646
#> sample estimates:
#>          domination number   || Pr(domination number = 2) 
#>                            6                            0

In all the test functions (based on arc density and domination number) above, the option end.int.cor is for end interval correction (default is “no end interval correction”, i.e., end.int.cor = FALSE) which is recommended when both Xp and Yp have the same interval support. When the symmetric difference of the supports is non-negligible, the tests are modified to account for the $\mathcal{X}$ points outside the range of $\mathcal{Y}$ points. For example, PEarc.dens.test1D(Xp,Yp,r,c,end.int.cor = TRUE) would yield the end interval corrected version of the arc-based test of 1D interaction. Furthermore, we only provide the two-sided tests above, although both one-sided versions are also available.

1.2 Summary and Visualization with Central Similarity PCDs

CS proximity regions are defined similar to the PE proximity regions in Section 1.1. Note also that for CS-PCDs in two dimensions, we use the edge regions to construct the proximity region, however, in the one dimensional setting, vertex and edge regions coincide, so we refer these regions as “vertex” regions for convenience. The default centrality parameter used to construct the vertex regions is again c=0.5 which yields the center of mass of each interval.

The functions for CS-PCD have similar arguments as the PE-PCD functions with the expansion parameter r replaced with t (which must be positive). Number of arcs of the CS-PCD can be computed by the function num.arcsCS1D, which is an object of class “NumArcs” and takes same arguments (except expansion parameter t) and returns similar output items as in num.arcsPE1D.

tau<-2; c<-.4

Narcs = num.arcsCS1D(Xp,Yp,tau,c)
summary(Narcs)
#> Call:
#> num.arcsCS1D(Xp = Xp, Yp = Yp, t = tau, c = c)
#>
#> Description of the output:
#> Number of Arcs of the CS-PCD with vertices Xp and Related Quantities for the Induced Subdigraphs for the Points in the Partition Intervals
#>
#> Number of data (Xp) points in the range of Yp (nontarget) points =  6
#> Number of data points in the partition intervals based on Yp points =  3 3 2 0 1 1
#> Number of arcs in the entire digraph =  6
#> Numbers of arcs in the induced subdigraphs in the partition intervals =  4 2 0 0 0 0
#> Lengths of the (middle) partition intervals (used as weights in the arc density of multi-interval case):
#> 2.606255 2.686573 2.477544 2.453178
#>
#> End points of the partition intervals (each column refers to a partition interval):
#>            [,1]       [,2]     [,3]     [,4]      [,5]     [,6]
#> [1,]       -Inf -0.1299548 2.476300 5.162873  7.640417 10.09359
#> [2,] -0.1299548  2.4763001 5.162873 7.640417 10.093595      Inf
#>
#> Indices of the partition intervals data points resides:
#> 2 1 3 1 1 6 2 3 5 2 

#plot(Narcs)

The incidence matrix of the CS-PCD can be found by inci.matCS1D by running inci.matCS1D(Xp,Yp,t=1.5,c) command. With the incidence matrix, approximate and exact domination numbers can be found by the functions dom.num.greedy and dom.num.exact, respectively.

Plot of the arcs in the digraph CS-PCD is provided by the function plotCSarcs1D, which take same arguments as the function plotPEarcs1D. We plot the arcs together with the centers, with centers=TRUE option in the plot function. Arcs are jittered along the $y$-axis to avoid clutter on the real line (i.e., for better visualization).

set.seed(1)
plotCSarcs1D(Xp,Yp,tau,c,jit,xlab="",ylab="",centers=TRUE)

The arcs of the CS-PCD for the 1D artificial data set with centrality parameter $c=.4$, the end points of the $Y$ intervals (red) and the centers (green) are plotted with vertical dashed lines.

Figure 1.4: The arcs of the CS-PCD for the 1D artificial data set with centrality parameter $c=.4$, the end points of the $Y$ intervals (red) and the centers (green) are plotted with vertical dashed lines.

Plot of the CS proximity regions (or intervals) is provided with the function plotCSregs1D, which take same arguments as the function plotPEregs1D. We plot the proximity regions together with the centers with centers=TRUE option:

plotCSregs1D(Xp,Yp,tau,c,xlab="",ylab="",centers = TRUE)

The CS proximity regions (blue) for the 1D artificial data set, the end points of the $Y$ intervals (black) and the centers (green) are plotted with vertical dashed lines.

Figure 1.5: The CS proximity regions (blue) for the 1D artificial data set, the end points of the $Y$ intervals (black) and the centers (green) are plotted with vertical dashed lines.

The function arcsCS1D is an object of class “PCDs” and has the same arguments as in num.arcsCS1D. Its call, summary, and plot are as in arcsPE1D. For this function, CS proximity regions are constructed for data points inside or outside the intervals based on Yp points with expansion parameter $t > 0$ and centrality parameter $c \in (0,1)$. That is, for this function, arcs may exist for points in the middle or end intervals. Arcs are jittered along the $y$-axis in the plot for better visualization. The plot function returns the same plot as in plotCSarcs1D, hence we comment it out below.

Arcs<-arcsCS1D(Xp,Yp,tau,c)
Arcs
#> Call:
#> arcsCS1D(Xp = Xp, Yp = Yp, t = tau, c = c)
#> 
#> Type:
#> [1] "Central Similarity Proximity Catch Digraph (CS-PCD) for 1D Points with Expansion Parameter t = 2 and Centrality Parameter c = 0.4"
summary(Arcs)
#> Call:
#> arcsCS1D(Xp = Xp, Yp = Yp, t = tau, c = c)
#> 
#> Type of the digraph:
#> [1] "Central Similarity Proximity Catch Digraph (CS-PCD) for 1D Points with Expansion Parameter t = 2 and Centrality Parameter c = 0.4"
#> 
#>  Vertices of the digraph =  Xp 
#>  Partition points of the region =  Yp 
#> 
#>  Selected tail (or source) points of the arcs in the digraph
#>       (first 6 or fewer are printed) 
#> [1] 3.907723 4.479377 5.337266 5.617220 8.459662 8.459662
#> 
#>  Selected head (or end) points of the arcs in the digraph
#>       (first 6 or fewer are printed) 
#> [1] 4.479377 3.907723 5.617220 5.337266 9.596209 9.709029
#> 
#> Parameters of the digraph
#> centrality parameter  expansion parameter 
#>                  0.4                  2.0 
#> Various quantities of the digraph
#>         number of vertices number of partition points 
#>                10.00000000                 5.00000000 
#>        number of intervals             number of arcs 
#>                 6.00000000                 8.00000000 
#>                arc density 
#>                 0.08888889
plot(Arcs)

1.2.1 Testing 1D Interaction or Uniformity with the CS-PCDs

We can test the 1D interaction between two classes/species or uniformity of points from one class in the 1D setting based on arc density of CS-PCDs. The distribution of the domination number of CS-PCDs is still a topic of ongoing work.

The Use of Arc Density of CS-PCDs for Testing 1D Interaction or Uniformity

We can test the 1D interaction of segregation/association or uniformity based on arc density of CS-PCD using the function CSarc.dens.test1D. This function is an object of class “htest” (i.e., hypothesis test), takes the same arguments as the function PEarc.dens.testS1D with expansion parameter r replaced with t, performs the same type of test with the same null and alternative hypotheses, and returns similar output as the PEarc.dens.test1D function. See Section 1.1, and also Ceyhan (2016) for more details.

CSarc.dens.test1D(Xp,Yp,tau,c) #try also CSarc.dens.test1D(Xp,Yp,tau,c,alt="l")
#> 
#>  Large Sample z-Test Based on Arc Density of CS-PCD for Testing
#>  Uniformity of 1D Data ---
#>  without End Interval Correction
#> 
#> data:  Xp
#> standardized arc density (i.e., Z) = -0.75628, p-value = 0.4495
#> alternative hypothesis: true (expected) arc density is not equal to 0.1658151
#> 95 percent confidence interval:
#>  0.08507259 0.20159565
#> sample estimates:
#> arc density 
#>   0.1433341

As in the tests based on PE-PCD, it is possible to account for $\mathcal{X}$ points outside the range of $\mathcal{Y}$ points, with the option end.int.cor = TRUE. For example, CSarc.dens.test1D(Xp,Yp,tau,c,end.int.cor = TRUE) would yield the end interval corrected version of the arc-based test of 1D interaction. Furthermore, we only provide the two-sided test above, although both one-sided versions are also available.

References

Ceyhan, E. 2012. “The Distribution of the Relative Arc Density of a Family of Interval Catch Digraph Based on Uniform Data.” Metrika 75(6): 761–93.

———. 2016. “Density of a Random Interval Catch Digraph Family and Its Use for Testing Uniformity.” REVSTAT 14(4): 349–94.

———. 2020. “Domination Number of an Interval Catch Digraph Family and Its Use for Testing Uniformity.” Statistics 54(2): 310–39.

VS1.3 - Example: An Artificial 1D Dataset

Elvan Ceyhan

2023-12-18

1 Illustration of PCDs on an Artificial 1D Dataset

1.1 Summary and Visualization with Proportional Edge PCDs

1.1.1 Testing Interaction and Uniformity with the PE-PCDs

1.2 Summary and Visualization with Central Similarity PCDs

1.2.1 Testing 1D Interaction or Uniformity with the CS-PCDs