In this vignette, we discuss the robust Horvitz-Thompson (RHT) estimator of Hulliger (1995, 1999). The vignette is organized as follows.

- 1 Workplace data
- 2 Robust Horvitz-Thompson estimator
- References

First, we load the namespace of the package `robsurvey`

and attach it to the search path.

The argument `quietly = TRUE`

suppresses the start-up
message in the call of `library("robsurvey")`

.

The `workplace`

sample consists of payroll data on n = 142
workplaces or business establishments (with paid employees) in the
retail sector of a Canadian province.

- The data display similar characteristics to the original 1999
Canadian Workplace and Employee Survey (WES), however, the
`workplace`

data are not those collected by Statistics Canada but have been generated by Fuller (2009, Example 3.1.1, Table 6.3). - The sampling design is stratified by industry, geographic region, and size (size is defined using estimated employment). A sample of workplaces was then drawn independently in each stratum using simple random sample without replacement (sample size is determined by Neyman allocation).

The original weights of WES were about 2200 for the stratum of small workplaces, about 750 for medium-sized, and about 35 for large workspaces. Several strata containing very large workplaces were sampled exhaustively; see Patak et al. (1998).

The variable of interest is `payroll`

, and the goal is to
estimate the population payroll total in the retail sector (in Canadian
dollars).

```
> head(workplace, 3)
ID weight employment payroll strat fpc
1 1 786 17 260000 2 10718
2 2 32 661 6873000 1 3432
3 3 36 3 366000 1 3432
```

In order to use the **survey methods** (not bare-bone
methods), we must **attach** the `survey`

package (Lumley, 2010, 2021) to the search
path

and specify a survey or sampling design object

```
> dn <- svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
+ data = workplace, calibrate.formula = ~-1 + strat)
```

**Note.** Since **version 4.2**, the
**survey** package allows the definition of pre-calibrated
weights (see argument `calibrate.formula`

of the function
`svydesign()`

). This vignette uses this functionality (in
some places). If you have installed an earlier version of the
`survey`

package, this vignette will automatically fall back
to calling `svydesign()`

without the calibration
specification. See vignette Pre-calibrated
weights of the `survey`

package to learn more.

To get a first impression of the distribution of
`payroll`

, we examine two (design-weighted) boxplots of
`payroll`

(on raw and logarithmic scale). The boxplots are
obtained using function `survey::svyboxplot`

.

From the boxplot with `payroll`

on raw scale, we recognise
that the sample distribution of `payroll`

is skewed to the
right; the boxplot on logarithmic scale demonstrates that log-transform
is not sufficient to turn the skewed distribution into a symmetric
distribution. The outliers need not be errors. Following Chambers (1986), we distinguish representative
outliers from non-representative outliers (\(\rightarrow\) see vignette “Basic Robust
Estimators” for an introduction to the notion of non-/ representative
outliers).

The outliers visible in the boxplot refer to a few large workplaces. Moreover, we assume that these outliers represent other workplaces in the population that are similar in value (i.e., representative outliers).

The following bare-bone estimating methods are available:

`weighted_mean_huber()`

`weighted_total_huber()`

`weighted_mean_tukey()`

`weighted_total_tukey()`

The functions with postfix `_tukey`

are
*M*-estimators with the Tukey biweight \(\psi\)-function. The Huber RHT
*M*-estimator of the payroll total can be computed with

Note that we must specify `type = "rht"`

for the RHT [the
case `type = "rhj"`

is discussed in the vignette “Basic
Robust Estimators”]. Here, we have chosen the robustness tuning constant
\(k = 8\).

The following survey method are available;

`svymean_huber()`

`svytotal_huber()`

`svymean_tukey()`

`svytotal_tukey()`

The survey method of the RHT (and its standard error) is

The `summary()`

method summarizes the most important facts
about the estimate.

```
> summary(m)
Huber M-estimator (type = rht) of the population total
total SE
payroll 1.559e+10 1.182e+09
Robustness:
Psi-function: with k = 8
mean of robustness weights: 0.9917
Algorithm performance:
converged in 4 iterations
with residual scale (weighted MAD): 89474
Sampling design:
Stratified Independent Sampling design
svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight,
data = workplace, calibrate.formula = ~-1 + strat)
```

The estimated location, variance, and standard error can be extracted
from object `m`

with the following commands.

For *M*-estimators, the estimated scale (weighted MAD) can be
extracted with the `scale()`

function.

Additional utility functions are:

`residuals()`

to extract the residuals`fitted()`

to extract the fitted values under the model in use`robweights()`

to extract the robustness weights

In the following figure, the robustness weights of object
`m`

are plotted against the residuals. The Huber RHT
*M*-estimator downweights observations at both ends of the
residuals’ distribution.

An adaptive *M*-estimator of the total (or mean) is defined by
letting the data chose the tuning constant \(k\). This approach is available for the RHT
estimator \(\rightarrow\) see vignette
“Basic Robust Estimators”, Chap. 5.3 on *M*-estimators.

CHAMBERS, R. (1986). Outlier Robust Finite Population Estimation.
*Journal of the American Statistical Association*
**81**, 1063–1069, DOI:
10.1080/01621459.1986.10478374.

FULLER, W. A. (2009). *Sampling Statistics*, Hoboken (NJ):
John Wiley & Sons, DOI:
10.1002/9780470523551.

HULLIGER, B. (1995). Outlier Robust Horvitz–Thompson Estimators.
*Survey Methodology* **21**, 79–87.

HULLIGER, B. (1999). Simple and robust estimators for sampling, in:
*Proceedings of the Survey Research Methods Section, American
Statistical Association*, pp. 54–63.

HULLIGER, B. (2006). Horvitz–Thompson Estimators, Robustified. In:
*Encyclopedia of Statistical Sciences* ed. by Kotz, S. Volume 5,
Hoboken (NJ): John Wiley and Sons, 2nd edition, DOI:
10.1002/0471667196.ess1066.pub2.

LUMLEY, T. (2010). *Complex Surveys: A Guide to Analysis Using R:
A Guide to Analysis Using R*, Hoboken (NJ): John Wiley &
Sons.

LUMLEY, T. (2021). survey: analysis of complex survey samples. R package version 4.0, URL https://CRAN.R-project.org/package=survey.

PATAK, Z., HIDIROGLOU, M. and LAVALLEE, P. (1998). The methodology of
the Workplace and Employee Survey. *Proceedings of the Survey
Research Methods Section, American Statistical Association*,
83–91.