# Modelling income tax and the project function

#### 2021-01-29

The functions model_income_tax and project are the core of the grattan package. Grattan applies them to the ATO’s 2% sample files to produce costings of changes to tax policy. The functions are both $$X^n \to X^n$$. That is, they take a sample file and return a mutated sample file.

With the mutated sample file, the costing for that particular tax year is the weighted sum of the difference between the new_tax and the baseline_tax columns. We can also use the mutated sample file to perform distributional analysis, such as the average change in tax by taxable income percentile.

Since the input data consists of tax returns and the grattan package does not purport to generate inferences about the wider Australian population, these functions cannot (directly) analyse the effect of policies on households or on the wider population. For example, policies affecting welfare payments, changes to the tax settings of businesses or super funds, or changes which would tax people who do not currently file tax returns are not amenable to the kind of analysis these functions perform.

## How to use model_income_tax

model_income_tax takes a sample file and returns a sample file under the settings given by the function arguments.

To start, let’s load the (minimal) packages we need. We’ll use the synthetic 2015-16 sample file contained in the suggested package taxstats1516. See ?install_taxstats for installation instructions. For future years, use the latest sample file from the ATO.

library(hutilscpp)
library(knitr)
library(data.table)
library(magrittr)
library(hutils)
library(grattan)
require_taxstats1516()

# Use the actual sample file if you've got it
s1516 <- as.data.table(sample_file_1516_synth)
s1516[, WEIGHT := 50L]

This function is purely cosmetic.

#' @return Number formatted as dollar e.g. 30e3 => $30,000 dollar <- function (x, digits = 0) { nsmall <- digits commaz <- format(abs(x), nsmall = nsmall, trim = TRUE, big.mark = ",", scientific = FALSE, digits = 1L) if_else(x < 0, paste0("\U2212","$", commaz),
paste0("$", commaz)) } All instances of model_income_tax have two mandatory arguments: sample_file and baseline_fy. These define the baseline_tax column in the result. When an argument is left as NULL, the new_tax column is calculated using the corresponding tax setting that applied in baseline_fy. s1516 %>% model_income_tax(baseline_fy = "2015-16") %>% select_grep("tax$", "Taxable_Income") %>%  # just look at relevant cols
kable
Taxable_Income baseline_tax new_tax
28849 2155 2155.29
210436 72060 72060.64
22285 426 426.15
58461 11592 11592.96
0 0 0.00
20078 0 0.00

Note that by default new_tax is a double precision vector, not rounded. You can use return. = sample_file.int to return rounded variables.

With the use of a simple function to test equality, we can see that new_tax is just the same as baseline_tax, as expected.

is_all_equal <- function(x, y) {
if (is.integer(x) && is.integer(y)) {
all(x == y)
} else {
isTRUE(all.equal(x, y))
}
}

s1516 %>%
model_income_tax(baseline_fy = "2015-16",
return. = "sample_file.int") %>%

### Changing Medicare levy parameters

The Medicare levy is more complex to calculate than ordinary income tax. There are parameters relating to two thresholds, as well as different thresholds for families and SAPTO-eligible individuals. Even the simplest modification require changes to multiple parameters. Warnings are emitted whenever parameters are not internally consistent.

Let’s try to increase the Medicare levy rate from 2% and 3%. Observe the warning messages.

## Warning: medicare_levy_upper_threshold was not specified, but its default value would be inconsistent with the parameters that were specified.
## Its value has been set to:
##  medicare_levy_upper_threshold = 30479
## Warning: medicare_levy_upper_sapto_threshold was not specified, but its default value would be inconsistent with the parameters that were specified.
## Its value has been set to:
##  medicare_levy_upper_sapto_threshold = 48197

Note the warning messsage says that the parameter has been changed. However, you should never tolerate the warning; instead, change the parameter to the suggested one (if you agree with the warning message’s advice).

## How to use project

The function project takes a sample file and returns a sample file. The other mandatory argument is h, the number of integer years ahead of the sample file provided.

Thus, to get a forecast for the 2018-19 tax year:

s1819 <- project(s1516, h = 3L)

This uses the internal forecast methods. To specify specific forecast outcomes, you can use the wage.series and lf.series

### Wage and labour series

To compare the tax collections under these different assumptions, one would use income_tax separately:

Currently there is no interface to using the upper or lower bounds of the labour force or wage price indices. If you wanted the 80% upper bound of the prediction interval for salary out to 2020-21, for instance, you would pass Sw_amt to excl_vars and manually inflate.

## [1] "$50,470" ## [1] "$50,704"
## [1] "$64,198" ## [1] "$64,431"

## Combining the two

To cost a reduction in the capital gains tax discount from 50% to 25% over the four years from 2018-19, we would run

cgt_25pc_fwd_estimates <-
lapply(yr2fy(2019:2022), function(fy) {
s1516 %>%
project_to(to_fy = fy) %>%
model_income_tax("2018-19",
cgt_discount_rate = 0.25) %>%
.[, fy_year := fy]
}) %>%
rbindlist

Note that this takes a few seconds, most of which is spent within project. We could improve the speed of this by caching the intermediate objects, either as objects in the environment or as files (say, .fst files). You should consider doing this when you find yourself running project many times – likely you are just repeating calculations.

cgt_25pc_fwd_estimates %>%
mutate_ntile("Taxable_Income", n = 5L, keyby = "fy_year") %>%
.[, delta := new_tax - baseline_tax] %>%
.[, .(totDelta = sum(delta),
avgDelta = mean(delta)),
keyby = .(fy_year, Taxable_IncomeQuintile)] %>%
# cosmetic
.[, lapply(.SD, round), keyby = key(.)] %>%
kable
fy_year Taxable_IncomeQuintile totDelta avgDelta
2018-19 1 0 0
2018-19 2 -9572321 -178
2018-19 3 -42454914 -787
2018-19 4 -54753252 -1015
2018-19 5 54372173 1008
2019-20 1 0 0
2019-20 2 -9766969 -181
2019-20 3 -43770465 -812
2019-20 4 -54703040 -1014
2019-20 5 59436677 1102
2020-21 1 0 0
2020-21 2 -9971943 -185
2020-21 3 -45009200 -835
2020-21 4 -54571450 -1012
2020-21 5 67143364 1245
2021-22 1 0 0
2021-22 2 -10195667 -189
2021-22 3 -46075853 -854
2021-22 4 -54522985 -1011
2021-22 5 72131973 1338

### lito_multi for custom offsets

While model_income_tax cannot account for the future imagination of tax policy makers, the argument lito_multi does provide a powerful mechanism for handling complicated offsets. The argument, if provided, must be a list of two components x and y. These can be used to define an offset: for every (x_i, y_i) defined the value of the offset for a taxable income x_i must be y_i with the points in between interpolated linearly.

For example to simply mimic LITO in 2015-16:

## Empty data.table (0 rows and 67 cols): Gender,age_range,Occ_code,Partner_status,Region,Lodgment_method...

### Budget_... parameters

These were used to cost policies proposed in the 2018 Budget period by the Government and the Opposition. They’re unlikely to have much use except in reproducing past results.

### SAPTO

The Seniors and Pensioner Tax Offset (SAPTO) can also be modified. To cost the abolition of SAPTO, one would use:

To model a change to lower the SAPTO threshold from $32,279 to$27,000:

To cost the proposal in Age of entitlement: age-based tax breaks (2016)

## [1] "-\$6.4 billion"