General Ranking of Models/Fits Using the AIC and BIC Metrics

When comparing different (simulation) fits for the same experimental data (see eval_sim_EPR_isoFit, eval_kinR_EPR_modelFit, eval_kinR_Eyring_GHS, smooth_EPR_Spec_by_npreg or eval_sim_EPR_isoFit_space), they can be scored/ranked by different metrics (e.g. minimum sum of residual squares or standard deviation of residuals), including Akaike and Bayesian Information Criteria (AIC and BIC, respectively). These are also applied for the best model selection in machine learning (refer to e.g. Predictive Modelling and Machine Learning or Error Estimation and Model Selection). As described in details, both metrics depends on maximum logarithmic likelihood (based on residuals calculation) to the same data. The smaller the (negative) AIC or BIC, the better the model/fit.

Usage

eval_ABIC_forFit(data.fit, residuals = NULL, k, rs.prob.distro = "auto")

Arguments

data.fit: Data frame object, usually containing variables/columns like experiment, fit(ted)/predicted as well as residuals/errors. If the latter is missing (see the argument residuals below) one can easily create/calculate the variable/column as a difference between the experimental and fitted/predicted values.
residuals: Character string, pointing to variable/column header with residuals/errors, depending on the data.fit argument (usually residuals = "Residuals" or residuals = "errors"). Default: residuals = NULL.
k: Numeric value identical to number of parameters used for the model/fit (see e.g. Examples in the eval_kinR_EPR_modelFit where k = 2).
rs.prob.distro: Character string, corresponding to proposed residuals/errors probability distribution. If set to default (rs.prob.distro = "auto"), it automatically decides which distribution (Normal/Gaussian or Student's t-distribution) fits the best to residuals/errors based on the implemented AIC and BIC calculations, additionally supported by the Shapiro-Wilk test (see shapiro.test). This is particularly suitable for the situation when residual analysis detects heavier tails (see e.g. Example in eval_sim_EPR_isoFit) and one is not quite sure of the corresponding probability distribution. Otherwise, the argument may also specify individual distributions like: rs.prob.distro = "normal", "Gaussian", "Student" or "t-distribution" ("t-distro").

Value

Function returns a list with the following components:

abic.vec: A numeric vector containing the values of estimated AIC and BIC, respectively.
message: Sentence (message), describing the residuals/errors probability distribution, which has been proposed for the AIC and BIC calculation (see also the rs.prob.distro argument).

Details

Estimation of model errors, that model/fit makes in respect to our (experimental) data, becomes one of the most consequential aspects of a statistical (machine learning) analysis. Often, different modelling/fitting approaches are used, with the attempt to identify or select the best model/fit. Therefore, for such purpose, one tries to minimize the errors/residuals more and more with each model. Or to put it another way, there is an information loss when the model/fit approximates the reality and a good model minimizes those losses. The evaluation of AIC and BIC actually approaches the problem from the other site, because it uses the technique called maximum likelihood estimate (MLE). The idea is to maximize the chance that each observation in the sample follows a pre-selected distribution with specific set of parameters (corresponding to a model/fit). For practical reasons a logarithmic likelihood (or log-likelihood,$LL$) is used, and the formulae for both criteria read: $$AIC = -2\,LL + 2\,k + (2\,k\,(k + 1)\,/\,(N - k -1))$$ and $$BIC = -2\,LL + k\,ln(N)$$ where $k$ and $N$ correspond to number of (model/fit) parameters and number of observations, respectively. The 3rd term in the $AIC$ definition represents the correction for small sample/observation ensemble, which for high number of observations becomes very small (and can be neglected, see e.g. Burnham and Anderson (2004) or Kumar (2023) in the References). For example, for EPR simulation fit with 2048 points and 8 parameters it equals to $16 \cdot 9\,/\,2039 \approx 0.0706$. However, for radical kinetic measurements with 42 EPR spectra and 3 parameters, the 3rd term results in $6 \cdot 4\,/\,38 \approx 0.6316$.

The original MLE/$LL$ calculation is based on the model. Nevertheless, such computation can be quite often impractical or even impossible to perform. To overcome this difficulty, the formulae for both criteria use a standard assumption that the model and the data residuals/errors are identically distributed. Therefore, the residuals/errors are applied as a proxy for the MLE/$LL$ (see e.g. Rossi et al. (2020) and Kumar (2023) in the References). Evaluation of the latter, in the actual function, proceeds through sum of the stats::dnorm (for the normal/Gaussian distribution) and of the stats::dt (for the Student's t-distribution), using the log = TRUE option. For t-distribution the df/$\nu$ parameter is unknown, therefore it is optimized by the above-described $LL$ as well as by the optimize function. Both probability distributions are included in the function because not always the residuals/errors follow the normal one. Sometimes, heavier tails may appear, e.g. for EPR simulation fits (please, refer to the Examples in the eval_sim_EPR_isoFit). Consequently, the function may automatically (see the argument rs.prob.distro) decide which distribution fits the residuals/errors the best, based on the lower AIC, BIC values, additionally supported by the Shapiro-Wilk Normality test (shapiro.test). It is recommended to evaluate/apply both information criteria. The AIC tends to favor a more complex model (over a simpler one) and thus suggests to "overfit" the data, whereas the BIC is in favor of simpler models because it possesses a stronger penalty ($k\,ln(N)$) for complex models than AIC ($2\,k$,see e.g. Fabozzi et al. (2014) and Zhang Y, Meng G (2023) in the References).

References

Fabozzi FJ, Focardi FM, Rachev ST, Arshanapalli BG (2014). The Basics of Financial Econometrics: Tools, Concepts, and Asset Management Applications (Appendix E), John Wiley and Sons, Inc. ISBN 978-1-118-57320-4, https://onlinelibrary.wiley.com/doi/book/10.1002/9781118856406.

Soch J et al. (2024). StatProofBook/StatProofBook.github.io: The Book of Statistical Proofs (Version 2023)., https://statproofbook.github.io/, https://doi.org/10.5281/ZENODO.4305949.

Burnham KP, Anderson DR (2004). "Multimodel Interference: Understanding AIC and BIC in Model Selection", Sociol. Methods Res., 33(2), 261-304, https://doi.org/10.1177/0049124104268644.

Thulin M (2025). Modern Statistics with R: From Wrangling and Exploring Data to Inference and Predictive Modeling, 2nd edition (Version 2.0.2), CRC Press and Taylor and Francis Group, LLC. ISBN 978-1-032-51244-0, https://www.modernstatisticswithr.com/.

Zhang Y, Meng G (2023). "Simulation of an Adaptive Model Based on AIC and BIC ARIMA Predictions", J. Phys.: Conf. Ser., 2449, 012027-7, https://doi.org/10.1088/1742-6596/2449/1/012027.

Svetunkov I (2022). Statistics for Business Analytics, Version 2025, https://openforecast.org/sba/.

Rossi R, Murari R, Gaudio P, Gelfusa M (2020). "Upgrading Model Selection Criteria with Goodness of Fit Tests for Practical Applications", Entropy, 22(4), 447-13, https://doi.org/10.3390/e22040447.

Hyndman RJ, Athanasopoulos G (2021). Forecasting: Principles and Practise, 3rd edition, O Texts, ISBN 978-0-987-50713-6, https://otexts.com/fpp3/.

Hyndman RJ (2013). "Facts and Fallacies of the AIC", https://robjhyndman.com/hyndsight/aic/.

Kumar A (2023). "AIC and BIC for Selecting Regression Models: Formula, Examples", https://vitalflux.com/aic-vs-bic-for-regression-models-formula-examples/#comments.

Examples

if (FALSE) { # \dontrun{
## to decide which probability distribution fits
## the best to residuals/errors
calc.abic.list.01 <-
  eval_ABIC_forFit(
    data.fit = triaryl_model_kin_fit_01$df,
    residuals = "residuals",
    k = 2,
    rs.prob.distro = "auto"
 )
#
## AIC and BIC values
calc.abic.list.01$abic.vec
#
## ...and the corresponding message
calc.abic.list.01$message
#
## calculation of AIC and BIC, taking into
## account the Student's t-distribution:
calc.abic.list.01 <-
  eval_ABIC_forFit(
    data.fit = best.sim.fit.df,
    residuals = "Errors",
    k = 8,
    rs.prob.distro = "t-distro"
  )
#
## for additional applications please,
## refer to the Examples in `eval_sim_EPR_isoFit()`
## or `eval_kinR_EPR_modelFit()`
#
} # }