Preprint / Version 1

Using Multiple Linear Regression (MLR) to predict the real cost of a car model

##article.authors##

  • Tien Anh Nguyen No

DOI:

https://doi.org/10.58445/rars.1211

Keywords:

Real Value Estimation, Statistics, Multiple Linear Regression, Economy

Abstract

In this paper, an analysis of basic car characteristics is taken into account to predict the real price of different automobile models. Multiple linear regression (MLR) analysis was performed on the data using structural equation modeling with JMP 17. My methodology is divided into three main steps: the first uses various statistical analysis techniques to evaluate and preprocess the data and collected variables; the second involves choosing the most significant variables using multiple methods. The final phase uses RMSE, AICc, BIC, Mallow’s Cp, and Adjusted 𝑅2 to compare the outcome of many MLR models built using the chosen variables. The collected findings indicate that the model produced with variables chosen using the Stepwise Selection approach performs better than the models utilizing other approaches, having the lowest AICc, RMSE, and highest Adjusted R2. In the results, a reasonable regression model acquired a remarkable ability to predict the price of car models.

References

Schneider, A., Hommel, G., & Blettner, M. (2010). Linear regression analysis: part 14 of a series on evaluation of scientific publications. Deutsches Arzteblatt international, 107(44), 776–782. https://doi.org/10.3238/arztebl.2010.0776

Pascual, J. (2022, December 8). Basic car characteristics. Kaggle. https://www.kaggle.com/datasets/joanpau/cars-df

Noor, Kanwal & Jan, Sadaqat. (2017). Vehicle Price Prediction System using Machine Learning Techniques. International Journal of Computer Applications. 167. 27-31. 10.5120/ijca2017914373.

MUTİ, S., & YILDIZ, K. (2023). Using linear regression for used car price prediction. International Journal of Computational and Experimental Science and Engineering, 9(1), 11–16. https://doi.org/10.22399/ijcesen.1070505

Pudaruth, Sameerchand. (2014). Predicting the Price of Used Cars using Machine Learning Techniques. International Journal of Information & Computation Technology. 4. 753-764.

Kaushal, Anirudh and Shankar, Achyut, House Price Prediction Using Multiple Linear Regression (April 25, 2021). Proceedings of the International Conference on Innovative Computing & Communication (ICICC) 2021, Available at SSRN: https://ssrn.com/abstract=3833734 or http://dx.doi.org/10.2139/ssrn.3833734

Aissaoui, Ouafae & Madani, Yasser & Oughdir, Lahcen & Dakkak, Ahmed & EL ALLIOUI, Youssouf. (2020). A Multiple Linear Regression-Based Approach to Predict Student Performance. 10.1007/978-3-030-36653-7_2.

Dietrich, D., Heller, R., Yang, B., EMC Education Services: Data science and big data analytics: discovering, analyzing, visualizing and presenting data.

Christine P. Chai (2020) The Importance of Data Cleaning: Three Visualization Examples, CHANCE, 33:1, 4-9, DOI: 10.1080/09332480.2020.1726112

Paul C. Price, R. S. J. (2017, August 21). 13.1 UNDERSTANDING NULL HYPOTHESIS TESTING. Research methods in psychology. https://opentext.wsu.edu/carriecuttler/chapter/13-1-understanding-null-hypothesis-testing/

Walczak, Beata & Massart, D.. (2000). Chapter 15 Calibration in wavelet domain. Data Handling in Science and Technology. 22. 323-349. 10.1016/S0922-3487(00)80040-4.

Borboudakis, Giorgos & Tsamardinos, Ioannis. (2017). Forward-Backward Selection with Early Dropping.

Narisetty, Naveen. (2020). Bayesian model selection for high-dimensional data. 10.1016/bs.host.2019.08.001.

Brandon Foltz. (2022). Statistics 101: Multiple Regression, Backward Elimination [Video]. YouTube. https://www.youtube.com/watch?v=pv4SBxyynxc

Brooks, G. P., & Ruengvirayudh, P. (2016). Best-subset selection criteria for multiple linear regression. General Linear Model Journal, 42(2), 14-25.

Akakike, H. (1974). “A new look at the statistical model identification”. IEEE Transactions on Automatic Control. 19 (6): 716 – 723,

McQuarrie, A. D. R.; Tsai, C.-L. (1998), Regression and Time Series Model Selection, World Scientific.

Gideon Schwarz. "Estimating the Dimension of a Model." Ann. Statist. 6 (2) 461 - 464, March, 1978. https://doi.org/10.1214/aos/1176344136

Adjusted R squared. IBM. (2024, January 18). https://www.ibm.com/docs/en/cognos-analytics/12.0.0?topic=terms-adjusted-r-squared

What is Mallows’ cp? Minitab. (n.d.). https://support.minitab.com/en-us/minitab/help-and-how-to/statistical-modeling/regression/supporting-topics/goodness-of-fit-statistics/what-is-mallows-cp/

David Christie, Simon P. Neill, 8.09 - Measuring and Observing the Ocean Renewable Energy Resource, Editor(s): Trevor M. Letcher, Comprehensive Renewable Energy (Second Edition), Elsevier, 2022, Pages 149-175, ISBN 9780128197349, https://doi.org/10.1016/B978-0-12-819727-1.00083-2. (https://www.sciencedirect.com/science/article/pii/B978012819727100

Downloads

Posted

2024-06-16

Categories