Preprint / Version 3

Hazardous Asteroid Classification with Machine Learning using Physical and Orbital Asteroid Properties


  • Arjun Ramakrishnan Polygence



Potentially Hazardous Asteroids, Machine Learning, Random Forest Classification


Asteroids, rocky objects orbiting the sun, have been a key focus of scientific study as they can provide insights into planet formation. With a seemingly infinite number of asteroids in space, the possibility of one colliding with our planet and leading to devastating effects constantly looms large. Asteroids that could come in close proximity or collide with earth are classified as potentially hazardous asteroids, PHA (NASA, n.d.). However, it becomes cumbersome for humans to manually analyze large datasets for identifying all the possibly dangerous asteroids. Thus, machine learning techniques are ideal to study trends and make predictions. Machine learning is a method of data analysis based on computer algorithms that model relationships and improve our ability to analyze asteroid threats. It has been applied to automate the asteroid classification process in the past, for instance by Anish Si in 2018 at the Vellore Institute of Technology in India, where his 15-tree Random Forest model performed the best (Si, 2018). The goal of this study was to train multiple machine learning models on physical and orbital asteroid features and identify the model that most accurately classified the asteroids as hazardous or non-hazardous. The key enhancements were that a different subset of features and significantly different list of models were used for classification. The results showed that a 50-tree Random Forest classification model had a 98.45% accuracy on the test set validating that the Random Forest is the most optimal model for asteroid classification.


Brownlee, J. (2019, August 6). How to improve deep learning performance. Machine Learning Mastery. Retrieved August 3, 2022, from

Brownlee, J. (2022, August 4). How to grid search hyperparameters for deep learning models in python with keras. Machine Learning Mastery. Retrieved August 5, 2022, from

Chavan, P. (2013, January 24). How to decide the number of hidden layers and nodes in a hidden layer? Research Gate. Retrieved July 28, 2022, from

Jain, K. (2021, March 14). How to improve logistic regression? Medium. Retrieved June 30, 2022, from

Koehrsen, W. (2018, January 10). Hyperparameter tuning the random forest in python. Medium. Retrieved July 27, 2022, from

Nadeem, Maryam. (2020, November 26). Hyperparameter tuning using GRIDSEARCHCV and Kerasclassifier. GeeksforGeeks. Retrieved August 5, 2022, from

NASA. (2022, March 15). NASA system predicts impact of small asteroid. NASA. Retrieved July 7, 2022, from

NASA. (n.d.). Neo basics. NASA. Retrieved June 30, 2022, from

NASA. (n.d.). Small-body database query. NASA. Retrieved June 23, 2022, from

Notable asteroid impacts in Earth's history. The Planetary Society. (n.d.). Retrieved July 7, 2022, from

Si, A. (2020, March). Hazardous Asteroid Classification through Various

Machine Learning Techniques. Tamil Nadu; International Research Journal of Engineering and Technology .

Tilli, Dan. (2017, October 13). Hyperparameter grid search with XGBoost. Kaggle. Retrieved July 27, 2022, from



2022-11-01 — Updated on 2022-12-22