Preprint / Version 1

Exoplanet Detection with Decision Trees

##article.authors##

  • Sriram Loganathan Cupertino High School

DOI:

https://doi.org/10.58445/rars.526

Keywords:

exoplanet, decision trees, machine learning algorithms

Abstract

Exoplanets can be detected through the observations of brightness and movement of the stars they orbit. In the past, machine learning algorithms have been able to classify possible candidates using specific techniques such as analyzing large samples of data and automating the otherwise tedious process of classification. In our research, we train a decision tree algorithm on datasets containing confirmed exoplanets, candidates, and false positives from the Kepler Mission in the NASA Exoplanet Archive. From this training, we build a decision tree classification model with a 94.12% accuracy at classifying exoplanets when training on confirmed exoplanets, candidates, and false positives, and a 99.78% accuracy when training only on confirmed exoplanets and false positives. Alternatively, when training a decision tree regression model to predict Kepler Object of Interest KOI) scores, we obtain a loss of 0.04. The decision tree algorithm is a viable option in classifying and detecting exoplanets, as displayed by its effectiveness.

References

Gagnon, Jean, et al. “IAL 18: Exoplanets & General Planetary Systems.” UNLV Physics,

, https://www.physics.unlv.edu/~jeffery/astro/ial/ial_018.html.

Britannica, The Editors of Encyclopaedia. "Arecibo Observatory". Encyclopedia Britannica, 1 Aug. 2023, https://www.britannica.com/topic/Arecibo-Observatory.

Dooling, Dave. “Kepler.” Encyclopædia Britannica, Encyclopædia Britannica, inc., 2009,

www.britannica.com/topic/Kepler-satellite.

Richmond, M. (2001). A connection between radial velocity and distance.

http://spiff.rit.edu/classes/phys240/lectures/expand/expand.html

Dobrijevic, D., & Howell, E. (2022, January 14). Redshift and blueshift: What do they mean?

https://www.space.com/25732-redshift-blueshift.html

Rauf, J. (2021). Looking for Exoplanets.

https://www.uc.edu/content/dam/refresh/cont-ed-62/olli/21-fall/exoplanets4.pdf

Richmond, M. (2014). Important parameters of an eclipsing system. What can we learn from

light curves? http://spiff.rit.edu/classes/phys373/lectures/light_curves/light_curves.html

Stanford Online. (2020, April 17). Lecture 10 - Decision Trees and Ensemble Methods |

Stanford CS229: Machine Learning (Autumn 2018) [Video]. YouTube.

https://www.youtube.com/watch?v=wr9gUr-eWdA

Koech, K. 2020, August 20. Cross-Entropy Loss Function. Towards Data Science.

https://towardsdatascience.com/cross-entropy-loss-function-f38c4ec8643e

Yadav, D. 2019, December 9. Categorical encoding using Label-Encoding and

One-Hot-Encoder. Towards Data Science.

https://towardsdatascience.com/categorical-encoding-using-label-encoding-and-one-hot-encoder-911ef77fb5bd

Mean Squared Error. In: The Concise Encyclopedia of Statistics. Springer, New York,

NY. https://doi.org/10.1007/978-0-387-32833-1_251

Kurama, V. 2020, March 29. Gradient Boosting for Classification.

https://blog.paperspace.com/gradient-boosting-for-classification/

Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

Downloads

Posted

2023-10-01