Preprint / Version 1

Analyzing the Distribution of Energy Sources in the United States


  • Suvrath Arvind Polygence
  • Clayton Greenberg



Statistics, Data Modeling, Energy


The United States is one of the largest consumers of energy in the world, but this energy comes from a wide variety of sources. In addition, this energy consumption varies from state-to-state and from sector-to-sector, meaning that no one model would tell us the full story of the energy distribution in the United States. The goal of this project was to analyze this data, using various techniques to develop our understanding of the nature of the data we were provided with. To effectively analyze the data, we created three groups of data: individual energy sources, energy by state, and general aspects of the energy distribution. Our analysis showed us that energy sources, like coal, appeared to be decreasing in consumption, that states could be grouped in clusters in order to predict production of coal from consumption (coal was the main energy source we analyzed), and that other aspects of the energy distribution (consumption and expenditure, for example) were almost perfectly correlated.


Alternative Fuels Data Center: Key Federal Legislation. Alternative Fuels Data Center. (n.d). Retrieved May 13, 2023, from

Clean Energy and Pollution Reduction Act - SB 350. California Energy Commission. (n.d). Retrieved May 13, 2023, from

Education Ecosystem (2018, September 12). Understanding K-means Clustering in Machine Learning. Towards Data Science. Retrieved May 13, 2023, from

Bart, A.C., Choi, J.M., & Guan, B. (2021, October 7). Energy Python Library. CORGIS Dataset Project. Retrieved May 13, 2023, from