A-EYE IN THE SKY: Examining Machine Learning-Based Keystroke Reconstruction from Passive Video Capture
DOI:
https://doi.org/10.58445/rars.3457Keywords:
Computer Science, Artificial Intelligence, Machine Learning, Side-Channel Attack, Keystroke Reconstruction, Cybersecurity, Neural Network, Convolutional Neural NetworkAbstract
The rapid growth of artificial intelligence (AI) has allowed for unprecedented progress across disciplines, but has also introduced new cybersecurity threats. Among these are machine-learning (ML) powered side-channel attacks. In this paper, we explore optical side-channel attacks, and the role ML plays in this attack vector. We investigate the feasibility of vision-based keystroke inference as an automated attack vector. Using a publicly available dataset of typing videos from Hugging Face, we frame the problem as a supervised classification task and systematically explore a minimal yet effective Neural Network (NN) pipeline. Our experiments compare five neural network architectures: EfficientNet_B0, ResNet-8, MobileNetV2_100, ConvNeXt_Tiny, along with a Feed-Forward multilayer perceptron (MLP). We also explore ten critical hyperparameters and discuss how variations affect performance. We train 26 models with varying hyperparameters, using Kaggle P100 GPUs. We evaluate models using F1 score, training loss, validation loss, and validation accuracy. Results indicate that a pretrained MobileNetV2-100 architecture, trained on min-max normalized, transformed and generously labeled data without time context, with class balanced training, a hidden layer, dropout regularization, and a time-based learning-rate schedule, achieves the best performance with a peak F1 score of 0.542. Our findings demonstrate that effective keystroke inference can be achieved with relatively modest computational resources and without sophisticated preprocessing or postprocessing. This suggests that such attacks are within reach of moderately skilled adversaries and highlights the need for countermeasures to such attack vectors. We discuss limitations of this work, chiefly dataset frame-label mismatch, and propose directions for future work such as blind keystroke recognition and inference on mobile devices.
References
J. Jumper et al., “Highly accurate protein structure prediction with AlphaFold,” Nature, vol. 596, no. 7873, pp. 583–589, Aug. 2021, doi: 10.1038/s41586-021-03819-2.
A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon, and E. D. Cubuk, “Scaling deep learning for materials discovery,” Nature, vol. 624, no. 7990, pp. 80–85, Dec. 2023, doi: 10.1038/s41586-023-06735-9.
A. Shehper et al., “What makes math problems hard for reinforcement learning: a case study,” Feb. 11, 2025, arXiv: arXiv:2408.15332. doi: 10.48550/arXiv.2408.15332.
M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational Physics, vol. 378, pp. 686–707, Feb. 2019, doi: 10.1016/j.jcp.2018.10.045.
K. Gandhi, J.-P. Fränken, T. Gerstenberg, and N. D. Goodman, “Understanding Social Reasoning in Language Models with Language Models,” Dec. 04, 2023, arXiv: arXiv:2306.15448. doi: 10.48550/arXiv.2306.15448.
“ARC Prize - What is ARC-AGI?,” ARC Prize. Accessed: Sept. 16, 2025. [Online]. Available: https://arcprize.org/arc-agi
D. Rein et al., “GPQA: A Graduate-Level Google-Proof Q&A Benchmark,” Nov. 20, 2023, arXiv: arXiv:2311.12022. doi: 10.48550/arXiv.2311.12022.
D. Craigen, N. Diakun-Thibault, and R. Purse, “Defining Cybersecurity,” Technology Innovation Management Review, vol. 4, no. 10, pp. 13–21, 2014.
Q. Chen and R. A. Bridges, “Automated Behavioral Analysis of Malware: A Case Study of WannaCry Ransomware,” in 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Dec. 2017, pp. 454–460. doi: 10.1109/ICMLA.2017.0-119.
M. Pollard, “A Case Study of Russian Cyber-Attacks on the Ukrainian Power Grid: Implications and Best Practices for the United States”.
B. Guembe, A. Azeta, S. Misra, V. C. Osamor, L. Fernandez-Sanz, and V. Pospelova, “The Emerging Threat of Ai-driven Cyber Attacks: A Review,” Applied Artificial Intelligence, vol. 36, no. 1, p. 2037254, Dec. 2022, doi: 10.1080/08839514.2022.2037254.
S. Armstrong, K. Sotala, and S. S. Ó hÉigeartaigh, “The errors, insights and lessons of famous AI predictions – and what they mean for the future,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 26, no. 3, pp. 317–342, July 2014, doi: 10.1080/0952813X.2014.895105.
S. Armstrong and K. Sotala, “How We’re Predicting AI – or Failing to,” in Beyond Artificial Intelligence, vol. 9, J. Romportl, E. Zackova, and J. Kelemen, Eds., in Topics in Intelligent Engineering and Informatics, vol. 9. , Cham: Springer International Publishing, 2015, pp. 11–29. doi: 10.1007/978-3-319-09668-1_2.
R. West and R. Aydin, “The AI Alignment Paradox,” Nov. 22, 2024, arXiv: arXiv:2405.20806. doi: 10.48550/arXiv.2405.20806.
G. Waizel, “Bridging the AI divide: The evolving arms race between AI- driven cyber attacks and AI-powered cybersecurity defenses,” International Conference on Machine Intelligence & Security for Smart Cities (TRUST) Proceedings, vol. 1, pp. 141–156, July 2024.
A. K. Jain, J. Mao, and K. M. Mohiuddin, “Artificial neural networks: a tutorial,” Computer, vol. 29, no. 3, pp. 31–44, Mar. 1996, doi: 10.1109/2.485891.
R. Szeliski, Computer Vision: Algorithms and Applications. Springer Nature, 2022.
F.-X. Standaert, “Introduction to Side-Channel Attacks,” in Secure Integrated Circuits and Systems, I. M. R. Verbauwhede, Ed., in Integrated Circuits and Systems. , Boston, MA: Springer US, 2010, pp. 27–42. doi: 10.1007/978-0-387-71829-3_2.
L. Zhuang, F. Zhou, and J. D. Tygar, “Keyboard acoustic emanations revisited,” ACM Trans. Inf. Syst. Secur., vol. 13, no. 1, p. 3:1-3:26, Nov. 2009, doi: 10.1145/1609956.1609959.
M. Li et al., “When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, in CCS ’16. New York, NY, USA: Association for Computing Machinery, Oct. 2016, pp. 1068–1079. doi: 10.1145/2976749.2978397.
S. Saab, A. Leiserson, and M. Tunstall, “Key Extraction from the Primary Side of a Switched-Mode Power Supply,” 2015, 2015/512. Accessed: Sept. 16, 2025. [Online]. Available: https://eprint.iacr.org/2015/512
A. Amrouche, L. Boubchir, and S. Yahiaoui, “Side Channel Attack using Machine Learning,” in 2022 Ninth International Conference on Software Defined Systems (SDS), Dec. 2022, pp. 1–5. doi: 10.1109/SDS57574.2022.10062906.
O. I. Abiodun et al., “Comprehensive Review of Artificial Neural Network Applications to Pattern Recognition,” IEEE Access, vol. 7, pp. 158820–158846, 2019, doi: 10.1109/ACCESS.2019.2945545.
O. Eluyode, “Scholars Research Library Comparative study of biological and artificial neural networks”, Accessed: Sept. 16, 2025. [Online]. Available: https://www.academia.edu/7938549/Scholars_Research_Library_Comparative_study_of_biological_and_artificial_neural_networks
“Gradient-based learning applied to document recognition | IEEE Journals & Magazine | IEEE Xplore.” Accessed: Sept. 16, 2025. [Online]. Available: https://ieeexplore.ieee.org/document/726791
“d41586-025-01965-5,” Nature, no. 634, pp. 839, 840, June 2025.
J. Lim, J.-M. Frahm, and F. Monrose, “Leveraging Disentangled Representations to Improve Vision-Based Keystroke Inference Attacks Under Low Data,” Apr. 05, 2022, arXiv: arXiv:2204.02494. doi: 10.48550/arXiv.2204.02494.
D. Balzarotti, M. Cova, and G. Vigna, “ClearShot: Eavesdropping on Keyboard Input from Video,” in 2008 IEEE Symposium on Security and Privacy (sp 2008), Oakland, CA, USA: IEEE, May 2008, pp. 170–183. doi: 10.1109/SP.2008.28.
Z. Yang, Y. Chen, Z. Sarwar, and H. Schwartz, “Towards a General Video-based Keystroke Inference Attack”.
S. E. Whang and J.-G. Lee, “Data collection and quality challenges for deep learning,” Proc. VLDB Endow., vol. 13, no. 12, pp. 3429–3432, Aug. 2020, doi: 10.14778/3415478.3415562.
S. E. Whang, Y. Roh, H. Song, and J.-G. Lee, “Data collection and quality challenges in deep learning: a data-centric AI perspective,” The VLDB Journal, vol. 32, no. 4, pp. 791–813, July 2023, doi: 10.1007/s00778-022-00775-9.
“Recognizing 50 human action categories of web videos | Request PDF,” ResearchGate, Aug. 2025, doi: 10.1007/s00138-012-0450-4.
H. Hajimolahoseini, W. Ahmed, A. Wen, and Y. Liu, “Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?,” July 23, 2024, arXiv: arXiv:2407.16514. doi: 10.48550/arXiv.2407.16514.
E. Kloberdanz, K. G. Kloberdanz, and W. Le, “DeepStability: a study of unstable numerical methods and their solutions in deep learning,” in Proceedings of the 44th International Conference on Software Engineering, Pittsburgh Pennsylvania: ACM, May 2022, pp. 586–597. doi: 10.1145/3510003.3510095.
M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Sept. 11, 2020, arXiv: arXiv:1905.11946. doi: 10.48550/arXiv.1905.11946.
H. Ali, N. Shifa, R. Benlamri, A. A. Farooque, and R. Yaqub, “A fine tuned EfficientNet-B0 convolutional neural network for accurate and efficient classification of apple leaf diseases,” Sci Rep, vol. 15, no. 1, p. 25732, July 2025, doi: 10.1038/s41598-025-04479-2.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Dec. 10, 2015, arXiv: arXiv:1512.03385. doi: 10.48550/arXiv.1512.03385.
T.-B. Xu, P. Yang, X.-Y. Zhang, and C.-L. Liu, “LightweightNet: Toward fast and lightweight convolutional neural networks via architecture distillation,” Pattern Recognition, vol. 88, pp. 272–284, Apr. 2019, doi: 10.1016/j.patcog.2018.10.029.
“resnet18 — Torchvision main documentation.” Accessed: Sept. 16, 2025. [Online]. Available: https://docs.pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html?utm_source=chatgpt.com
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Mar. 21, 2019, arXiv: arXiv:1801.04381. doi: 10.48550/arXiv.1801.04381.
C. Luo, X. He, J. Zhan, L. Wang, W. Gao, and J. Dai, “Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices,” May 07, 2020, arXiv: arXiv:2005.05085. doi: 10.48550/arXiv.2005.05085.
Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” Mar. 02, 2022, arXiv: arXiv:2201.03545. doi: 10.48550/arXiv.2201.03545.
F. Wang et al., “E-ConvNeXt: A Lightweight and Efficient ConvNeXt Variant with Cross-Stage Partial Connections,” Aug. 28, 2025, arXiv: arXiv:2508.20955. doi: 10.48550/arXiv.2508.20955.
F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain.,” Psychological Review, vol. 65, no. 6, pp. 386–408, 1958, doi: 10.1037/h0042519.
“ImageNet.” Accessed: Sept. 16, 2025. [Online]. Available: https://www.image-net.org/download.php
D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” Jan. 30, 2017, arXiv: arXiv:1412.6980. doi: 10.48550/arXiv.1412.6980.
F. Binbeshr, M. L. Mat Kiah, L. Y. Por, and A. A. Zaidan, “A systematic review of PIN-entry methods resistant to shoulder-surfing attacks,” Computers & Security, vol. 101, p. 102116, Feb. 2021, doi: 10.1016/j.cose.2020.102116.
J. I. D, R. V, T. K. P, A. Iyer, and N. M. S, “Resisting Visual Hacking: A Novel Graphical Password Authentication System,” in 2023 3rd International Conference on Pervasive Computing and Social Networking (ICPCSN), June 2023, pp. 910–915. doi: 10.1109/ICPCSN58827.2023.00155.
H.-M. Sun, S.-T. Chen, J.-H. Yeh, and C.-Y. Cheng, “A Shoulder Surfing Resistant Graphical Authentication System,” IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 2, pp. 180–193, Mar. 2018, doi: 10.1109/TDSC.2016.2539942.
M. Kumar, T. Garfinkel, D. Boneh, and T. Winograd, “Reducing shoulder-surfing by using gaze-based password entry,” in Proceedings of the 3rd symposium on Usable privacy and security, Pittsburgh Pennsylvania USA: ACM, July 2007, pp. 13–19. doi: 10.1145/1280680.1280683.
Q. Yue, Z. Ling, X. Fu, B. Liu, K. Ren, and W. Zhao, “Blind Recognition of Touched Keys on Mobile Devices,” in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale Arizona USA: ACM, Nov. 2014, pp. 1403–1414. doi: 10.1145/2660267.2660288.
Downloads
Posted
Categories
License
Copyright (c) 2025 Nathan Jin Peng Yong

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.