Preprint / Version 1

A Proposed Solution

Identifying Sensitive Information as a Safety Measure Against Privacy Vulnerabilities Associated With Optical Character Recognition

##article.authors##

  • Farah Mohammed Maadi STEM School for Girls

DOI:

https://doi.org/10.58445/rars.1846

Keywords:

Optical character recognition, Sensitive Information, computer science

Abstract

Optical character recognition (OCR) is a technology used to generate machine-readable text from images and documents; some OCR applications store extracted text in cloud storage, which has been proven to be not 100% secure for storing sensitive information. Therefore, items including sensitive information should not be processed and have their text extracted and stored to preserve the user’s security, which is not applicable unless sensitive data is identified first. Based on the conducted research about this problem, the previous efforts, and what is currently available, this paper proposes a solution of identifying items including sensitive information, and preventing OCR applications that store extracted text in cloud storage from extracting text out of items including sensitive information. This research also tests the validity of the major part of the proposed solution, which is identifying items including sensitive data in the first place. To test the ability to identify sensitive data, a MobileNet neural network was trained four times to determine whether items include sensitive data. The results of testing MobileNet after the last training session demonstrated the validity of identifying sensitive information at a reliable level of accuracy in a short time, indicating promising results for the proposed solution if applied to a real OCR application in the presence of simple coding, including if-else statements.

References

N. A. M. Isheawy and H. Hasan, “Optical Character Recognition (OCR) System,” Opt. Character Recognit., vol. 17, no. 2, Apr. 2015.

I. Barberá, “AI Possible Risks & Mitigations - Optical Character Recognition,” Opt. Character Recognit., Sep. 2023.

N. Vurukonda and B. T. Rao, “A Study on Data Storage Security Issues in Cloud Computing,” Procedia Comput. Sci., vol. 92, pp. 128–135, 2016, doi: 10.1016/j.procs.2016.07.335.

N. C. Rajasekar and C. O. Imafidon, “Exploitation of Vulnerabilities in Cloud-Storage,” GSTF Int. J. Comput., vol. 1, no. 2, 2011, doi: 10.5176/2010-2283_1.2.41.

O. Awodele, E. E. Onuiri, and S. O. Okolie, “Vulnerabilities in Network Infrastructures and Prevention/Containment Measures”.

H. De Bruijn and M. Janssen, “Building Cybersecurity Awareness: The need for evidence-based framing strategies,” Gov. Inf. Q., vol. 34, no. 1, pp. 1–7, Jan. 2017, doi: 10.1016/j.giq.2017.02.007.

A. D. Dongare, R. R. Kharde, and A. D. Kachare, “Introduction to Artificial Neural Network,” vol. 2, no. 1, 2012.

A. Singh, K. Bacchuwar, and A. Bhasin, “A Survey of OCR Applications,” Int. J. Mach. Learn. Comput., pp. 314–318, 2012, doi: 10.7763/IJMLC.2012.V2.137.

P. Shital and C. R., “Web Browser Security: Different Attacks Detection and Prevention Techniques,” Int. J. Comput. Appl., vol. 170, no. 9, pp. 35–41, Jul. 2017, doi: 10.5120/ijca2017914938.

A. K. Jain, S. R. Sahoo, and J. Kaubiyal, “Online social networks security and privacy: comprehensive review and analysis,” Complex Intell. Syst., vol. 7, no. 5, pp. 2157–2177, Oct. 2021, doi: 10.1007/s40747-021-00409-7.

W. Rawat and Z. Wang, “Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review,” Neural Comput., vol. 29, no. 9, pp. 2352–2449, Sep. 2017, doi: 10.1162/neco_a_00990.

M. Islam, G. Chen, and S. Jin, “An Overview of Neural Network,” Am. J. Neural Netw. Appl., vol. 5, no. 1, p. 7, 2019, doi: 10.11648/j.ajnna.20190501.12.

M. M. Srivastava and P. Kumar, “Machine Learning approaches to do size-based reasoning on Retail Shelf objects to classify product variants,” Oct. 07, 2021, arXiv: arXiv:2110.03783. Accessed: Aug. 15, 2024. [Online]. Available: http://arxiv.org/abs/2110.03783

D. Huang, Q. Bu, Y. Qing, Y. Fu, and H. Cui, “Feature Map Testing for Deep Neural Networks,” Jul. 21, 2023, arXiv: arXiv:2307.11563. Accessed: Aug. 15, 2024. [Online]. Available: http://arxiv.org/abs/2307.11563

A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” 2017, arXiv. doi: 10.48550/ARXIV.1704.04861.

B. Khasoggi, E. Ermatita, and S. Samsuryadi, “Efficient mobilenet architecture as image recognition on mobile and embedded devices,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 1, p. 389, Oct. 2019, doi: 10.11591/ijeecs.v16.i1.pp389-394.

V. Nasteski, “An overview of the supervised machine learning methods,” HORIZONS.B, vol. 4, pp. 51–62, Dec. 2017, doi: 10.20544/HORIZONS.B.04.1.17.P05.

F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” 2016, arXiv. doi: 10.48550/ARXIV.1610.02357.

Y. Guo, Y. Li, L. Wang, and T. Rosing, “Depthwise Convolution Is All You Need for Learning Multiple Visual Domains,” Proc. AAAI Conf. Artif. Intell., vol. 33, no. 01, pp. 8368–8375, Jul. 2019, doi: 10.1609/aaai.v33i01.33018368.

B.-S. Hua, M.-K. Tran, and S.-K. Yeung, “Pointwise Convolutional Neural Networks,” 2017, arXiv. doi: 10.48550/ARXIV.1712.05245.

K. N. Plataniotis, D. Androutsos, and A. N. Venetsanopoulos, “Multichannel filters for image processing,” Signal Process. Image Commun., vol. 9, no. 2, pp. 143–158, Jan. 1997, doi: 10.1016/S0923-5965(96)00021-5.

T. Van Dijk and G. De Croon, “How Do Neural Networks See Depth in Single Images?,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South): IEEE, Oct. 2019, pp. 2183–2191. doi: 10.1109/ICCV.2019.00227.

C. Campbell, “Chapter 7 An Introduction to Kernel Methods”.

A. R. Ajiboye, R. Abdullah-Arshah, H. Qin, and H. Isah-Kebbe, “EVALUATING THE EFFECT OF DATASET SIZE ON PREDICTIVE MODEL USING SUPERVISED LEARNING TECHNIQUE,” Int. J. Comput. Syst. Softw. Eng., vol. 1, no. 1, pp. 75–84, Feb. 2015, doi: 10.15282/ijsecs.1.2015.6.0006.

X. Ying, “An Overview of Overfitting and its Solutions,” J. Phys. Conf. Ser., vol. 1168, p. 022022, Feb. 2019, doi: 10.1088/1742-6596/1168/2/022022.

S. Nurollahian, M. Hooper, A. Salazar, and E. Wiese, “Use of an Anti-Pattern in CS2: Sequential if Statements with Exclusive Conditions,” in Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, Toronto ON Canada: ACM, Mar. 2023, pp. 542–548. doi: 10.1145/3545945.3569744.

Downloads

Posted

2024-10-23