ABSTRACT VIEW
TRANSFORMING DATA INSIGHTS: EDUCATIONAL EXPERIENCES IN ADVANCED BIG DATA PROCESSING
L. Antoni, J. Jirásek, R. Krivoš-Belluš, V. Pristaš, Š. Horvát
Pavol Jozef Safarik University in Kosice, Faculty of Science, Institute of Computer Science (SLOVAKIA)
Big data processing is essential because it allows organizations to extract value from large, complex data sets that are otherwise hard to analyze. By employing advanced analytics, machine learning, and real-time processing, organizations can identify patterns, trends, and correlations that inform strategic decisions, improve operational efficiencies, and drive innovation. Big data can be defined as data too large to be efficiently processed on a single computer or as massive amounts of diverse, unstructured data produced by high-performance applications. Two basic approaches to distributing data processing operations on many machines are to divide the data into chunks, apply the same algorithm on all chunks (data parallelism), divide the problem into chunks, and run it on a cluster of machines (task-parallelism).

In this paper, we share our practical experience teaching the subject 'Technologies of Big Data Processing' at Pavol Jozef Safarik University in Košice from 2020 – 2024 in groups of 10 students of master study programs in Computer Science or Data Science and Artificial Intelligence. We delve into the practical aspects of modern big data processing and storage systems, providing valuable insights and knowledge. We present examples of the projects in this course to build the classification models of artificial intelligence (decision trees, neural networks) to analyze the large text corpus or tabular data. We finish with distributed neural network training in Apache Spark to manage and process large-scale data efficiently. In investigating the '5 V's' of big data (Velocity, Volume, Variety, Veracity, and Value), we present the general characteristics that define the challenges and opportunities inherent in managing and extracting hidden patterns from heterogeneous datasets.

Keywords: Big data, education, artificial intelligence, neural network, university course.