ABSTRACT VIEW
LOCAL LLMS: SAFEGUARDING DATA PRIVACY IN THE AGE OF GENERATIVE AI. A CASE STUDY AT THE UNIVERSITY OF ANDORRA
A. Dorca Josa, M. Bleda-Bejar
Universitat d'Andorra (ANDORRA)
The growing field of Generative Artificial Intelligence (GAI) presents an unprecedented opportunity for innovation across diverse sectors. The inherent nature of these models, trained on vast amounts of data often sourced from the public domain, raises critical concerns regarding data privacy and security. At the same time, the reliance on centralized servers hosted by large technology companies that have access and utilize these powerful AI tools introduces a significant vulnerability. This paper argues about the importance of deploying local Large Language Models (LLMs) on on-premise servers to mitigate the risks associated with data leakage to GAI providers.

Also, it has to be taken into account that the very act of transmitting data to remote servers inherently introduces security vulnerabilities. Data breaches, cyberattacks, and unauthorized access can compromise the confidentiality of user and company information, potentially leading to identity theft, financial fraud, or other detrimental consequences.

Deploying local LLMs on on-premise servers presents a compelling solution to data security and accessibility challenges. The University of Andorra (UdA) has implemented a local LLM server using open technologies such as Ollama, Open WebUI, and Automatic1111. This infrastructure enables the hosting of most small to medium-sized open-source LLM models for the teaching and administrative staff community. By retaining both the AI model and the processed data within a controlled environment, organizations can establish and guarantee a strong data security framework.

At the same time, local LLMs empower organizations to exercise greater control over their data and AI capabilities. They can fine tune the model to their specific needs, ensuring alignment with their business objectives and ethical considerations.

Implementation of local LLMs, however, is not without its challenges. The initial investment in hardware infrastructure can be substantial, particularly for smaller organizations. Additionally, maintaining and updating local AI models requires technical expertise and ongoing resources.

This paper focuses on discussing the pros and cons of this kind of setup as well as the bare minimums needed to host an LLM on premises both to ensure the capabilities previously mentioned and, at the same time, ensure a successful user experience. Features like talking with documents or websites using Retrieval Augmented Generation (RAG), user data integrity and privacy, or image generation should be within the initial requirements.

Finally, a qualitative survey has been conducted among the teaching and administrative staff at the UdA to gather insights and use cases for improving the setup in the future. The results revealed that data privacy and open access are the most valued features, while the quality of responses compared to other private and closed online models is perceived as the least favorable aspect. Despite this, there is promising potential for local implementations of LLMs, with plans to extend the service to students in the near future to bridge any accessibility gap they may currently face.

Keywords: LLM, open, privacy, security, user experience.