Federated Learning for Data Security

LLM | Deep Learning | Unstructured Data | Research-based | Python

Machine learning needs large data volumes for training, but data security is a major concern, with over 70% of organizations experiencing breaches. The data security market is projected to exceed $250 billion globally by 2031. Secure data transfer into machine learning models is essential. Federated Learning, introduced by Google in 2017, addresses this by sharing only model parameters, ensuring data safety.

Our client, Prediction Guard, tasked us with implementing Federated Learning for text translation. The project, titled "Federated Learning for Text Translation," required a deep understanding of two concepts: (A) Automated or Machine-enabled Text Translation and (B) Federated Learning. Our approach involved breaking down the large technical issue into smaller problems and solving each with the appropriate resources.

We tested multiple models for translation and found that mBART provided the desired results. To gain a foundational understanding of Federated Learning, we started with a smaller use case: image classification. Using CNNs and the CIFAR dataset, this step was successful. Transitioning from image classification to text classification was straightforward, as the output remained a single value. For text classification, the distilBERT model was utilized.

The ultimate goal was text translation. Initially, mBART was employed, but computational challenges arose. Further research led to a paper on AdapterHub, which allowed the reduction of trainable parameters by 0.5%, overcoming these issues.

The platforms used for this project were Hugging Face, Flower, and AdapterHub.

Pratik.

Federated Learning for Data Security