Research project comparing how machine learning models perform when trained across distributed systems versus centralized training. Used PyTorch and TensorFlow.
Traditional machine learning requires centralizing all training data, which creates privacy concerns and bandwidth limitations. Federated learning trains models across distributed devices without sharing raw data.
Built experiments to compare model accuracy, training time, and convergence rates between federated learning and traditional centralized training across different dataset types and sizes.
Tested on image classification, text processing, and numerical data to understand performance across different domains.
Implemented experiments in both PyTorch and TensorFlow to validate results across different ML frameworks.
Measured accuracy, training time, memory usage, and communication overhead for comprehensive analysis.
Federated learning achieved 92-96% of centralized model accuracy while maintaining data privacy. Performance gap varied by dataset complexity.
Network bandwidth became the bottleneck in federated training. Gradient compression techniques helped but didn't eliminate the overhead.
Federated models required 2-3x more training rounds to converge, but individual rounds were faster due to parallel processing.
Learned to design controlled experiments, manage multiple variables, and draw meaningful conclusions from data.
Got hands-on experience with both PyTorch and TensorFlow, understanding their strengths for different use cases.
Understanding the practical challenges of coordinating training across multiple nodes and handling network failures.