[Project Name] - Dylon Shattuck

Project Overview

Background

Developed a synchronous federated learning system to explore the potential of distributed machine learning networks. This research project, completed in collaboration with Yatin Kare, investigated how federated learning models compare to conventional machine learning approaches. We implemented a system that could distribute machine learning tasks across multiple nodes, analyzing the relationship between network size, model accuracy, and problem complexity. Our findings demonstrated the scalability advantages of federated learning, particularly when applied to complex problems.

Key Features

Multi-Node Training Architecture
Comparative Performance Analysis
Dataset Adaptability

Technical Implementation

Architecture

The system was built using a flexible design methodology that adapted as we gathered more data and requirements. The core architecture utilized PyTorch's distributed computing framework, chosen over TensorFlow due to its superior documentation and distributed capabilities. The system was deployed across multiple computing nodes (laptops and desktops) using containerized application development for consistent deployment environments.

Challenges & Solutions

Framework Selection

Problem: Initially faced difficulties with TensorFlow's incomplete distributed learning documentation

Solution: Migrated to PyTorch which provided more comprehensive documentation and better support for distributed learning implementations

Performance Scaling

Problem: Needed to understand how model effectiveness scaled with problem difficulty and node count

Solution: Implemented a testing framework using Jupyter notebooks with pandas and matplotlib to analyze and visualize the relationship between nodes, accuracy, and problem complexity. Results showed non-linear scaling with diminishing returns as nodes increased.

Key Learnings

-Federated learning shows increasing advantages over traditional models as problem complexity increases

-The relationship between number of nodes and model accuracy follows a positive but diminishing returns curve

-Dataset choice significantly impacts initial model accuracy, with simpler datasets like MNIST showing better performance than complex ones like Fashion-MNIST

Capstone Research: Distributed Machine Learning