Glossary term
Glossary term
Infrastructure and Serving
A process that runs on a host machine and executes machine learning programs on TPU devices.
Created for this library
An ML platform team monitors TPU worker health during distributed training to react quickly to hardware issues.
A research engineer profiles per-TPU-worker activity to spot stragglers in synchronous training.
An ML platform team uses TPU worker counts that match the partitioning strategy for the training job.
Definition source: Google for Developers Machine Learning Glossary | Creative Commons Attribution 4.0 License