A comprehensive framework for efficient, scalable, and performance-portable tensor applications
Tensor-centric computations are the compute-intensive core of large-scale parallel applications in scientific computing and machine learning. The quest for sustained performance increases for such computations will depend on enhanced hardware efficiency via customization and reduced data movement. This requires novel advances in algorithm-architecture co-design methodology. Further, transition to customized hardware presents crucial challenges for application developer productivity and performance-portability.
This research effort is funded by NSF PPOSS-PP of Scalable Systems award number #22-507.