10.1145/3582016.3582051acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections

The Sparse Abstract Machine

Published:25 March 2023Publication History

ABSTRACT

We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming dataflow abstraction with sparse primitives that encompass a large space of scheduled tensor algebra expressions. SAM dataflow graphs naturally separate tensor formats from algorithms and are expressive enough to incorporate arbitrary iteration orderings and many hardware-specific optimizations. We also present Custard, a compiler from a high-level language to SAM that demonstrates SAM's usefulness as an intermediate representation. We automatically bind from SAM to a streaming dataflow simulator. We evaluate the generality and extensibility of SAM, explore the performance space of sparse tensor algebra optimizations using SAM, and show SAM's ability to represent dataflow hardware.

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Peter Ahrens, Fredrik Kjolstad, and Saman Amarasinghe. 2022. Autoscheduling for Sparse Tensor Algebra with an Asymptotic Cost Model. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI) (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 269–285. isbn:9781450392655 https://doi.org/10.1145/3519939.3523442 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brett W. Bader and Tamara G. Kolda. 2007. Efficient MATLAB Computations with Sparse and Factored Tensors. SIAM Journal on Scientific Computing, 30, 1 (2007), December, 205–231. https://doi.org/10.1137/060676489 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Vivek Bharadwaj, Aydin Buluç, and James Demmel. 2022. Distributed-Memory Sparse Kernels for Machine Learning. https://doi.org/10.48550/ARXIV.2203.07673 Google ScholarGoogle Scholar
  5. Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler support for sparse tensor computations in MLIR. ACM Transactions on Architecture and Code Optimization (TACO), 19, 4 (2022), 1–25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Canning, G. Galli, F. Mauri, A. De Vita, and R. Car. 1996. O(N) tight-binding molecular dynamics on massively parallel computers: an orbital decomposition approach. Computer Physics Communications, 94, 2 (1996), April, 89–102. https://doi.org/10.1016/0010-4655(96)00009-4 Google ScholarGoogle ScholarCross RefCross Ref
  7. Alex Carsello, Kathleen Feng, Taeyoung Kong, Kalhan Koul, Qiaoyi Liu, Jackson Melchert, Gedeon Nyengele, Maxwell Strange, Keyi Zhang, Ankita Nayak, Jeff Setter, James Thomas, Kavya Sreedhar, Po-Han Chen, Nikhil Bhagdikar, Zachary Myers, Brandon D’Agostino, Pranil Joshi, Stephen Richardson, Rick Bahr, Christopher Torng, Mark Horowitz, and Priyanka Raina. 2022. Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra. IEEE Symposium on VLSI Technology & Circuits. Google ScholarGoogle ScholarCross RefCross Ref
  8. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI’18). USENIX Association, USA. 579–594. isbn:9781931971478 Google ScholarGoogle Scholar
  9. Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits, 52, 1 (2017), 127–138. https://doi.org/10.1109/JSSC.2016.2616357 Google ScholarGoogle ScholarCross RefCross Ref
  10. Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9, 2 (2019), 292–308. https://doi.org/10.1109/JETCAS.2019.2910232 Google ScholarGoogle ScholarCross RefCross Ref
  11. Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, October, 30 pages. issn:2475-1421 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Vidushi Dadu, Sihao Liu, and Tony Nowatzki. 2021. PolyGraph: Exposing the value of flexibility for graph processing accelerators. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 595–608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Vidushi Dadu, Jian Weng, Sihao Liu, and Tony Nowatzki. 2019. Towards general purpose acceleration by exploiting common data-dependence forms. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 924–939. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS), 38, 1 (2011), 1–25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C. North, and Gordon Woodhull. 2002. Graphviz— Open Source Graph Drawing Tools. In Graph Drawing, Petra Mutzel, Michael Jünger, and Sebastian Leipert (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 483–484. isbn:978-3-540-45848-7 Google ScholarGoogle Scholar
  16. Richard Feynman, Robert B. Leighton, and Matthew L. Sands. 1963. The Feynman Lectures on Physics. Vol. 3. Addison-Wesley. Google ScholarGoogle Scholar
  17. Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse GPU Kernels for Deep Learning. IEEE Press, 1–14. isbn:9781728199986 Google ScholarGoogle Scholar
  18. Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw., 4, 3 (1978). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 44, 3 (2016), 243–254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Charles R Harris, K Jarrod Millman, Stéfan J Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Bret, Allan Haldane, Jaime Fernández del Rio, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature, 585, 7825 (2020), 357–362. Google ScholarGoogle Scholar
  21. Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Yuhan Chen, Ronald Dreslinski, and Trevor Mudge. 2020. Sparse-TPU: Adapting Systolic Arrays for Sparse Matrices. Association for Computing Machinery, New York, NY, USA. 1–12. isbn:9781450379830 https://doi.org/10.1145/3392717.3392751 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. ExTensor: An accelerator for sparse tensor algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319–333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rawn Henry, Olivia Hsu, Rohan Yadav, Stephen Chou, Kunle Olukotun, Saman Amarasinghe, and Fredrik Kjolstad. 2021. Compilation of Sparse Array Programming Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 128, October, 29 pages. https://doi.org/10.1145/3485505 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kenneth E Iverson. 1962. A programming language. In Proceedings of the May 1-3, 1962, spring joint computer conference. 345–351. Google ScholarGoogle Scholar
  25. 2011. Graph Algorithms in the Language of Linear Algebra, Jeremy Kepner and John R. Gilbert (Eds.) (Software, environments, tools, Vol. 22). SIAM. isbn:978-0-89871-990-1 http://dblp.uni-trier.de/db/books/collections/KG2011.html Google ScholarGoogle Scholar
  26. Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 180–192. https://doi.org/10.1109/CGO.2019.8661185 Google ScholarGoogle ScholarCross RefCross Ref
  27. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), 1–29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fredrik Berg Kjølstad. 2020. Sparse tensor algebra compilation. Ph. D. Dissertation. Massachusetts Institute of Technology. Google ScholarGoogle Scholar
  29. David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2018. Spatial: A Language and Compiler for Application Accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 296–311. isbn:9781450356985 https://doi.org/10.1145/3192366.3192379 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tamara G. Kolda and Jimeng Sun. 2008. Scalable Tensor Decompositions for Multi-aspect Data Mining. In 2008 Eighth IEEE International Conference on Data Mining. 363–372. https://doi.org/10.1109/ICDM.2008.89 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Scott Kovach and Fredrik Kjolstad. 2022. Correct Compilation of Semiring Contractions. https://doi.org/10.48550/ARXIV.2207.13291 Google ScholarGoogle Scholar
  32. Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In CGO. San Jose, CA, USA. 75–88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Qiaoyi Liu, Dillon Huff, Jeff Setter, Maxwell Strange, Kathleen Feng, Kavya Sreedhar, Ziheng Wang, Keyi Zhang, Mark Horowitz, Priyanka Raina, and Fredrik Kjolstad. 2021. Compiling Halide Programs to Push-Memory Accelerators. CoRR, abs/2105.12858 (2021), arXiv:2105.12858. arxiv:2105.12858 Google ScholarGoogle Scholar
  34. Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2019. PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’52). Association for Computing Machinery, New York, NY, USA. 1009–1022. isbn:9781450369381 https://doi.org/10.1145/3352460.3358254 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Erdal Mutlu, Ruiqin Tian, Bin Ren, Sriram Krishnamoorthy, Roberto Gioiosa, Jacques Pienaar, and Gokcen Kestor. 2020. COMET: A Domain-Specific Compilation of High-Performance Computational Chemistry. In Languages and Compilers for Parallel Computing: 33rd International Workshop, LCPC 2020, Virtual Event, October 14-16, 2020, Revised Selected Papers. Springer-Verlag, Berlin, Heidelberg. 87–103. isbn:978-3-030-95952-4 https://doi.org/10.1007/978-3-030-95953-1_7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Quan M. Nguyen and Daniel Sanchez. 2021. Fifer: Practical Acceleration of Irregular Applications on Reconfigurable Architectures. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21). Association for Computing Machinery, New York, NY, USA. 1064–1077. isbn:9781450385572 https://doi.org/10.1145/3466752.3480048 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, and Karthikeyan Sankaralingam. 2017. Stream-dataflow acceleration. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 416–429. https://doi.org/10.1145/3079856.3080255 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. OuterSPACE: An outer product based sparse matrix multiplication accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 724–736. Google ScholarGoogle ScholarCross RefCross Ref
  39. Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy Allmon, Rachid Rayess, Stephen Maresh, and Joel Emer. 2013. Triggered Instructions: A Control Paradigm for Spatially-Programmed Architectures. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA ’13). Association for Computing Machinery, New York, NY, USA. 142–153. isbn:9781450320795 https://doi.org/10.1145/2485922.2485935 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 27–40. https://doi.org/10.1145/3079856.3080254 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Raghu Prabhakar, David Koeplinger, Kevin J. Brown, HyoukJoong Lee, Christopher De Sa, Christos Kozyrakis, and Kunle Olukotun. 2016. Generating Configurable Hardware from Parallel Patterns. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’16). Association for Computing Machinery, New York, NY, USA. 651–665. isbn:9781450340915 https://doi.org/10.1145/2872362.2872415 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A Reconfigurable Architecture For Parallel Paterns. SIGARCH Comput. Archit. News, 45, 2 (2017), June, 389–402. issn:0163-5964 https://doi.org/10.1145/3140659.3080256 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jing Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, and Mark Horowitz. 2017. Programming Heterogeneous Systems from an Image Processing DSL. ACM Trans. Archit. Code Optim., 14, 3 (2017), Article 26, aug, 25 pages. issn:1544-3566 https://doi.org/10.1145/3107953 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 58–70. https://doi.org/10.1109/HPCA47549.2020.00015 Google ScholarGoogle ScholarCross RefCross Ref
  45. Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. ACM Trans. Graph., 31, 4 (2012), Article 32, July, 12 pages. issn:0730-0301 https://doi.org/10.1145/2185520.2185528 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M.M.G. Ricci and T. Levi-Civita. 1901. Méthodes de calcul différentiel absolu et leurs applications. Math. Ann., 54 (1901), 125–201. http://eudml.org/doc/157997 Google ScholarGoogle ScholarCross RefCross Ref
  47. Alexander Rucker, Matthew Vilim, Tian Zhao, Yaqi Zhang, Raghu Prabhakar, and Kunle Olukotun. 2021. Capstan: A Vector RDA for Sparsity. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21). Association for Computing Machinery, New York, NY, USA. 1022–1035. isbn:9781450385572 https://doi.org/10.1145/3466752.3480047 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad. 2020. A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 158, Nov., 30 pages. https://doi.org/10.1145/3428226 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http://frostt.io/ Google ScholarGoogle Scholar
  50. Edgar Solomonik and Torsten Hoefler. 2015. Sparse Tensor Algebra as a Parallel Programming Model. CoRR, abs/1512.00066 (2015), arXiv:1512.00066. arxiv:1512.00066 Google ScholarGoogle Scholar
  51. Edgar Solomonik, Devin Matthews, Jeff R. Hammond, John F. Stanton, and James Demmel. 2014. A massively parallel tensor contraction framework for coupled-cluster computations. J. Parallel and Distrib. Comput., 74, 12 (2014), 3176–3190. issn:0743-7315 https://doi.org/10.1016/j.jpdc.2014.06.002 Domain-Specific Languages and High-Level Frameworks for High-Performance Computing Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, and Zhiru Zhang. 2020. MatRaptor: A sparse-sparse matrix multiplication accelerator based on row-wise product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 766–780. Google ScholarGoogle ScholarCross RefCross Ref
  53. Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. 2020. Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 689–702. https://doi.org/10.1109/HPCA47549.2020.00062 Google ScholarGoogle ScholarCross RefCross Ref
  54. Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2020. Efficient Processing of Deep Neural Networks. Morgan & Claypool Publishers. Google ScholarGoogle Scholar
  55. Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. SIGPLAN Not., 50, 6 (2015), June, 521–532. issn:0362-1340 https://doi.org/10.1145/2813885.2738003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Matthew Vilim, Alexander Rucker, and Kunle Olukotun. 2021. Aurochs: An Architecture for Dataflow Threads. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 402–415. https://doi.org/10.1109/ISCA52012.2021.00039 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Matthew Vilim, Alexander Rucker, Yaqi Zhang, Sophia Liu, and Kunle Olukotun. 2020. Gorgon: Accelerating Machine Learning from Relational Data. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 309–321. https://doi.org/10.1109/ISCA45697.2020.00035 Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P Gummadi. 2009. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM workshop on Online social networks. 37–42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Jian Weng, Sihao Liu, Dylan Kupsh, and Tony Nowatzki. 2022. Unifying spatial accelerator compilation with idiomatic and modular transformations. IEEE Micro, 42, 5 (2022), 59–69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, and Joel S. Emer. 2022. Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling. https://doi.org/10.48550/ARXIV.2205.05826 Google ScholarGoogle Scholar
  61. Rohan Yadav, Alex Aiken, and Fredrik Kjolstad. 2022. DISTAL: The Distributed Tensor Algebra Compiler. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 286–300. isbn:9781450392655 https://doi.org/10.1145/3519939.3523437 Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Rohan Yadav, Alex Aiken, and Fredrik Kjolstad. 2022. SpDISTAL: Compiling Distributed Sparse Tensor Computations. arXiv preprint arXiv:2207.13901. Google ScholarGoogle Scholar
  63. Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. 2022. SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning. https://doi.org/10.48550/ARXIV.2207.04606 Google ScholarGoogle Scholar
  64. Guowei Zhang, Nithya Attaluri, Joel S Emer, and Daniel Sanchez. 2021. GAMMA: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 687–701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yaqi Zhang, Nathan Zhang, Tian Zhao, Matt Vilim, Muhammad Shahbaz, and Kunle Olukotun. 2021. SARA: Scaling a Reconfigurable Dataflow Accelerator. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 1041–1054. https://doi.org/10.1109/ISCA52012.2021.00085 Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Zhekai Zhang, Hanrui Wang, Song Han, and William J Dally. 2020. SpArch: Efficient architecture for sparse matrix multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 261–274. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The Sparse Abstract Machine

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Article Metrics

        • Downloads (Last 12 months)103
        • Downloads (Last 6 weeks)103

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!