preloader

I’m Juechu “Joy” Dong

Confidential Computing GPU Programming Model Computer Architect & Systems PhD candidate @Umich

HELLO

Hi there, I’m Joy โ€” glad you stopped by! ๐Ÿ˜Š My full Chinese name is ่‘ฃ็ๅˆ Juechu (pronounced roughly ge-รผ-e, chew), but Joy works just fine in everyday life.

Bio:
Joy Dong is a PhD Candidate at the University of Michigan specializing in computer architecture and GPU programmability, where she is advised by Prof. Satish Narayanasamy. Her research focuses on building the software stack for the next generation of AI, specifically optimizing GPU kernels to make large-scale machine learning more efficient and accessible. She previously worked on FlexAttention and Helion at Meta PyTorch and CuTe DSL ecosystem at Nvidia.

CV      
about-me

News

  • [Oct 2025]

    Our work on Helion during my second internship with the PyTorch team on is officially launched; see the PyTorch blog.

  • [Mar 2025]

    Excited to be selected to recieve the MLCommons ML and Systems Rising Star Award!!

  • [Mar 2025]

    FlexAttention is accepted to MLsys ‘25. See you in the Santa Clara this summer~

Archived news ...
  • [Oct 2024]

    Our work mm2-gb is accepted to ACM BCB ‘24, the flagship conference of the ACM SIGBio. Join us in Shenzhen, China to see how we accelerate minimap2 using GPU!

  • [Aug 2024]

    Our work FlexAttention is lauched. See our PyTorch Blog and 180k view X post. Stay tuned to FlexAttention Part II - decoding and paged attention.

  • [Jun 2024]

    Our work Toleo is accepted to ASPLOS ‘24. Its presentation is delayed to ASPLOS ‘25. See you in Rotterdam~

  • [May 2024]

    I joined Meta Pytorch Compiler team this summer as a research scienctist intern. See you at Menlo Park~

  • [Mar 2024]

    Our work mm2-gb for long sequence DNA mapping is accepted by BioSys'24. Checkout our open sourced demo. Many thanks to AMD HPC team! see you in San Diego~

  • [Jan 2024]

    I passed PhD qualification test and becomes a PhD candidate.

  • [Dec 2023]

    I recieved Rackham International Student Fellowship for 2023-2024 acdemic year .

Selected Honors

  • 2026

    Rackham Predoctoral Fellowship

    The fellowship supports outstanding doctoral students who have achieved candidacy and are actively working on dissertation research and writing.

  • 2025

    MLCommons ML and Systems Rising Stars

    These promising researchers, drawn from over 170 applicants, have demonstrated excellence in Machine Learning (ML) and Systems research and stand out for their current and future contributions and potential. news

Archived honors ...
  • 2025

    Rackham Doctorate Internship Fellowship

  • 2024

    Rackham International Student Fellowship

    The award recognizes her academic excellence and will support her ongoing research in CSE. news

Experience

Education

  • 2022 Sept - exp. 2027

    University of Michigan

    Ph.D in Computer Science and Engineering

    Computer Architecture & Systems

  • 2022 Apr

    University of Michigen

    B.S.E. in Computer Engineering
  • 2022 Aug

    Shanghai Jiaotong Univeristy

    B.S.E. in Electrical & Computer Engineering

Industry

  • 2025 - 2026

    NVIDIA

    Deep Learning Compute Architect Intern

    CuTe DSL ecosystem

  • 2024, 2025

    Meta

    Research Scientist Intern | PyTorch Team

    FlexAttention & Helion Distributed

  • 2022 May - 2022 Aug

    NVIDIA

    Deep Learning Compute Architect Intern | GPU Architecture

Publications

  • MLSys ‘25

    FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

    Joy Dong *, Boyuan Feng *, Driss Guessous *, Yanbo Liang *, Horace He

    *authors contributed equally to this work.
    [poster] [arxiv] [blog] [github] [citeme]

    FlexAttention is a novel compiler-driven programming model for flexible and efficient attention variants implementation.
    ๐ŸŒŸFlexible: Allow users to implements majority of attention variants in a few lines of idomatic PyTorch code.
    ๐ŸŒŸFast & Efficient: Achive comparable performance to expert tuned kernels via JIT torch.compile.
    ๐ŸŒŸBlock Sparsity: Leverages block sparsity to further improve performance without manual optimization for a specific mask.

  • ASPLOS ‘24

    Toleo: Scaling Freshness to Tera-scale Memory Using CXL and PIM

    Juechu Dong, Jonah Rosenblum, Satish Narayanasamy

    [paper] [github] [poster] [citeme]

    experience

    ๐ŸŒŸScale trusted memory size from hundreds of MB to tens of TB by expanding the span of trusted from a single trusted processor to an entire platform including intelligent memories.
    ๐ŸŒŸDesign a new scheme of freshness protection that reduces the space requirement by 50x.
    ๐ŸŒŸReduce deployment cost by spacing sharing one intelligent memory device among multiple CPUs.

  • Nature Computer Science – under submission

    SECRET-GWAS: Confidential Computing for Population-Scale GWAS

    Jonah Rosenblum, Juechu Dong, Satish Narayanasamy

    [preprint] [code] [citeme]

    Develop a thousand-core platform on Azure Confidential Computing to conduct multi-institutional GWAS on millions of patients in less than a minute.
    Adapt Spark-based Hail genomic analysis framework to run on TEE under obliviousness requirement.
    Parallelize GWAS computation on 1k cores to achieve near linear speedup.

  • ACM BCB ‘24

    mm2-gb: GPU Accelerated Minimap2 for Long Read DNA Mapping

    Juechu Dong *, Xueshen Liu *, Harisankar Sadasivan, Sriranjani Sitaraman, Satish Narayanasamy

    *both authors contributed equally to this work.
    [paper] [github] [slides] [blog] [citeme]

    Performance Boost: Accelerate bottleneck step (chaining) of state-of-art long sequence mapping tool minimap2 by 2.57x-5.33x on GPU.
    Scales well: Optimize towards ultra long reads of 50kb+ to accommodate genome sequencing technology trend.
    Open Sourced! with active maintainance and optimization! Welcome community contributions~

Skills

  • Programming Language & Compilers

    c/c++, python, cuda, Helion, Triton, CuTe DSL

  • Architectures

    AMD CDNA2/3 GPU, NVIDIA Hopper/Blackwell GPUs

Email

joydong@umich.edu

WeChat

jiaochewchew

Address

4844 Bob & Betty Beyster Building
2260 Hayward St
Ann Arbor, MI
48105