preloader

I’m Juechu “Joy” Dong

Confidential Computing Computer Architect PhD candidate @Umich

HELLO

Greetings from Joy! Welcome to my page. My Chinese name is 董珏初 Juechu (pronounced ge ü e, chew). If you find it hard to pronounce my name in mandarin, I’m totally fine with Joy.😊

Bio:
Juechu (Joy) Dong is a PhD candidate at the University of Michigan CSE department advised by Prof. Satish Narayanasamy. Her research focuses on privacy enhancing technologies and large scale parallel computing. Her works seek to advance paralell and confidential computing solutions for enabling privacy-preserving data analytics solutions ranging from population scale genomic analysis to generative AI. Joy recieved dual Bachelor’s degree in Computer Engineering from the Shanghai Jiao Tong University and the University of Michigan. She was awarded Rackham International Student Fellowship.

DOWNLOAD MY CV      
about-me

NEWS

  • [Oct 2024]

    Our work mm2-gb is accepted to ACM BCB ‘24, the flagship conference of the ACM SIGBio. Join us in Shenzhen, China to see how we accelerate minimap2 using GPU!

  • [Aug 2024]

    Our work FlexAttention is lauched. See our PyTorch Blog and 180k view X post. Stay tuned to FlexAttention Part II - decoding and paged attention.

  • [Jun 2024]

    Our work Toleo is accepted to ASPLOS ‘24. Its presentation is delayed to ASPLOS ‘25. See you in Rotterdam~

  • [May 2024]

    I joined Meta Pytorch Compiler team this summer as a research scienctist intern. See you at Menlo Park~

Archived news ...
  • [Mar 2024]

    Our work mm2-gb for long sequence DNA mapping is accepted by BioSys'24. Checkout our open sourced demo. Many thanks to AMD HPC team! see you in San Diego~

  • [Jan 2024]

    I passed PhD qualification test and becomes a PhD candidate.

  • [Dec 2023]

    I recieved Rackham International Student Fellowship for 2023-2024 acdemic year .

EXPERIENCE

Education

  • 2022 Sept - exp. 2027

    University of Michigan

    Ph.D in Computer Science and Engineering | Computer Architecture & Systems
  • 2022 Apr

    University of Michigen

    B.S.E. in Computer Engineering | GPA: 3.99/4.00

    Course work: EECS470 Computer Architecture (A), EECS482 Operating Systems (A), Parallele CUDA Programming (A)

  • 2022 Aug

    Shanghai Jiaotong Univeristy

    B.S.E. in Electrical & Computer Engineering | GPA: 3.82/4.00

    Course work: VE401 Probability Methods in Eng. (A+), VV186/VV285/VV286 Honors Mathematics II/III/IV (A-, A, A)

Industry

  • 2024 May - 2024 Aug

    Meta

    Research Scientist Intern | PyTorch Team

    Build flexible and efficient attention programming model: FlexAttention.
    Work with TorchInductor and conduct performance analysis and optimizations on attention kernels.

  • 2022 May - 2022 Aug

    NVIDIA

    Deep Learning Compute Architect Intern | GPU Architecture

    Model and analyze new memory features on next-gen GPUs such as distributed shared memory and TMA.
    Specialize in: GPU architecture, memory hierarchy & multi-device communication

PUBLICATIONS

  • MLSys ‘25 - under submission

    FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

    Joy Dong *, Boyuan Feng *, Driss Guessous *, Yanbo Liang *, Horace He

    [arxiv][blog] [code] [citeme] *authors contributed equally to this work.

    FlexAttention is a novel compiler-driven programming model for flexible and efficient attention variants implementation.
    🌟Flexible: Allow users to implements majority of attention variants in a few lines of idomatic PyTorch code.
    🌟Fast & Efficient: Achive comparable performance to expert tuned kernels via JIT torch.compile.
    🌟Block Sparsity: Leverages block sparsity to further improve performance without manual optimization for a specific mask.

  • ASPLOS ‘24 – Accepted

    Toleo: Scaling Freshness to Tera-scale Memory Using CXL and PIM

    Juechu Dong, Jonah Rosenblum, Satish Narayanasamy

    We will present Toleo at ASPLOS'25! [arxiv] [code] [citeme]

    experience

    🌟Scale trusted memory size from hundreds of MB to tens of TB by expanding the span of trusted from a single trusted processor to an entire platform including intelligent memories.
    🌟Design a new scheme of freshness protection that reduces the space requirement by 50x.
    🌟Reduce deployment cost by spacing sharing one intelligent memory device among multiple CPUs.

  • Nature Computer Science – under submission

    SECRET-GWAS: Confidential Computing for Population-Scale GWAS

    Jonah Rosenblum, Juechu Dong, Satish Narayanasamy

    [preprint] [code] [citeme]

    Develop a thousand-core platform on Azure Confidential Computing to conduct multi-institutional GWAS on millions of patients in less than a minute.
    Adapt Spark-based Hail genomic analysis framework to run on TEE under obliviousness requirement.
    Parallelize GWAS computation on 1k cores to achieve near linear speedup.

  • ACM BCB ‘24

    mm2-gb: GPU Accelerated Minimap2 for Long Read DNA Mapping

    Juechu Dong *, Xueshen Liu *, Harisankar Sadasivan, Sriranjani Sitaraman, Satish Narayanasamy

    [paper] [code] [slides] [AMD Blog] [citeme] *both authors contributed equally to this work.

    Performance Boost: Accelerate bottleneck step (chaining) of state-of-art long sequence mapping tool minimap2 by 2.57x-5.33x on GPU.
    Scales well: Optimize towards ultra long reads of 50kb+ to accommodate genome sequencing technology trend.
    Open Sourced! with active maintainance and optimization! Welcome community contributions~

SERVICES

Leadership

  • Coordinator

    Computer Engineering Lab Reading Group

    Organize weekly paper reading presentations and discussions.
    Host talks from visiting researchers and professors.

  • Co-Founder & Co-President

    UM-SJTU Joint Institute Alumni Association

    Alumni Engagement: Organize alumni and student gatherings.
    Relationship Building: Involve in expanding SJTU - UM collaborations, connecting to JI sponsors, and building industry relationships.
    Career Advising: Organize students career development workshops.
    Welcoming: Host new student orientation events, organize airport pickups, and offer settle down help.
    Student Support: Support students during the stressful transition to start in a new university in a new country, and during urgent crisis.

    experience

Teaching

  • WN2024

    Graduate Student Instructor: EECS570 Parallel Computer Architecture

    with Prof. Ronald Dreslinski @UMich
  • FA2023

    Graduate Student Instructor: EECS471 CUDA Programming

    with Dr. Valeriy Tenishev @UMich
  • FA2021, WN2022

    Instructional Aid: EECS470 Computer Architecture

    with Prof. Mark Brehob and Prof. Ronald Dreslinski @UMich

    Teach out of order processor design topics including branch prediction, pipelines, prefetching, caches etc. Hold lab sessions and develop exam problems regarding OoO processor design.

  • SP2021

    Teaching Assistant: VE401 Probabilistic Methods in Eng.

    Instructor Dr. Horst Hohberger @SJTU-UM Joint Institute
  • SU2020

    Teaching Assistant: VP260 Honors Physics

    Instructor Dr. Mateusz Krzyzosiak @SJTU-UM Joint Institue.

SKILLS

  • Programming Language

    c/c++ cuda, (system)verilog HIP, bash, Makefile

  • Technologies/Frameworks

    GPU Tuning: nsight-compute/nsight-sys, omniperf/omnitrace/rocprof
    Formal Verification: Murphi
    SIMD: avx512, avx2 on Xeon Phi
    Simulation: SniperSim, DRAMSim, pinplay
    Confidential Computing: Open Enclave SDK, Intel SGX

  • Architectures

    AMD CDNA2 Instinct GPU, NVIDIA Hopper GPU, Intel Xeon Phi, Out-of-order CPU

QUOTES

  • Book stores, free markets and cafés are my must-visits while traveling. My recent best is Campfire Coffee, Negaunee, MI, in a tiny town near Marquette, upper Peninsula. Nice place to visit in fall.
  • Shanghai has only two seasons, winter and summer, and they switch randomly. It is otherwise a wonderful city to live in.
  • Meet my cat Cuda 😼