Ruiying Ma

I received a B.Eng in Computer Science and Technology from Tsinghua University, IIIS in June, 2025. Prior to that, I visited UC Berkeley as a student researcher, where I was advised by Prof. Aditya Parameswaran and worked on designing databases for unstructured data, especially documents. I also worked as a research intern at the Systems and Networking Research Group of Microsoft Resarch Asia, where I studied caching problems with Dr. Chieh-Jan Mike Liang, Prof. Francis Y. Yan, and Yanjie Gao.

Email / CV / Google Scholar / Github / LinkedIn

Research

I am interested in Computer Systems, especially Databases. Currently, I'm working on databases for PDF documents, a common category of unstructured data. Sepcifically, I formulate, analyze, and leverage various structures from real-world PDFs to improve efficiency and accuracy in document analytics. I'm also working on a traditional computer system problem: caching. I design and analyze new cache replacement policies that achieve state-of-the-art efficiency on evolving system workloads.

Publications

Querying Templatized Document Collections with Large Language Models
Yiming Lin, Madelon Hulsebos, Ruiying Ma, Shreya Shanker, Sepanta Ziegham, Aditya G. Parameswaran, Eugene Wu
ICDE, 2025
paper / arXiv

By leveraging semantic hierarchical structures from templatized documents, we design ZenDB, a document analytics system with a novel query engine, for accurate and cost-effective (~31x cost savings) query execution.

Source code from Jon Barron's website.