Welcome to Zora’s home on the web :)
If you want to do research with me, discuss project directions for the 11-711 (Advanced NLP) course, or just generally get connected, you can check my calendar here (via calendly) and book a 30-minute meeting. Please confirm with me first before you schedule any meeting!
- Gave a guest lecture about code generation for the Advanced NLP course (11-711) 👩🏫 more details here
- A talk about 🛠️ Tool using, learning, and making with LLMs at code generation reading group, check out the video and slides
- Gave a talk about ODEX at the Machine Learning Methods in Software Engineering (video) hosted by JetBrains Research Team. Feel free to check it out! 🙌
- New Preprint: FilCo ⌨️ Learning to Filter Context for Retrieval-Augmented Generation
- Paper appearing in EMNLP main conference: API-Assisted Code Generation for Question Answering on Varied Table Structures, thanks to the great work of my mentees!
- We released SPAE (Semantic Pyramid AutoEncoder), which enables frozen LLMs to perform understanding & generation tasks on non-linguistic modalities such as images or videos
- We just released 🌟 StarCoder, a 15B open-source code generation model with SoTA performance, super happy to have contributed to it. (BigCode)
- Check out ODEX about pen-omain code generation with ecution-based evaluation (arXiv)
My primary research interest is to build language models with interpretable and generalizable reasoning skills, by generating verifiable programs (e.g., SQL, logical forms, Python), grounding on supporting contexts (unstructured Wikipedia text, structured tables), and explore increasingly diverse contexts with broader functionalities.
NLP Research Experience
Prior to CMU, I took a gap year and became an Assistant Researcher at Microsoft Research (Asia) through the Star Bridge Program. My focus then was understanding and leveraging structured knowledge such as tables, via large-scale pre-training or complex question answering and/or data-to-text generation.
During undergrad years, I was also selected as a student researcher by the Tencent Rhino-Bird Talent Cultivation Program. I was lucky to explore both offices at Shenzhen and Beijing in China, also, work on interesting projects about knowledge injection and adaptive inference via self-distillation.
Life As An Undergrad
I have received my B.S. in Mathematics from Beijing Normal University, while my explorations are a bit more diverse than this. I have studied macro-economical models for Central Bank Digital Currency (CBDC) at People’s Bank of China; configured human visual coding pathways using self-organized maps at IDG/McGovern Institute for Brain Research; captained the genetic engineering for an antibiotic-free glucose production method and presented at the iGEM conference; also, examined Chinese language models in terms of world and linguistic knowledge acquisition at Institute of Chinese Information Processing.
- Reviewer: EACL 2023, main conference (Question Answering track)
- Reviewer: AAAI 2023, 2024, main conference
- Reviewer: NeurIPS 2022, 2023, Table Representation Learning (TRL) workshop
- Teaching Assistant for 11-711: Advanced Natural Language Processing
- Organizing Committee: Student Research Symposium, Language Technologies Institute (CMU), 2022
- Reviewer: EMNLP 2022, 2023, main conference (Unsupervised and Weakly-Supervised Methods, Language Modeling and Analysis, QA)
- Reviewer: NAACL 2022, Structured and Unstructured Knowledge Integration (SUKI) workshop
I Love My Name
My name in Chinese is 王芷若, which reads as Zhiruo Wang in Hanyu Pinyin. It is usually hard for non-native speakers to pronounce, so you can also call me Zora (as ZR is similar to Zhi Ruo). I love my name, especially in Chinese characters, since it has a more beautiful meaning than in English alphabets. 芷 stands for 白芷 (Angelica dahurica) and 若 stands for 杜若 (Pollia japonica), which are two kinds of Chinese herbal medicine. Also, 芷若 is a beautiful vanilla.
Feel free to reach out if you have any questions :)