Ziyang Wang

I am a second year CS Ph.D. student at The University of North Carolina, Chapel Hill advised by Prof. Mohit Bansal and also work closely with Prof. Gedas Bertasius. My current research interest is multimodal learning, with a special focus on video-language understanding. I am affiliated with UNC-NLP group.

Previously, I was an Applied Scientist Intern in Amazon Alexa AI working with Heba Elfardy, Kevin Small, Markus Dreyer. I also interned in Tsinghua AIR working with Prof. Jingjing Liu. I finished my undergrad study at UESTC and advised by Prof. Jingjing Li.

My email address is ziyangw at cs . unc . edu, if you have any questions or have relevant intern positions for summer 2024, feel free to contact me!

Google Scholar  /  Curriculum Vitae  /  GitHub  /  Linkedin

profile photo


In general, I am interested in the fundamental challenges in video understanding. Recently, I have been interested in using neuro-symbolic methods to accomplish complex and explainable video understanding using multiple pretrained models.

project image

Unified Coarse-to-Fine Alignment for Video-Text Retrieval

Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal
ICCV23, 2023
arxiv / code /

UCoFiA captures the cross-modal similarity information at different granularity levels(video-sentence, frame-sentence, pixel-word) and unifies multi-level alignments for video-text retrieval.

project image

Language-Augmented Pixel Embedding for Generalized Zero-shot Learning

Ziyang Wang, Yunhao Gou, Lei Zhu, Heng Tao Shen
IEEE Transactions on Circuits and Systems for Video Technology, 2022

In this paper, we propose a novel GZSL framework named Language-Augmented Pixel Embedding (LAPE) , which directly maps the image pixels to the semantic attributes with cross-modal guidance.

project image

Region Semantically Aligned Network for Zero-Shot Learning

Ziyang Wang, Yunhao Gou, Jingjing Li, Yu Zhang, Yang Yang
CIKM21 (long oral), 2021
arxiv /

We propose a novel ZSL framework named Region Semantically Aligned Network (RSAN), which transfers region-attribute alignment from seen classes to unseen classes.


I like sports, especially soccer. I have won a couple of trophies as captain during my undergraduate years.

Design and source code from Leonid Keselman's website, thanks!