About Me
I am a fourth-year Ph.D. candidate in Computer Science at the INtelligent Data Engineering Lab (INDElab), University of Amsterdam, advised by Prof. Dr. Paul Groth and Dr. Klim Zaporojets.
My research builds deep-learning models for multi-modal knowledge graphs that stay accurate as data drifts over time and that disambiguate near-identical entities. I fuse graph structure, text, images, and temporal signals for entity linking, link prediction, and recommendation. My methods consistently improve over prior state of the art (up to +20% on entity linking and +6% on recommendation) and ship as open-source code with released datasets and benchmarks.
Before joining UvA, I received my Masterβs degree in Control Engineering from Beijing University of Technology in 2022, advised by Prof. Dr. Yong Zhang, where I worked on graph neural networks under power-law distributions and multi-view graph representation learning.
I am on the job market, seeking Data Scientist roles. Feel free to reach out at p.zhang@uva.nl.
News
- 2026.07: π€ Heading to San Diego in July for ACL 2026 to present Fusion Training for Mathematical Generalization in Large Language Models, come say hi if youβll be around!
- 2026.05: A great week in Dubrovnik presenting Beyond Images at ESWC 2026.
- 2026.05: Presented Graph-TempCZ at LREC 2026 in Palma de Mallorca, lots of good chats on software mentions in the literature.
- 2026.04: π Fusion Training for Mathematical Generalization in Large Language Models is in at the ACL 2026 Student Research Workshop!
- 2026.03: π Beyond Images was accepted at ESWC 2026, thanks to everyone who made it happen.
- 2026.01: π Graph-TempCZ accepted at LREC 2026. Great to see this one land.
- 2025.08: π Our survey on large language models for data challenges in graphs was accepted in Expert Systems with Applications.
- 2024.11: Presented our work on entity linking and co-occurrence networks at EKAW 2024, right here in Amsterdam.
- 2024.10: Off to Boise for CIKM 2024 to present CYCLE.
- 2024.10: Presented TIGER at ECAI 2024 in Santiago de Compostela.
- 2024.09: π Our paper on entity linking in co-occurrence networks was accepted at EKAW 2024.
- 2024.08: π CYCLE accepted at CIKM 2024!
- 2024.07: π TIGER accepted at ECAI 2024!
Project Experience

Fusion Training Β· Hybrid reasoning in large language models
Made one model good at both quick answers and deep step by step reasoning, instead of trading one for the other.
Newer LLMs (like Qwen3 and GPT-5) switch between fast concise replies for easy questions and long reasoning for hard ones to save time and compute, but training both behaviors into a single model makes them compete. Using math problem solving as the testbed, we systematically studied how to mix and order the two kinds of training data, showed that interleaving them keeps both skills strong, quantified the trade off between them, and released an open benchmark (Fusion Bench) for the community.

TimeRoute Β· Recommendation system
Improved recommendation accuracy by up to 6% on TikTok and Amazon datasets, beating strong baselines.
Recommenders usually blend user signals (clicks, text, images) the same way no matter when they happened. I built a model that learns which signals matter over short versus long time spans and weighs them accordingly, then automatically cleans up noisy and missing data so the recommendations stay reliable.

Time Imprint Β· Entity resolution
Boosted top match accuracy by up to 4.81%, and by up to 200% on the hardest cases.
Systems often confuse near identical records whose text and images look almost the same. I added timing as an extra clue so the model can tell them apart, sharply cutting errors on the most confusable pairs.

Beyond Images Β· Automated data enrichment
Lifted match accuracy by up to 7%, and by up to 333% on ambiguous logos and symbols.
Many records have missing or low quality images, which hurts matching. I built an automated pipeline that finds extra images online, turns them into text with vision language models, and summarizes everything with an LLM, filling the gaps without any manual work.

CYCLE Β· Entity resolution that holds up over time
Beat the best prior method by 13.9% to 17.8%, with the largest gains on rare records.
Models that match text to a database get worse as that database changes from year to year. I designed a training approach that learns from those yearly changes so accuracy stays high as the data evolves, and released an open benchmark for the problem.

TIGER Β· Entity resolution with graphs and text
Outperformed the strongest baseline by 16% to 21%.
Tackling the same drift problem, I combined how records connect to each other with their text descriptions to make matching more robust as data changes over time, and released a public benchmark to measure it.
Publications
-
ACL 2026Fusion Training for Mathematical Generalization in Large Language Models. Congfeng Cao, Pengyu Zhang, Jelke Bloem. Annual Meeting of the Association for Computational Linguistics (Student Research Workshop).
Paper | DOI | Code -
ESWC 2026Are a Thousand Words Better Than a Single Picture? Beyond Images - A Framework for Multi-Modal Knowledge Graph Dataset Enrichment. Pengyu Zhang, Klim Zaporojets, Jie Liu, Jia-Hong Huang, Paul Groth. 23rd European Semantic Web Conference.
Paper | DOI | Code | Video (YouTube) | Video (Bilibili) -
LREC 2026Graph-TempCZ: A Graph Representation of Software Mentions for Predicting Software Usage in Scientific Publications. Congfeng Cao, Pengyu Zhang, Jelke Bloem. International Conference on Language Resources and Evaluation.
Paper | DOI | Code -
ESWA 2025A survey of large language models for data challenges in graphs. Mengran Li, Pengyu Zhang, Wenbin Xing, Yijia Zheng, Klim Zaporojets, Junzhou Chen, Ronghui Zhang, Yong Zhang, Siyuan Gong, Jia Hu, Xiaolei Ma, Zhiyuan Liu, Paul Groth, Marcel Worring. Expert Systems with Applications.
Paper | DOI | Code -
EKAW 2024Understanding the Impact of Entity Linking on the Topology of Entity Co-occurrence Networks for Social Media Analysis. James Nevin, Pengyu Zhang, Dimitar Dimitrov, Michael Lees, Paul Groth, Stefan Dietze. International Conference on Knowledge Engineering and Knowledge Management.
Paper | DOI | Code -
CIKM 2024CYCLE: Cross-Year Contrastive Learning in Entity-Linking. Pengyu Zhang, Congfeng Cao, Klim Zaporojets, Paul Groth. Proceedings of the 33rd ACM International Conference on Information and Knowledge Management.
Paper | DOI | Code -
ECAI 2024TIGER: Temporally Improved Graph Entity Linker. Pengyu Zhang, Congfeng Cao, Paul Groth. European Conference on Artificial Intelligence.
Paper | DOI | Code -
Physica A 2024Relationship Updating Network with Contrastive Learning. Pengyu Zhang, Yong Zhang, Xinglin Piao, Yongliang Sun, Baocai Yin. Physica A: Statistical Mechanics and its Applications.
Paper | DOI | Code -
EAAI 2023MVMA-GCN: Multi-view Multi-layer Attention Graph Convolutional Networks. Pengyu Zhang, Yong Zhang, Jingcheng Wang, Baocai Yin. Engineering Applications of Artificial Intelligence.
Paper | DOI | Code -
JCAD 2022Visual Analysis for Name Disambiguation of Academic Papers (in Chinese). Pengyu Zhang, Yong Zhang, Yanjie Cui, Baocai Yin. Journal of Computer-Aided Design and Computer Graphics.
Paper | DOI | Video (YouTube) | Video (Bilibili) -
Information 2021Dual-Channel Heterogeneous Graph Network for Author Name Disambiguation. Xin Zheng, Pengyu Zhang, Yanjie Cui, Rong Du, Yong Zhang. Information.
Paper | DOI | Code
Education
- 2022 - Present, Ph.D. in Computer Science, INDElab, Faculty of Science, University of Amsterdam (UvA), the Netherlands. Supervisors: Prof. Dr. Paul Groth, Dr. Klim Zaporojets.
- 2019 - 2022, M.Eng. in Control Engineering, Faculty of Information Technology, Beijing University of Technology (BJUT), China. Supervisor: Prof. Dr. Yong Zhang.
Skills
- Programming: Python, SQL, Bash, Git, Linux.
- Machine Learning & Deep Learning: PyTorch, Hugging Face, scikit-learn, NumPy, Pandas; large language models, vision-language models, graph neural networks, contrastive learning, recommendation.
- Tools: Weights & Biases, Jupyter.
CV
Computer Science Ph.D. candidate at UvA with five first-author papers on multi-modal and temporal knowledge graphs, seeking Data Scientist roles.