Experience

ACL 2026

Fusion Training · Hybrid-reasoning LLMs

Found that interleaving thinking and non-thinking training data keeps both abilities strong, and measured how the balance shifts (adding more non-thinking data steadily hurts reasoning). Released the open Fusion Bench benchmark.
Newer LLMs mix quick answers with long step-by-step reasoning to save compute, but putting both into one model makes them fight. Ran a full grid over data ratios and training orders to see what keeps both working.

Under review

TimeRoute · Time-aware recommendation

Raised recommendation accuracy by up to 6% on TikTok, Amazon-Baby, and Amazon-Sports over strong baselines.
Recommenders usually mix a user’s clicks, text, and images the same way no matter when each happened. Built a model that learns which of these matter over short versus long time spans, and that cleans up noisy or missing history on its own.

Under review

Time Imprint · Multi-modal entity disambiguation

Raised top-match accuracy by up to 4.81% overall, and by up to 200% on the hardest, most look-alike cases.
Systems often mix up near-identical records whose text and images look almost the same. Added time as an extra clue so the model can tell them apart, cutting errors most where they were worst.

ESWC 2026

Beyond Images · Knowledge-graph data enrichment

Raised match accuracy by up to 7% overall, and by up to 333% on ambiguous logos and symbols.
Many records have missing or low-quality images, which hurts matching. Built a pipeline that finds extra images online, turns them into text with vision-language models, and writes a summary with an LLM, filling the gaps with no manual work.

LREC 2026

Graph-TempCZ · Large-scale graph link prediction

Raised test accuracy by 5.98% (to 92.88%) with a GraphSAGE GNN over feature-based XGBoost baselines.
Built the first large graph linking research papers to the software they use, with over six million mentions spanning 1959 to 2022, and checked how well the model predicts usage in later years.

CIKM & ECAI 2024

CYCLE: Paper | DOI | Code
TIGER: Paper | DOI | Code

CYCLE & TIGER · Temporally robust entity linking

CYCLE beat the best prior method by 13.9% to 17.8%; TIGER beat the strongest baseline by 16% to 21%, measured over one to three year gaps.
Models that match text to a database lose accuracy as the database changes from year to year. CYCLE learns from those yearly changes, and TIGER also uses how records connect to each other. Both come with public benchmarks (GCL-TempEL and Graph-TempEL).

Skills