Yunlong Lin

Welcome to my academic homepage. I am Yunlong Lin, a Master student at Xiamen University (XMU) @SmartDSP advised by Prof. Xinghao Ding and Yue Huang. I previously interned at Tencent Hunyuan (Tencent HY) and ByteDance Seed. My key research areas and methodologies include:

📌 Research Interest

My recent interests lie in scaling LLM/VLM agents for digital, visual, and 3D/4D automation, including: (i) coding and tool-use agents for real-world workflows and evaluation (Claw-Eval-Live, JarvisArt, JarvisEvo, JarvisX-Cowork), (ii) visual and creative agents for image generation, editing, restoration, and design artifacts (Unify-Agent, Gen-Searcher, JarvisIR, PosterCraft), and (iii) foundation models and benchmarks for 3D/4D world understanding and dynamic reasoning (IR3D-Bench, DynamicVerse, Thinking in Dynamics). These experiences span agent scaffold and harness design, task and evaluation construction, trajectory rollout and distillation, reinforcement learning/post-training, and multimodal reasoning. Current and previous focal areas include:

Multimodal LLM Agents
- Tool-using agents and creative workflows: JarvisArt, JarvisEvo, JarvisX-Cowork
- World-grounded synthesis agents: Unify-Agent, Gen-Searcher
Coding Benchmark
- Live agent coding benchmark: Claw-Eval-Live
3D/4D Large Foundation Models & Benchmarks
- 3D scene understanding and agentic inverse rendering: IR3D-Bench, HumanCrafter
- Dynamic 4D world modeling and reasoning: DynamicVerse, Diff4Splat, Thinking in Dynamics
AIGC / Unified Models
- Unified image generation and editing: Meta-CoT, iFSQ, ChatUMM
- Generative design and visual restoration: PosterCraft, JarvisIR, SnowMaster, AGLLDiff, DPLUT

✉️ Welcome to contact me for discussions and collaborations on VLM/LLM/Agent and reinforcement learning research.

📱 Contact

WeChat: lyl20136148
Email: linyl@stu.xmu.edu.cn

🔥 News

Our Claw-Eval-Live, a live agent benchmark for evolving real-world workflows, has been released!
Our multimodal representation work Twins has been accepted by ICML. Congratulations to Kaixiong!
Our JarvisEvo has been accepted by CVPR 2026!
Our JarvisEvo: Towards Self-Evolution—the first intelligent photo retouching Agent integrating "Edit-Evaluate-Reflect" has been released! We're honored that our work coincides with (OpenAI x Adobe's new feature!)
Our JarvisArt: The first intelligent photo retouching agent seamlessly integrated with Adobe Lightroom has been accepted by NeurIPS 2025! The code and benchmark are now fully open-sourced!

💼 Experience

Jun'2025 - Dec'2025: Qingyun Intern, Tencent Hunyuan
Jan'2026 - Present: ByteDance Seed

🔬 Selected Research

Open-Source Project

JarvisX-Cowork: A Personal AI Creative Assistant for End-to-End Creative Workflows

An open-source creative assistant that supports end-to-end workflows from web reference search to image generation, refinement, video creation, and polished demos, decks, or docs.

Treating creative production as a unified agent workflow is more scalable than isolated tools. A single planner with shared memory and tool interfaces significantly reduces context switching and improves final output consistency.

💻 GitHub | 🎬 Demo Video

Preprint 2026

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Chenxin Li, Zhengyang Tang, Mingxin Huang, Yunlong Lin, Shijue Huang, Shengyuan Liu, Bowen Ye, Rang Li, Lei Li, Benyou Wang, Yixuan Yuan

A live workflow-agent benchmark with refreshable demand signals and verifiable execution traces; 105 tasks across 22 categories, 13 frontier models, top model passes only 66.7%.

📄 PDF | 🌐 Project | 🤗 HF Paper | 💻 Code

Preprint 2026

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

Shuang Chen, Quanxin Shou, Hangting Chen, Yucheng Zhou, Kaituo Feng, Wenbo Hu, Yi-Fan Zhang, Yunlong Lin, Wenxuan Huang, Mingyang Song, Dasen Dai, Bolin Jiang, Manyuan Zhang, Shi-Xue Zhang, Zhengkai Jiang, Lucas Wang, Zhao Zhong, Yu Cheng, Nanyun Peng

Image synthesis quality improves when generation is grounded in an explicit world model rather than prompt-only decoding. Unifying perception, planning, and generation inside one agent reduces mismatch between intent and rendered scenes.

📄 PDF | 🧾 arXiv | 💻 Code

Preprint 2026

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, Xiangyu Yue

Better images come from better search trajectories before generation. Reinforcement learning over agentic search decisions gives stronger global composition and semantic faithfulness than one-shot prompt tuning.

📄 PDF | 🧾 arXiv | 🌐 Project | 💻 Code

CVPR 2026 (Tencent HY)

JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization

Yunlong Lin*, Linqing Wang, Kunjie Lin, Zixu Lin*, Kaixiong Gong, Wenbo Li, Bin Lin, Zhenxi Li, Shiyi Zhang, Yuyang Peng, Wenxun Dai, Xinghao Ding3♣, Chunyu Wang†, Qinglin Lu†

A self-evolving loop between editor and evaluator provides stable long-horizon improvement for image editing agents. Reward shaping plus reflection-driven updates enables continual capability gains without handcrafting new pipelines for each task.

📄 PDF | 🌐 Project | 🤗 HF Paper | 💻 Code

NeurIPS 2025

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Yunlong Lin*, Zixu Lin*, Kunjie Lin*, Jinbin Bai, Panwang Pan, Chenxin Li, Haoyu Chen, Zhongdao Wang, Xinghao Ding†, Wenbo Li^♣️, Shuicheng Yan†

High-quality artistic retouching is easier when the agent decomposes editing into interpretable subgoals and tool calls. This decomposition improves controllability for users while preserving strong aesthetic quality.

CVPR 2025

JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

Yunlong Lin*, Zixu Lin*, Haoyu Chen*, Panwang Pan*, Chenxin Li, Sixiang Chen, Kairun Wen, Yeying Jin, Wenbo Li, Xinghao Ding

Restoration should be optimized for downstream driving perception, not just pixel-level scores. Injecting agent-style decision signals into enhancement leads to more robust perception gains under adverse conditions.

📄 PDF | 🌐 Project | 🤗 Online Demo ｜ 💻 Code

NeurIPS 2025

DynamicVerse: Physically-Aware Multimodal Modeling for Dynamic 4D Worlds

Kairun Wen, Yuzhi Huang, Runyu Chen, Hui Zheng, Yunlong Lin, Panwang Pan, Chenxin Li, Wenyan Cong, Jian Zhang, Junbin Lu, Chenguo Lin, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Yue Huang, Xinghao Ding, Rakesh Ranjan, Zhiwen Fan

Physical priors are critical for temporally consistent 4D world modeling. Combining physically-aware constraints with multimodal context improves both realism and controllability in dynamic scene generation.

📄 PDF | 🌐 Project | 💻 Code

Preprint 2025

PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

Sixiang Chen, Jianyu Lai, Jialin Gao, Tian Ye, Haoyu Chen, Hengyu Shi, Shitong Shao, Yunlong Lin, Song Fei, Zhaohu Xing, Yeying Jin, Junfeng Luo, Xiaoming Wei, Lei Zhu

A unified framework can jointly handle layout planning and visual rendering instead of treating them as separate stages. This coupling yields better text-image harmony and more stable poster aesthetics.

📄 PDF | 🌐 Project | 💻 Code | 📹 Demo Video

📝 Publications

AAAI 2025

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement

Yunlong Lin*, Tian Ye*, Sixiang Chen*, Zhenqi Fu, Yingying Wang, Wenhao Chai, Zhaohu Xing, Lei Zhu, Xinghao Ding.

Training-free enhancement can still be practical when diffusion models are guided by carefully designed global and local luminance priors. This design avoids expensive retraining while maintaining real-world robustness.

📄 PDF | 🌐 Project | 💻 Code

AAAI 2025

Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors

Yunlong Lin*, Zhenqi Fu*, Kairun Wen, Tian Ye, Sixiang Chen, Ge Meng, Yingying Wang, Yue Huang, Xiaotong Tu, Xinghao Ding.

Lightweight LUT-based enhancement and diffusion priors are complementary rather than conflicting. Their combination provides a strong efficiency-quality trade-off for unsupervised low-light restoration.

📄 PDF | 🌐 Project

CVPR 2025

SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization

Jianyu Lai*, Sixiang Chen*, Yunlong Lin, Tian Ye, Yun Liu, Song Fei, Zhaohu Xing, Hongtao Wu, Weiming Wang, Lei Zhu.

MLLM-driven feedback is effective for coordinating multiple restoration experts in difficult weather conditions. Multi-model feedback optimization improves desnowing quality and reduces brittle failure cases.

📄 PDF | 🌐 Project | 💻 Code