Research Publications

A comprehensive collection of our peer-reviewed papers, articles, and reports. For the full record of our scholarly contributions.

1. Generative image reconstruction from gradients

IEEE Transactions on Neural Networks and Learning Systems

2. Vision-amplified semantic entropy for hallucination detection in medical visual question answering

International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)

3. Cross-modal obfuscation for jailbreak attacks on large vision-language models

arXiv preprint arXiv:2506.16760

4. A survey and evaluation of adversarial attacks in object detection

IEEE Transactions on Neural Networks and Learning Systems

5. Intention Analysis Makes LLMs A Good Jailbreak Defender

In Proceedings of the 31st International Conference on Computational Linguistics.

https://aclanthology.org/2025.coling-main.199/

6. Revisiting Catastrophic Forgetting in Large Language Model Tuning

In Findings of the Association for Computational Linguistics: EMNLP 2024

https://aclanthology.org/2024.findings-emnlp.249/

7. NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models

In Proceedings of the Thirteenth International Conference on Learning Representations

https://openreview.net/forum?id=yaOe2xBcLC

8. Breaking the false sense of security in backdoor defense through re-activation attack

Advances in Neural Information Processing Systems

9. Inform: Mitigating reward hacking in rlhf via information-theoretic reward modeling

Advances in Neural Information Processing Systems

https://proceedings.neurips.cc/paper_files/paper/2024/file/f25d75fc760aec0a6174f9f5d9da59b8-Paper-Conference.pdf

10. Learning from models beyond fine-tuning

Nature Machine Intelligence

https://www.nature.com/articles/s42256-024-00961-0

11. Neural Phylogeny: Fine-Tuning Relationship Detection among Neural Networks

In Proceedings of the Thirteenth International Conference on Learning Representations

https://openreview.net/forum?id=jv2zHOalpL

12. Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift

In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 9477-9486)

13. ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks

International Conference on Machine Learning

14. Copyrightshield: Spatial similarity guided backdoor defense against copyright infringement in diffusion models

IEEE/CVF International Conference on Computer Vision

15. Elba-bench: An efficient learning backdoor attacks benchmark for large language models

The 63rd Annual Meeting of the Association for Computational Linguistics

16. Cot-valve: Length-compressible chain-of-thought tuning

The 63rd Annual Meeting of the Association for Computational Linguistics

17. Open-World Authorship Attribution

In Findings of the Association for Computational Linguistics: ACL 2025 (pp. 17744-17758)