Xiaohan Yan   é¢œå°ę¶µ

I am now a Computer Science Master's student at CAD Research Center, Tongji University.

I did my B.Sc. in Computing Science at Hohai University.

My research interests are computer vision, multi modal and reinforcement learning, specifically, learning-based methods for 3D point cloud segmentation, multimodal pretrain methods, etc.

If you find any research interests that we might share, feel free to drop me an email. I am always open to potential collaborations.

I am a former ACMer and a former OIer.

Email  /  CV  /  Github /  Linkedin 

äø­ę–‡ / English / ę—„ęœ¬čŖž

profile photo
Short Bio
I am a M.Sc student in Computer Science at CAD Research Center, Tongji University, where I am honored to be advised by assoc. prof. Gang Wei.

Before that, I received my B.Sc degree in Computing Science at Hohai University in 2022, and I received the honour of being the Charming Graduate of Hohai University.

I am the former captain of the ACM team at Hohai University, I have chaired the 10th and 11th Hohai University ACM Programming Competition. Also, I ran hohai online judge website for a year.

I am a former OIer at JiangSu DaFeng Senior High School, during that time, I became interested in computer science.

I was born on May 21th, 2000 in Yancheng, China. My hometown is on the shores of the Yellow Sea, with a national nature reserve, also known as the home of the moose.

Research interests
I am working at the intersection between Computer Vision and Multi Modal, developing new deep learning methods to resolve the challenging problems in 3D Vision or text-image alignment, especially focus on segmentation, Pretrain model and Scene Understanding.

My long-term goal is to improve the application of 3D Vision, benefiting society directly by improving people's living environment.
News
[2024/5/23] Our paper "RE0: Recongnize Everything with 3D Zero-shot Open-Vocabulary Instance Segmentaion" has submitted to NeruIPS2024.

[2024/4/29] I have been on a research internship as NIO, Shanghai

[2024/4/28] Our paper "AttenPoint: Exploring Point Cloud Segmentation through Attention-Based Modules" has submitted to PRCV2024.

[2024/3/8] Our paper "Anatomical Structure-Guided Medical Vision-Language Pre-training" has submitted to MICCAI2024.

[2024/1/8] I have been on a research internship at Institute for Al Industry Research (AIR), Tsinghua University.

Internship Experiences

More details has been written in my CV.

ā€¢ Research Internship at NIO, Shanghai. April 2024 - Present
ā€¢ Research Internship at Institute for Al Industry Research (AIR), Tsinghua University. January 2024 - March 2024
Research

Much of my research is about inferring the physical world (shape, motion, color, light, etc) from images and 3D raw data. Representative researches are highlighted.

RE0: Recognize Everything with 3D Zero-shot Open-Vocabulary Instance Segmentation
Xiaohan Yan, Zijian Jiang, Yinghao Shuai, Nana Wang, Xiaowei Song
NeurIPS2024, 2024-5, Code Will coming soon, Paper will coming soon

We leverage the 3D geometry information in 3D point cloud, the projection relationship between 3D point cloud and multi-view 2D posed RGB-D frames and the semantic features extracted by CLIP from multi-view 2D posed RGB-D frames to address the challenge of 3D instance segmentation.

AttenPoint: Exploring Point Cloud Segmentation through Attention-Based Modules
Xiaohan Yan, Nana Wang, Xiaowei Song
PRCV2024, 2024-4, Code Will coming soon, Paper will coming soon

Similar to how humans perceive 3D objects, neural networks discern the class labels of point clouds by combining local and global features of the structures and performance. Based on this, we reviewed the pipeline of few-shot point cloud semantic segmentation and identified three issues.

GreedyAgent:AĀ SimpleĀ yetĀ EfficientĀ ApproachĀ forĀ Meta-learningĀ fromĀ LearningĀ Curves
Jinyu He, Xiaowei Song, Xiaohan Yan, Nana Wang
ICIC2024 oral, 2024-4, Code, Paper will coming soon

Meta-learning plays an increasingly importantt role in AutoML. A key sub-problemā€”meta-learning from learning curves is an mmature but gradually attention area within the field of meta-learning.

Anatomical Structure-Guided Medical Vision-Language Pre-training
Qingqiu Li, Xiaohan Yan
MICCAI2024, 2024-3, Code, Paper

Learning medical visual representations through vision-language pre-training has some challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework.

Project

Much of my projects is about inferring the physical world (shape, motion, color, light, etc) from images and 3D raw data. Representative projects are highlighted.

End-to-end-SegmentAnything3D
Xiaohan Yan, Nan Wang,
Kaggle, 2023-10, Code

This project aims to using Segment Anything 3D to solve a ply pointcloud without 2D labels.

We using a pcd2rgb method to generate 2D RGB & Depth photos. And then, we aligne the inputs and generate the .ply output.

LLM Science Exam - Use LLMs to answer difficult science questions
Xiaohan Yan, Nan Wang, Xiaowei Song, Jinyu He
Kaggle, 2023-10, Code

We get the 0.905 at the leardboard. And reach the Top 4%.

We gather the wiki pedia knowledge about science questions, and use the bag-of-words model to clean the datas. Then, we use the sentence transformer to find the similarity between the problen and the cleaned dataset. Training three large deberta models with different datasets, and combining their features to inferring the right answer.

Stable Diffusion - Image to Prompts
Xiaohan Yan, Nan Wang, Xiaowei Song
Kaggle, 2023-05, Code

For images generated from text using Stable Diffusion, we use three models BLIP+CLIP, OFA and ViT. Then, we combine their features to predict the text for a given generated image.

HUAWEIRobot Path Planning for CodeCraft
Xiaohan Yan, Nan Wang, Xiaowei Song
CodeCraft, 2023-03, Code

This project is about HUAWEI robots application, the project requires us to assign policies, control scheduling, and path planning for multiple robots in a single map.

Selected awards
ā€¢ The 2019 ICPC Asia-East Continetnt Final - Bronze Medal (2019)

ā€¢ CCF Collegate Computer Systems & Programming Contest - Silver Modal (2019)

ā€¢ Jiangsu Collegiate Programming Contest - Silver Modal 2nd place (2020)

ā€¢ CCF Certified Software Professional - 320(Top 0.88%) (2020)

ā€¢ Hohai University Academic & Science and Technology Scholarship (2019 - 2021)

ā€¢ Hohai University Charming Graduates (2022)

ā€¢ The rest of the awards
What did I do In my spare time?
InsideOut
Origami-hui, Xiaohan Yan
GameJam, 2023-04, Porject Page

As an incarnation of matter inhaled by the deep breath, you should try your best to avoid sinking into the human body.

"WASD" move; "R" recover "Oxygen"; "E" interact with environment; "left Shift" sprint; "left mouse button" attack; "right mouse button" parry.

eScape
Origami-hui Xiaohan Yan
GameJam, 2023-12, Porject Page

Scale your device and escape from this geometry storm.

This game reached the "Innovation RK1" and "Theme interpretation RK2" at Game Off 2023

Misc
JapanesešŸ‡ÆšŸ‡µ:
        I am trying to learn Japanese now. And I plan to take part in the Japanese N2 exam at 2024 Summer.

SportsšŸƒā€ā™‚ļø:
        SwimmingšŸŠ, swimming is my hobby when I was a kid, and I hit 39ā€˜22s in the 50m backstroke.
        Go, BadmintonšŸøļø, Flying DiscšŸ„.

GamesšŸŽ®:
        I love to play PokĆ©mon-related games such like PTCG, PokĆ©mon Legends: Arceus, etc.
        I am a fan of Nintendo. The Legend of Zelda is the best game I think.

Last updated on 2024/5/29