AIE310004: Computer Vision
Computer Vision is a discipline that studies how to enable machines to "see," and it is currently one of the most popular and rapidly advancing research fields in artificial intelligence. This course will introduce fundamental problems in computer vision, helping students grasp the basic concepts, foundational knowledge, and essential methods in the field, thereby opening the door to research in related areas for students interested in computer vision. Through reading classical literature and validating classic algorithms and applications, the course will also enhance students' hands-on skills in computer vision engineering.
This course covers the development history of computer vision and introduces traditional computer vision methods, including classical computer vision features and early image classification frameworks. Additionally, we will delve into the fundamentals of neural networks, as well as neural network architectures such as convolutional neural networks (CNNs), Transformer models, and diffusion models. The course will also focus on CNN- and Transformer-based methods for image classification, object detection, and image segmentation, along with the latest advancements, such as large-scale multimodal pre-training models and visual generative models. Furthermore, we will explore practical challenges in building computer vision models, such as excessive model parameters, data imbalance, and domain shifts, while providing corresponding solutions.
The course includes three hands-on sessions, where students will have the opportunity to build an image classification model, an object detection model, and an image generation model themselves, reinforcing their understanding of the course content and cultivating their practical skills.