Kyoto, Japan, 28th September, 2009
In association with the
Twelfth IEEE International Conference on Computer Vision
Object and scene categorization has long been a central goal of computer vision research. Changes in lighting, viewpoint, and pose, as well as intra-class differences, lead to enormous appearance variation, making the problem highly challenging. While advances in machine learning and image feature representations have led to great progress in 2D pattern recognition approaches, recent work suggests that large gains can be made by acknowledging that objects live in a physical, three-dimensional world.
By modeling objects and their relations in 3D, we can provide robustness to changes in viewpoint and pose and contextual constraints that reflects the underlying structure of real-world scenes. But to do so, we must answer several fundamental questions. How can we effectively learn 3D object representations from images or video? What level of supervision is required? How can we infer spatial knowledge of the scene and use it to aid in recognition?
After the success of the 3dRR 07 workshop during the past ICCV 07, we are pleased to announce this second edition in conjunction with ICCV 2009. This workshop represents a great opportunity to bring together experts from multiple areas of computer vision and provide an arena for stimulating debate. We believe the complementary viewpoint offered by studies in human vision can provide additional insight on this fundamental problem. We encourage submissions that contain interesting new ideas, even with limited validation. Specific questions we aim to address include:
- What are suitable representations of the 3D geometry of object instances or classes which can be exploited for recognition?
- Can we expand known 2D spatial models (e.g. constellation models) to 3D?
Reconstruction and Recognition
- Can recognition and reconstruction be run simultaneously to enhance each other?
- How much does 3D information help?
- How detailed does the 3D representation need to be in order to achieve satisfactory recognition?
- How can we represent and infer the depth and orientation of surfaces and free space in indoor and outdoor scenes?
- How can alternative representations, such as depth maps and surface layout estimates, be combined to improve robustness?
Spatial constraints and contextual recognition
- How can we use/explore different degrees of 3D spatial constraints (e.g. ground plane) for recognition?
- How can 3D spatial constraints be used for joint recognition of scenes and the objects within?
- What can we learn from what we know about our own visual system? How do we humans represent 3D objects or the 3D environment? Can this inspire computational work?