3D Scene Generation
CVPR 2019 Workshop, Long Beach, CA
Image credit: [1, 2, 7, 12, 6, 4, 5]
People spend a large percentage of their lives indoors---in bedrooms, living rooms, offices, kitchens, and other such spaces---and the demand for virtual versions of these real-world spaces has never been higher. Game developers, VR/AR designers, architects, and interior design firms are all increasingly making use virtual 3D scenes for prototyping and final products. Furthermore, AI/vision/robotics researchers are also turning to virtual environments to train data-hungry models for tasks such as visual navigation, 3D reconstruction, activity recognition, and more.
As the vision community turns from passive internet-images-based vision tasks to applications such as the ones listed above, the need for virtual 3D environments becomes critical. The community has recently benefited from large scale datasets of both synthetic 3D environments  and reconstructions of real spaces [8, 9, 14, 16], and the development of 3D simulation frameworks for studying embodied agents [3, 10, 11, 15]. While these existing datasets are a valuable resource, they are also finite in size and don't adapt to the needs of different vision tasks. To enable large-scale embodied visual learning in 3D environments, we must go beyond such static datasets and instead pursue the automatic synthesis of novel, task-relevant virtual environments.
In this workshop, we aim to bring together researchers working on automatic generation of 3D environments for computer vision research with researchers who are making use of 3D environment data for a variety of computer vision tasks. We define "generation of 3D environments" to include methods that generate 3D scenes from sensory inputs (e.g. images) or from high-level specifications (e.g. "a chic apartment for two people"). Vision tasks that consume such data include automatic scene classification and segmentation, 3D reconstruction, human activity recognition, robotic visual navigation, and more.
Call for Papers
Call for papers: We invite extended abstracts for work on tasks related to 3D scene generation or tasks leveraging generated 3D scenes. Paper topics may include but are not limited to:
- Generative models for 3D scene synthesis
- Synthesis of 3D scenes from sensor inputs (e.g., images, videos, or scans)
- Representations for 3D scenes
- 3D scene understanding based on synthetic 3D scene data
- Completion of 3D scenes or objects in 3D scenes
- Learning from real world data for improved models of virtual worlds
- Use of 3D scenes for simulation targeted to learning in computer vision, robotics, and cognitive science
Submission: we encourage submissions of up to 6 pages excluding references and acknowledgements. The submission should be in the CVPR format. Reviewing will be single blind. Accepted extended abstracts will be made publicly available as non-archival reports, allowing future submissions to archival conferences or journals. We also welcome already published papers that are within the scope of the workshop (without re-formatting), including papers from the main CVPR conference. Please submit your paper to the following address by the deadline: email@example.com Please mention in your email if your submission has already been accepted for publication (and the name of the conference).
|Paper Submission Deadline||May 17 2019|
|Notification to Authors||May 31 2019|
|Camera-Ready Deadline||June 7 2019|
|Workshop Date||June 17 2019|
|Welcome and Introduction||8:45am - 9:00am|
|Invited Speaker Talk 1||9:00am - 9:25am|
|Invited Speaker Talk 2||9:25am - 9:50am|
|Spotlight Talks (x3)||9:50am - 10:10am|
|Coffee Break and Poster Session||10:10am - 11:10am|
|Invited Speaker Talk 3||11:10am - 11:35am|
|Invited Speaker Talk 4||11:35am - 12:00pm|
|Lunch Break||12:00pm - 1:30pm|
|Invited Speaker Talk 5 (Industry Talks)||1:30pm - 2:00pm|
|Invited Speaker Talk 6||2:00pm - 2:25pm|
|Oral 1||2:25pm - 2:45pm|
|Oral 2||2:45pm - 3:05pm|
|Coffee Break and Poster Session||3:05pm - 4:00pm|
|Invited Speaker Talk 7||4:00pm - 4:25pm|
|Invited Speaker Talk 8||4:25pm - 4:50pm|
|Panel Discussion and Conclusion||4:50pm - 5:40pm|
Vladlen Koltun is a Senior Principal Researcher and the director of the Intelligent Systems Lab at Intel. The lab is devoted to high-impact basic research on intelligent systems. Previously, he has been a Senior Research Scientist at Adobe Research and an Assistant Professor at Stanford where his theoretical research was recognized with the National Science Foundation (NSF) CAREER Award (2006) and the Sloan Research Fellowship (2007).
Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Scientist in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on visual recognition and search. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an Alfred P. Sloan Research Fellow and Microsoft Research New Faculty Fellow, a recipient of NSF CAREER and ONR Young Investigator awards, the PAMI Young Researcher Award in 2013, the 2013 Computers and Thought Award from the International Joint Conference on Artificial Intelligence (IJCAI), the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013, and the Helmholtz Prize in 2017.
Marc Pollefeys is a full professor and head of the Institute for Visual Computing of the Dept. of Computer Science of ETH Zurich which he joined in 2007. He leads the Computer Vision and Geometry lab. Previously he was with the Dept. of Computer Science of the University of North Carolina at Chapel Hill where he started as an assistant professor in 2002 and became an associate professor in 2005. Before he was a postdoctoral researcher at the Katholieke Universiteit Leuven in Belgium, where he also received his M.S. and Ph.D. degrees in 1994 and 1999, respectively. His main area of research is computer vision. One of his main research goals is to develop flexible approaches to capture visual representations of real world objects, scenes and events. Dr. Pollefeys has received several prizes for his research, including a Marr prize, an NSF CAREER award, a Packard Fellowship and a ERC Starting Grant. He is the author or co-author of more than 280 peer-reviewed papers.
Ellie Pavlick is an Assistant Professor of Computer Science at Brown University, and an academic partner with Google AI. She received her PhD in Computer Science from the University of Pennsylvania. She is interested in building better computational models of natural language semantics and pragmatics: how does language work, and how can we get computers to understand it the way humans do?
Daniel Aliaga does research primarily in the area of 3D computer graphics but overlaps with computer vision and visualization while also having strong multi-disciplinary collaborations outside of computer science. His research activities are divided into three groups: a) his pioneering work in the multi-disciplinary area of inverse modeling and design; b) his first-of-its-kind work in codifying information into images and surfaces, and c) his compelling work in a visual computing framework including high-quality 3D acquisition methods. Dr. Aliaga’s inverse modeling and design is particularly focused at digital city planning applications that provide innovative “what-if” design tools enabling urban stake holders from cities worldwide to automatically integrate, process, analyze, and visualize the complex interdependencies between the urban form, function, and the natural environment.
Angela Dai is a postdoctoral researcher at the Technical University of Munich. She received her Ph.D. in Computer Science at Stanford University advised by Pat Hanrahan. Her research focuses on 3D reconstruction and understanding with commodity sensors. She received her Masters degree from Stanford University and her Bachelors degree from Princeton University. She is a recipient of a Stanford Graduate Fellowship.
Jiajun Wu is a fifth-year PhD student at MIT, advised by Bill Freeman and Josh Tenenbaum. He received his undergraduate degree from Tsinghua University, working with Zhuowen Tu. He has also spent time at research labs of Microsoft, Facebook, and Baidu. His research has been supported by fellowships from Facebook, Nvidia, Samsung, Baidu, and Adobe. He studies machine perception, reasoning, and its interaction with the physical world, drawing inspiration from human cognition.
The workshop also features presentations by representatives of the following companies:
Eloquent Labs, Simon Fraser University
Facebook AI Research, Simon Fraser University
Thanks to visualdialog.org for the webpage format.
 Fast and Flexible Indoor Scene Synthesis via Deep Convolutional Generative Models
CoRR, vol. arXiv:1811.12463, 2018
 GRAINS: Generative Recursive Autoencoders for INdoor Scenes
CoRR, vol. arXiv:1807.09193, 2018
 Gibson env: real-world perception for embodied agents
Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference on, IEEE, 2018
 VirtualHome: Simulating Household Activities via Programs
 Embodied Question Answering
 ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans
Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2018
 SeeThrough: Finding Objects in Heavily Occluded Indoor Scene Images
2018 International Conference on 3D Vision (3DV), 2018
 Matterport3D: Learning from RGB-D Data in Indoor Environments
International Conference on 3D Vision (3DV), 2017
 Joint 2D-3D-semantic data for indoor scene understanding
arXiv preprint arXiv:1702.01105, 2017
 MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments
 AI2-THOR: An interactive 3D environment for visual AI
arXiv preprint arXiv:1712.05474, 2017
 Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
 Semantic scene completion from a single depth image
Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2017
 ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017
 CARLA: An Open Urban Driving Simulator
1–16, Proceedings of the 1st Annual Conference on Robot Learning, 2017
 SceneNN: A Scene Meshes Dataset with aNNotations
International Conference on 3D Vision (3DV), 2016