« View All Resources

Grounding Object Recognition and Scene Understanding (MIT Fall2011)

Computer Vision Central - Posted on December 22, 2011 at 3:29 pm.

  • Links: http://people.csail.mit.edu/torralba/courses/6.870_2011f/6.870.grounding.html
  • Details:
  • This class will cover current approaches to object recognition and scene understanding in computer vision and its relation to other disciplines. The goal of this class is to provide an in depth presentation of computer vision techniques for recognition of objects, scenes, materials, actions, ... but by putting them in the framework of concrete tasks.

    The class is addressed to students from any discipline, not just vision, interested in learning about computer vision techniques that can be applied to their research. We will cover state of the art object recognition and scene understanding techniques and how they relate to robotics, language, computer graphics, crowd sourcing, human-computer interaction, etc. For students in computer vision, this class will allow exploring new tasks and scene representations, beyond labeling objects in images for the sake of it.

    The course will cover bag of words models, part based models, classifier based models, multiclass object recognition and transfer learning, concurrent recognition and segmentation, context models for object recognition, representations for scene understanding and large datasets for semi supervised and unsupervised discovery of object and scene categories, etc. We will be reading a mixture of papers from computer vision and influential works from cognitive psychology and other disciplines.


    DateTopicLectureInvited speaker


    Links to Papers/code
    Sept. 7Class goals and a short introductionAntonio 

    Lecture1 (ppt)

    -P. Cavanagh, Vision is getting easier every day, Perception 1996
    Sept. 14Edges, textures, ...Antonio Lecture2 (ppt)

    Sept. 21The importance of dataAntonio

    Boris Katz

    Carl Vondrick

    Lecture3 (ppt)
    Boris (ppt)
    Carl (ppt)

    -LabelMe (websitepaper.pdf)
    -Watson (paper.pdf)
    -START (system websitepaper.pdf)
    -Video annotation 

    Sept. 28Object recognitionAntonio

    Seth Teller

    David Hayden

    Lecture4 (ppt)

    -Felzenszwalb, McAllester and Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR 2008. (code)
    - Manipulation (paper.pdf
    - Natural language commands (paper.pdf

    Oct. 5Object recognition in contextAntonio

    Nicholas Roy

    Ryan Schoen

    Lecture5 (ppt)


    Oct. 12Human visionAntonio

    Aude Oliva

    Deborah Hanus

    Lecture6 (ppt)

    - A. Oliva, A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 2001. (gist code)
    - S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR 2006. (code)

    Oct. 19Words and picturesAntonio

    Regina Barzilay

    Yevgeni Berzak

    Lecture7 (ppt)

    Gestural Cohesion for Discourse Segmentation
    Jacob Eisenstein, Regina Barzilay, Randall Davis
    Proceedings of ACL, 2008

    Modeling Gesture Salience as a Hidden Variable for Coreference Resolution and Keyframe Extraction
    Jacob Eisenstein, Regina Barzilay, Randall Davis
    Journal of Artificial Intelligence Research, 2008

    Turning Lectures into Comic Books with Linguistically Salient Gestures
    Jacob Eisenstein, Regina Barzilay, Randall Davis
    Proceedings of AAAI, 2007

    Oct. 26Multiclass models and transfer learningAntonio

    Daniela Rus

    Sudeep Pillai

    Lecture8 (ppt)

    Nov. 2No class   

    Nov. 9No classICCV  

    Nov. 16Vision and the brainAntonio

    Jim Di Carlo

    Ha Hong

    Lecture9 (ppt)

    Jim Di Carlo's papers

    Nov. 23HCIAntonioStudents:
    Mike Fleder
    Jeremy Scott
    Yafim Landa 
    Lecture10 (ppt)

    Nov. 303D scenesAntonioStudents:
    Emily Zhao
    Xiaodan Jia
    Lecture11 (ppt)

    Dec. 7Project presentationsAntonio  

    Dec. 14ProjectsAntonioLast day of classes 


    Related courses:



    Other resources:



    Here there are links to useful code for low-level and mid-level vision tasks:

    Other useful code:

    k();} ?>