CSc 645 |
Advanced Topics in Algorithm Analysis (Spring 09)
|
| Instructor | Kobus Barnard |
| Office Hours | TBA, by automated sign up. |
| Time and Place | TR 3:30-4:45, Gould-Simpson 906 |
| Description |
This course will explore current research relevant to using parametrized
statistical models to extract meaning from complex data such as image data
of objects, organs, and organisms. Such models can effectively encode
important aspects of many problems while capturing variance across instances
and measurement processes, thereby explaining observed data. Importantly,
models can be fit to data using statistical inference in the Bayesian
framework, and further, models for a category can be learned from groups of
instances of the category. Learned category models can be fit to particular
instances of the data to characterize them and/or identify the category that
the instance belongs to.
As a concrete example, consider building a statistical geometric model for the form of a biological entity such as a particular plant species at a given development stage. The general form of such a model can be specified with the help of domain scientists, and the details of such a model can be inferred from images of plants from the species under study. Models learned in this way can be fit to image data of new instances. If the images are of specimens grown under a particular environmental condition, the process can be be used to gather quantitative information about the form of the plant under that condition. Thus one could, for example, extract quantitative data about the effect of drought on the geometric structure of that plant, and link that information to other data such as genetic data and crop yield. The course is an advanced, research focused seminar course targeted at students who want to integrate this methodology into their research. The material will have a strong emphasis on the key technical issue of efficiently inferring good values for model parameters from real data. This generally becomes an optimization problem that can be addressed by advanced sampling and other methods. A significant fraction of the course time will be spent studying literature that suggests strategies for optimizing the kinds of functions that arise in this problem domain. |
| Topics (UNDER CONSTRUCTION) | Modeling and representation: Graphical model notation, Bayesian networks, Bayesian statistics, stochastic grammar models, and image formation. Inference and optimization: Modern approaches to optimizing resulting posterior functions such as MCMC methods (e.g., Metropolis Hastings, reversible jump) and methods based on statistical physics such as Langevin sampling and accelerated molecular dynamics, with a focus on how our understanding of the construction of these functions can be exploited to make these methods effective. |
| Frequently asked question | What is the relation of this topic to machine learning and the 2007 six hundred level course on the fundamentals of machine learning? This course can be considered a special topic in machine learning, and the area can be considered a subset of machine learning, broadly defined. The approach studied in this course is an increasingly popular way to set up learning from data in order to predict future events or classify new examples. However, there are many other useful techniques that are included in a more general machine learning syllabus. For example, discriminative methods can be very successful at classification without a good underlying model. However, in this course, we will focus on applications where models with good representation in the problem domain are arguably worth pursuing. |
| Prerequisites |
There are no formal prerequisites for this course other than the normal
qualifications required to take graduate level computer science courses.
However, the material is very mathematical, and students must
be prepared to struggle with it |
| Required Text |
There are no required texts. All readings will be made available on-line.
Students interested in a general text that puts the approach in perspective might consider getting a copy of "Pattern Recognition and Machine Learning", by Christopher Bishop. This course overlaps with perhaps one third of this book, and it is an excellent reference for the parts of basic machine learning that we will not deal with here. |
| Format |
This course is very research focused. Students will be expected to read a
number of current papers, present and lead the discussion for several of
them, do a substantive project, write a paper about it, and take part in
reviewing the work of others. Research oriented projects will be strongly
encouraged. Group projects will be possible. Short presentations at several
project stages will be required from each project group.
Before the day when specific papers will be presented, students will hand in a short document in response to the reading. This can include thoughts, summaries, comments, notes, worked examples, programming examples, and questions for discussion. The exact nature of the feedback is flexible, and should be chosen by the student to serve two purposes: 1) increase ones understanding of the material; and 2) prepare for discussing the topic. The canonical format of the class is as follows: First there may be a short, random mini-lecture by the instructor. This will cover random topics related to writing the project paper, presenting topics, and doing research in this and other areas. This part of the course is experimental, and will be discontinued if it is not working out. Second, the presenter for that day will provide a short (10-15 minute) lecture on the material. The goal here is to present the high-level points so that everyone is on the same page, and ensure that the discussion that follows is not on what is readily available in the text, or easily clarified. The presentation may contain figures and formulas for reference during the discussion that are not discussed in detail. Questions from the audience to clarify basic issues are encouraged. However, the presenter may defer the question until the discussion period. The final part of the class will be open discussion on the topic lead by the presenter. It is critical that everyone endeavor to take part, and this will be reflected by the participation grade. The instructor and the presenter will have access to the substantive responses and will use them as needed to increase the quality of the discussion. (The first few presentations will be handled by the instructor, and will not necessarily conform to the above format.) |
| Grading (under construction) |
Grading will be based on performance in the following areas.
A cumulative percentage of 90% guarantees an A, 80% guarantees a B, 70% a C, and 60% a D.
|
| Policies |
Good attendance is required. If you cannot make class due to travel or sickness,
please let the instructor know, as missed classes can count against the
participation grade. The degree of impact will be significant if there are more
than three unexplained missed classes.
Students will be expected to respect the University of Arizona's academic integrity policy.
|