AN INFORMATION THEORETIC APPROACH TO JOINT PROBABILISTIC FACE DETECTION AND TRACKING
Department of InformaticsUniversity of Thessaloniki
E-mail: eloutas,nikou,pitas @zeus.csd.auth.gr
Head orientation is calculated by using either feature based
methods [6, 7] or appearance based methods [8, 9]. The latter rely
A joint probabilistic face detection and tracking algorithm for com-
on using training sets of face images under varying pose, while
bining a likelihood estimation and a prior probability is proposed
the feature based methods do not require statistical training. Ap-
in this paper. Face tracking is achieved by a Bayesian framework.
pearance based methods are particularly interesting as they can be
The likelihood estimation scheme is based on statistical training
combined in a probabilistic framework to obtain a single percep-
of sets of automatically generated feature points, while the priorprobability estimation is based on the fusion of an information the-
The Bayesian face tracking scheme proposed in this paper re-
oretic tracking cue and a gaussian temporal model. The likelihood
lies on an appearance based model of automatically generated fea-
estimation process is the core of a multiple face detection scheme
ture point sets for construction the likelihood function  and a
used to initialize the tracking process. The resulting system was
mutual information tracking cue for constructing the prior prob-
tested on real image sequences and is robust to significant partial
ability. Our approach introduces the use of mutual information
occlusion and illumination changes
as a separate cue in a Bayesian face tracking framework. Also,the probability of face observation is constrained using a tempo-ral model based on the automatically generated feature point sets. 1. INTRODUCTION
Head orientation calculation is performed using a mutual informa-tion based scheme. The proposed approach doesn’t require train-
Automatic detection and tracking of human parts is a challenging
ing for head orientation estimation and has shown good results in
research topic with applications in many domains such as human
determining pose under facial appearance changes and illumina-
computer interaction and surveillance, face recognition and in hu-
man joint audio and video localization systems.
The tracking algorithm is initialized using a likelihood func-
In that framework, Bayesian approaches express the posterior
tion estimation framework and is interpreted as a probabilistic face
probability of the motion parameters in terms of a prior probability
detector. An arbitration scheme is also used to obtain a multiple
and a likelihood function . The prior probability is representa-
tive of the tracked object previous history and the likelihood is rep-
The main contributions of the current work are the use of a
resentative of the similarity to an appearance based model learnt
novel probabilistic model based on automatically generated fea-
through statistical training. Bayesian approaches are considered
ture point sets in an object tracking scheme, the introduction of
an effective way of updating prior information by forwarding the
mutual information as a separate cue in a Bayesian framework and
posterior probability and using it as the prior in the next stage of
the head orientation calculation method using mutual information.
the process. They also allow the fusion of different tracking cuesin order to provide a joint tracking output.
The proposed tracking scheme was tested on real image se-
quences. The tracker performs well in partial occlusion and illu-
The main characteristics of existing work are the use of an
mination change situations as it combines the robustness of mutual
image model learned through statistical training and the fusion of
information systems to illumination changes and the appearance
different tracking cues. An appearance model consisting of a sta-
based face detection systems to partial occlusion.
ble component, a transient component and an outlier process isproposed in . Object tracking is performed using color, texture,and edge information in , while edge and ridge information is
2. LIKELIHOOD ESTIMATION
used in . Grayscale and motion model information are com-bined in  to perform tracking of 3D articulated figures.
The acquisition of the likelihood estimates is an important part ofa Bayesian tracking framework. Moreover, it can be used in order
This study has been partially supported by the Commission of the
European Communities, in the framework of the project IST-1999 20993
to construct a face detection scheme. The face detection scheme
CARROUSO (Creating, Assessing and Rendering of High Quality Audio-
is used as a tracking initialization procedure and is applied at the
Visual Environments in MPEG-4 context).
beginning of the tracking process or in the case of tracking failure.
Likelihood is learnt through training of automatically generated
2.3. Tracking algorithm initialization
feature points. Each image of the training set is described by aset of automatically generated feature points [10, 11]. The fea-
The face tracking algorithm initialization procedure is based on the
ture points represent image corners and are characterized by large
estimation of the facial observation probability. The facial obser-
gradient variations in both horizontal and vertical directions and is
vation probability calculation process is extended to handle mul-
presented in  as an edge detection algorithm.
tiple faces. Candidate facial regions are considered all those forwhich the normalized face observation probability exceeds a pre-defined threshold. In order to eliminate false facial region can-
2.1. Face feature generation and training
didates an arbitration scheme similar to that presented in  isimplemented. The steps of the initialization of the multiple face
The feature set , is generated using a matrix:
Calculate the facial observation probabilities over the whole
Reject all the candidate regions whose normalized facial
is constructed for every candidate feature point.
observation probability is below a predefined threshold. Mark
are the image gradients of an image point in the
these candidate regions as non face regions. – Mark as a face the unmarked image region assigned
Features having two large eigenvalues of their matrix
to the maximum facial observation probability.
selected and the inter feature distance must not exceed a predefined
– Perform the arbitration scheme:
threshold (feature neighborhood threshold).
The feature set is assumed to be comprised of
Reject any candidate facial region whose center
Most of them represent corners generated by the intersection of
lies within a previously defined facial region.
the object contours or corner of the local intensity pattern not cor-
Reject any candidate facial region overlapping
responding to obvious scene features . In the case of faces,
with a previously defined facial region.
the feature set is expected to lie on face areas containing intensity
Reject any candidate facial region when the num-
variations such as the face contour, the eyes area, the nose area and
ber of less probable candidate facial regions within
them is less than a predefined threshold.
The training procedure involves the feature set generation from
a number of training images. The ”ORL Database of Faces”
until all candidate regions are marked as face or non face.
containing a total number of 400 images of 40 different personswas used for training. The number of features,
3. PRIOR PROBABILITY ESTIMATION
much less than the total number of image pixels
The prior probability is representative of the previous knowledge
acquired through the tracking process. The estimation of the prioris based on a mutual information tracking cue and a temporal model. 2.2. Face observation probability estimation 3.1. Mutual information cue
The estimation of the first cue face observation probability is ac-complished by calculating the likelihood
The tracking process can be modeled as a communication between
is the input pattern in the ”feature point set space” and
sents the face class. The multiscale extension of the face detection
procedure used in  is adopted. Using the results obtained by
mum number of grayscale levels). Mutual information is a mea-
sure of the amount of information transmitted through the com-
their marginal probability mass functions
are the reference and target images respectively,
is the term estimated from the M principal com-
represent scale and rotation respectively.
The mutual information of two random variables
feature points should be generated using the previously describedalgorithm. An estimate of the face position and scale is thus ob-tained. The probability
face is generally normalized with respect to its maximum value
The normalized probability is compared to a prede-
fined threshold in order to perform facial region assignment.
The maximum mutual information for a particular prior
is a normalizing factor , while the term
As it can be observed, the prior is constructed from the mutualinformation contribution
In order to obtain the full estimate of the head orientation
coarse estimate is obtained at first by finding the translation vector.
The estimate is then refined by calculating the scale factor and the
Let the prior probability based on the mutual information tracking
rotation angle and the final estimate is obtained. Better results may
be obtained by adopting a recursive refining process. 5. EXPERIMENTAL RESULTS
The proposed algorithm was tested on a variety of real face image
sequences under different lightening and occlusion conditions. Re-
indicates a strong match between the ref-
sults on a single face sequence without lightening changes or par-
erence and the target regions, while a small value of Ô
tial occlusion are presented in Figure 2. As it can be observed,
the face position and orientation are correctly determined. Track-ing results on a similar sequence with lightening changes are pre-
3.2. Temporal model
sented in Figure 3. A slight drift in the estimated facial position isnoticed in very dark image sequences when the tracking process is
The temporal model part of the prior describes the probability of a
prolonged for too long. Results on multiple face image sequences
face to appear given its location at the previous time instant. The
suffering from lightening changes and partial occlusion are pre-
temporal model is used as a constraint factor  in the tracking
sented in Figures 4 and 5 respectively. Facial position is correctly
determined in the multiple face case even under severe partial oc-clusion and illumination changes. In general, the face tracking al-
gorithm proposed in this paper can effectively track multiple faces
under significant illumination changes and partial occlusion.
In order to model the facial position variation, the feature point setsgenerated on the reference and target regions are used. The overallfacial position variation is also modeled as a gaussian distribution:
A Bayesian face tracking scheme was presented in this paper. Like-
lihood estimation is performed using sets of automatically gener-ated feature points, while the prior probability estimation is based
on a mutual information tracking cue and a gaussian temporal
The main contributions of the proposed scheme are the intro-
duction of a novel appearance based model for likelihood estima-
tion and the use of a mutual information tracking cue in order to
estimate the prior combined with a gaussian temporal model.
Moreover, the implementation of an arbitration scheme, to
abilities are not informative if the prior pdf has a larger variance
face tracking initialization is also important since it allows a mul-
than the likelihood function . Therefore, too small values of
will render the temporal model non informative and
The proposed algorithm was tested on real face sequences. Re-
thus unimportant to the tracking process.
sults have shown that the facial position is correctly determinedeven in image sequences presenting important illumination changes
4. FACE TRACKING
and partial occlusion. The face orientation was correctly deter-mined under normal illumination conditions and slight illumina-
In order to track the detected faces to the next frame the observa-
tion changes. Robustness to illumination changes is obtained by
using the mutual information tracking cue, while robustness to
partial occlusion is obtained by the use of the appearance based
feature points and their rotation and scaling parameters at time in-
 J. Ruanaidh and W. Fitzgerald, Numerical bayesian methodsapplied to signal processing, Springer-Verlag, 1996.
 A. Jepson, D. Fleet, and T. Maraghi, “Robust online appear-
ance models for visual tracking,” in Proc. of 2001 Int. Conf. on Computer Vision and Pattern Recognition, 2001, vol. I,pp. 415–422.
 C. Rasmussen and G. D. Hager, “Probabilistic data associ-
ation methods for tracking complex visual objects,” IEEETransactions on Pattern Analysis and Machine Intelligence,vol. 23, no. 6, pp. 560–576, 2001.
 H. Sidenbladh and M. Black, “Learning image statistics for
in IEEE International Conference onComputer Vision (ICCV), Vancouver, Canada., 2001, vol. 2,pp. 709–716. Fig. 1. (a) Feature point set of 100 feature points. Feature neigh-
 H. Sidenbladh, F. De la Torre, and M. Black, “A framework
borhood threshold=5. (b) Feature point set of 100 feature points.
for modeling the appearance of 3d articulated figures,” in
Feature neighborhood threshold=3. (c) Feature point set of 300
IEEE International Conference on Automatic Face and Ges-
feature points. Feature neighborhood threshold=3. ture Recognition (FG), Grenoble, France., 2000, pp. 368–375.
 T. Jebara and A. Pentland, “Parametrized structure from mo-
tion for 3d adaptive feedback tracking of faces,” in Proceed-ings of the International Conference on Computer Vision andPattern Recognition, 1997, pp. 144–150.
 A. Nikolaidis and I. Pitas, “Facial feature extraction and pose
determination,” Pattern Recognition, Elsevier, vol. 33, no. 11, pp. 1783–1791, 2000. Fig. 2. Tracking results under normal lightening conditions.
 T. Darrell, B. Moghaddam, and A. Pentland, “Active face
tracking and pose estimation in an interactive room,” in Pro-ceedings of the International Conference on Computer Visionand Pattern Recognition, 1996, pp. 67–72.
illumination-insensitive head orientation estimation,”
IEEE International Conference on Automatic Face and Ges-ture Recognition (FG), Grenoble, France., 2000, pp. 183–188.
 C. Tomasi and T. Kanade, Shape and Motion from ImageFig. 3. Tracking results under different illumination conditions. Streams: a Factorization Method - Part 3 Detection andTracking of Point Features, 1991.
 K. Rohr, Landmark-based image analysis, Kluwer Aca-
 A. Verri E. Trucco, Introductory techniques for 3-D Com-puter Vision, Prentice Hall, 1998.
 F. Samaria and A. Harter, “Parameterisation of a stochastic
model for human face identification,” in Proceedings of 2ndIEEE Workshop on Applications of Computer Vision, Sara-sota FL, 1994, pp. 138–142. Fig. 4. Tracking results in a face image sequence containing two
 B. Moghaddam and A. Pentland, “Probabilistic visual learn-
faces under varying lightening conditions.
ing for object representation,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 19, no. 7, pp. 696–710, 2001.
 H. Rowley, S. Baluja, and T. Kanade, “Neural network-based
face detection,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 20, no. 1, pp. 23–37, 1998.
 S. Haykin, Communication Systems-3rd ed., J. Wiley, 1994.
 M. Skouson, Q. Guo, and Z. Liang, “A bound on mutual
information for image registration,” IEEE Transactions onFig. 5. Tracking results in a face image sequence containing two Medical Imaging, vol. 20, no. 8, pp. 843–846, 2001.
faces under varying lightening conditions and partial occlusion.
Matthew R. Tidwell, APR Experience Morningstar Communications – Overland Park, KS Vice President Currently lead new business acquisition efforts and provide senior level client service for Morningstar Communications, a full-service marketing communications firm. Specialize in industries including healthcare, advanced energy, and financial services. Great Plains Energy / K
e u r o p e a n u r o l o g y 5 0 ( 2 0 0 6 ) 2 1 5 – 2 1 7a v a i l a b l e a t w w w . s c i e n c e d i r e c t . c o mj o u r n a l h o m e p a g e : w w w . e u r o p e a n u r o l o g y . c o mEditorial – referring to the article published on pp. 351–359 of this issueTreatment of Erectile Dysfunction with Chronic Dosingof TadalafilThe medical management of erectile dysfunction