Auditorily, all speech is made up of sounds describable in terms of quality, pitch, loudness and length. 
All markers in speech thus depend on these variables for their phonetic realization, and the discussion 
that follows is an attempt to explain the phonetic basis of different types of speaker-characteristics.
<p>

There are three different facets of vocal performance to be considered. Each of these facets is subject 
to a different time-perspective. Firstly, there is the facet of vocal performance that represents the 
speaker's permanent or quasi-permanent voice, by which he is recognizable even when his consonants 
and vowels are unintelligible, for example, when heard speaking on the other side of a closed door. The 
other two facets are tone of voice and the phonetic realizations of linguistic units. The time-perspective of 
tone of voice is usually medium-term, and that of linguistic articulations very short-term.

<p>
Because voice features are by definition long-term, they lie quite outside any possibility of signalling 
linguistic meaning, so it is appropriate to refer to such voice features as <I> extralinguistic. </I> Since they are 
not normally consciously manipulated by the speaker, voice features are informative but not 
communicative. The medium-term features that make up tone of voice, and which have the function of 
signalling affective information, have a rather closer resemblance in some ways to the short-term use 
of the vocal apparatus for signalling linguistic meaning, and such features are therefore often referred 
to as <I> paralinguistic.</I>  They are <b>para</b>linguistic in the sense that they form a communicative code subject to 
cultural convention for its interpretation; paralinguistic features are not fully linguistic in the sense 
that they lack the possibility of signalling meaning through sequential arrangement into structures, 
which is a criterial property of linguistic communication. 
<p>
<b>Neither extralinguistic nor paralinguistic features are irrelevant to directly linguistic interests, 
since they constitute a background against which the linguistic articulations can achieve their 
perceptual prominence. Strictly, each of the three types of vocal feature, extralinguistic, 
paralinguistic and linguistic, acts as a perceptual ground for the figures of the other two types of 
figure.  </b>
<p>
Each of these categories of vocal behaviour will now be discussed in more phonetic detail. A summary 
of the relationship between these vocal variables and their marking functions is given in table 1. <p>

<b>2.1. Extralinguistic voice features </b>

Long-term speaker-characterizing voice features are of two different sorts. One type of voice feature 
arises from anatomical differences between speakers. The second type is the product of the way in 
which the individual speaker habitually 'sets' his vocal apparatus for speaking. Unlike this second type, 
which will be discussed in a moment, the first type of feature is by definition outside any possibility of 
control by the speaker. It includes anatomical influences on aspects of voice quality and of voice 
dynamics.
<p>
Anatomical influences on voice quality are due to factors such as basic vocal tract length, dimensions 
of lips, tongue, nasal cavity, pharynx and jaw, dental characteristics, and geometry of laryngeal 
structures (Abercrombie 1967: 92). These anatomical factors impose limits on the range of spectral 
effects (in terms of formant frequency and amplitude ranges, and on the distribution of aperiodic noise 
through the spectrum) that the speaker can potentially control acoustically.

Anatomical influences on voice dynamics are due to factors such as the dimensions and mass of the 
vocal folds, and respiratory volume. These influence pitch and loudness ranges, by imposing limits on 
the ranges of fundamental frequency and amplitude that the speaker can produce.
<p>
Listeners' judgments of physical attributes, based on the product of such anatomically derived 
features, are amongst the most accurate conclusions drawn. This is precisely because they are based 
on invariant, involuntary aspects of a speaker's vocal performance. Physique, age and sex are all judged 
with a fair accuracy, and interesting information about a speaker's medical condition is also sometimes 
accurately inferred.
<p>
Physique and height are probably judged accurately because of the good correlation that seems to exist 
between these factors and the dimensions of the speaker's vocal apparatus. A tall, well-built man will 
tend to have a long vocal tract and large vocal folds. His voice quality will reflect the length of his vocal 
tract by having correspondingly low ranges of formant frequencies, and his voice dynamic features will 
indicate the dimensions and mass of his vocal folds by a correspondingly low range of fundamental 
frequency. His large respiratory volume will be reflected in a powerful loudness range. If we then hear 
such a voice over the telephone, we normally have a confident expectation that the speaker will turn 
out to be a large, strong male. In general, our expectations are fulfilled, within a reasonable margin of 
error. Bonaventura (1935) gave subjects pictures and voices to match, and found that fair accuracy was 
achieved: in terms of Kretschmerian body-types (Kretschmer 1925), judgments of pyknic types were 
most accurate, accuracy was less for <I>leptosome </I> types, and least for <I> athletic </I> types. Moses (1940, 1941) 
gives general support to this, and Fay & Middleton (1940a) report a more detailed finding: they found 
that in judging body-types from voices transmitted over a public address sys-tem, the results were 22 
per cent above chance for <I> pyknic </I> types, 20 per cent for leptosomes, but only 1 per cent above chance for 
athletic types. Lass, Beverly, Nicosia & Simpson (1978) report that listeners typically judge weight to 
within 3-4 lbs (though overestimating the weight of males and underestimating that of females), and 
that they judge height to within 1.5 inches (though underestimating the height of both males and 
females). There is one class of voices where the general correlation does not apply, but where listeners 
nevertheless seem to be able to reach successful conclusions about the physical attributes. That is 
where the formant ranges of the voice are radically discrepant with the fundamental frequency, as in 
particular types of dwarfism (Vuorenkoski, Tjernlund & Perheentupa 1972; Weinberg & Zlatin 1970). In 
these cases, the dimensions of the vocal folds are smaller than their general correlation with vocal 
tract length would lead one to expect.

<p>
Exceptions to the general rule of our ability as listeners to attach a particular size and physique to a 
given voice are sufficiently rare to take us aback when they occur.


<p>Age is judged accurately (Dordain, Chevrie-Muller & Gr&#233;my 1967; Hollien & Shipp 1972; Mysak 
1959; Ptacek, Sander, Maloney & Roe Jackson 1966; Shipp & Hollien 1969). Voice quality features 
probably play their part in marking this characteristic, but voice dynamic features are likely to be the 
more primary cues. Age is marked by pitch in both males and females: Hollien & Shipp (1972) show a 
progressive lowering of mean pitch with age for males from 20 up to 40, then a rise from age 60 
through the 80s. Mysak (1959) also showed this rise in mean pitch from the 50s upwards. Dordain et 
al. (1967) report a drop in mean pitch for older women, but a rise with extreme age. Ptacek et al. 
(1966) also report a reduced pitch range with extreme age.

<p>
Features of auditory quality can signal aspects of the age of a speaker. These include the quality 
associated with the 'breaking' voice of puberty, and the quality of extreme old age. Vocal indications of 
puberty, referred to in clinical literature as 'vocal mutation', often include whispery voice. Luchsinger 
& Arnold (1965: 132) write that 'In addition to the lowering of the average speaking pitch, the voice is 
frequently husky during mutation, or it may sound weak.' The senescent voice of extreme old age 
derives from a complex of endocrinal, anatomical and physiological changes. The mucal fluid supply 
often becomes disturbed, either greatly increasing or decreasing, tissues become increasingly less 
elastic, and cartilages become calcified and ossified (Fyfe & Naylor 1958; Luchsinger & Arnold 1965; 
Meader & Muyskens 1962; Terracol & Azemar 1949). Meader & Muyskens (1962: 77) comment 
that'Since the rigidity of tissue is one determination of its resonating qualities, the gradual deposition 
of lime in ... cartilages (replacing them by bone) helps to explain the shrill voice and thin voice 
(deficient in harmonics) of age.' Because muscles atrophy, the glottis of old speakers often has a bowed 
appearance (Luchsinger & Arnold 1965: 136; Tarneaud 1941); this means that, to achieve phonation, 
greater effort has to be exerted to bring the vocal folds together, and a rather harsh voice is often the 
result. When this is combined with inefficient phonation because of an excess of mucus, the type of 
voice that results is a harsh whispery voice, as suggested by the following comment from Luchsinger & 
Arnold (1965: 136): 'Tracheal and laryngeal mucous secretions are increased, sometimes on an allergic 
basis. Together with a tendency to chronic bronchitis, this over-secretion of mucus produces the 
hacking, coughing, throat-clearing, or "moist" hoarseness of the old man.' In old age, fatty tissue can 
build up in the ventricles in the sides of the upper larynx (Ferreri 1959), and the ventricular folds 
above the ventricles can shrink towards the sides of the larynx, giving a wider entrance to the 
ventricles (Luchsinger & Arnold 1965:136). All these factors can contribute significantly to the fine 
detail of the auditory quality of the phonation being produced.  [...]
<p>