Aquatic ape theory and speech origins: a hypothesis


Marc J. M. Verhaegen


Speculations in Science and Technology 11, 165-171 (1988)


Received: March 1987


Abstract - The question of speech origins is discussed in the light of the theory that humans had semi-aquatic hominid ancestors. Diving requires a special anatomy of the airway entrances and a very refined control of breathing. The brain structures that "voluntarily" controlled the airway entrances’ closure and breathing could also be used for elaborating the older (early hominid, perhaps gibbon-like) sound production. Later, the evolution of association areas in the brain greatly enhanced human ability for attaching a particular meaning to a conventional sound combination.



The aquatic ape theory (AAT) of Sir Alister Hardy (1) states that a few million years ago human ancestors spent a considerable part of their day swimming and diving in a river, lake or sea, and, at least partially, consumed aquatic food. The AAT is supported by the presence of our thick subcutaneous fat layers, by our lack of body hair and by several other features that are absent in non-human primates, but widespread among aquatic mammals (1-13).

The ability to speak is a uniquely human characteristic. Innumerable attempts to explain it have been made but the question of how language emerged is not yet solved. Recently, it has been suggested that the origin of speech was facilitated by our aquatic past (5,14). All aquatic mammals "voluntarily" control their breathing. When surfaced they open the airway passage whenever they want to inhale air, and they can hyperventilate and then close the airway passage when they intend to dive. The subtle "voluntary" control of breathing and airway closure in mammals in general is a pre-adaptation for speech (15,16).

All of this is very speculative but interesting enough to elaborate the hypothesis of the aquatic origin of speech and to propose a possible scenario for speech emergence. Schematically, I discern four, more or less distinct, phases in speech evolution, based on the supposed evolutionary sequence of the speech centres.


Phase I - Gibbon-like song (the tropical forest phase of hominoid evolution)

Presumably, the earliest hominoids were tree living creatures that were smaller than modern pongids and hominids, and therefore probably had smaller home ranges (17). There are reasons to believe that they produced gibbon-like territorial songs. Each pair of gibbons produces its own duet, a rather stereotyped melody (18) that is recognised by its neighbours as belonging to that pair. This is also seen in other monogamous arboreal vertebrates, especially tropical birds (19,20) and some small and medium sized primates (21). In the forest canopy, pure tones seem to carry further than sounds without pure tones (21). Music, a rhythmic arrangement of pure tones, often grouped in alternating "voices", can powerfully affect our emotions. It might be a rudiment of a territorial and pair- or group-binding mechanism, as in the case of national anthems, hymns, love songs, etc., and could have its origin in early hominoid territorial song.

If this is so, it seems to imply that the earliest hominoids were monogamous. That is not impossible, since all three (remarkable different) social group types of the great apes (orangs are solitary, gorillas live in harem groups, common chimps live in territories defended by related males, see ref. 22, pp.25 and 36) are easily derivable from ancestral monogamy.

In that case, the great apes must have lost the more musical gibbon-like utterances. The adult males develop large laryngeal air sacs. They produce loud, low-frequency calls, which are probably better suited to their life-style where they are no longer monogamous and have larger home ranges, than the more varied and more musical gibbon-like songs (see ref. 22, p.207).

Gibbon songs are mainly generated by the vocal chords, although the vocal tract also probably plays an important role (as in birds, see ref. 23). It is not known which brain structures the gibbons use for song generation and analysis. The early hominoids could use subcortical, limbic, extra-pyramidal and/or peri-Sylvian structures. Great apes show slight Sylvian enlargements of the left auditory cortex. Probably, the early hominoids did not use the "voluntary" Area 4 (see Phase II) much in song generation. As an illustration, many patients with paralytic strokes and right hemiplegia (left Area 4 lesion) cannot speak but can swear or even sing.


Phase II - Voluntary breathing and airway closure (the semi-aquatic phase)

All animals can "voluntarily" open the mouth to bite. "Voluntary" means, in this context, controlled by the primary motor projection cortex, i.e. in primates, the precentral Area 4 of the cerebral cortex (see Figure 1). Aquatic mammals can close the airway entrances much more completely than land mammals, thus avoiding being drowned by water entering the lungs, and they have a very refined voluntary control of mouth, nose and throat passages.

Modern man has a very special anatomy of the airway entrances that is not incompatible with a previous semi-aquatic lifestyle. He has a smaller mouth which can be closed more efficiently (24) and, presumably, the wet mucosa of our fleshy lips allows a better fit than the dry skin of the lips of non-human primates. In other primates, the tongue is generally flatter and somewhat less mobile than in humans (ref. 16, p.625). Our nasal cavity is elongated by an external nose (ref. 25, Figure 159) and narrowed by strongly developed inferior conchae, which often cause even complete obstruction in some humans (11,26,29). The nasal cavity can be disconnected from the throat by muscles that raise the velum (probably also in apes) (5). In human adults, the larynx is placed more caudally than in non-human mammals (15,25,27, for a possible explanation, see ref. 14). We have a larynx that is much more mobile than that of apes (28). Moreover, humans, like aquatic mammals, can breathe voluntarily; they can hold their breath, though this is never necessary on land, and can hyperventilate whenever they want to (e.g. before diving, see ref. 29). Humans, in contrast with most terrestrial mammals (ref. 25, Figure 25), can voluntarily breathe through the mouth, possibly facilitated by the laryngeal descent (16,25), which could be an adaptation to enable rapid in- or ex-halation of a large amount of air before or just after a dive, the nose passage alone being too small.

Our primary motor projection cortex is much larger than that of apes, mostly due to the expansion of the areas for the musculature of mouth, throat and breathing, i.e. the latero-inferior section of Area 4 (see Figure 1). Just in front of that enlarged Area 4 lies Broca’s Area. It is a typically human structure indispensable for speech generation, and can be distinguished histologically from all other human cortical areas (ref. 30, pp.5-12). In present-day man, Broca’s Area coordinates the activities of the latero-inferior section of Area 4, in order to produce the right sound at the right time. Broca’s Area (or the first Broca-like structure) originates in my theory’s Phase II to coordinate the muscles commanded by the enlarged Area 4 to make the right airway muscle contract at the right moment: just before, during or just after a dive.


Figure 1

Lateral view of left human cerebral cortex.

After Chusid (30), Geschwind (33) and Thompson (39).


Phase III - Voluntary sound production

The varied but merely emotional sound production (Phase I) combined with the voluntary control of the airway musculature (Phase II) predisposed to a voluntary sound production that could be extremely varied. When our ancestors returned to a wholly terrestrial habitat, airway control for diving became superfluous and the refined airway musculature could be used exclusively for improving vocalisation. The sounds generated by the vocal chords could be strongly modified and diversified by contracting certain muscles in the lips, tongue, velum, pharynx and larynx, governed by the neocortex of Area 4 and Broca’s area. In order to use the voluntary airway control for the vocal apparatus, our ancestor must have been able to register and interpret his own sound production (feedback, cf. motor theory of speech production) (31,32). This was certainly improved by the evolution of the arcuate fasciculus (see Figure 1), a typically human neural pathway between Broca’s Area and Wernicke’s Area (33). Wernicke’s Area, a primary language area used for decoding spoken language, lies immediately dorsal to the primary auditory receptive area, and to the postcentral principal sensory areas for mouth and throat (see Phase IV). In Wernicke’s Area, connections could be made with other, nearby neocortical areas (especially the auditory, visual and sensory areas, see Figure 1), and the sound or certain combinations of sounds (words) could be associated with something that our ancestor was aware of (hearing, seeing, feeling, doing) at the same moment. The first "words" could be an extension or an abbreviation of one’s own melody or an imitation of somebody else’s territorial song or group song, of weeping, crying, laughter, panting, etc., or of natural phenomena, like branch cracking, animal calls, etc. Later (in Phase IV?), a fixed "word" order may have become established by custom (e.g. the actor before the action: subject/verb), and fusion of words that often followed each other created conjugation, flexion and new words.


Phase IV - Association areas and thought

Compared with a chimpanzee’s brain, our association areas are enormously large. These areas are found in the temporal, preoccipital, parietal and inferior frontal lobes (see Figure 1). The cortex of these areas can be distinguished histologically from the other cortical areas and even from Broca’s Area (ref. 30, pp.5-12). This suggests that Broca’s Area and the association areas evolved separately (respectively in Phases II and IV?). In my interpretation, most association areas evolved after the breathing and air-holding function of the enlarged Area 4 and Broca’s Area had been integrated with sound generation (Phase III). The new association areas amplified the possible applications of the sound-producing apparatus. They acted as the hardware of the computer, whereas the sound analysing and producing areas acted as the input/output apparatus. The particular "language" was the software.

There are indications, I think, that our ancestors returned to a more terrestrial habitat not earlier than two million years ago (in a cooler and drier period of the Pleistocene? see ref. 11). In the hominid fossil record, the great expansion of the association areas seems to begin about two million years ago, with the genus Homo (34,35). The limited brain enlargement of Homo habilis could correspond broadly with the enlargement of Area 4, Broca’s area (34), the arcuate fasciculus and Wernicke’s Area (already in Phase III?); that of Homo erectus with a further association cortex enlargement. This would mean that some sort of speech is much older than one million years. The oldest "languages" could have been tonally different, i.e. more musical than today’s languages. Even today, intonation is indispensable in normal speech, and perhaps half of the world’s languages are tonal (cf. Phase I). The relatively small size of the brain of the australopithecines (possibly without a real Area of Broca (34)) could be explained by their dwelling or having dwelt in inland semi-aquatic habitats (e.g. gallery forests), and not in littoral habitats (11). If early Homo lived at the sea coasts, he had to dive deeper and longer than his freshwater cousins, so the voluntary control of this airway muscles became more important. Brain enlargement is a striking feature of many cetaceans. Conceivably, the support of the body (and brain) weight by the surrounding water allowed sea mammals to obtain large brains (for echo-location for whatever "purpose"), because of the weakened necessity of brain miniaturisation in an aquatic environment (e.g. the neurone density in the brain of a baleen whale is more than 100 times less than that in the brain of a wren, see ref. 20, p. 1 19). In this vision, it seems possible that our semi-aquatic life lasted as long as the brain enlargement in Homo, i.e. until less than one million years ago.



Most authors discussing language origins try to explain our speech capacities by an enormous amplification of vocalising abilities that already existed in rudimentary forms in pre-human primates (36,37,38) but fail to explain how exactly this could have occurred. In my view, most of these problems are readily solved by the application of the aquatic theory on the evolution of the vocal and breathing apparatus. For instance, a simultaneous emergence of Broca’s Area, the arcuate fasciculus and Wernicke’s Area seems highly improbable evolutionarily, and yet in the traditional view, each of these structures has no function without the other two. However, in an aquatic phase, a Broca-like structure alone could be used as a coordination centre for controlling breathing and airway closure, and only later the arcuate fasciculus arose, which induced Wernicke’s Area. The traditional view has the same difficulties in explaining laryngeal descent, etc.

Concerning the relation of language and thought, I assume that a simpler (non-verbal) sort of thinking already existed in our pre-aquatic ancestors, but the great unfolding of human cognitive abilities became possible only after the acquisition of proper input/output organs for the brain. Hence, our great communicational capacities may not have evolved thanks to our large brain; rather the opposite seems true: large association areas only became usable with our voluntary sound production.



I wish to thank Mrs E. Morgan, Dr J. Wind, Dr P. van Cauwenberge, Professor M. LeMay and Professor D. Falk for discussions, corrections and help.




1 Hardy, A. C., "Was man more aquatic in the past?", New Scientist, 7,642-645 (1960).

2 Morris, D., The Naked Ape. Jonathan Cape, London (1967).

3 Morris, D., Manwatching. Jonathan Cape, London (1977).

4 Morgan, E., The Descent of Woman. Souvenir Press, London (1972).

5 Morgan, E., The Aquatic Ape. Souvenir Press, London (1982).

6 Morgan, E., "The aquatie hypothesis", New Scientist, 1405, 17 (1984).

7 Morgan, E., "Sweaty old man and the sea", New Scientist, 1448,27-28 (1985).

8 Morgan, E., "Lucy’s child", New Scientist, 1540, 13-15 (1986).

9 Cunnane, S. C., The aquatic ape theory reconsidered", Medical Hypotheses, 6,49-58 (1980).

10 Gribbin, N. and Cherfas, J., The Monkey Puzzle. Paladin, London (1983).

11 Verhaegen, M. J. B., "The aquatic ape theory: evidence and a possible scenario", Medical Hypotheses, 16, 16-32 (1985).

12 Verhaegen, M. J. B., "Origin of hominid bipedalism", Nature, 325, 305-306 (1987).

13 Ellis, D. V., "Proboscis monkey and aquatic theory", Sarawak Museum J., 57, 251-262 (1986).

14 Morgan, E. and Verhaegen, M., "In the beginning was the water", New Scientist, 1498, 62-63 (1986).

15 Wind, J., On the Phylogeny and Ontogeny of the Human Larynx. Wolters-Noordhoff. Groningen (1970).

16 Wind, J., "Phylogeny of the human vocal tract", Annals N. Y. Acad. Sci., 280,612-630 (1976).

17 Clutton-Brock, T.H. and Harvey, P.H., "Primate ecology and social organization", J. Zool. Lond., 183,1-39 (1977).

18 Brockelman, W.Y. and Schilling, D., "Inheritance of stereotyped gibbon calls". Nature, 312, 634-636 (1984).

19 Thorpe, W. H., "Duet-singing birds", Scient. Am., 229, 70-79 (1973).

20 Chauvin, R., La Biologie de I’Esprit. Rocher, Monaco (1985).

21 Haimoff, E. H., "Convergence of duetting of monogamous Old World primates", J. Hum. Evol., 15, 51-59 (1986).

22 Chalmers, N., Social Behaviour in Primates. E. Arnolds, London (1979).

23 Nowicki, S., "Vocal tract resonances in oscine bird sound reproduction: evidence from bird songs in a helium atmosphere", Nature, 325, 53-55 (1987).

24 Hockett, C. F., "The foundations of language in man. the small mouthed animal", Scient. Am., 217, 141-144 (1967).

25 Negus, V., The Comparative Anatomy of the Larynx. W. Heinemann Medical Books, London (1949).

26 Cauwenberge, P. van, "Clinical use of rhinomanometry in children", Internat. J. Ped. ORL, 8, 163-175 (1984).

27 Laitman, J. T., "Evolution of the hominid upper respiratory tract: the fossil evidence", in Tobias, P. V. (Editor), Hominid Evolution, pp.281-286. A. Riss, New York (1985).

28 Fink, B. R. and Frederickson, E. L, "Laryngeal preadaptation to articulated language", in Chivers, D. J. and Joysey, K. A. (Editors), Recent Advances in Primatology, Volume 3, pp. 93-95. Academic Press, London (1978).

29 Verhaegen, M. J. B., "The aquatic ape theory and some common diseases", Medical Hypotheses, 24, 293-300 (1987).

30 Chusid, J. G., Correlative Neuroanatomy and Functional Neurology. Lange Medical Publications, Los Altos (1973).

31 Williams, H. and Nottebohm, F., "Auditory responses in avian vocal neuron: a motor theory for song perception in birds", Science, 229,279-282 (1985).

32 Kelly. D.B., "A motor theory of song perception", Trends Neurosci., 9, 149-150 (1986).

33 Geschwind, N., "Language and the brain", Scient. Am., 226, 76-83 (1972).

34 Falk, D., "Cerebral cortices of East African early hominids", Science, 221, 1072-1074 (1983).

35 Yellen, J.E., "The longest human record", Nature, 322, 774 (1986).

36 Wind, J., "Fossil evidence for primate vocalisations?", in Chivers, D. J. and Joysey. K. A. (Editors), Recent Advances in Primatology, Volume 3, pp. 87-91. Academic Press, London (1978).

37 Lieberman, P., "On the evolution of human syntactic ability", J. Hum. Evol., 14, 657-668 (1985).

38 Steklis, H. D., "Primate communication, comparative neurology, and the origin of language re-examined", J. Hum. Evol., 14, 157-173 (1985).

39 Thompson, R. F., Introduction to Physiological Psychology. Harper International Edition, New York (1975).