We need a new hypothesis to replace one that we pretend to falsify. We cannot just claim that Darwin's natural selection theory of evolution is false if we have nothing else to propose. Every new hypothesis has to account for all observations previously 'explained' by the old interpretation and explain even more. Here, I am arguing that words seem to be arbitrary only because of our phonocentric perspective of language but are mostly iconic representations of visual forms. Indeed most things for which we have a word are silent. Something that produces a sound can be referred to by imitating its sound (echomimetic onomatopoeia) or otherwise (e.g., its visual form). A thing that does not produce sound cannot be referred to by 'sound imitation'; it must be named otherwise. We fail to detect iconic descriptions of silent things in our phonemes because a sound does not directly correspond to visual perception. I am proposing, instead, to search for iconicity in the forms of graphemes used in written language.
I know that this approach sounds crazy. We all know that all languages start as spoken languages; there are thousands of them worldwide (SIL-International 2019); writing comes after. Or do we really 'know' that? That speech is the default modality for language is only a hypothesis. It is a legitimate and attractive hypothesis because most people did not read or write but spoke for most of history. Japanese adopted a writing system only around the 1st century AD, and Slavic languages took their alphabets as late as the 9th century AD. Still, there are thousands of languages without writing systems. Concerning articulate speech, Charles Darwin expressed his conviction that this 'owes its origin to the imitation and modification of various natural sounds, the voices of other animals and man's instinctive cries, aided by signs and gestures… the imitation of musical cries by articulate sounds may have given rise to words expressive of various complex emotions' (Darwin 1871).
Ten years earlier, Max Müller published a list of speculative theories about the origins of spoken language (M. Müller 1862). These hypotheses are meant to explain how the first language could have developed. They also postulate that human mimicry of natural sounds was how the first words with meaning emerged. For example, the bow-wow or cuckoo theory sees first words as imitations of the cries of beasts and birds. The pooh-pooh theory sees the first words as emotional interjections and exclamations triggered by pain, pleasure, surprise, and so on. The ding-dong theory states that all things have a vibrating natural resonance, echoed somehow by man in his earliest words. The yo-he-ho theory suggests that language emerges out of collective rhythmic labor; the attempt to synchronize human collaborative muscular effort results in sounds such as heave alternating with sounds such as ho, so creating the first words. Scholars today consider such theories not so much wrong — they occasionally offer peripheral insights — but simplistic and irrelevant (Firth 1964; Stam 1976). Therefore the impact of onomatopoeia remains marginal, and there is no general theory for iconic words.
No doubt, language can be orally transferred from one individual to another and from population to population. Phonetic mutations may occur during this process. Borrowing the example of a ring species from biological evolution (Fig. 1), we may understand how new spoken dialects and languages may arise by this oral propagation (Cruse 1986). In linguistics, the phenomenon is known as a dialect continuum. People living in neighboring villages may understand each other throughout the territory of an expanding language around a geographical barrier. However, accumulated mutations result in people from opposite extremes no longer following each other. Calabria's neighboring populations to Portugal may communicate without difficulty, despite the artificial political frontiers, but Calabrian and Portuguese are different languages. Should history have followed a different course, this dialect continuum could break. For example, some Germanic, Slavic, or Arabic languages could separate the two extremes. Dialects along a dialect continuum may adopt different writing systems at different times or no writing at all. We may, hence, find spoken languages without writing systems very far away, in geographical space, from a remotely related language that uses an alphabet. The existence of spoken languages, or that most people spoke without reading and writing in history, neither proves a phonetic nature of language nor disproves the alternative hypothesis that individual terms, if not entire languages, are first created in writing and later spread orally.
Figure 1. Left: A schematic representation of a ring species. Individuals can successfully reproduce (exchange genes) with members of their own species in adjacent populations. Individuals of the population, separated by a barrier, cannot reproduce when they contact the two ends of the ring. This process represents a form of speciation occurring with gene flow. Artwork by Andrew Z. Colvin; Creative Commons license. Right: Seagulls of the genus Larus form a circumpolar species ring; the gulls can interbreed to some extent with their neighbours, except at the two ends of the ring. Populations: 1 Larus fuscus (Lesser Black-backed Gull); 2 Siberian population of Larus fuscus; 3 Larus fuscus heuglini or Larus heuglini (Heuglin's gull); 4 Larus argentatus birulai (Birula's Gull); 5 Larus argentatus vegae or Larus vegae (East Siberian Herring Gull); 6 Larus argentatus smithsonianus or Larus smithsonianus (American Herring Gull); 7 Larus argentatus (Herring Gull). Artwork and species information by S. Solberg J. and Frédéric Michel; Creative Commons license.
A similar mechanism of linguistic 'speciation' operates diachronically. Consequent generations understand each other, although progressive linguistic differences are noticeable. However, as these differences accumulate over the centuries, comprehension becomes more complicated. Thus, the ancient and modern versions of the same language sound like different languages; the average Greek can understand very few Homeric words. Nevertheless, practically every word uttered today is somehow related to a previous word attested in the same or another language. Other related aspects of culture and 'knowledge,' such as science and technology, philosophical ideas, religion, or history, are also created focally in writing and spread orally. Non-written lore, like folklore, rumour, or personal, or professional experience, has a much shorter range in space and time; it tends to mutate rapidly and finish in cultural shards. Benitez-Bribiesca points out that in the absence of a 'code script,' the arbitrary transfer of ideas from one brain to another would lead to critically low replication accuracy and high mutation rate rendering the evolutionary process of such cultural events chaotic (Benitez-Bribiesca 2001). Besides a code script, I would add the requirement of a repair mechanism for memes analogous to the genetic code and the enzymatic repair systems for genes. This factual criticism of memetics (Dawkins 1976) applies reasonably well to oral language but not to a written one, which has them all, hard code, accurate replication machinery, and repair mechanisms. Pragmatically thinking, the vocabulary of a standard language exists in a text. An obvious consequence is that, although people may create words verbally all the time (by inspiration or error), only written words survive and spread long enough to become recognizable parts of a standard language – dictionary entries.
Another pragmatic argument for the hypothesis that language emerges by writing is that we cannot have evidence of structured speech from the deep pre-history but do have archaeological evidence of drawings and glyphs. The ancient Egyptians had about 1000 hieroglyphs. It is inconceivable that they also had a distinct phoneme for each of them. Modern French has only 36 phonemes (Jakobson and Lotz 1949), while other major modern languages never exceed 60 phonemes worldwide (Hay 2004). Lithuanian appears to have the richest, let us term it, 'phonome' [1] (by analogy to -omics terminology; 'Omics' n.d.); it comprises 59 phonemes. The most concise European phonome is Modern Greek, which does its job with only 23 phonemes, second only to Modern Japanese (22 phonemes) worldwide. Scholars assume that most Egyptian pictograms corresponded to spoken words, i.e., a combination of phonemes, while others corresponded to syllables or single phonemes. This assumption is questionable. If the Egyptians had glyphs for their phonemes and full-blown phonetically complex words, they should be able to represent them using a few glyphs of phonetic value; they would not need myriads of pictograms. The multitude of attested symbols suggests no spoken words for them. Therefore, we may imagine that pictograms had no phonetic representation. In other words, it is not indisputable that the ancient Egyptian writing system was a mere transcription of Egyptian phonetics. It could be mostly, if not wholly, a written language without a phonetic equivalent.
'A picture is worth a thousand words' is an English adage meaning that complex or multiple ideas can be represented in a single still image more effectively than by mere verbal description. A horse-headed humanoid depicted along with text (Fig. 2) illustrates that concepts are waiting to be named at any time in history. The author of that ancient graffito found it easier to draw the hybrid monster than to describe it in words. Concepts, ideas, and mental representations of objects are born before words. The graphical representation of new concepts is more intuitive and powerful than a verbal description. Likewise, some Egyptian hieroglyphs may have represented ideas or objects without a name. On this line of thought, graphemes precede phonemes.
Having said all that, I do not mean that the Egyptians or the more ancient populations were mute. Biological evolution had given us the capability to speak long before we developed language, as it has given us the ability to drive a bicycle. Should the ancients be given bikes, I am sure they could soon figure out how to operate them even faster, with the help of an instructor. We can train our vocal muscles to produce the highly sophisticated sequences of sounds that characterize speech (or song) as we can train our hand muscles to play a violin concerto. But, to speak, man must have words to pronounce. With most things around having no names, the first language should consist of intuitive echo-mimetic onomatopoeia, gesture, and form-drawing (pictograms). In that case, we need a theory about the graphical representation of objects or concepts followed by a phonetic transcription of the glyphs, i.e., transcription of forms into vocal sound.
Figure 3. Two models of word-formation. Left: the phonocentric model. Right: the graphocentric model.
The diagram of Fig. 3 represents the two competing models of word-formation. The phonocentric model (sememe > phoneme > grapheme) assumes that speech is older than writing. Words were primarily formed by direct transcription of sememes into phonemes or phonetic morphemes, e.g., syllables; the latter were subsequently transcribed into graphemes by convention. Here are two serious decisions to make: (i) what are the most basic and essential sememes to reproduce into the limited number of phonemes of each language? (ii) How do these sememes correspond to sounds? The phonocentric model does not propose any rational mechanism that could massively explain the sememe-to-phoneme transcription. We may presume that this transcription was unconscious, by instinct or intuition. In essence, the human species would have some innate biological (neurological) mechanism to associate a sememe (stimulus, signified) to a phoneme, and a set of sememes, to a series of phonemes.
Though, it would be tough to convince any biologist with this assumption. Behaviors map along a continuum from the innate to the learned. Kinesis, taxis, tropism, and reflex are natural biological responses that require no conscious thought. These movements are fundamental to life and occur in all kinds of organisms, even in unicellular microorganisms and subcellular systems. In man, all innate biological mechanisms are, by definition, universal, i.e., independent of culture. They certainly have changed in evolutionary time (during Homo speciation) but are perceivably conserved and stable in historical time. Wound healing works in the same way in modern-day America as it did in Ancient Mesopotamia. A hungry English Queen salivates in the same way upon presenting a delicious dish as a hungry enslaved person would do in ancient Athens.
Instinct is another type of innate behavior defined as the inherent inclination of an animal towards a particular action. Instincts are fixed behavioral patterns carried out without variation in response to corresponding stimuli. A simple example of an instinctive response is a baby sea turtle, newly hatched on a beach, making its way to the ocean. A behavior is instinctive if performed without prior experience, i.e., as an expression of innate biological factors in the absence of learning. Language has nothing to do with these types of behavior. Instead, language is supposed to be an arbitrary social convention.
At the borderline between innate and rational behavior lies intuition. Whether an intuitive response is natural or cognitive is debatable. Ancient Eastern traditions place intuition at the high end of analytical thinking, beyond the mental process of conscious thought, together with the absolute knowledge of the ultimate reality (e.g., Hindu Brahman; eternal, universal truth). A sort of, you know how to respond to environmental situations, no need to think rightly. In Zen Buddhism, for example, intuition is considered a mental state between the Universal mind (common sense) and the individual's capability to discriminate right from wrong. After examining the exposed evidence and arguments, a judge eventually decides on the gut feeling. The inherent capacity is associated with Kōan, a Sino-Japanese logogram carrying sememes of public life (official, governmental, common, collective), justice (fair, equitable), and administration (table, desk, law case, record, file, plan, proposal; 'Kōan' n.d.). The emperors or God himself govern by intuition. They do not need to and cannot explain their rationale; we laypeople wouldn't be able to understand it anyway. The Hinduist philosopher Aurobindo Ghose (1872 –1950) sees intuition and reason as parts of a circular mental process of progress. Reason organizes our perception, thoughts, and actions into metaphysical philosophy (intuition) and experimental science (Aurobindo 2005). The Indian godman Chandra Mohan Jain, also known as Rajneesh or Osho (1931 –1990), taught that human consciousness increases from basic animal instincts to intelligence and intuition, the latter being the ultimate goal of humanity. According to Yahya ibn Habash Suhrawardī (1154-1191), a Persian philosopher building upon Zoroastrian and Platonic ideas, intuition is knowledge acquired through illumination for correct judgment (Al-Suhrawardi 2006). For Ibn Sīnā (known in the West as Avicenna; c. 980 –1037), another Persian polymath, intuition is a prophetic capacity, a knowledge obtained without intentional acquisition. While common knowledge requires imitation, intuitive understanding reposes on intellectual certitude (Kalin 2010).
The modern Western definitions of intuition employ terms from both sides of the cognitive process. It is cognition, i.e., a method of acquiring knowledge and understanding, though without recourse to conscious rational thought, experience, or the use of senses. Lexico, for example, merges intuition with instinct as 'the ability to understand something instinctively, without the need for conscious reasoning' [2], i.e., understanding (reason) through instinct, which is an innate biological mechanism. The English term intuition comes from Latin intueri (to look at, consider), from in (in, at, on, upon, from, of space; within, while in, of time; into, to; about; according to; against) and tuērī (to look, gaze at, behold, watch, guard, see, observe, view; to care for, defend, protect, support, compensate, make up for; to uphold, keep up, maintain, preserve). If we insist on promoting an esoteric interpretation, we may argue that intuition historically meant looking (for the truth) inwards, inside us; behold the reality that exists in us. However, most of the other sememes and combinations of the Latin stems suggest that intuition meant something like: according to a view, upon consideration, making up from what we see, or for what we do not see, according to observation, visibly, etc. In this sense, intuition is the first step of cognition after sensing. It corresponds to the Platonic concept of low-grade belief or unreliable opinion (Greek πίστις; pistis; good faith, persuasion, confidence, assurance). It follows the primitive awareness of things, eikasia (Greek εἰκασία; imagination, likeness, representation, comparison, estimate, conjecture, apprehension of or employing images or shadows). But it precedes reasoning (Greek διάνοια; dianoia; thought, intention, purpose, meaning, mental process) and comprehension (Greek νόησις; noesis; intelligence, thorough understanding, concrete idea, concept).
Plato appears, in a way, to agree with Aurobindo on a circular (by feedback or inheritance) mechanism of cognition by defining intuition as pre-existing knowledge. If comprehension and knowledge require rational thinking, someone must have done this thinking before. Such knowledge resides in the immortal soul incarnated in each of us. If we need to inquire and seek the truth, this is because we forget most of that pre-existing knowledge at our traumatic birth. Sometimes, however, we can recall and remember that knowledge instead of generating it again. Plato calls this phenomenon anamnesis, remembrance, and gives the example of pre-existing mathematical truths, such as axioms, postulates, self-evident posits, assumptions, starting points, or definitions that do not require reasoning or proof (Plato1967). In other words, knowledge generated through rational thinking by others comes to us as an innate aptitude. Intuition and common sense being products of collective intelligence at the highest end of the cognitive scale, return to take the form of fundamental inherent truths written in our biological material. In his dialog Phaedo, Plato elaborates further on the concept of anamnesis, suggesting that the senses are sources of error. We can only know universal truths. Contingent propositions cannot be eternally transmitted. We have to investigate them at every instance through reason (Plato1966).
René Descartes (1596 –1650) and David Hume (1711–1776) were founders of rationalism and empiricism. Following their thoughts, doubt (reasoning) applies to composite and complicated things. Still, elements of nature such as colors, shapes, quantity, size and number, place and time of occurrence, etc., are beyond questioning (Descartes 1639). For rationalists, physical evidence and empirical proof are unnecessary to establish truths. For example, having a triangle in mind, one may extrapolate and imagine a chiliagon, a shape with a thousand angles. We do not need empirical evidence to prove that such a form exists in nature. At least some concepts and knowledge are gained independently of sense experience. Descartes clearly and distinctly understands his body as an extended thing; this fact does not require any mental process to become established. At the same time, he clearly and distinctly understands his existence as a thinking being, which does not require the presence of his body. This intuitive knowledge requires pure sense, or pure reason, and maps at the two opposite extremes of the reasoning process. Indeed, red color is red by definition, and a chiliagon is a shape with a thousand angles, also by definition. A descriptive definition is the starting point of the cognitive process. When this process starts from sensory observation, it is an explanation; when it starts from a rational idea (prediction), it is validation. The English astronomer and clergyman John Michell predicted the existence of black holes in 1784. Albert Einstein rationally demonstrated this concept along with his theory of general relativity in 1915. It was only in 2019 that observational evidence for a black hole was published (Event Horizon Telescope Collaboration et al. 2019; 'Black Hole,' n.d.).
Empiricists claim that knowledge comes only or primarily from sensory experience and empirical evidence. Propositions about relations of ideas are intuitively or demonstratively certain. But what appears to be an innate idea, a priori reasoning, intuition, revelation, or tradition arises from previous sense experiences (Hume 1777; Baird and Kaufmann 2008b; Morris and Brown 2019). The idea that the interior angles of a Euclidean triangle sum to 180° requires prior observation of pointed objects, angle measurement, transcription of a triangle on paper, line drawing, etc. An intuitive idea of a black hole could only have arisen in the mind of an experienced astronomer who had already gathered astronomical data and mastered cosmic laws. Even that prediction required some simple though a posteriori calculations. Empiricism suggests that the human mind is blank at birth and develops thought only through experience. According to Immanuel Kant (1724 –1804), intuition consists of the primary sensory information provided by the cognitive faculty and is independent of the nature of the objects themselves. The receptivity of the observer must necessarily precede an intuition. One's perception of an object is an immediate representation, an empirical intuition. Instead, perception of space and time is a pure intuition, not objective and real, not including sensation. Anyway, intuitions are immediate representations of objects requiring the senses, whereas concepts are mediate representations of the general characteristics of things like Platonic ideas (Kant 1787).
In modern psychology, intuition is the ability to quickly make decisions and provide reasonable solutions to problems without comparing alternatives. Under time pressure and changing parameters, experts use their experience from previously taken action options to identify similar situations by pattern-matching and choose feasible solutions (G. Klein 2003). Therefore, intuition depends on a conscious review of previous observations and associated outcomes, even if that mental process is almost instantaneous. However, experience does not always correlate with accurate intuition. Researchers attempted to quantify intuition. They employed people who said intuitive, i.e., who made decisions quickly but could not identify their rationale and compared to subjects who decided according to well-remembered reasoning. The task was to determine the amount of the payoff offered to gamblers during a specific gambling trial solely based on facial cues. The prediction accuracy did not differ between intuitive and non-intuitive people, but the study did show that some people are aware of their course of reasoning, and some are not (Giannini et al. 1984). In a previous study (Giannini et al. 1978), non-intuitive subjects indicated that they used systematic trial-and-error methods to make their interpretations. Those subjects needed several seconds to respond but responded with greater accuracy under reinforced conditions (rewarding). Intuitive people could not describe the rationale they employed to make choices. They responded instantaneously and more accurately under non-reinforced conditions (without having the opportunity to validate their predictions). In everyday speech, people frequently confuse intuition with instinct. Someone who has experience with children and knows how to deal with them in certain situations is said to have a better 'instinct' about children. But, as explained above, biological instinct does not require prior experience.
Where exactly does language map on this biological and behavioral responses continuum? Practically thinking, language cannot be an involuntary, innate natural behavior because it presents geographical and temporal variation. It can only be a voluntary, acquired behavior, whether intuitive or rational or both. It is voluntary because we speak or write when we please and acquired because we learn it. It looks intuitive because it is quasi-automatic in native adult speakers. But it is the product of intellectual exercise and takes several years to master. For most of us, it is the most complicated thing we ever learned, still imperfectly, and it requires constant practice to stay alive in memory. We achieve fluency in a language when we have accumulated enough experience and training in vocabulary, grammar, and syntax to compress the time required for thought between a stimulus and speech response. Given the above panorama of propositions about intuition, we may also ask what a more intuitive representation of sememes describing things is, sound, or shape?
Firstly, the Latin-born word intuition, the English synonyms insight, second sight, clairvoyance, and the Greek equivalent ἐνόρασις (enorasis; in + sight; beholding) are all semantically and historically associated with vision, not with sound or hearing. The very etymology of intuition suggests that vision is more intuitive than audition; visual observation is the means, by excellence, of intuition. Look around. Most things are directly or indirectly visible, whereas relatively few objects are heard, touched, tasted, or smelled. Therefore, we use vision far more frequently than other senses for immediate identification and understanding of natural objects. Secondly, to the extent to which intuition involves pattern-matching, matching visual forms to visual forms is far more intuitive than pairing them with sounds. My intuition suggests that the wheels of a bicycle resemble wedding rings, both types of objects being instances of a Euclidean circle (e.g., the letter O). They do not immediately resemble a 360 Hz sound (the frequency of the first formant of the phoneme /o/). A bicycle wheel may make a sound due to friction when it runs, not necessarily a 360 Hz sound. A wedding ring does not sound as it functions. All rings made of solid material may produce a sound when struck. But, resonance depends on the size and physical properties of the colliding objects. There is no universal sound that matches a ring pattern. A ring species (Fig. 1) can neither be struck nor produce sound in any way. Its name intuitively recalls a circular pattern. Even the sound-related meanings of the English ring (resonant sound of a bell, pleasant or correct sound, sound or appearance that is characteristic of something, a continued, repeated or reverberated sound, chime, a set of bells harmonically tuned to produce a resonant sound, resound, reverberate, echo, repeat regularly, loudly, or earnestly) point to the periodic, quasi-circular nature of acoustic waves and harmonic patterns.
'Seeing is believing' is a famous English proverb expressing our folk conviction that vision is necessary for generating intuition. We need to see something before accepting that it occurs or exists. Vision is practically always an influential perception component, even when the primary modality is another sense. We do not feel to comprehend something if we have no idea what it looks like. We tend to consider, for example, the recently published 'photograph' of a black hole as the ultimate proof that black holes exist, and the theory is correct. However, this visual representation is not a typical photograph but a synthetic description of non-visual data, and we know well that even vision can be deceptive. A large volume of the literature suggests that speech perception in face-to-face communication is bimodal. The listener's brain integrates auditory and visual cues from the talker's mouth and face. Well-known illusions occur when these cues are contradictory, and the visual cues override the auditory ones (McGurk and MacDonald 1976; Calvert, Spence, and Stein 2004; Magnotti and Beauchamp 2017). Diagrams and graphs are also artificial visual representations of thought or data, popular since antiquity. If I want to convey a comprehensive and intuitive description of an object or an idea, I better make a drawing.
Both spoken and written language is undoubtedly learned, consciously or unconsciously, by observing and imitating the behavior of others. This cognitive process is known as observational learning. More primitive learning mechanisms proceed in the absence of social interaction. Examples are the habituation or sensitization phenomena (a diminution or amplification, respectively, of an innate response to a recurring stimulus) or learning by reinforcement (reward or punishment), such as operant or classical conditioning (Overmier 2002; Pear 2014). These innate biological mechanisms of learning occur widely in animals, including protozoa (Wood 1988), even in plants (Gagliano et al. 2014). Behaviorist theories of the first half of the 20th century considered an individual's inner biological forces to explain behavior. For example, a hostile impulse would explain one's irascible behavior; through circular reasoning, this behavior would, in turn, pseudo-explain of the person's hostile impulse (Bandura 1971). Bandura based his social learning theory on famous experiments showing that children adopt a particular behavior by observing adult demonstrators (Bandura, Ross, and Ross 1961, 1963). He sensibly explained that learning by trial and error alone would be too slow, arduous, and hazardous. Observational learning prevents everybody from getting poisoned while trying to learn if the fruit is edible or not. If somebody eats it, it is edible. If nobody eats it, I better abstain. Instead, social learning develops through continuous crosstalk between cognition, behavior, and environment. Current phonocentric linguistics is comparable to that of early behaviorist thinking. Language has long been deemed a spontaneous biological behavior generated by some inner human faculty without help from others. Humans would arbitrarily utter a phoneme or two on a particular occasion without connection to the concurrent environmental or social circumstances that cause the utterance. Later, they would find it helpful to symbolize this utterance, drawing glyphs.
Essential observational learning occurs in many higher animals (mammals and birds) by at least three demonstrated mechanisms: imitation (N. E. Miller and Dollard 1941), affordance (Gibson 1966; E. D. Klein and Zentall2003), and emulation (Wood 1988). Of course, there are yet higher types and more complex learning mechanisms (from play to scientific experimentation or simulation), but let us focus on those intuitive mechanisms that could have triggered the invention and subsequent viral expansion of language.
From childhood men have an instinct for representation, and in this respect, differs from the other animals that he is far more imitative and learns his first lessons by representing things[3] (Aristotle 1932).
Imitation hardly requires further explanation and documentation. So strong is the evidence that humans learn by imitation that Andrew Meltzoff proposed Homo imitans instead of Homo sapiens for our species (Meltzoff 1988). In humans, observational learning seems not to need reinforcement, as experiments with animals suggest, but a demonstrator (model) in the learner's social environment. A model is someone of authority or higher status like a parent, teacher, sibling, or esteemed friend in childhood. Adults learn from presumed authorities such as scientists, politicians, specialists, commercials, and authors of any kind. Recently, Cecilia Heyes has challenged the idea that humans have an inborn, genetically inherited 'module' for imitation that differentiates them from other animals. She suggests that there is a continuity of imitation capabilities among animals. 'Human infants learn to imitate using associative mechanisms that we share with other animals, and our prodigious imitative capacity is due primarily to the rich resources provided by our sociocultural environments' (Heyes 2016). Imitation explains well the expansion of language, whether spoken or written, but not the linguistic 'big bang' or the creation of new words.
A more interesting concept to this end is affordance. James Gibson coined the term in 1966 and refined its definition in 1979 (Gibson 1966, 1979):
The affordances of the environment are what it offers the animal, what it provides or furnishes, either for good or ill. The verb to afford is found in the dictionary, but the noun affordance is not. I have made it up. I mean by it something that refers to both the environment and the animal in a way that no existing term does.
The author suggests that the environment affords (provides or supplies) to animals opportunities or facilities such as the terrain, shelters, water, fire, objects, tools, and other animals. The composition and layout of objects and surfaces constitute what they afford. To perceive an object is to perceive what it affords. This radical hypothesis implies that the 'values' and 'meanings' of things can be directly perceived and explains how those values and meanings are external to the perceiver. Man has not created a new environment; we have simply modified the old one to our convenience. Animals discriminate between horizontal and vertical, soft and rigid, flat and protruding surfaces, obstacles and openings, on so on. Terrestrial surfaces are climb-on-able, fall-off-able, get-underneath-able, or bump-into-able relative to the animal. As much as humans, dogs and cats can all identify the most comfortable, soft, warm, and safe surface for sleeping, another surface for sitting, eating, or hiding food. We altered steep slopes by building stairways to afford ascent and descent. Staircase steps afford to step, up or down, relative to the size of one's legs. Some things are graspable, and some are not. Some need a handle to handle. Concave objects can be used as containers because they afford to hold the rain. The list of examples is inexhaustible. The same object or environmental element can provide different affordances to different animals, different individuals, and even to the same individual at different times. In other words, the same object may have different values and meanings as well as alternative objects can have the same affordance, value, purpose, and meaning.
The theory of ecological affordance contrasts with another influential theory of the early 20th century known as Gestalt [4] psychology. The two views are not incompatible, in my opinion. The Gestalt theory suggests that organisms perceive entire patterns or configurations, not merely individual components (Koffka 1935). This school mainly turned psychology into an experimental science based on the principle that there is some correlation between conscious experience and brain activity. Some of its fundamental concepts, called laws of grouping, were empirically demonstrated using vision as the most common sensory modality. These laws are very relevant to our discussion, as we will see them applied in word-formation and mythology.
Organisms perceive some parts as 'hanging together' more tightly than others. This phenomenon is called perceptual grouping. We experience things as regular, orderly, symmetrical, and straightforward. Our mind eliminates complexity and unfamiliarity to observe reality in its most simplistic form and, whence, create meaning. For perceiving an assortment of objects as a group, the objects must lie close together (law of proximity) or be similar (law of similarity). The mind completes incomplete objects and closes gaps for us (law of closure). A dashed line is still a line, an incomplete circle is still a circle, a broken skull is human remains, and a few vertical columns is an ancient temple. We perceive objects as being symmetrical (law of symmetry), forming around a center point, and divide them into an even number of symmetrical parts. When symmetrical elements are not connected, the mind perceptually connects into a coherent shape. We perceive objects as lines moving along the smoothest path (law of common fate). The movement of an object produces a perceptual path on which our mind places the moving object. We perceive objects as having trends of motion, which indicate the path that the object is on. We perceptually group objects if they are aligned (law of continuity). An intersection tends to split the elements of an object into separate uninterrupted objects. Finally, we categorize visual stimuli according to prior experience (law of past experience). According to the law of continuity, the letters L and I together should be read as U, but the law of past experience will prevent us from reading U instead of LI. Of course, it also works in the opposite direction, reading H instead of |-|.
Emulation refers to observational learning without imitation. The observer, enticed by the result of the demonstrator's behavior, achieves the same outcome following a different strategy (Haggerty 1909; Tomasello et al. 1987; Tomasello 1999; H. C. Miller, Rayburn-Reeves, and Zentall 2009; Whiten et al.2009). Emulation may occur even in the absence of a demonstrator. Learners see the movement of the objects involved and come to some insight about its relevance to their own problems (Boesch and Tomasello 1998).
In light of the above evidence, one can easily imagine the following sequence of events. An early human pulls an object on a soft surface and remarks that a linear trace is left behind. A stick moving on the soft surface also leaves a mark. The mark's shape depends on the trajectory of the hand holding the stick. If the hand follows a circular path, it inscribes a circular shape on the sand. The marks represent the shapes of the objects a hand would describe. A linear mark represents an elongated object, linear trajectory, or track. A person can so intuitively write, and no speech is yet required. The invention of writing can quickly spread through observational learning within and beyond the inventor's social group. According to Bandura, observational learning requires (i) attention, including modeled events and observer characteristics; (ii) retention, including symbolic coding, cognitive organization, symbolic rehearsal, and motor rehearsal; (iii) motor reproduction, including physical capabilities, self-observation of reproduction, and accuracy of feedback; and (iv) motivation, including external, vicarious and self-reinforcement. In societies that have already developed language, coding an observation into words, labels, or images improves retention. For communities about to invent language, marks drawn on a soft surface are symbols of forms and behaviors that members can follow.
If language is an arbitrary social convention, it is worth focusing on the terms of language and convention. From a phonocentric perspective, language is synonymous with speech. The relevant Wikipedia articles list more than 20 competing theories on the origin of language at the moment of writing ('Origin of Speech' n.d.; 'Origin of Language' n.d.). They are all phonocentric, i.e., about the origin of speech. Two themes prevail. Articulate speech is either an evolutionary continuation of animal capabilities to produce emotional sound or a distinctive feature of human anatomy and behavioral or cultural traits. Estimates of when speech may have appeared in humans go as early as 100000 to 350000 years ago (Nichols 1998; Perreault and Mathew 2012) when Homo sapiens and anatomically modern humans emerged (~300000 BP). Of course, such estimates do not generate testable predictions since speech does not fossilize. The theories about speech evolution generally invite more critics than supporters. For example, the above-cited calculations assume that phonemes, or other linguistic features, accumulate at a constant rate like gene mutations do in evolutionary time. Instead, one may argue that, once a human becomes aware of her capability of producing vocal sounds, she may utter all the basic phonemes a language needs before sunset. Producing random phonemes without specific meaning is not language.
The origin of meaningful speech is related to the emergence of symbolic culture and behavioral, visual, auditory, or other types of signals that contain meaning and can be received and interpreted by others (Knight 2010). This relation links speech to ritual, religion, and signaling theories. This is a theoretical framework in the field of evolutionary biology dealing with phenomena of animal communication. Vervet monkeys are known to elicit up to thirty distinct vocalizations (calls), some depending upon the kind of predator approaching (Seyfarth, Cheney, and Marler 1980; Arnold and Zuberbühler 2006). As far as I know, animal calls are species-specific and independent of geographical location or social group, which is not the case with human language. Like in all evolutionary theories, propositions about a selective advantage of a given biological trait are reasonable and attractive. One can always find examples in nature to support or decline a claim. My main skepticism against a purely organic evolutionary approach to speech and language is that such an approach would imply no conscious human intervention. Nature has not determined what language we speak as it cannot decide what rituals we will perform, what religion we will follow, what currency we will use, or how we fix the price of our products. Let me use the bicycle parable again. Firstly, worms cannot drive bicycles because they do not have legs. Humans did not evolve legs to drive bikes. They do have an innate capacity to stand and equilibrate, which they employ to equilibrate on bicycles by training their brain and muscles. The advantage of speed overcomes the risk of accidents in busy urban environments, mainly when we observe and understand the rationally designed traffic signals. The ability to speak should be differentiated from the kind of speech we deliver, what we say, what phonemes we use, and how we combine them.
An example of a purely linguistic approach to the origin of speech is the body of theoretical and observational works attempting to identify the 'first words' in a presumed proto-language (Kenneally 2007). Kin names like mama (usually meaning mother), papa (father), etc., are thought to be proto-sapiens words from a language occurring some 50000-100000 years ago in Africa. These words and cognates appear to occur in a great majority of about 1000 studied languages from 14 or so major language families spread in all continents (De-l'Etang and Bancel 2008). Jakobson explained that the 'nursery words' papa and mama are present in many languages families from all over the world because they result as spontaneous, convergent formations of the sound-symbolism type. During breastfeeding, the infant often produces a slight nasal murmur (mur-mur; /mu-mu/). This is the only phonation she can produce with the lips pressed to the mother's breast and the mouth full. She later reproduces this sound as an anticipatory signal at the mere sight of food or a manifestation of a desire to eat in the absence of food or nursing. The nasal murmur is supplied with a labial release; it may also obtain optional vocalic support when the mouth is free (Jakobson 1962). According to folk belief and scholarly works, these words are among the first word-like sounds made by babbling babies. Infants may simply associate the first sound they can make with the first people they see, i.e., their parents. Better, parents tend to associate these sounds with themselves and subsequently employ them as part of their baby-talk vocabulary. Some believe that the bilabial stops /p/ and /b/ and the bilabial nasal-murmur /m/ are among the easiest consonants to pronounce.
The child's words are not standard language. The terms mama (mum), papa (dad), Greek mpampa (pronounced [baba]; papa; dad), and the similar (dada; kaka, pipi, mimi, mam, miam, etc.) exist in everyday Modern Greek but do not appear in any Ancient Greek text or dictionary. Using the mama and papa paradigm to explain names such as Odysseus and Penelope requires a dangerous logical jump. Judging from the frequencies of the phonemes /m/, /p/, and /b/ in various languages, these are not the easiest consonants to pronounce. In the English vocabulary, R, N, T, S, L, or C, and the corresponding phonemes are more frequent than P, M, or B. The vowels A and U are also less frequent than E and I. The Greek equivalent of dad ([baba]) is formed with a double phoneme /mp/ because Greek has no /b/. The letter B and diphthong /mp/ are among the rarest consonants by far. The voiced bilabial nasal /m/, voiceless bilabial stop /p/, and voiced bilabial stop /b/ are all articulated with closed lips. The open unrounded vowel A (/a/ or /ɑː/; from mama) and the open-mid back unrounded U (from mum) require the mouth fully open in most languages. Therefore, pronunciation of the phonemes /ma/, /pa/, or /ba/ needs the lips and mouth to move from a completely closed position to wide-open. Note that baby words usually consist of a single syllable repeated for emphasis.
The child so emits a signal meaning 'I need food' in the visual form of 'I open my mouth,' or 'my mouth is open and empty, please fill it,' accompanied by sound to attract the adult's attention. The syllables are utterly iconic signifiers, transmitting the signified (need to eat) in visual and auditory modalities. Such gestures and sounds represent babies' innate (instinctive) behavior in response to hunger. They occur around the globe and are imitated by the parents, who interpret them as they please. As innate behavior, it is almost certain that the same gestures and sounds were performed by proto-human babies, perhaps even by babies of ancestral Homo species that could produce voice. But I cannot invoke this mechanism to explain Homer's epics.
A language deprivation experiment is to isolate infants before they speak and observe which natural language they develop. Although scientifically sound, this test with human subjects is not allowed by any ethical standards today. However, several kings and emperors with religious or political motivations have reportedly performed such experiments in the past. Most of the time, any observed baby language met the experimenter's preconceived ideas. From his famous research, Pharaoh Psamtik I of Egypt (also known as Psammetichus; 664–610 BC) deduced that children naturally speak Phrygian[5] (Herodotus 1920). According to King James IV of Scotland (1473 –1513), Scottish children raised by a dumb woman on the islet of Inchkeith in 1493 naturally spoke good Hebrew (Lindsay 1814). The island was then presumably uninhabited but used as a quarantine place for people suffering from syphilis four years later, after 1497 ("Inchkeith" n.d.). Frederick II, Holy Roman Emperor of Hohenstaufen (Sicily, 1211-1250 AD), saw his subjects dying before pronouncing a word. Perhaps the most reliable account is an experiment by the third Mughal emperor Abu'l-Fath Jalal-ud-din Muhammad Akbar (1542– 1605 AD), also known as Akbar the Great, a distinguished Indian-Muslim politician, reformer, and intellectual. Akbar isolated 30 children from language to repeat Psamtik's experiment at a larger scale and apply the laws and customs of the country whose language the children would speak. But he observed that, after four years of isolation, all of the children who survived were dumb[6] (Beveridge 1888). This account is consistent with the old observation, traced back to Kaykavus' Classical Persian poem Qabus Nama (c. 1080), that people with congenital loss of hearing are unable to speak[7] (Kaykavus 1886). There is no natural innate speech, but it is all a social construct. We imitate what we hear.
Without claiming any social science expertise, common sense suggests that other conventional social protocols – governance, ethics, law, market, fashion, technology, etc. – are established by more-or-less conscious, rational mechanisms. The 'demonstrator' proposes while the observers accept or reject. The demonstrator decides how to represent a signified symbolically. A politician proposes a policy for better living; a scientist proposes a theory to explain a phenomenon; a manufacturer offers a product at a price; a designer proposes a prototype solution for a purpose; a poet proposes a word for a concept; and so on. Ideas and symbols get established, persist, and die out depending on social needs and environmental circumstances. This mechanism is somewhat analogous to Darwinian adaptation by natural variation and selection transposed to cultural entities. An astrophysical term will probably never make it to an agricultural community; probably, hydraulics specialists will also ignore it. The name of an ancient agrarian technique will be forgotten in modern urban environments. But cultural events occur at a completely different time-scale than heritable variation and natural selection.
If demonstrators and observers belong to the same species, their brain works in roughly the same way. Symbolic coding that is not intuitive to the observers – i.e., requires lengthy reasoning to be understood – will not work. But, trial and error will soon find a working symbol intuitively followed by the observers. Symbols are subject to constant adjustment and frequent failure. Such conventions are not arbitrary (random); in the sense that 'anything will do.' Only symbols that mean something to the observer are accepted and copied.
What symbolic sound can describe a straight line that makes no sound; the cords of an instrument that make different sounds; or television that speaks all the time? I cannot suggest any conscious, rational, and describable mechanism for transcribing sememes directly into phonemes. However, I am indicating herein multiple arrangements for the transcription of basic sememes into a minimal set of graphemes, which we may recombine to describe any imaginable concept. Then, these few graphemes can be rationally transcribed into phonemes and converted into intuitive speech by demanding but rewarding practice. This possibility leads us to the alternative, graphocentric model of word-formation.
The first part of that graphocentric theory (sememe > grapheme > phoneme) should not be difficult to demonstrate by experiment. Children can spontaneously put together a set of signifiers from their minds onto a piece of paper as soon as they get hold of drawing materials. They are not necessarily able to describe the same signifiers in words. Man has been drawing and engraving since the Lower Palaeolithic era (500000 years ago; Fig. 4), long before phonetics can be documented. We can also easily imagine how drawing can be abstracted and converted into standardized glyphs with meaning. A straight line may be represented with I (|), a round object with O. A distance, or something thicker than a line, may be shown as the space between two consecutive straight lines (|-| > I-I > H). A zig-zag line meaning left-right-left-right, runs vertically and looks like the Greek letter Sigma (Σ). A horizontal zig-zag form with up and down bents, like a wave, a denture, etc., may be represented with the Phoenician letter Šīn (∨∨; W), the Latin M or W. The Greek letter Lambda (Λ) shows upwards while V shows downwards. A parenthesis-like grapheme resembles a smiling mouth as in a smiley (Fig. 5). The Phoenician letter Pē (see section P) would do the job. It inherits the semantic value of mouth along with everything else that a mouth may evoke: a hole, drink, food, breathing, speech, lips, the bilabial phoneme /p/, the papa-person who feeds us, etc.
Figure 4. Homo erectus drawing with geometric incisions on shell c. 500,000 BP. Naturalis Biodiversity Center, Netherlands. Artwork by Henk Caspers / Naturalis Biodiversity Center; Creative Commons license.
The second part of the hypothesis is more challenging: how do we transcribe glyphs and associated signified objects into phonemes? This question is not any more difficult to answer than the opposite question required for a speech-to-writing hypothesis: how we transcribe phonemes into lines and letters. At least, the graphocentric model narrows down the problem. Instead of explaining the phonetic transcription of all man's first observations and inventions, we only need to figure out how to pronounce a score of graphemes per language. In other words, how to configure the mouth muscles to evoke each grapheme and produce a sound that evokes a particular muscle configuration corresponding to the grapheme even in the absence of visual contact. It seems we have some answers already in the literature.
Phoneticians classify the vowels along two primary dimensions: height and backness. Vowel height represents the vertical position of the tongue relative to the roof of the mouth. However, the term refers to the voice's lowest resonance associated with the tongue's height. For close vowels like [i] and [u], also known as high vowels, the tongue goes close to the palate, high in the mouth. The tongue is positioned low in the mouth for open vowels like [a], also known as low vowels. When looking at our sore throat, the doctor asks us to lower the tongue by making an [a]-voice. Height is defined in terms of the inverse of resonance. The higher the frequency of the sound, the lower (more open) the vowel (Ladefoged 2006). Vowel backness corresponds to tongue positioning along the horizontal axis, relative to the back of the mouth, during the articulation of a vowel. However, like vowel height, backness is defined in voice resonance, not the tongue's position. For front vowels, such as [i], the sound frequency is relatively high, corresponding to a position of the tongue forward in the mouth. For back vowels, such as [u], the sound frequency is low, consistent with a tongue position towards the back of the mouth.
The third dimension of vowel classification is roundedness. The variable's name is after the rounding of the lips in some vowels. Because lip rounding is easily visible, vowels are identified as rounded based on the articulation of the lips. Acoustically, rounded vowels are distinguished chiefly by a decrease in backness and, slightly, in height. In most languages, roundedness is not a distinctive feature but a reinforcement of mid to high-back vowels. In some languages, such as French and German (with front rounded vowels), roundedness is independent of backness to a certain degree. Still, some phonetic correlation usually exists between roundness and backness.
There are, of course, further subdivisions and higher dimensions of the phonetic (mouth) space. Because sound and space are continuous variables, phonemes may be grouped or distinguished arbitrarily. A small set of so-called cardinal vowels is used by phoneticians as reference points to map all vowels of all languages. A cardinal vowel is a theoretical vowel sound produced with the tongue in extreme positions, high or low (height), front, or back (backness). Daniel Jones systematized the current system in the early 20th century (Jones 1917), but the idea goes back to previous works; notably to The Alphabet of Nature by Alexander John Ellis (1814-1890; Ellis 1845) and the Visible Speech by Alexander Melville Bell (1819 –1905; Bell 1867). For instance, the vowel of the English feet is proximal to the cardinal vowel 1, [i].
To take the phonetically simplest European example, Modern Greek has a system of five vowel phonemes: /i, u, e, o, a/. The first two have qualities approaching their respective cardinal vowels [i, u], the mid vowels /e, o/ are true-mid [e̞, o̞], and the open /a/ is near-open central [ɐ] (Arvaniti 2007). We observe that the pronunciation (phonetic transcription) of the Greek graphemes I, Y and A matches their shape. The narrow I and the narrowing-down Y give close vowels, [i]. The widening form of the A corresponds to an open vowel, [a]. The grapheme H is (supposed to be, and probably was) pronounced as a long [i], i.e., like I but for a longer time, which fits the hypothesis that H represents a double-I (I-I > H). The vowels /e/ and /o/ have intermediate values for height, i.e., they are half-closed, the former being a front vowel, the latter a back vowel. Their backness values corroborate the hypotheses suggested herein that the grapheme E stands for 'opening' (in the sense of a window, rather than definitely 'open') and 'branching' (outward expansion), therefore having low roundness and backness as a phoneme, [e]. In contrast, the graphemes O and Ω likely stand for roundish or concave forms and correspond to a phoneme with high roundness and backness values, [o].
We, therefore, see some correspondence between mouth configuration (pronunciation) and the shape of vowel graphemes, which is relatively easy to establish in either direction, from phoneme to grapheme or vice versa, although it does require human expertise. The question remains whether one can unconsciously describe natural objects using phonemes directly, or some intermediate abstraction into simple forms is required.
Several experiments show crosstalk between brain regions responsible for visual and auditory stimuli and motion responses. This crosstalk is thought to produce the so-called 'bouba/kiki' effect. A related pathological condition would be synaesthesia (Köhler 1929; Ramachandran and Hubbard 2001; "Bouba/Kiki Effect" 2020). The most obvious example is dance, but the researchers have also argued that subjects systematically chose the name bouba to rounded shapes and kiki to jagged, angular shapes. Ramachandran and Hubbard suggest that the bouba/kiki effect has implications for the evolution of language because it indicates that the naming of objects is not entirely arbitrary. However, their suggestion requires the bold assumption that today's pathological condition was once the norm in human populations.
These experiments usually present a rounded or jagged shape to subjects and ask them to attribute one of the two predefined names, kiki or bouba, to each form. A potential bias is that angular and spiky forms are reminiscent of the letters K and I (straight lines), while the bouba shape reminds roundish forms like B, O, U, or lowercase A (Fig. 6). Providing both the images and the words makes it impossible to discriminate auditory from visual mechanisms by which the subjects associate words to forms. I asked my colleagues to draw any shape of their choice that best corresponds to the sound [kiki] or the sound [buba]. The drawings for [kiki] were indeed enriched in angular shapes and straight lines, while the [buba] forms mainly presented curves and roundish elements (Fig. 7). Objects drawn after [kiki] were generally smaller than those outlined for [buba], either in drawing scale or size of the signified objects (e.g., bird vs. primate). The shapes different people drew for the same sound had nothing to do with the classical kiki or bouba shapes or with each other.
When the experiment is run in this direction, it introduces a new bias. The sounds [kiki] and [buba] may evoke familiar words and associated meanings. The sound [kiki] evokes the French word riquiqui (also spelled rikiki), meaning ridiculously small, petty, skimpy, cramped. This word derives from an onomatopoetic root, rik-, evoking a short and sharp sound and forms words expressing accuracy, fairness, strictness, then meanness, greed, or narrowness. The morpheme -ki is also used as a diminutive suffix in Greek. Also, the sound [buba] evokes baby, bubble, bubby, bubo or bulb, in English; bébé or poupée, in French; various roundish personages in children's shows and toys worldwide; the Greek boubou (chubby female baby); etc. It is, therefore, challenging to discriminate between the hypotheses of phoneme-to-shape or grapheme-to-shape correspondence using the bouba/kiki type of experiment. Moreover, that particular experiment concerns only the phonemes and graphemes it involves (/k, i, b, o, u, a/; K, I, B, O, U, and A); what about the other phonemes and graphemes?
Figure 6. The classical Kiki and Bouba shapes evoke letter shapes. Artwork by Bendž; annotation by ES; Creative Commons license.
In Saussure's terms, the attribution of a signifier (phoneme) to a signified (natural object) seems arbitrary from the above [buba]/[kiki] experiment. We cannot predict how a subject will interpret phonemes heard out of context. By hearing the sound [kiki], people may draw a snake, a star, or a tree, at their discretion. Upon [buba], they may represent a teddy bear, a cloud, or a vehicle. A phoneme-to-shape transcription cannot be specific unless it is limited to drawing standard graphemes. There are only a few ways to transcribe phonemes using graphemes in various languages (e.g. [kiki] = KIKI or QUIQUI, in French). Expanding a grapheme to a fuller graphic representation of an object is also arbitrary. For example, given the grapheme A, one may imagine a pipe, an angle, a roof, an inverted bull's head, open human limbs, etc. Could the transcription object-to-grapheme (signified-to-signifier) be more predictable?
In a pilot experiment, I showed my friends images of recognizable and non-recognizable objects, including photographs and figurative or abstract drawings, without explaining what these represent. I asked them to describe the forms they see in each image by choosing three letters from a panel (artificial alphabet) of 45 ancient or modern letter-like forms. The panel included some cursive or purposely rotated/created glyphs. I also explicitly asked them to avoid thinking of the words that the images may evoke in their language but think instead as a painter who abstracts a visual impression using a given set of pure lines.
According to Saussure's hypothesis of arbitrariness, every letter of the panel has the same probability of being chosen by the subjects for every image shown since linguistic signs are supposed to be orthogonal to their signified objects. There should be no statistically significant association of some letters with pictures. This prediction is counter-intuitive – we would instead expect a significant association of, say, round graphemes with round objects – but the absence of statistical association is precisely what Saussure's hypothesis predicts. Well, as expected by intuition, and contrary to Saussure's prediction, there were several statistically significant associations between image forms and letter shapes.
If the relation between the signifier and the signified is arbitrary, then the signifier does not predict the signified, and the signified does not predict the signifier. If we ask subjects to draw a tree, we will probably collect as many different tree shapes as subjects. Some will draw a Christmas tree, some will emphasize the branches, some will draw a trunk with roundish foliage instead, some will draw a palm tree, and some will put fruits on it. There are many ways to describe a tree, and we cannot predict what the individual designer will choose to represent. Suppose the designer opts to figuratively represent a trunk with roots, branches, and foliage in the other direction. In that case, we have no difficulty predicting that she has thought of a tree and the type of tree she had in mind. Different peoples may have different trees in mind simply because they live in different natural environments, with different trees around. Those residing in Alpine environments see conifers (Δ-shaped Christmas trees), those around the Mediterranean coasts see pines, those in fertile valleys see fruit trees, etc. If asked to draw letters that describe the form of their model tree from a limited alphabet, we can predict that those authors will pick different graphemes and form different words for their different tree models. In my preliminary experiment, subjects appear to agree on what letters best describe the forms of various trees and predict, without knowing it, at least in part – under the design of this simple experiment – the words for tree in multiple languages.
Perception (signified > signifier), as well as expression (signifier > signified), are sums of signal and noise. As a communication system, language should reduce noise so that the signal can be perceived and understood. The only convention is the set of allowed graphemes – e.g., the alphabet – and their corresponding phonemes accurately pronounced. We cannot just draw anything we have in mind. We have to choose shapes from that limited set that everybody is trained to understand. What we have in mind does iconically correspond to a real signified thing. But, an object acquires a name only when someone describes it using a standard set of graphemes. Knowing many signifiers (words) in one language, we can understand whole texts and ideas. Knowing the meaning of each morpheme and stem in a word, we can predict the word's meaning. If each character, or short string, is a morpheme and we can trace its primary purpose, then we should be in principle able to predict the meaning of a word, of any term, even the meaning of Ancient Greek proper names.
Figure 7. Drawings produced by subjects hearing the non-words [kiki] (left) or [buba] (right). Each part of the image (a to h) shows the drawings of one participant.
References
[1] The ensemble of the phonemes of a language; by analogy to the biological terms ending in -ome for the entire sets of genes (genome), transcripts (transcriptome), proteins (proteome), metabolites (metabolome), phenotypic traits (phenome), etc., of an organism, considered collectively.
[4] The German term for form is also interpreted as pattern or configuration.
[6] Unable to speak, most typically because of congenital deafness
[7] ‘Une autre preuve de ce que j'avance est ce fait que tout sourd de naissance est en même temps muet; ne voit on pas que tous les muets sont sourds?’ ('Another proof of what I put forward is the fact that every man who is deaf from birth is at the same time dumb; don't we see that all the mutes are deaf?).'