Teaching Pronunciation

by Yoshio Okita


A misconception pertaining to teaching pronunciation: A case study of Japan and its implications to teacher training and teaching pronunciation to ESL/EFL learners whose native language is syllable-timed. This article examines a common misconception among English teachers and learners at high schools in Japan about what it means to teach and learn English pronunciation. It discusses what has caused this misconception, describes it in the light of a recent trend in English education, and, most importantly, discusses what needs to be done in order to put a new concept of teaching pronunciation into practice. The article concludes with a discussion of the implications in teacher training, based on the author’s background as a teaching practitioner and an inservice and preservice teacher trainer.



Although this article is a case study of language learning in Japan, it may be of help to EFL/ESL instructors in other parts of the world to see how pronunciation instruction can be viewed from a communicative perspective. The example here looks at learners who encounter difficulty in producing English words with consonant clusters and/or closed syllables in the stream of speech. Pronunciation difficulties of this kind are commonly found among those learners whose native tongues are syllable-timed and open-syllabled.

The following misconception seems to have been held uncritically by some English language teachers for the past few decades and has undoubtedly hindered learners from acquiring basic speaking competence.

Misconception: The purpose of learning pronunciation is 1) to locate the vowel that is pronounced with a primary stress in a given word, and to locate the word with a major stress in a given sentence, and 2) to distinguish the individual vowel and consonant sounds or phonemes.

The source of the misconception

Entrance Examinations
Since English is one of the core subjects at school, English examinations are given at various times: at mid-term and at the end of the term in high school, and upon entrance to senior high schools, colleges, and universities. These English examinations have frequently included pronunciation questions in written form which generally require finding the stress in a given word or sentence and comparing individual vowel and consonant sounds. Appendices 1 and 2 contain typical examples. (The term "pronunciation questions" is used hereafter to mean "pronunciation questions in written form.")

Appendix 1 is from the National Entrance Examination (English Portion) which was given January 17, 1996. This is the examination administered by the National Center for University Entrance Examinations, a central government organization for all high school students who want to enter national, prefectural, municipal, and some private colleges and universities. In the examination there are 51 questions, eight of which are related to pronunciation. Appendix 2 is from the first National Entrance Examination given in 1979. It is clear that there hasn’t been much change since then in the way pronunciation questions are given. Type C questions in Appendix 1, where the word with a major stress in the given sentence is to be selected, are nothing new. Questions of this kind first appeared in the National Entrance Examination in 1985.

Besides the National Entrance Examination, there is another category of entrance examinations. This is what each private college and university administers on its own. A survey conducted by the author shows that examinations in the private sector usually cover more or less the same pronunciation questions as those found in Appendices 1 and 2.

Accordingly, what passes for pronunciation instruction at senior high school has only consisted of 1) explaining Daniel-Jones’ phonemic symbols using a chart of English vowels and consonants, 2) using minimal pair exercises, 3) having learners memorize pronunciation patterns to predict on which vowel the primary stress is placed in a word, and 4) distinguishing the differences in the sounds of a vowel or consonant in a given word (See Appendices 1 and 2 for examples). All this is done so that learners can get higher marks in the pronunciation sections of university or college entrance examinations. As for finding the stressed word in a sentence, teachers only try to approach this issue from a semantic or structural point of view and usually do not have learners orally produce the sentence. It is no wonder that Cronin (1996:16) states, "Unfortunately, in Japan, many students will have studied [English] for six years in school with little useful practice other than of individual words."

A recent trend in the English examinations

As mentioned above, there are entrance examinations for senior high schools which used to include pronunciation questions. But today, a close look at some of the entrance examinations for senior high schools will reveal that the questions are different from what they used to be. Among the prefectural senior high schools, the number that give pronunciation questions in entrance examinations is decreasing. The statistics (Obunsha 1996) show that in 1996 the number of prefectures which included pronunciation questions was 11 out of 47, whereas in 1994 it was 18.

In Japan there is an English proficiency test called STEP (the Society for Testing English Proficiency). In 1995, 3.5 million people (from elementary school pupils to adults) took this test. The test once included pronunciation questions as an important component, but the administrators revised the test in 1993 and eliminated all pronunciation questions.

In brief, the entrance examinations for universities and colleges seem to be the cause of the misconception, as described above. If pronunciation questions were abolished from university examinations, as many of the high schools and STEP have done, there would be a much broader change in the way pronunciation is taught. It is perhaps unfortunate that recent innovations in high school English examinations have not had any practical repercussion on institutions of higher education.

Taking a more holistic approach

Shifting towards communicative language teaching (CLT)
It was in the early 1980s that some English education specialists began to discuss the importance of introducing communication-oriented language teaching. The communicative approach become a buzzword among people in the fields of education and business in the late 1980s. The latter complained that students did not usually have a good enough command of English to communicate in the business world, even after completing ten years of English training.

In the mid-1980s the Ad Hoc Education Advisory Committee put forward a package of recommendations for educational innovation. Part of the package was a recommendation for future English language education. Following the Committee’s recommendation, the Ministry of Education decided that the English curriculum should be revised. As a result, the English curriculum has been geared more toward communication, and textbook publishers have revised or edited their teaching materials according to the Ministry guidelines. It is now generally believed that CLT is the right track for future English education; it seems that both the education administration and school teachers have agreed on the importance of shifting to CLT. But there seems to be no clear idea of what CLT is all about, much less what approach is necessary in teaching pronunciation in a CLT setting.

Pronunciation instruction in CLT
Hinofotis and Baily (1980:124–5) reported that up to a certain proficiency standard, the fault which most severely impairs the communication process in EFL/ESL learners is pronunciation, rather than vocabulary or grammar. Their arguments make pronunciation instruction all the more important in improving the communicative competence of learners.

Wong reminds us (1993:45) that the most relevant features of pronunciation—stress, rhythm, and intonation—play a greater role in English communication than the individual sounds themselves. Therefore, teaching speech from the perspective of suprasegmentals seems indispensable in a CLT setting. Learning pronunciation should not be limited to finding primary stress and comparing individual vowel and consonant sounds in a given word, as has often been the case with pronunciation learning in the past. Focusing on individual vowel and consonant sounds is only the first step in learning English speech, as Yule, Hoffman, and Damico claim (1987:765).

It is now widely accepted in Japan that the ultimate goal of language teaching is to encourage learners to acquire communicative competence. This makes it all the more important that pronunciation instruction should be approached holistically, not phonetically, and that teaching pronunciation be directed more from the suprasegmental perspective.

Taking the first step toward change

Because there is so much discrepancy between the holistic concept of teaching pronunciation and the one already being implemented in Japanese schools, it would be too difficult to institute a new methodology all at once. So, consider what could be done in Japanese schools to initiate pronunciation training that goes hand-in-hand with the recent trend towards CLT.

First, it is essential for English teachers to realize that, besides pronunciation of individual sounds or phonemes, there are other distinctive aspects of the English sound system that are not found in the Japanese system: English is a stress-timed language, whereas Japanese is a syllable-timed language. As is often the case with a syllable-timed language, Japanese is also an open-syllable language (words in Japanese usually end with a vowel). When English is pronounced by a Japanese speaker, this phonological feature is often transferred, and an additional vowel or epenthesis is the result.

Linking and assimilation
Japanese is predominantly open-syllabled, that is, syllables end in a vowel with each syllable having an equal stress. So words are made with one of the following combinations: CONSONANT+VOWEL, VOWEL alone, or DIPHTHONG, with the exception of the consonant /n/, which is the only consonant that can be found either at the beginning or at the end of syllables. If this system is transferred to an English sentence, "Look at the red doll!" each word would sound like /lUkuw/, /eetow/, /D@/, and /rEdow/, and /daluw/.2 All of the final consonants are pronounced with an additional vowel. This phenomenon is often found among English learners whose native tongue is mainly open syllabled. When a vowel is added in this way, it is unlikely that the speaker will produce English speech with linking and assimilation occurring at the right places.

Linking is a phenomenon in which two sounds are connected between adjacent words and pronounced in a continuous flow from one word to another. Avery and Ehrlich (1992:84) say that linking occurs between the following combinations:

1) Consonants and vowels (e.g., When we pronounce "Come on," we don’t say the two words separately but we say /k@man/ as if they were one word. This is known as "liaison.")

2) Consonants and consonants (e.g., When we pronounce "root beer," the stop consonant at the end of the first word is usually kept unreleased, and there seems to be a pause or a sudden stoppage of breath in between. This is known as "open juncture.")

3) Identical consonants (e.g., When we pronounce "red deer," the two identical and adjacent consonants are pronounced as one long consonant. This is another case of "open juncture.")

Assimilation is a phenomenon in which two adjacent consonants are pronounced and the articulation of the first consonant is altered under the influence of the second consonant. (e.g., When "at this" is pronounced, the alveolar /t/ in "at" becomes an interdental /D/ influenced by the following sound.)

The sounds in each of these three combinations of linking and assimilation are all consonants. If those consonants are pronounced with an additional vowel, quite often with a tense vowel, rather than a reduced schwa as shown above, there is no way the speaker can produce fluent and comprehensible English speech.

It is quite clear what happens when a fluent and comprehensible speaker utters the sentence: "Look at the red doll!" When pronounced individually, the words are pronounced /lUk/, /{t/, /D@/, /rEd/, and /dal/. But when those words are pronounced in normal and comprehensible speech, a liaison linking occurs between /lUk/ and /{t/, and an assimilation occurs between /{t/ and /D@/ and an open-juncture linking occurs between /rEd and /dal/. So the whole sentence sounds something like [lUk{t9D@rEd*dal#] ("9" and "8" denote "dentalized" and "unreleased," respectively.)

The author believes that it is of vital importance for Japanese English learners (and others whose native language tends to be an open-syllable type) to practice linking and assimilation. There are three reasons for this. First, because these two features do not exist in the their first language sound system, learners cannot easily produce the correct English sounds. As a result, as Yule, et al. (1987) imply, they have a hard time perceiving, still less comprehending the phrases or sentences they cannot produce.

Second, by practicing these two features, students will become aware of a basic difference between English and Japanese: English is stress-timed and Japanese is syllable-timed. Students will then be interested to know about other features of the English sound system, like rhythm and intonation, both of which are profoundly influenced by stress timing.

Third, mastery of liaisons and junctures serves as a useful test of whether or not learners have learned to produce the individual English consonants correctly. This is the first necessary step in learning English pronunciation. When some consonants are pronounced with an additional vowel, linking and assimilation will not be possible.

English teachers should begin by having learners realize how important it is to practice linking and assimilation in English speech. Such a realization is not only important in helping learners upgrade their oral production of English but also in improving their listening comprehension skill.

Implications for teacher training

In the proceeding sections it has been argued that in order to upgrade learners’ pronunciation skill, it is necessary for learners and teachers to be aware that

1) communicative effectiveness can best be promoted through not a segmental but suprasegmental approach to pronunciation learning, and that

2) it is practical to begin early on to teach linking and assimilation in English.

It is important for teachers to raise the awareness of their students. But what can help to encourage teachers’ awareness in the first place? Inservice and preservice teacher training courses can insure that teachers will be aware of the state of the art and be confident in their teaching.

Inservice teacher training
Japan has 47 prefectures and 11 jurisdictional municipalities. A local inservice teacher training center has been established in each of those administrative divisions and each gives organized training courses. Much of the achievement in school education is due to the inservice teacher training system. For example, novice English teachers are required by the ordinance to attend a one-year-long course in their first year on the job.

The problem with inservice training is that English teachers, like other teachers, find themselves too busy with daily school chores to attend any of the courses provided, even though teachers are granted the right to attend courses by the ordinance. It was once said that the English teaching curriculum was revised too drastically for trainers to catch up and that trainers were groping in the dark. That seems not to be the case any more and there are well-qualified teacher trainers who provide well-organized and quality courses or programs in every training center. So the problem with inservice training is that most teachers do not have enough time to get new insights at the center even though they want to. The administration needs to insure that teachers have the time to attend inservice as well as the opportunity.

Preservice teacher training
To get a high school teaching credential, students attend four-year colleges or universities and take a required program. The core courses are "English Teaching Methodology," which is usually a four-credit course and lasts one year, and "Teaching Practice," which is a two-week course in which students do practical teaching with the help of practitioners in an authentic classroom setting. "English Teaching Methodology" consists of theoretical and practical study in which students are engaged in micro-teaching in a laboratory setting.

The problem with the preservice training is that the instructors lecture on "English Teaching Methodology." Because the study of TESOL is quite recent here, there are not enough instructors trained and knowledgeable in that area. Therefore, in some instances, non-TESOL specialists are giving the course.


The misconception about pronunciation instruction undoubtedly stems from the type of pronunciation questions used in entrance examinations for universities or colleges. Many teachers still adhere to the kind of pronunciation practice once predominant during audio-lingualism which is not consistent with the recent trend towards CLT. Even though the suprasegmental approach is what is seriously needed in today’s language learning setting, allowance is to be made for a transitional and more practical approach, and instruction emphasizing phonological linking and assimilation must be initiated. This approach is essential for those learners whose native language is an open-syllable type. At the same time, it is important to raise the awareness of all the people involved in English language learning on implementing this approach in their daily classroom activities.


(Special thanks go to Dr. Schaefer at Temple University Japan for valuable advice in completing this article.)

Appendix 1

From the "Center Entrance Examination" (English Portion) given January 17, 1996.  The directions are originally in Japanese and translated into English here by the author.

[A] Answer which one of the four words in each group has the primary stress located on the different syllable from the other three.
Q1 1)  art-ist 2) as-pect 3) boy-cott 4) ca-nal
Q2 1)  con-fi-dent 2) del-i-cate 3) po-et-ic 4) sen-si-tive
Q3 1)ad-mi-ra-ble 2)app-ro-priate 3)com-pli-cated 4)nec-es-sar-y
[B] Answer which two of the four underlined parts are pronounced with the same sound.
Q1  In the Museum of Architecture you have a chance to see models of machines used for building arches.
Q2  I bought a house in the southern part of France for my cousin
[C] In the following dialogue,
Q1  Answer which of the four underlined words is pronounced more strongly than the other three.
A:  May I take your order now?
B:  Yes, please.  I'll have today's special and a cup of coffee.
A:  Would you like your coffee right away?
B:  No thanks. I'd like it later, please.
Q2  Answer which one of the four underlined words is pronounced less strongly than the other three.
A:  Hey! Your baseball just broke my window.
B:  I'm sorry.
A:  You have to be more careful when you play ball.
B:  I will from now on.

Appendix 2
From the "Standard Primary Examination (English Portion)" given January 15, 1979.  The directions are originally in Japanese and translated into English here by the author.

[A] Answer which one of the four underlined parts in each group is pronounced differently from the other three.
Q1 1) power 2) tour 3) flour 4) tower
Q2 1) rough 2) touch 3) glove 4) prove
Q3 1) sew 2) motion 3) foreign 4) blow
Q4 1) shoe 2) truth 3) group 4) throw
Q5 1) receive 2) friend 3) says 4) bread
Q6 1) edge 2) soldier 3) gaze 4) adjust
Q7 1) shepherd 2) triumph 3) elephant 4) philosophy
Q8 1) desire 2) desert 3) disease 4) descend
Q9 1) answer 2) persuade 3) sweet 4) swift
Q10 1) silly   2) assure 3) science 4) passive
[B] Answer which one of the four words in each group has the primary stress located on the different syllable from the other three.
Q1 1) or-i-gin 2) oc-cur 3) lim-it 4) of-fer
Q2 1) in-stru-ment 2) cal-en-dar 3)at-mos-phere 4) ad-vise
Q3 1) ca-nal 2)de-moc-ra-cy 3) char-ac-ter 4) suc-cess
Q4 1) mu-si-cian 2) ne-ces-si-ty 3) au-thor-i-ty 4)pho-to-graph
Q5 1) man-age 2) con-nect 3) o-blige 4) re-veal
Q6 1) a-tom-ic 2) dif-fer-ent 3) se-ri-ous 4) vi-ol-ent
Q7 1) ac-ci-dent 2) ma-chin-e-ry 3) res-tau-rant 4) tel-e-phone
Q8 1) mar-riage 2) mys-ter-y 3) ben-e-fit 4) ex-ist-ence
Q9 1) ad-ven-ture 2) can-di-date 3) ter-ri-ble 4) pol-i-tics
Q10 1) fa-mil-ial 2) im-me-di-ate 3) lit-er-a-ture 4) a-bil-i-ty

Yoshio Okita teaches at the Preservice Teacher Education Course, Kwansei Gakuin University.



. .

