Portion of the Philippine language relations map.

Portion of the map.

First, a big thanks to everybody for being engaged in what I thought was just a simple map to visualize relationships of Philippine languages. For some including me, it was as an eye-opener.  To others the map was a picture that celebrates our diversity. Whatever this map meant to you, I´m glad that we learned together.

I loved hearing from you. You gave thoughtful comments with lots of interesting facts. This time I’d liked to answer your questions. I will also address the comments on the GMAnews online article about the map.

Here goes.

It is not my study.

Yes, I did the map.  But the data came from a study done by A. Bouchard-Côté and co-authors who are mathematicians and social scientists from Canada and the United States of America. Their goal is to reconstruct Proto-Austronesian words from modern Austronesian languages [1]. Based on their data, I looked at the relationship among languages and mapped them out geographically. 

Their research is quite impressive. It not only looked at relationships among languages but also made “a first step toward a comprehensive computational model of sound change [2].”

Why is this important? Quintin Atkinson in [2] compared it to algorithms evolutionary biologists initially had. He said, “compared with the rudimentary models of nucleotide substitution first used by biologists, Bouchard-Côté et al.’s model of sound change is highly sophisticated.”  The algorithm of Bouchard-Côté et al. can help historical linguists.

Biologists benefited from modern computational algorithms. To enumerate some examples: they refined the story of how humans populated the world, traced the origin of coconut domestication and determined taxonomic relationships among animals.

What did the authors do exactly?

They “used a probabilistic model of sound change and a Monte Carlo inference algorithm (a type of mathematical algorithm) to reconstruct the lexicon and phonology of proto-languages given a collection of cognate sets from modern languages.”

The words are considered cognate when they came from the same root word. A cognate set is a collection of related basic words.

First, the authors gathered words from modern languages that are related to each other. They got these words from a large database, the Austronesian Basic Vocabulary Database (ABVD), which at that time contained 659 languages.

Next, they looked at the phonemes of the words. A phoneme is a basic unit of a language’s phonology, which is combined with other phonemes to form meaningful units such as words or morphemes [wiki].

Then they modeled “the evolution of discrete sequences of phonemes, using a context-dependent probabilistic string transducer. Probabilistic string transducers efficiently encode a distribution over possible changes that a string might undergo as it changes through time [1].”

Yes, there was a lot of math involved. 

How relevant are these to your questions. 

1.  Certain languages were not included in the the Austronesian Basic Vocabulary Database (ABVD) at the time of study.

There are around 1200 modern Austronesian languages but only 659 languages were in the database. Again, the database is limited only to Austronesian languages which do not include for example, Chavacano.

The authors wanted to include more languages to enhance the reconstruction but they were limited by the database.

2.  Some ‘similar’ languages were listed separately. 

For example, Ilonggo and Hiligaynon were considered as different languages in the database.

3. Each language in the database is described by 210 words and these words are known to be stable.

The Ilonggo in Mindanao have the same sets of words as the Ilonggo in Iloilo. This is true also for Ilocano.

Also, languages do not emerge in a short span of time. Researchers say that a new language is formed every 1600 years for each lineage in the Philippines [3]. 

4. Since the relationship is based on a lot of mathematics, I could not just include a language not part of the study. 

The languages went through a rigorous math that enabled the researchers to establish the relationships. Adding another language that did not go through the process will disrespect the science and the internal logic that made the relationships possible. It will invalidate the map.

Correcting misconceptions

Perhaps one of the hardest to correct is the idea that the Philippine languages came from Malaysia and Indonesia because that is what’s being taught to us in school. I guess this is the reason why GMAnews placed it as a heading. 

I’ll give you a snippet of one of my future posts with the image below. I hope to convince you that our languages did not come from Malaysia and Indonesia.


Austronesian expansion from [3]. I placed the Proto-Malayo-Polynesian label.

The node I am pointing is the Malayo-Polynesian node. Notice that the parent of the Philippine languages separated first in the Malayo-Polynesian node. The Austronesian languages of Indonesia and Malaysia are part of the Island South-east Asia group. I hope that it is clear from the image above that our languages did not traverse the path from Malaysia and Indonesia.

Let’s consider two things. 1) If the Philippine languages descended from Indonesian and Malaysian languages, then it will be subsumed under the Island South-east Asia languages. The plot says not.  2) The geographic origin of the Austronesian language is Taiwan and follows the north to south path: the Philippines was populated first before Island South-east asia.

Recent analyses of genetic materials also point to the same route (click herehere and here).

Is this the final say?

There are competing models but this is what science is telling us now. This is so far the best description of the truth with current evidences.

This is how science works: a never ending iteration of corrections and investigations.

What else can be studied?

Linguists are having a hard time analyzing the Philippine languages because of word borrowing between languages that are neighbors.

Researchers said that the Greater Central Philippine language grouping “may be obscured in their analyses due to conflicting signal caused by borrowing between neighboring languages [3].”  Moreover, they said that “the features identifying these subgroups may not be present in the 210-item list of basic vocabulary they have used.”

I colored and italicized the comment of the researchers because this is where you, a native speaker of a Philippine language, is of great value. There are subtle features that only native speakers can get. There are other cultural clues that may be embedded in the cognate words that only a native speaker can know but are not incorporated in the algorithm.

Finally, the researchers of the studies I cited below are all non-Filipinos. This is disheartening. It seems that when in comes to our own languages, non-Filipinos know better than most, if not all, of us.

I’d like to thank you again for reading my blog.

Happy learning!



[1] Bouchard-Côté A, Hall D, Griffiths TL, & Klein D (2013). Automated reconstruction of ancient languages using probabilistic models of sound change. Proceedings of the National Academy of Sciences of the United States of America, 110 (11), 4224-9 PMID: 23401532

[2] Atkinson, Q.D. (2013). The descent of words PNAS, 4159-4160 : 10.1073/pnas.1300397110

[3] Gray RD, Drummond AJ, & Greenhill SJ (2009). Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science (New York, N.Y.), 323 (5913), 479-83 PMID: 19164742