jeudi 5 mars 2015

Exposé : Pierre Magistry, Stratification du réseau lexical du hokkien de Taïwan

Un exposé d'un ancien du cursus LI, qui applique des méthodes de TAL assez puissantes (construction semi-automatique de graphes lexicaux) dans une problématique de linguistique historique. C'est mercredi prochain 11 mars, au CRLAO, de 16h à 18h, à l'INaLCO, Salle des Plaques, 2, rue de Lille, 75007 Paris.

Pierre Magistry (Paris Diderot):  
Stratification du réseau lexical du hokkien de Taïwan

Taiwanese Hokkien presents strong evidences of multiple historical layers of borrowing through the pervasiveness of 「多音字」duoyinzi, sinograms with multiple readings. For a given sinogram, traditional analysis distinguish between wenduyin 文讀音 and baiduyin 白讀音 (so-called "literary" and "colloquial" readings) but it is well accepted that more than two strata are to be found and described.
We will both stress the limits of such analysis and explain how we can still benefit from it.
Fortunately, a large amount of lexical data for Taiwanese Hokkien is available as Open Data. We will propose a method to (semi-) automatically model and explore the Taiwanese lexicon in search of such strata.
Our method is based on the modelisation of the lexicon as a complex network .
We will first introduce all needed theoretical aspects of our modelisation (so that no prior knowledge in graph theory is required to attend this presentation). Then we will explain how we can rely on graph theory to create a model of the lexical data as a graph that takes into account the traditional analysis of 文讀音 and 白讀音. Once the model is created, we will show how it can be analysed and explored in search of lexical strata using community detection and advanced visualisation tools.

