Un exposé d'un ancien du cursus LI, qui applique des méthodes de TAL assez puissantes (construction semi-automatique de graphes lexicaux) dans une problématique de linguistique historique. C'est mercredi prochain 11 mars, au CRLAO, de 16h à 18h, à l'INaLCO, Salle des Plaques, 2, rue de Lille, 75007 Paris.
Pierre Magistry (Paris Diderot):
Stratification du réseau lexical du hokkien de Taïwan
Taiwanese Hokkien presents strong evidences of multiple historical
layers of borrowing through the pervasiveness of 「多音字」duoyinzi,
sinograms with multiple readings. For a given sinogram, traditional
analysis distinguish between wenduyin 文讀音 and baiduyin 白讀音 (so-called
"literary" and "colloquial" readings) but it is well accepted that more
than two strata are to be found and described.
We will both stress the limits of such analysis and explain how we can still benefit from it.
Fortunately,
a large amount of lexical data for Taiwanese Hokkien is available as
Open Data. We will propose a method to (semi-) automatically model and
explore the Taiwanese lexicon in search of such strata.
Our method is based on the modelisation of the lexicon as a complex network .
We
will first introduce all needed theoretical aspects of our modelisation
(so that no prior knowledge in graph theory is required to attend this
presentation). Then we will explain how we can rely on graph theory to
create a model of the lexical data as a graph that takes into account
the traditional analysis of 文讀音 and 白讀音. Once the model is created, we
will show how it can be analysed and explored in search of lexical
strata using community detection and advanced visualisation tools.
Aucun commentaire:
Enregistrer un commentaire