LINGUIST List 9.627

Tue Apr 28 1998

European Minority Languages


Message 2: "Language Resources for European Minority Languages"

Date: Mon, 27 Apr 1998 15:13:44 +0100
From: Briony Williams <briony@cstr.ed.ac.uk>
Subject: "Language Resources for European Minority Languages"

                      Pre-final programme and
                      Call for participation

                          Workshop on

        "Language Resources for European Minority Languages"
        ----------------------------------------------------

           Wednesday May 27 1998 (morning), Granada, Spain

        In association with the First International Conference on
     Language Resources and Evaluation, May 28-30 1998, Granada, Spain

PROGRAMME:

 8:00  Registration

 8:30  Welcome and Introduction

 8:40  "Overview of minority languages in Europe".  Marc 
        Alemany (Catalan        Sociolinguistic Institute).

 9:00  "VOCATEL and VOGATEL: Two Telephone Speech Databases of Spanish Minority
       Languages (Catalan and Galician)".  Luis Villarrubia, Paloma Leon, Luis
       Hernandez (Speech Technology Group, Telefonica I&D, Madrid, Spain);
       Climent Nadeu, Ignasi Esquerra, Javier Hernando (Dept. TSC, Universitat
       Polite'cnica de Catalunya, Barcelona, Spain); Carmen Garcia-Mateo, Laura
       Docio (ETSIT de Telecomunicacio'n, Universidad de Vigo, Vigo, Spain).

 9:20  "Written Linguistic Resources in Catalan: the DCC Project". Joan Soler
       Bou (Institut d'Estudis Catalans, Barcelona, Spain).

 9:40  "The MELIN project". Donncha O' Cro'ini'n (Institiu'id Teangeolai'ochta
       E'ireann/Linguistics Institute of Ireland, Dublin, Ireland).

10:00  COFFEE

10:30  "A framework for the automatic processing of Basque". I. Aldezabal,
       O. Ansa, J.M. Arriola, A. Di'az de Ilarraza, N. Ezeiza, A. Maritxalar,
       M. Oronoz, K. Sarasola (Euskal Herriko Unibertsitatea, Spain); I. 
       Aduriz, M. Urkia (UZEI, Donostia, Spain).

10:50  "Towards the creation of new Galician language resources: From a printed
       dictionary to the Galician WordNet".  Fernando Magan (Ramo'n Pin~eiro
       Research Center for Humanities, Santiago de Compostela, Spain).

11:10  Poster Session 1  (odd-numbered authors at posters)

11:50  Poster Session 2  (even-numbered authors at posters)

12:30  Plenary

13:30  End

==========================================================================

                             Poster papers
                             -------------

 1  "A tagger environment for Galician". M. Vilares, J. Gran~a (Universidad de
    Corunna, Spain); T. Araujo, D. Cabrero, I. Diz (Ramo'n Pin~eiro Research
    Center for Humanities, Santiago de Compostela, Spain).

 2  "A bilingual Spanish-Catalan database of units for concatenative 
    synthesis".
    I. Esquerra, A. Bonafonte, F. Vallverdu', A. Febrer (Universitat 
    Polite'cnica de Catalunya, Barcelona, Spain).

 3  "Methods and tools for building the Catalan WordNet". L. beni'tez, S. 
    Cervell, G. Escudero, M. Lo'pez, G. Rigau, M. Taule' (Universitat 
    Polite'cnica de Catalunya, Barcelona, Spain; Universitat de Barcelona).

 4  "Lemmatisation of the corpus of Cornish". J. Mills (University of Luton,
    England, UK).

 5  "SpeechDat Cymru: A large-scale telephony Welsh database". R.J. Jones,
    J.S. Mason (Univ. of Wales, Swansea, Wales, UK); L. Helliker, M. Pawlewski
    (BT Labs, Ipswich, England, UK).

 6  "KGB Project: Tools and resources for Breton language learning". J. Siroux,
    H. Gourmelon, G. Mercier, J-P. Messager (ENSSAT, Lannion, France).


 7  "A speech database in Basque language". K. Lo'pez de Ipin~a, I. Torres,
    L. On~ederra (Euskal Herriko Unibertsitatea, Spain).

 8  "An overview of the existing language resources for 'Gallego'".
    C. Garci'a-Mateo (Universidade de Vigo, Spain); M. Gonza'lez-Gonza'lez
    (Universidade de Santiago, Spain).

 9  "Language standardisation and linguistic resources: The case of Central
    Ladin (Dolomites)". F. Ciochetti (Istitut Ladin, Vigo di Fassa, Italy);
    F. Pianesi (IRST, Trento, Italy).

10  "The LE-PAROLE project and the National Corpus of Irish". D. O' Cro'ini'n,
    E. Ui' Dhonnchadha (Institiu'id Teangeolai'ochta E'ireann/Linguistics
    Institute of Ireland, Dublin, Ireland).

11  "Design of a phonetic corpus for speech recognition in Catalan".
    I. Esquerra, C. Nadeu (Universitat Polite'cnica de Catalunya, Barcelona,
    Spain); L. Villarrubia, P. Leo'n (Telefo'nica Investigacio'n y Desarrollo,
    Madrid, Spain).

12  "Levels of annotation for a Welsh speech database for phonetic research".
    B. Williams (University of Edinburgh, Scotland, UK).

- ----------------------------------------------------------------------------

WORKSHOP SCOPE AND AIMS:

The minority or "lesser used" languages of Europe (e.g.  Basque, Welsh,
Breton) are under increasing pressure from the major languages.  Some of them
(e.g.  Gaelic) are becoming endangered, but others (e.g.  Catalan) are in a
stronger position, with a certain amount of official recognition and funding. 
However, the situation with regard to language resources is fragmented and
disorganised.  Some minority languages have been adequately researched
linguistically, but most have not, and the vast majority do not yet possess
basic speech and language resources (such as text and speech corpora) which
are sufficient to permit commercial development of products. 

If this situation were to continue, the minority languages of Europe would
fall a long way behind the major languages, as regards the availability of
commercial speech and language products.  This in turn will accelerate the
decline of those languages that are already struggling to survive, as speakers
are forced to use the majority language for interaction with these products. 
To break this vicious circle, it is important to encourage the development of
basic language resources. 

The workshop is a very small first step towards encouraging the development of
such resources.  The aim is to share information, so that isolated researchers
will not need to start from nothing.  An important aspect will be the forming
of personal contacts, which at present do not exist.  The aim is to make it
easier for isolated researchers with little funding and no existing corpora to
begin developing a usuable speech or text database.  There will be a balance
between presentations of existing language resources, and more general
presentations designed to give background information. 

ORGANISERS:

   Briony Williams    University of Edinburgh, Scotland, UK
   Climent Nadeu      Universitat Politecnica de Catalunya, Catalunya, Spain
   Alex Monaghan      Dublin City University, Ireland