About
Introduction
The tool performs a metrical analysis (scansion) of Galician poetry. It outputs:
- The number of metrical syllables for each line
- The stress pattern or positions for stressed metrical syllables
- The stress pattern excluding extrarhythmic stresses (i.e. those that lie outside the main rhythmic pattern for a line)
GAMA is part of the COMPEL project.
How it works
Metrical analysis is based on the JUMPER library by Marco Remón & Gonzalo (2021). This tool performs metrical analysis without syllabification. However, JUMPER is specialized in Spanish. So that it can handle Galician, we carried out some modifications
- The diphthong list and the list of lexically unstressed words was adapted to Galician. For this we consulted works like Carballo Calero (1966) and Freixeiro Mato (2006).
- A preprocessing module was developed, which handles input text before it is passed to the metrical analysis. This preprocessing does two things:
- First, it renders historical spelling closer to current Galician (using the ILG/RAG norm). This is not required for scansion, but it could be useful for any downstream work with the text (e.g. part-of-speech tagging or dependency parsing with a model for current Galician)
- Second, for the purposes of scansion with JUMPER only, a version of the text where accent marks are closer to Spanish orthographic conventions is created. This helps JUMPER in cases where Galician orthographic stress conventions do not match Spanish ones; current Spanish has very explicit orthographic stress rules and, based on orthography, it is possible to unambiguously know which lexical syllables bear prosodic stress. Galician orthography is somewhat more ambiguous in this respect.
Note that JUMPER is very fast (as it does not rely on part of speech tagging or any heavy preprocessing). Most of the processing time required by GAMA is due to our own preprocessing prior to scansion.
Credits
The developers are Pauline Moreau and Pablo Ruiz Fabo (PI).
Code is on GitHub
Metrical analysis relies on JUMPER's algorithm, as described above.
The vocabulary for contemporary Galician is based on a combination of items from dictionaries in Linguakit (Gamallo et al., 2018) and Apertium (Forcada & Tyers, 2016).
A model for contextual spelling normalization was trained with texts from the Nós project corpus (Gamallo et al., 2024).
The work is supported by the European Union (Grant ID 101149659 MSCA-PF 2023).
How to cite
Moreau, P. & Ruiz Fabo, P. (2025). GAMA web: Interface for the metrical analysis of Galician poetry. CiTIUS - Universidade de Santiago de Compostela.
References
- Carballo Calero, Ramón (1966). Gramática elemental del gallego común. Vigo: Galaxia.
- Forcada, Mikel L. & Tyers, Francis M. (2016). Apertium: a free/open source platform for machine translation and basic language technology. In Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products. Riga, Latvia.
- Freixeiro Mato, Xosé (2006). Gramática da lingua galega I - Fonética e fonoloxía. Vigo: Edicións A Nosa Terra.
- Gamallo, Pablo, Marcos Garcia, César Piñeiro, Rodrigo Martínez-Castaño and Juan C. Pichel (2018). LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction. In Fifth Conference on Social Network Analysis, Management and Security, pp. 239-244. Available at IEEE Xplore
- Gamallo, P., Rodríguez, P., Paniagua, S., Bardanca, D., Pichel, J. R., & Garcia, M. (2024). Open Generative Large Language Models for Galician. Procesamiento del Lenguaje Natural, 73, pp. 259-270. Available at SEPLN
- Marco Remón, G., & Gonzalo, J. (2021). Escansión automática de poesía española sin silabación. Procesamiento del Lenguaje Natural, 66, pp. 77-87. Available at SEPLN