By Anna Feldman
Whereas supervised corpus-based tools are hugely exact for various NLP tasks, together with morphological tagging, they're tough to port to different languages simply because they require assets which are pricey to create. accordingly, many languages haven't any real looking prospect for morpho-syntactic annotation within the foreseeable destiny. the tactic offered during this booklet goals to beat this challenge by way of considerably proscribing the required facts and as an alternative extrapolating the correct details from one other, comparable language. The procedure has been proven on Catalan, Portuguese, and Russian. even if those languages are just fairly resource-poor, a similar process could be in precept utilized to any inflected language, so long as there's an annotated corpus of a comparable language to be had. Time wanted for adjusting the method to a brand new language constitutes a fragment of the time wanted for platforms with vast, manually created assets: days rather than years. This e-book touches upon a couple of issues: typology, morphology, corpus linguistics, contrastive linguistics, linguistic annotation, computational linguistics and common Language Processing (NLP). Researchers and scholars who're drawn to those medical parts in addition to in cross-lingual stories and purposes will tremendously take advantage of this paintings. students and practitioners in computing device technological know-how and linguistics are the possible readers of this booklet.
Read Online or Download A Resource-Light Approach to Morpho-Syntactic Tagging PDF
Similar study & teaching books
This e-book develops a framework for discussing fundamental college lecturers making adjustments to their understandings and practices. The framework has been constructed to permit the complexity of exterior and inner points of switch procedures to be explored in a holistic means. exterior elements influencing lecturers comprise elevated specification of the curriculum, altering calls for for types of pedagogy and a rhetoric of Lifelong studying.
The architectural crit, overview or jury is a cornerstone of architectural schooling world wide. The defence of principles, drawings, and types in an open structure prior to employees and friends is meant to be a foreground for fit inventive debate, yet many scholars view it as adversarial war of words - an ego journey for employees and humiliation for them.
This publication is the 1st quantity of a radical “Russian-style” two-year graduate direction in summary algebra, and introduces readers to the fundamental algebraic constructions – fields, earrings, modules, algebras, teams, and different types – and explains the most ideas of and techniques for operating with them. The path covers giant parts of complicated combinatorics, geometry, linear and multilinear algebra, illustration conception, classification thought, commutative algebra, Galois conception, and algebraic geometry – issues which are frequently neglected in typical undergraduate classes.
- The Teacher Monologues: Exploring the Identities and Experiences of Artist-Teachers
- Lecture Notes on Algebraic K-Theory
- Issues in Mathematics Teaching (Issues in Subject Teaching)
- Gendered Identities and Immigrant Language Learning (Critical Language and Literacy Studies)
- Nonkilling History: Shaping Policy with Lessons from the Past
Extra info for A Resource-Light Approach to Morpho-Syntactic Tagging
Chapter 2. Common tagging techniques 28 This algorithm predicts all morphological categories independently and even more, the prediction is based on the ACs rather than on the previously predicted values. Thus, the tag which is suggested by the EXP tagger does not have to be an element of the list of tags returned by the morphological analyzer for the given word. That is why, the purely subtag independent strategy is modified by the so-called Valid Tag Combination (VTC) strategy. 21). 18). The Penn Treebank dataset has been used for the EXP tagging of English.
2003) investigate bootstrapping part-of-speech taggers using co-training, in which two taggers, TnT (Brants 2000) and the maximum entropy C&C tagger (Curran and Clark 2003), are iteratively re-trained on each other’s output. Since the output of both taggers is noisy, the challenge is to decide which newly labelled examples should be added to the training set. They investigate selecting examples by directly maximizing tagger agreement on unlabeled data. The results show that simply re-training on all of the newly labelled data is surprisingly effective, with performance depending on the amount of newly labelled data added at each iteration.
Even though the algorithm described by Yarowsky and Wicentowski (2000) cannot be used directly because of the issues outlined above, their ideas, to a large extent, inspired the current work. The main goal here is to produce detailed morphological resources for a variety of languages without relying on large quantities of annotated training data. Similarly to Yarowsky and Wicentowski (2000), this work relies on a subset of manually encoded knowledge, instead of applying completely unsupervised methods.
A Resource-Light Approach to Morpho-Syntactic Tagging by Anna Feldman