Corpus and Computational Linguistics Resources
Below is a list of valuable resources for corpus studies and computational linguistics. These resources can greatly aid in translation research, language studies, and computational analysis.
Corpus Resources
- TR Corpus - A comprehensive resource for translation-related corpora.
- AI Translation Platform - A platform for generating translations using AI, designed to support advanced research in translation studies.
- Corpus of Contemporary American English (COCA) - A vast resource for linguistic research, offering a wide range of texts for analysis.
- Sketch Engine - A powerful tool for corpus analysis, with access to over 500 corpora in more than 90 languages.
- British National Corpus (BNC) - A large text corpus of written and spoken English from diverse sources.
- TED Talks Corpus - A resource for studying spoken English and its translation, containing transcripts of TED Talks and their translations.
- Alpino Treebank - A syntactically annotated corpus for Dutch, providing deep syntactic analysis of texts.
- CHILDES (Child Language Data Exchange System) - A resource for studying language acquisition, with transcripts of child language and their translations.
- Project Gutenberg - Over 60,000 free eBooks for linguistic and translation studies.
- European Language Resources Association (ELRA) - Provides access to a variety of language resources for research and development.
- OpenSubtitles Corpus - A collection of subtitles for movies and TV shows, useful for linguistic analysis and translation studies.
Computational Linguistics Resources
- Natural Language Toolkit (NLTK) - A leading platform for building Python programs to work with human language data.
- spaCy - An advanced NLP library in Python for processing and understanding large volumes of text.
- Stanford CoreNLP - A suite of NLP tools for performing a wide range of linguistic analysis tasks.
- Apache OpenNLP - A machine learning-based toolkit for processing natural language text.
- TensorFlow - An open-source platform for machine learning, widely used in computational linguistics research.
- Hugging Face Transformers - A library that provides general-purpose architectures for natural language understanding and generation.
- FastText - A library for efficient learning of word representations and sentence classification.
- Thinc - A lightweight deep learning library for NLP that supports spaCy.
- LREC (Language Resources and Evaluation Conference) - The leading conference on language resources and evaluation, with proceedings available online.
- ACL Anthology - A digital archive of research papers in computational linguistics, providing access to thousands of publications.