Language variation in Twitter and literary texts
Donnerstag, 19. April 2018, 16:15 Uhr bis 17:45 Uhr
In the last few years, the use of big data for linguistic purposes has opened up new paths in corpus linguistics, since it offers new opportunities for the investigation of large-scale language variation. We analyze a database with nearly 4,000 million geolocalized tweets and investigate lexical and orthographical variation in Spanish and English. While linguistic variation in Spanish is found to show a clear distinction between urban and rural speeches, the English language is characterized by two dominant varieties, namely, British and American. Finally, we will apply a complexity measure to the analysis of literary texts, which allows us to cluster similar authors in an automated way. Implications of our results to the short-ranged syntactic structure of different languages will be discussed.
(Institute for Cross-Disciplinary Physics and Complex Systems, UIB-CSIC, Palma de Mallorca)
Raum: SOD 1 105