Individual Submission Summary
Share...

Direct link:

Automatic Text Analysis Made Easy: Using AmCAT, NLPipe, and R to Do Corpus Management, Linguistic Processing, and Automatic Text Analysis

Fri, May 26, 14:00 to 15:15, Hilton San Diego Bayfront, Floor: 3, Aqua 309

Abstract

Although there are many programs available for dictionary coding and manual content analysis, it remains difficult to use more advanced text analysis such as Part-of-Speech tagging or dependency parsing without technical expertise. Moreover, as we move from thousands of documents to millions of documents, data management and parallel computing become crucial to project success, but both are difficult to get right without a strong computational background.

We contribute to solving these issues by presenting a combination of three technologies that complement each other and make the various tasks involved in text analysis easier: corpus management (AmCAT), linguistic processing (NLPipe), and substantive analysis (using R). Although the software can be downloaded and installed as a single docker container, the components are fully modular and accessible through REST API’s, so each part can also be used by itself and integrated with existing systems and programs.

Authors