You are here

DHAsia Hands-On Clinic | Stylometrics and Genre Research in Imperial Chinese Studies, with Paul Vierthaler

REGISTRATION LINK HERE.
In this hands-on workshop, Paul Vierthaler will introduce participants to the basics using stylometry to analyze classical and vernacular Chinese texts.
At the end of the workshop, participants will have developed a workflow to sanitize, normalize, and then analyze documents in a corpus of Chinese texts. Participants will also work with tools to semi-automatically detect genre or authorship.
This workshop will focus on problems unique to working with the types of language found in pre-1911 Chinese texts. We will begin by going over how to prepare documents for analysis.
This will include discussions on how to tokenize a Chinese text in ways that improve the accuracy of analysis, particularly when comparing texts written in very different linguistic registers. We will also cover text sanitization and data normalization.
The workshop will conclude by discussing hierarchical cluster analysis, principal component analysis, and (pending time constraints) classification algorithms.
IMPORTANT NOTE: Although focused on the Chinese case, the analytical approaches examined here are valuable for scholars working across Asia, on all time periods.

Details

When:

Thursday, February 11, 2016. 01:30 PM

Where:

CESTA, Wallenberg Hall, 4th Floor

Sponsor:

Wallenberg Hall, Center for Spatial and Textual Analysis (CESTA), History Department, Center for East Asian Studies

Contact:

650-721-1385
tsmullaney@stanford.edu

Admission:

REGISTRATION REQUIRED | SPACE IS LIMITED | Limited to Stanford Students, Faculty, and Staff | Contact: Tom Mullaney (tsmullaney@stanford.edu)

Audience: