Free Resources
I have created a variety of freely available visualization tools. Included here is a tool, SLASH/A, developed by two of my former students. In addition to the visualization tools are a few corpora, mostly with simple inline annotations of lemma and part of speech.
Visualization tools
DoubleTreeJS is a compact, interactive view of keyword in context (KWIC) and concordance information. [The original DoubleTree, in Java, is available at EURAC, or from this local copy.] | |
KWICis is a modern concordance (keyword in context = KWIC) visualization that is interactive and designed for structured data. | |
Slash/A is an ngram viewer for corpora with dated documents. It’s a tool created by two former students of mine: Slava Todorova and Maria Chinkina. | |
ProD is an experimental visualization for tree(-like) structures (e.g. constitutent structures, dependency structures, etc.) | |
Extended Linguistic Dependency Diagrams (xLDDs) is a visualization tool specialized for the graphical presentation of linguistic dependency structures and the dynamic interaction with these visualizations. Download local copy | |
Structured Parallel Coordinates is an interactive visualization for corpus query results and ranked data. It is a specialized version of Parallel Coordinates. Download local copy |
Corpora
Letters
All the letters corpora are annotated with information about the letters (author and/or addressee, date, etc.), as well as with token, lemma, and part of speech information (all automatically generated). The Barrett-Browning letters also have (some) named entities annotated, while the Bierce and Michelangelo letters have some additional structural annotations (e.g. salutation and closing, paragraphs, etc.). All corpora come in an XML version. The Barrett-Browning letters are in Text-Corpus Format (TCF), while the Bierce and Michelangelo letters are custom formats (DTDs provided). In addition, the Barrett-Browning and Michelangelo letters also come in other formats: individual letters as XML and all the letters as a tab delimited "vertical file". The letters are freely available under a Creative Commons License.
-
Letters between Elizabeth Barrett and Robert Browning (1845-1846)
Letters semi-automatically annotated with To/From, dates, tokens, sentences, parts of speech, lemmas, named entities. These letters were prepared with help from students at the University of Tübingen, especially Eyal Schejter.
Download page -
Letters from Ambrose Bierce to a variety of people (1892-1913)
All letters (348K), semi-automatically annotated with To, Destination, Date, some text structure, tokens, sentences, parts of speech, lemmas.
Download -
Letters from Michelangelo Buonarotti to a variety of people (1497-1524)
Letters (in Italian, 2.5MB) automatically annotated with to, place, dates, sentences, tokens, lemmas, and parts of speech. Includes the python part of speech tagging script.
Download
Tema della settimana
101 email messages (77.5K, plain text) from me to a group of friends, mainly in non-native Italian.
Download messages,
plus metadata
Journals
Journal of a Trip to California by the Overland Route Across the Plains, by E. S. Ingalls.
[coming soon]