Interactive visualization tools for analyzing language data

Chris Culy

linguistics.chrisculy.net

Università Cattolica del Sacro Cuore
16 April 2015

My interests in
language and visualization

Mali

Bambara:
malonyinina o malonyinina

Fulfulde:
mi yi'ii ɗum e giɗum

Donno Sɔ Dogon:
Oumar Anta inyemɛñ waa be gi

References: OpenStreetMap, Culy 1985, 1988, 1994a,b, 1996, 1997, 2002; Culy & Gnailbouly Dicko 1988; Culy et al. 1995; Culy & Fagan 2002.

My interests in
language and visualization

Takelma texts

Now those just scattered off, Grizzly Bear did chase the people around. Now this Coyote, for his part, did run off with the chieftainess girl. Then, 'tis said, after a little while, "Are you a female? It must be a female," he thought; Coyote now, for his part, did wish to sleep with her. Tunc nihil vulvae repperit. "What did I, for my part, (take)? That you were a woman I thought," he said to her. Coyote threw Frog into the water. "Do you think you will be a woman? Frog you will always be called," he said to Frog. Proceeding just up to there (it goes). 'Tis finished. Go gather and eat your ba|ap'-seeds.

Reference: Takelma Texts, Culy 1999a,b, 2000

My interests in
language and visualization

Recipes, letters

 

Bake Ø until done

 

EBB: Overjoyed I am

Reference: Culy 1996

My interests in
language and visualization

Visualizations as tools

References: C. Culy, E. Chiocchetti, and N. Ralli. 2013, C. Culy, M. Passarotti, U. König-Cardanobile. 2014.

Three distinct goals of visualizations

  • Exploration
  • Comprehension (analysis)
  • Communication (presentation)
A visualization may be suitable for some goals but not others

Reference: B.H. McCormick, T.A. DeFanti, M.D. Brown. 1987. Visualization in Scientific Computing, Computer Graphics 21:6. ACM SIGGRAPH.

Visualizations put ideas into our heads.

Augmenting charts

Basic charts can be augmented in a variety of ways

[demo]

Techniques used:
  • Small multiples
  • Heatmap
  • tSNE dimensionality reduction

Source: Michelangelo by relation

Connecting charts with the source data


[2 demos]

Source: Slash/A by Slava Todorova and Maria Chinkina

KWICs revisited

KWICis

[2 demos]

Source: KWICis

KWICs reinvented

DoubleTreeJS

[3 demos]

Source: DoubleTreeJS

Networks of relations

  • Analysing cross-references

    [demo]
  • Exploring authorship

    [demo]

References: C. Culy, E. Chiocchetti, and N. Ralli. 2013, E. Chiocchetti, C. Culy and N. Ralli. 2013. "Visualising conceptual relations in the domain of law: Verweis Viewer" at TOTh 13.

Picking a visualization

Grinstein's Grand Challenge

  • Pre-existing first steps:
    Tableau, Spotfire
  • Limitations
    • Limited types of data: numbers, dates, geographic, categories
    • No notion of task
    • No notion of preferences

References: Georges Grinstein, Tableau, Spotfire

Tasks and preferences


[demo]

Source: ProD

Reseach directions:
Data models and levels of abstraction

  • DoubleTreeJS for dependencies
    [demo]
  • Slash/A for "textual time"
    [demo]

Sources: DoubleTreeJS Slash/A by Slava Todorova and Maria Chinkina

Reseach directions:
Visual analytics

  • KWICis' signficance for text comparison
  • Slash/A's significance for subset comparison
  • DoubleTreeJS and clusters

Sources: KWICis DoubleTreeJS Slash/A

Visualizations put ideas into our heads!

Thank you

 

Chris Culy

linguistics.chrisculy.net

Extras follow

Visualization goals example


[demo]

Sources: Topics demo

Elaborating charts

Icelandic Diachronic Corpus

Reference:
Butt, Miriam, Bögel, Tina, Kotcheva, Kristina, Schätzle, Christin, Rohrdantz, Christian, Sacha, Dominik, Dehé, Nicole and Daniel Keim. 2014. 'V1 in Icelandic: A Multifactorical Visualization of Historical Data'. Proceedings of the LREC 2014 Workshop on Visualization as added value in the development, use and evaluation of LRs (VisLR). Reykjavik, Iceland.

Research directions:
Dataset genres

CorporaGenre (Bakhtin)
Individual items(Complex) Utterances
Type(s) of the itemsUtterance types
Single corpus characterized by a unifying factorInstantiation of a genre
Category of corpora characterized by an abstaction of a unifying factorGenre: the collection of utterances used in a sphere of communication

 

Not unproblematic, but a start ...

References: M. Bakhtin 1986. "The Problem of Speech Acts". Thanks to Yulia Svetashova and the members of the class Development of applications using NLP tools