My title
Chris Culy
AVML, 7 September 2012

My Three Perspectives

  1. Language

  2. Models

  3. Tools

Mention Math Models as inspiration

Visualizations put ideas into our heads …

What questions come to mind?

What makes L/L data different?

  1. Language is not mappable

  2. Strings are special

  3. Individual pieces of L/L data are meaningful

* Some L/L data that is not (strictly) textual at AVML:
      Geographic information (e.g. typology, dialect mapping papers)
      Audio/Video (as original data)
      Sound information (waves, spectograms, intonation curves, etc)
      Time duration in Speech/dialog info (e.g. pauses, overlaps, etc)
* Connection with original data at AVML: Text Variation Explorer

One visualization, multiple models

Source: by jasondavies

Uses: Structured Parallel Coordinates

Extending the familiar

Uses: xLDD

Connecting with the data

Uses: xLDD

Flexible components

Uses: xLDD

Goethe on seeing

  1. Man erblickt nur, was man schon weiß und versteht.
    You glimpse only what you already know and understand.

  2. Was man weiß, sieht man erst!*
    You see first what you know!

Kanzler F. v. Müller, Unterhaltungen mit Goethe, 24, April 1819, cited in Lexikon Goethe-Zitate
Einleitung in die Propyläen

Perception and Visualization

  1. "Preattentive" / Tunable features

  2. Attention

  3. Sources: VHS Tübingen, photo by L. Lee McIntyre

Testing in new contexts

Uses: ProD

Which visualization for task, user?

Uses: ProD

Sources: DoubleTree, Georges Grinstein

Dataset genres

Sources: on, on

* Difference between genre and data genre:
      data genre is a collection of instances of a genre, with additional properties of the collection  
         an individual letter is an instance of the correspondence genre
         but not of the correspondence-exhange data genre
* cf. Discursis workshop/paper, similarities and differences between conversations and correspondence

Continuity visualization 1

Continuity visualizations 2 and 3

* Adjacency High Frequency: EBB uses the high frequency words more than RB does, except for "heart".
* Adjacency High Frequency: Connection between word frequency and co-occurrence freq (box sizes)
* Adjacency Lower frequency: in this range EBB uses her higher frequeny words than RB does those words
* Adjacency Lower frequency: in this range no connection between overall word frequency (or for EBB or RBB) and co-occurrence freq

* Hopefully, they are asking questions: What about X? Could you show Y? Why not Z?

Visualizations put ideas into our heads!

Thank You