My title
 
Chris Culy
AVML, 7 September 2012

My Three Perspectives

  1. Language

  2. Models

  3. Tools

Mention Math Models as inspiration

Visualizations put ideas into our heads …

What questions come to mind?

What makes L/L data different?

  1. Language is not mappable

  2. Strings are special

  3. Individual pieces of L/L data are meaningful

* Some L/L data that is not (strictly) textual at AVML:
 
      Geographic information (e.g. typology, dialect mapping papers)
      Audio/Video (as original data)
      Sound information (waves, spectograms, intonation curves, etc)
      Time duration in Speech/dialog info (e.g. pauses, overlaps, etc)
 
* Connection with original data at AVML: Text Variation Explorer

One visualization, multiple models

Source: http://bl.ocks.org/1341281 by jasondavies

Uses: Structured Parallel Coordinates

Extending the familiar

Uses: xLDD

Connecting with the data

Uses: xLDD

Flexible components

Uses: xLDD

Goethe on seeing

  1. Man erblickt nur, was man schon weiß und versteht.
    You glimpse only what you already know and understand.

  2. Was man weiß, sieht man erst!*
    You see first what you know!

Sources:
Kanzler F. v. Müller, Unterhaltungen mit Goethe, 24, April 1819, cited in Lexikon Goethe-Zitate
Einleitung in die Propyläen

Perception and Visualization

  1. "Preattentive" / Tunable features

  2. Attention

  3. Sources: VHS Tübingen, photo by L. Lee McIntyre

Testing in new contexts

Uses: ProD

Which visualization for task, user?

Uses: ProD

Sources: DoubleTree, Georges Grinstein

Dataset genres

Sources: on archive.org, on archive.org

* Difference between genre and data genre:
 
      data genre is a collection of instances of a genre, with additional properties of the collection  
    e.g.
         an individual letter is an instance of the correspondence genre
         but not of the correspondence-exhange data genre
 
* cf. Discursis workshop/paper, similarities and differences between conversations and correspondence

Continuity visualization 1

Continuity visualizations 2 and 3

* Adjacency High Frequency: EBB uses the high frequency words more than RB does, except for "heart".
 
* Adjacency High Frequency: Connection between word frequency and co-occurrence freq (box sizes)
 
* Adjacency Lower frequency: in this range EBB uses her higher frequeny words than RB does those words
 
* Adjacency Lower frequency: in this range no connection between overall word frequency (or for EBB or RBB) and co-occurrence freq
 

 
* Hopefully, they are asking questions: What about X? Could you show Y? Why not Z?

Visualizations put ideas into our heads!

Thank You

christopher.culy@uni-tuebingen.de

http://www.sfs.uni-tuebingen.de/~cculy/

/

#