Archive for September, 2006

Tutorial: Key Word In Context with Javascript

The Digital History Hacks blog has been running a nice series on using Python for digital humanities type tasks. One of the tutorials was on creating a Key Word In Context list. It made me want to write a KWIC script too, so I wrote one in javascript:

// get the word you want to find
var word = prompt('Enter the word you wish to find');
//make a regular expression to find the word
var re = new RegExp('(\\w+?\\s+?)?(\\w+?\\s+?)?'+word+'(\\s+?\\w+?)?(\\s+?\\w+?)? ','gim');
var matches = this.document.getElementsByTagName('body')[0].textContent.match(re);
var report = window.open();
report.document.write('<ol>');
for(i = 0; i < matches.length; i++)
{
 report.document.write('<li>'+matches[i]+'</li>');
}
report.document.write('</ol>');

And now you can strip the white space, bung a ‘javascript:’ protocol in front of it, stick it in an href of an anchor and you’ve got a bookmarklet you can drag to your bookmarks toolbar:

KWIC

WordPress doesn’t seem to like bookmarklets, so I’m afraid you’ll have to turn it into a link yourself.

Click it whilst on a web page, and you’ll get a list of all the occurrences of the word in context.
It works with TEI documents, and even plain text too. (On firefox at least).

Advertisements

Comments (3)

Tutorial: Networks (of Folksonomy) with Ruby, del.icio.us and Graphiz

I was idly thinking about my del.icio.us bookmarks, how the tags are connected to each other when they are used to describe the same bookmarks, and wondering what they would look like as a graph.

Instead of simply searching the web and finding this del.icio.us tag grapher, I decided that I wanted to try playing with Graphiz (open source graphing software), so I wrote a ruby script to write the .dot file from my bookmarks.

I really liked Graphiz. It’s a great tool, and .dot is a nice format, as it lets you abstract all the positioning and presentation, whereas if I had been generating an SVG file (for example), I would have had to do lots of calculations for the positioning of all the nodes and everything.

Anyway, this is how I did it:

#open the bookmarks file (after running it through HTML Tidy
# first, to transform it into XML)
require "rexml/document"
file = File.new( "delicious.xhtml" )
doc = REXML::Document.new file

#create a 2D array: an array of an array 
# of the tags used for each bookmark.
tag_sets = Array.new()
doc.elements.each('//a') {|e| tag_sets.push(e.attributes['tags'].split(',')) } 

# I added this following line because I had too many bookmarks, 
# making the graph too big and complicated: ->
#      tag_sets = tag_sets.slice(0..10)

# now flatten the 2D array, and get a 1D array
# of all the tags used - .uniq gets rid of duplicates
tag_list = tag_sets.flatten.uniq         


#get the relationships
relationships = Array.new()

# now iterate through the tag list, 
# and for each tag, look for that in each of the bookmarks.
# If it's found, record a relationship with the other tags of
# that bookmark

tag_list.each do |tag|
 
 tag_sets.each do |tag_set|
   
   if tag_set.include? tag
     tag_set.each do |related_tag|
     relationships.push([tag, related_tag]) if tag!=related_tag 
     end
   end
   
 end
  
end

# relationships is now a 2D array of arrays each
# containing two tags

# put it into the .dot syntax

graph = "digraph x { \r\n"+relationships.uniq.collect{|r|'"'+r.join('" -> "')+'";'}.join("\r")+"}"

# now  write it all into the .dot file


file = File.new("delicious_graph.dot", "w")
file.write(graph)
file.close()

Links to the Results

I don’t expect the results will be of much interest to anyone, but here they are for completeness sake.

the .dot file
an SVG export of the graph (you may need a plugin, or a recent version of firefox, safari or opera)

Comments (1)

Digital Humanities Start-up funding

Comments (1)