<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Algorithms for Matching Strings</title>
	<atom:link href="http://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/feed/" rel="self" type="application/rss+xml" />
	<link>http://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/</link>
	<description>web technology and humanities scholarship</description>
	<lastBuildDate>Fri, 23 Oct 2009 15:15:04 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: AAA</title>
		<link>http://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/#comment-120</link>
		<dc:creator>AAA</dc:creator>
		<pubDate>Sat, 25 Nov 2006 02:14:24 +0000</pubDate>
		<guid isPermaLink="false">https://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/#comment-120</guid>
		<description>It would be great if you publish all your refs. Thanks.</description>
		<content:encoded><![CDATA[<p>It would be great if you publish all your refs. Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gene T</title>
		<link>http://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/#comment-67</link>
		<dc:creator>Gene T</dc:creator>
		<pubDate>Mon, 04 Sep 2006 13:34:13 +0000</pubDate>
		<guid isPermaLink="false">https://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/#comment-67</guid>
		<description>There&#039;s a lot of string/sequence matching algo&#039;s out there:

http://www.dcs.shef.ac.uk/~sam/stringmetrics.html
http://www-igm.univ-mlv.fr/~lecroq/string/index.html
(i can send you dozens mroe refs is you&#039;re interested)
http://homepages.cs.ncl.ac.uk/brian.randell/Genealogy/NameMatching.pdf

If you want to dig into Computational Linguistics (Info Extraction and Named ENtity Recognition), the state of the art are techniques like max entropy, conditional random fields, e.g.

http://www.cs.umass.edu/~mccallum/papers/collseg04icmlws.pdf</description>
		<content:encoded><![CDATA[<p>There&#8217;s a lot of string/sequence matching algo&#8217;s out there:</p>
<p><a href="http://www.dcs.shef.ac.uk/~sam/stringmetrics.html" rel="nofollow">http://www.dcs.shef.ac.uk/~sam/stringmetrics.html</a><br />
<a href="http://www-igm.univ-mlv.fr/~lecroq/string/index.html" rel="nofollow">http://www-igm.univ-mlv.fr/~lecroq/string/index.html</a><br />
(i can send you dozens mroe refs is you&#8217;re interested)<br />
<a href="http://homepages.cs.ncl.ac.uk/brian.randell/Genealogy/NameMatching.pdf" rel="nofollow">http://homepages.cs.ncl.ac.uk/brian.randell/Genealogy/NameMatching.pdf</a></p>
<p>If you want to dig into Computational Linguistics (Info Extraction and Named ENtity Recognition), the state of the art are techniques like max entropy, conditional random fields, e.g.</p>
<p><a href="http://www.cs.umass.edu/~mccallum/papers/collseg04icmlws.pdf" rel="nofollow">http://www.cs.umass.edu/~mccallum/papers/collseg04icmlws.pdf</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Got</title>
		<link>http://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/#comment-62</link>
		<dc:creator>Got</dc:creator>
		<pubDate>Fri, 01 Sep 2006 11:21:54 +0000</pubDate>
		<guid isPermaLink="false">https://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/#comment-62</guid>
		<description>This kind of research is very interesting. I am trying it in order to request corpus in medieval latin or in ancient french and it can represent a solution if the corpus are not lemmatized. Levenstein distance, in particular, is the base algorithm for fuzzy searche and is implemented in exist, an XML database : http://exist-db.org/ (and the fuzzy search : http://demo.exist-db.org/xquery/functions.xq#text:fuzzy-index-terms). Maybe this implementation can give you some idea.</description>
		<content:encoded><![CDATA[<p>This kind of research is very interesting. I am trying it in order to request corpus in medieval latin or in ancient french and it can represent a solution if the corpus are not lemmatized. Levenstein distance, in particular, is the base algorithm for fuzzy searche and is implemented in exist, an XML database : <a href="http://exist-db.org/" rel="nofollow">http://exist-db.org/</a> (and the fuzzy search : <a href="http://demo.exist-db.org/xquery/functions.xq#text:fuzzy-index-terms)" rel="nofollow">http://demo.exist-db.org/xquery/functions.xq#text:fuzzy-index-terms)</a>. Maybe this implementation can give you some idea.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: semantichumanities</title>
		<link>http://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/#comment-61</link>
		<dc:creator>semantichumanities</dc:creator>
		<pubDate>Thu, 31 Aug 2006 08:47:37 +0000</pubDate>
		<guid isPermaLink="false">https://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/#comment-61</guid>
		<description>Yes, I can imagine where co-authorship is quite common, like in scientific articles, you could do pretty interesting things with networks of co-authorship. Perhaps a handy OPAC feature would be an interface that visualised this and let you navigate through it (see http://www.openacademia.org ). Or a search that let you limit a search on one author by the nth degrees of separation from another.</description>
		<content:encoded><![CDATA[<p>Yes, I can imagine where co-authorship is quite common, like in scientific articles, you could do pretty interesting things with networks of co-authorship. Perhaps a handy OPAC feature would be an interface that visualised this and let you navigate through it (see <a href="http://www.openacademia.org" rel="nofollow">http://www.openacademia.org</a> ). Or a search that let you limit a search on one author by the nth degrees of separation from another.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pierre</title>
		<link>http://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/#comment-60</link>
		<dc:creator>Pierre</dc:creator>
		<pubDate>Wed, 30 Aug 2006 19:14:19 +0000</pubDate>
		<guid isPermaLink="false">https://semantichumanities.wordpress.com/2006/08/30/algorithms-for-matching-name-variants/#comment-60</guid>
		<description>Resolving names can be an headache, for example a author named &quot;Chen C&quot; matches more than &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&amp;cmd=Search&amp;itool=pubmed_AbstractPlus&amp;term=%22Chen+C%22%5BAuthor%5D&quot; rel=&quot;nofollow&quot;&gt;2400 references&lt;/a&gt; on &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&quot; rel=&quot;nofollow&quot;&gt;pubmed&lt;/a&gt;. 

The coauthorship could be used to discriminate one author from another ?

Pierre</description>
		<content:encoded><![CDATA[<p>Resolving names can be an headache, for example a author named &#8220;Chen C&#8221; matches more than <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&amp;cmd=Search&amp;itool=pubmed_AbstractPlus&amp;term=%22Chen+C%22%5BAuthor%5D" rel="nofollow">2400 references</a> on <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed" rel="nofollow">pubmed</a>. </p>
<p>The coauthorship could be used to discriminate one author from another ?</p>
<p>Pierre</p>
]]></content:encoded>
	</item>
</channel>
</rss>
