<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-21972500</id><updated>2011-11-27T16:49:25.017-08:00</updated><title type='text'>PsiVision</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://psivision.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>14</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-21972500.post-2009488744450886218</id><published>2010-11-05T09:05:00.000-07:00</published><updated>2010-11-05T09:05:50.057-07:00</updated><title type='text'>Encrypted Search Indices and Multi-Key PKI</title><content type='html'>As an example of encrypted space operations, consider search indices which are encrypted / hashed.&lt;br /&gt;&lt;br /&gt;1. User enters clear text query. Client encrypts and sends to server.&lt;br /&gt;2. Server matches encrypted query to encrypted index. Returns pointers to client indicating result set.&lt;br /&gt;3. Client displays result set.&lt;br /&gt;&lt;br /&gt;The pointers, or references to documents, can also be encrypted. &lt;br /&gt;&lt;br /&gt;The beauty of this is that the server doesn't suffer so much from the risks associated with aggregation of valuable content. The search server with its index and document pointers is not useful to anyone with inappropriate access to it.&lt;br /&gt;&lt;br /&gt;This does raise an interesting question for PKI - can we have a set of data that is encrypted once, but can be decrypted by distinct keys, 1 per distinct authorised client.&lt;br /&gt;&lt;br /&gt;You may suggest a simpler solution - why not have an encrypted server disk, with SSL encrypted communication between client and server. The aim here is to have no plain text material at all on the server or on the communication link - not even in memory.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-2009488744450886218?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/2009488744450886218'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/2009488744450886218'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2010/11/encrypted-search-indices-and-multi-key.html' title='Encrypted Search Indices and Multi-Key PKI'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-7850297805967217134</id><published>2010-03-10T05:53:00.000-08:00</published><updated>2010-03-10T05:57:34.956-08:00</updated><title type='text'>Data Operations in Encryption Space</title><content type='html'>A key problem with encrypted data is that it needs to be temporarily decrypted before it can be operated on. This opens up an opportunity for security breach, and more sophisticated attacks do take advantage of data in its decrypted form in a CPU, in memory or in transit across a data bus. It isn't enough for the data to be encrypted on a hard disk.&lt;br /&gt;&lt;br /&gt;This raises the question of whether data can be operated on meaningfully whilst still in encryption space.&lt;br /&gt;&lt;br /&gt;This would mean operators would need to be transformed from unencrypted space to encrypted space. An advantage of this is that the operations are then also encrypted, reducing the amount of information that can be gleaned by eavesdropping on the operations.&lt;br /&gt;&lt;br /&gt;Is there a limited class of operators than can be transformed to encryption space? Is this class sufficiently wide enough to be useful? Is this constrained by the type of encryption?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-7850297805967217134?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/7850297805967217134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/7850297805967217134'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2010/03/data-operations-in-encryption-space.html' title='Data Operations in Encryption Space'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-9090089163345027605</id><published>2007-10-07T17:29:00.000-07:00</published><updated>2007-10-07T17:38:00.435-07:00</updated><title type='text'>Interpretation of Latent Semantic Indexing</title><content type='html'>Latent Semantic Indexing is the leading approach to information retrieval which aims to uncover some of the underlying meaning of documentsthrough a transformation in the vector document-term model. &lt;br /&gt;&lt;br /&gt;The results are indeed better than simple rerieval based on term matching. In particular you may return relevant documents which happen to contain none of the query terms. &lt;br /&gt;&lt;br /&gt;But what is the intuitive interpretation of Latent Smenatic Indexing. What is the meaning of the transformed space?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-9090089163345027605?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/9090089163345027605'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/9090089163345027605'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2007/10/interpretation-of-latent-semantic.html' title='Interpretation of Latent Semantic Indexing'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-8679207533687111271</id><published>2007-06-26T20:36:00.000-07:00</published><updated>2007-06-28T07:26:39.318-07:00</updated><title type='text'>Deduction, Abduction &amp; Induction</title><content type='html'>Wikipedia concisely clarifies the distinction between the &lt;a href="http://en.wikipedia.org/wiki/Inference#Three_types_of_logical_inference"&gt;three forms of inference&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Deductive inference&lt;/span&gt; - finding the effect, given the cause and the rule.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Abductive inference&lt;/span&gt; - finding the cause, given the rule and the effect.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Inductive inference&lt;/span&gt; - finding the rule, given the cause and the effect.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;   [ cause ] → → → [ rule ] → → → [ effect ]&lt;br /&gt;  (abduction)     (induction)     (deduction)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Wikipedia gives an example using Hooke's law F=k.x (rule) which relates the force applied (cause) to a spring and its extension (effect). If the force and Hooke's law are known, the elongation of the beam can be &lt;span style="font-weight: bold;"&gt;deducted&lt;/span&gt;. If the elongation and Hooke's law are known, the force acting on the beam can be &lt;span style="font-weight: bold;"&gt;abducted&lt;/span&gt;. If the elongation and the force are known, Hooke's law can be &lt;span style="font-weight: bold;"&gt;inducted&lt;/span&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-8679207533687111271?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/8679207533687111271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/8679207533687111271'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2007/06/deduction-abduction-induction.html' title='Deduction, Abduction &amp; Induction'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-116540719131204433</id><published>2006-12-06T04:12:00.000-08:00</published><updated>2007-01-15T08:47:05.936-08:00</updated><title type='text'>Semantic &amp; Syntactic Entailment</title><content type='html'>&lt;span style="font-weight:bold;"&gt;Semantic entailment&lt;/span&gt; is the same as the implication that we have seen earlier. If all truth assignments that satisfy a set of forumulae S also satisfy A, then S semantically entails A. That is, S→A. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Syntactic entailment&lt;/span&gt; is slighly different. If A can be derived from S using a finite application of the propositional inference rules, then S syntactically entails A.&lt;br /&gt;&lt;br /&gt;What does it mean if only one of the above applies?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-116540719131204433?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/116540719131204433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/116540719131204433'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2006/12/semantic-syntactic-entailment_06.html' title='Semantic &amp; Syntactic Entailment'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-116536822852994921</id><published>2006-12-05T17:10:00.000-08:00</published><updated>2006-12-08T08:47:06.536-08:00</updated><title type='text'>Calculus of Propositional Logic</title><content type='html'>Propositional logic indeed has a calculus which allows us to verify arguments and prove theorems.&lt;br /&gt;&lt;br /&gt;Common algebra has axiomatic &lt;span style="font-weight: bold;"&gt;inference rules&lt;/span&gt; which allow the derivation of new true formulae from assumed true formulae, either as continuations of an existing formula, or as a combination of multiple formulae. Similarly, the &lt;span style="font-weight: bold;"&gt;propositional calculus&lt;/span&gt; provides inference rules for propositional logic.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Inference Rules&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The text &lt;a href="http://mhprofessional.com/product.php?isbn=0070466491"&gt;Schaum's Outline of Logic&lt;/a&gt; presents these rules and provides worked examples of their use.&lt;br /&gt;&lt;br /&gt;The following summarises the 10 rules of propositonal inference rules. All these rules either &lt;span style="font-weight:bold;"&gt;introduce&lt;/span&gt; or &lt;span style="font-weight:bold;"&gt;eliminate&lt;/span&gt; a propositional logic operator. &lt;br /&gt;&lt;br /&gt;The first eight are non-hypothetical:&lt;br /&gt;&lt;pre&gt;* Negation Elimination&lt;br /&gt;   From ¬¬p, infer p.&lt;br /&gt;* Conjunction Elimination&lt;br /&gt;   From (p ∧ q), infer p&lt;br /&gt;   From (p ∧ q), infer q.&lt;br /&gt;* Conjunction Introduction&lt;br /&gt;   From p, q, infer (p ∧ q).&lt;br /&gt;* Disjunction Introduction&lt;br /&gt;   From p, infer (p ∨ q)&lt;br /&gt;   From p, infer (q ∨ p).&lt;br /&gt;* Disjunction Elimination&lt;br /&gt;   From (p ∨ q), (p → r), (q → r), infer r.&lt;br /&gt;* Biconditional Introduction&lt;br /&gt;   From (p → q), (q → p), infer (p ↔ q).&lt;br /&gt;* Biconditional Elimination&lt;br /&gt;   From (p ↔ q), infer (p → q)&lt;br /&gt;   From (p ↔ q), infer (q → p).&lt;br /&gt;* Conditional Elimination&lt;br /&gt;   From p, (p → q), infer q.&lt;/pre&gt;&lt;br /&gt;The final two are hypothetical inference rules in that they temporarily assume the truth of a hypothesis.&lt;br /&gt;&lt;pre&gt;* Negation Introduction&lt;br /&gt;   If hypothesis (p→q) leads to (p→ ¬q), infer ¬p.&lt;br /&gt;* Conditional Introduction&lt;br /&gt;   If hypothesis p allows a proof of q, infer (p → q).&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-116536822852994921?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/116536822852994921'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/116536822852994921'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2006/12/calculus-of-propositional-logic.html' title='Calculus of Propositional Logic'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-116473427404280963</id><published>2006-11-28T09:10:00.000-08:00</published><updated>2006-12-05T03:06:20.526-08:00</updated><title type='text'>Truth Table for Implication</title><content type='html'>I was surprised the binary &lt;span style="font-weight:bold;"&gt;implication&lt;/span&gt; operator (&amp;rarr;), also called the &lt;span style="font-weight:bold;"&gt;conditional&lt;/span&gt;, has a truth table, just as the AND and OR logical operators do. It is as follows:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;A B (A&amp;rarr;B)&lt;br /&gt;0 0  1&lt;br /&gt;0 1  1&lt;br /&gt;1 0  0&lt;br /&gt;1 1  1&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;That is, we can only say something about B if A is true. There is a similar table for the binary &lt;span style="font-weight:bold;"&gt;biconditional&lt;/span&gt; operator (&amp;harr;):&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;A B (A&amp;harr;B)&lt;br /&gt;0 0  1&lt;br /&gt;0 1  0&lt;br /&gt;1 0  0&lt;br /&gt;1 1  1&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This means we can determine truth tables for compound statements such as A&amp;rarr;(B&amp;rarr;C).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-116473427404280963?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/116473427404280963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/116473427404280963'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2006/11/truth-table-for-implication.html' title='Truth Table for Implication'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-114679047721082563</id><published>2006-05-04T17:36:00.000-07:00</published><updated>2006-06-08T07:56:46.146-07:00</updated><title type='text'>Entailment, Implication, Induction</title><content type='html'>What is the difference between &lt;span style="font-weight: bold;"&gt;logical&lt;/span&gt; and &lt;span style="font-weight: bold;"&gt;semantic entailment&lt;/span&gt;? How is it different from &lt;span style="font-weight: bold;"&gt;equality&lt;/span&gt; and &lt;span style="font-weight: bold;"&gt;implication&lt;/span&gt;? &lt;span style="font-weight: bold;"&gt;Induction&lt;/span&gt;, unlike &lt;span style="font-weight: bold;"&gt;deduction&lt;/span&gt;, generates new knowledge, but how can we best approximate human induction algorithmically?&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521838053"&gt;The Geometry of Information Retrieval&lt;/a&gt; fails to make clear its discussion of logic and how logical relations are related to information retrieval. As such, I'll be working through &lt;a href="http://www.amazon.co.uk/exec/obidos/ASIN/0521368650/202-2671386-6879800"&gt;Logic for Mathematicians&lt;/a&gt; and &lt;a href="http://www.amazon.co.uk/exec/obidos/ASIN/0521533619/202-2671386-6879800"&gt;Logic, Induction and Sets&lt;/a&gt;, before returning to The Geometry of Information Retrieval.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-114679047721082563?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/114679047721082563'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/114679047721082563'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2006/05/entailment-implication-induction.html' title='Entailment, Implication, Induction'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-114477076720156249</id><published>2006-04-11T08:47:00.000-07:00</published><updated>2006-04-12T06:39:58.560-07:00</updated><title type='text'>Human - Computer Agreement</title><content type='html'>Speaking with a friend it became clear that there might be a limit to the performance from any computer algorithm aiming to emulate intelligence.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Disagreement&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Two or more humans can disagree on, for example, the relevance of an information retrieval operation, or the semantic content of a visual image. If human minds have trouble agreeing on the semantic content of what they observe, can we then expect a computer algorithm to produce results that generate more agreement than a set of human judges?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-114477076720156249?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/114477076720156249'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/114477076720156249'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2006/04/human-computer-agreement.html' title='Human - Computer Agreement'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-114287410685273720</id><published>2006-03-20T08:51:00.000-08:00</published><updated>2006-04-11T08:46:22.916-07:00</updated><title type='text'>The Geometry of Information Retrieval</title><content type='html'>Currently reading through &lt;a href="http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521838053"&gt;The Geometry of Information Retrieval&lt;/a&gt;  which promises a much needed theoretical unification of the &lt;span style="font-weight: bold;"&gt;logical&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;probabilistic &lt;/span&gt;and &lt;span style="font-weight: bold;"&gt;vector&lt;/span&gt; space models of information retrieval. It provides this foundation using the language of quantum mechanics, where documents are objects in Hilbert space, and where interactions/measurements are Hermitian operators on this space, and where state vectors collapse upon interaction.&lt;br /&gt;&lt;br /&gt;The derivation of probability is not that surprising - distribute probability over the number/magnitude of eigenvectors of the interaction operators. However, of much more interest will be how the structure of the, possibly non-Boolean, logic is derived from this space and how this relates to measurements on the space.&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;&lt;d|&gt;&lt;d|&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;/d|&gt;&lt;/d|&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-114287410685273720?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/114287410685273720'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/114287410685273720'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2006/03/geometry-of-information-retrieval.html' title='The Geometry of Information Retrieval'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-114199512134528785</id><published>2006-03-10T04:14:00.000-08:00</published><updated>2006-03-10T08:43:51.293-08:00</updated><title type='text'>The Nature of the Problem</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Structured Data&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Chatting with a friend yesterday, it became apparent that what is meant by &lt;span style="font-weight: bold;"&gt;structured data&lt;/span&gt; is not clear. As yet, there is no set of sample data to illustrate what is meant by this.&lt;br /&gt;&lt;br /&gt;What is clear is that the structured data is richer than simple multi-dimensional vectors. For example, it is hoped that the data structure or its description allows for the expression of orderings on attributes; short &lt; medium &lt; long.&lt;br /&gt;&lt;br /&gt;An established process that works with rich and structured data instances is &lt;span style="font-weight: bold;"&gt;Inductive Logic Programming (ILP)&lt;/span&gt;, a process for inducing descriptive statements for a set of logical clauses. An explanation of this working with the well known trains data set is given by this presentation by Peter Flach. Another example is the determination of carcinogenic compounds, and a worked example is presented here.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Minimal Expressiveness&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As a general word of caution it should be noted that the language to describe the data or the constraints, should be minimally expressive. Too much expressiveness leads to weak results or uncessary computational complexity.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Distance Metric&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The core of the problem then, is not the particular clustering algorithm to be used, but rather the notion of distance between instances of such rich or structured data. Such a metric must take into account the additional constraints such as orderings, or more general relations between attributes, and other constraints that can be expressed. In ILP the expression of such constraints is with logical clauses, the same language for desribing the data instances themselves.&lt;br /&gt;&lt;br /&gt;So in general, what is required is a data structure and an associated representation language  that allows for the general expression of relations or constraints over the elements of that data structure, as well as the usual value of those elements.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-114199512134528785?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/114199512134528785'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/114199512134528785'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2006/03/nature-of-problem.html' title='The Nature of the Problem'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-114173395789637768</id><published>2006-03-07T04:13:00.000-08:00</published><updated>2006-03-08T08:05:03.446-08:00</updated><title type='text'>Partitional or Hierarchical Clustering?</title><content type='html'>Previously most work related to clustering in the image domain has been partitional (or fuzzy partitional). I've not yet thought through the merits of hierarchical clustering as applied to the image domain or to the clustering of structured data.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Partitional &lt;/span&gt;clustering simply slices the data space into clusters. However, &lt;span style="font-weight: bold;"&gt;hierarchical &lt;/span&gt;clustering provides more information. It tells you what clusters there would be depending on the level of granularity asked for. You can have one big cluster that covers all the data instances, and as you increase the granularity you see the sub-clusters separating off. This information is usually and naturally presented in the form of a dendogram.&lt;br /&gt;&lt;br /&gt;See wikipedia for more information on these distinctions: &lt;a href="http://en.wikipedia.org/wiki/Data_clustering"&gt;http://en.wikipedia.org/wiki/Data_clustering&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-114173395789637768?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/114173395789637768'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/114173395789637768'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2006/03/partitional-or-hierarchical-clustering.html' title='Partitional or Hierarchical Clustering?'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-113926740647998783</id><published>2006-02-06T14:57:00.000-08:00</published><updated>2006-03-10T08:14:14.590-08:00</updated><title type='text'>Truly Unsupervised Clustering Algorithm</title><content type='html'>The first problem is the clustering algorithm itself. The most widely used algorithms need to be told how many clusters they are searching for and this means the algorithm is not entirely unsupervised.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Refinements&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The classic algorithm is the &lt;span style="font-weight: bold;"&gt;k-means&lt;/span&gt; and many are variants of this simple procedure. Some of these are simply refinements which effectively lower the margin of error on the results. Some refinements have the side effect of producing correct clusters in a robust manner, even if the initial number of clusters sought is set to the wrong value. I confirmed this positive behaviour in the Fayyad set of refinements to the basic k-means algorithm.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Beneficial Noise&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Another surprising result was that the accuracy of clustering results was significantly improved if the, possibly multi-dimensional, data was extended by another dimension with random values. The reasoning behind this effect is that the random noise actually smooths the objective functions, down which the clustering algorithm descends.&lt;br /&gt;&lt;br /&gt;I'll investigate this affect again to confirm its validity. I've not seen others refer to it, despite a quick search at the time.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Truly Unsupervised Clustering Algorithms&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;However the academic publications have started mentioning other methods which don't require the number of clusters to be preset, and i'll have a look at these.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-113926740647998783?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/113926740647998783'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/113926740647998783'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2006/02/truly-unsupervised-clustering.html' title='Truly Unsupervised Clustering Algorithm'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry><entry><id>tag:blogger.com,1999:blog-21972500.post-113909847299173378</id><published>2006-02-04T16:08:00.000-08:00</published><updated>2006-02-06T05:11:01.363-08:00</updated><title type='text'>Introduction</title><content type='html'>This blog will act as a diary and notebook of my aim to create a method to perform clustering of structured data&lt;span style="font-weight: bold;"&gt;,&lt;/span&gt; incorporating domain knowledge into the algorithm or data structures, and with particular application to the clustering of images.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/21972500-113909847299173378?l=psivision.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/113909847299173378'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/21972500/posts/default/113909847299173378'/><link rel='alternate' type='text/html' href='http://psivision.blogspot.com/2006/02/introduction.html' title='Introduction'/><author><name>psivision</name><uri>http://www.blogger.com/profile/06866257676734125644</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='26' src='http://photos1.blogger.com/hello/144/10093/640/home4.1.jpg'/></author></entry></feed>
