the B°

Artificial Cognition Programme

We are contributing to a collaborative consortium whose aim is to develop the concept an integrated mathematical model that is able to represent syntactic and semantic aspects of language in a reduced lexicon, namely Basic English, and ultimately develop an artificial cognition engine.  The fundamental concept lies in creating a mathematical means to create a multi-dimensional graphical representation of a sentence, in a space pre-conditioned by the inter-relationship of all words in this reduced lexicon.  To translate from standard English a wizard is employed, based on a vocabulary of 100,000+ words, to a representation using less than 2,000 words, which is available at www.simplish.org.  Thus, in a multidimensional space the central low-dimensionality points describe words and their relationship, while increasingly complex phrases are represented by further dimensionality points.  The core relationships have been derived from a single interchangeable individual and then refined; unlike the majority of current efforts which rely on a large corpus of text, which have the disadvantage that ambiguity is a major problem.  A descriptive phrase may either be represented by a complex word (i.e. not one of the basic English lexicon) or it may represent an abstract  concept.  In this way, our method is able to cope with both data-driven requirements and concept-driven information needed for problem-solving.  This representation is conceptually similar to Chinese writing with the added advantage that the source data can be extended both to other languages and to other alphabets.  In this pre-conditioned space two phrases written using different words that convey similar meaning will be represented by a similar ideogram, which can be compared to existing data, analyzed for concepts being searched for and/or extrapolated.  Applications under development include a semantic search engine, a conversational agent and human-machine interfaces.

Objectives

Broad objective: 

Our aim is to devise artificial cognition systems able to process unstructured knowledge. Cognitive systems are natural or artificial information processing systems, including those responsible for perception, learning, reasoning, decision-making, communication and action. Many of the new application opportunities lie at the interface between life sciences, social sciences, engineering and physical sciences, such as bioinformatics, data-mining, semantic web, and human-machine interaction for command and control of autonomous vehicles/robots.

Specific staged 3-year objectives:

2008 - This year’s work was based on the concept of producing an integrated mathematical model that is able to represent syntactic and semantic aspects of full English of 100,000+ words in a reduced lexicon, namely Basic English (of somewhat over 1,000 words).   The translating tool is available on internet (www.simplish.org), it processes uploaded files in .pdf .doc or .txt format converting many everyday English words to a more simple representation while substantially keeping the information content, it describes complex scientific words in the reduced lexicon in footnotes (as well as a second layer where an explanation itself contains further complex words), users can add their own words to a personal dictionary, and current work is concentrating on optimization, database refinements and ambiguity-removal through semantic and syntactic analysis tools.

2009 – The following stage will cover the implementation of the acquisition, storage and retrieval of knowledge in a reduced vocabulary multidimensional space.  In this space, phrases having a similar meaning will be acquired using the translating tool, stored in a semantics-conditioned space and semantic retrieval capability will be implemented.

2010 - The system will then be ready to have subject-specific knowledge embedded in the reduced-lexicon space.  Phrases and paragraphs will be displayed sequentially as a trajectory following a series of waypoints representing each phrase/complex word within a paragraph.  Thus, context information will be inserted in the same space as data-driven mission specific information.

General Approach

The fundamental concept lies in creating a mathematical means to create a multi-dimensional graphical representation of a sentence, in a space pre-conditioned by the inter-relationship of all words in a reduced lexicon, bearing in mind both a shallow semantic representation and syntactic relations.  To translate from Standard English to Basic English the simplish wizard is employed to derive a representation using less than 2,000 words.  Thus, in a multidimensional space the central low-dimensionality points describe words and their relationship, while increasingly complex phrases are represented by further dimensionality points, since phrases can be represented by all the words it uses in a large-dimensionality vector. 

Possibly the most important aspect of cognition has to do with memory. We know that the processes of acquisition, storage and retrieval of knowledge lie at the heart of human cognition. Furthermore, it has long been known that organization and memorization are inseparable and that memory is aided by meaning. Therefore, working out a way to establish the meaning of words in an artificial cognition system helps organization and is a crucial step in developing these systems. Our way to achieve this objective is to assign meaning in terms of the other words in a reduced vocabulary, unlike standard natural language where 100,000 words would have to be clustered somehow and related to each other.

Relating the core 1,000 words of Basic English to each other is being done by another team within our group using multivariate methods, which also provides means to create a multi-dimensional graphical representation of a sentence, in a space pre-conditioned by the inter-relationship of all words in this reduced lexicon.  Thus, these 1,000 words enable unique infinite expressiveness in a relatively low-dimensionality space.  A descriptive phrase may either represent a complex word (i.e. not one of the Basic English lexicon) or it may represent an abstract concept.  In this way, our method is able to cope with both data-driven requirements and concept-driven information needed for problem-solving. This representation is conceptually similar to Chinese writing with ideograms. In this pre-conditioned space two phrases written using different words that convey similar meaning will be represented by a similar ideogram, which can be compared to existing data, analyzed for concepts being searched for and/or extrapolated.  The net result will be to allow us to move from pattern-matching to concept-matching, in a multidimensional nearest-neighbor search.

Core relationships have been derived from a single individual and then refined; unlike the majority of current efforts which rely on a large corpus of text, which have the disadvantage that ambiguity is a major problem since the meaning of a word becomes the consensus of perhaps tens of thousand of people’s ideas.  In our case, the core relationships can be recalculated for people with different outlooks and this compared to each other to establish “meaning” to different people of the same phrase.

In this broad strategy, it is possible to assign trajectories to each user, service or specific information in a collaborative network.  Moreover, the direction of these trajectories can be related to a planning path or the collection of the corresponding mission-derived data.  Trajectories may end on a known precursor condition that can be extrapolated to an actionable conclusion, while other users’ threads, which can be allocated degrees of priority/credibility, are used to corroborate or reject said action.  In this multi-threaded space information fusion is achievable and moreover, users can be added or deleted as they come into and out of a theater.

Finally, this broad approach can be helpful in the general field of human-machine interaction by enabling humans to express themselves in full-vocabulary natural language at the same time as machines only actually process semantically-filtered limited-vocabulary phrases.

Impact

We believe that the impact for military applications of applying these methods to real-world applications will be to substantially address the objectives of a distributed intelligence and information fusion program.

On the other hand, civilian technology will be positively impacted by the application of the principles herein described to applications such as a semantic search engine, improved data-mining algorithms, and more versatile robots “apparently” able to process natural language.  Moreover, the 2/3 of the world population who do not share the same linguistic roots, and have a very limited English vocabulary will find it very much easier to understand technical and scientific material, most of which is written in English.

Core Competences

We have been working on a viable representation of both semantic and syntactic content of language for the last 14 years.  We have a novel and unique method to encapsulate all this information in a low-dimensionality space using multivariate techniques.  Currently we have a team with the experience of having implemented the translating tool and familiarity with the mathematics involved.  Perhaps more importantly we have the translating tool itself, which is crucial in turning unstructured full-vocabulary language into a limited-vocabulary representation that substantially keeps the semantic content, and which can be processed in such a way so as to usefully acquire, store and retrieve knowledge.