Ideas
From OpenCog
See also StudentApplicationTemplate, Development, Projects, Publications, GSoC Projects for 2009 and GSoCProjects2008 for background reading.
Ideas listed here may be taken up as projects by anyone with the necessary skill and motivation (the ideas here are not just for GSoC students!).
OpenCog projects are inter-related in many ways, although they vary considerably in their exploration of various sub-fields of computer science, programming languages used, size and scope, and other properties. OpenCog teams overlap and compliment each other to varying degrees. The formation of particular projects and teams are influenced primarily by the goals and needs of OpenCog as an integrated and coherent system and for the performance of specific larger project goals, such as building a natural language conversational system.
Most tasks are given difficulty labels:
- Relatively straightforward AI R&D, though not easy
- Pretty difficult
- Rather difficult
(Note that these labels refer only to the AI aspects of the task; there may also be difficult software engineering or systems integration aspects, which haven't been labeled.)
If you add an idea as a Blueprint in Launchpad (under the respective project), please append a link to the blueprint after the idea text on this page.
OpenCog Framework
- Language: C++ (uses Boost and templates)
- Code: https://code.launchpad.net/opencog
- List: http://groups.google.com/group/opencog-developers?hl=en
- License: AGPLv3 + linking exception
The framework projects require no prior experience with AI, NLP or robotics; instead, they require strong coding skills, and a working knowledge of system architecture principles.
Packaging for Ubuntu and/or OSX
Something that is important for community uptake of OpenCog is the ability to easily install and play with the system. Many people get stuck compiling OpenCog and then after a brief period of struggling with assistance on IRC they sometimes give up. OpenCog has Cogbuntu, a customised Ubuntu image, but this is overkill for many people and it would be preferable to allow users/developers to use the familiar "apt-get install opencog".
In addition, the OSX port recently initiated by Joel, is not complete. The port only includes a subset of the components within OpenCog and it would be great to develop a package that conforms to to the ".app" packaging of programs in OSX.
Distributed and persistent AtomSpace
Provide AtomSpace persistence, by hooking to an open-source BigTable equivalent such as Hypertable or HBase. The AtomTable (the implementation of the AtomSpace) would then be relegated to an in memory AtomSpace cache of the distributed versions.
In GSoC 2009, Jeremy Schlatter worked on connecting HyperTable to OpenCog as a distributed and persistent AtomSpace. This is the first step of the below project, but we still need to tune the performance, provide indexes that can hopefully mirror the pluggable AtomSpace indexes, and come up with a workable way of having multiple OpenCog instances interacting with the HyperTable (dealing with conflicting changes, etc.)
There is also a need for redesigning the AtomSpace API to allow clients/MindAgents to specify the depth of their queries. The ability to focus on only those atoms in memory before resorting to searching the distributed AtomSpace is an important requirement to allow AI algorithms to scale.
This project requires little of no acquaintance with general AI or NLP concepts; instead, it requires strong knowledge of distributed processing, network, locking, transactional and kernel-type operating system principles. This is an ideal project for students with strong coding skills and theoretical comp-sci knowledge.
It should be relatively easy to just "hook things up". It is considerably harder to set things up so that one gets high-performance, and gets a solution that could scale to thousands of connections at a time (although we'll settle for dozens to begin with!). This is where the need for system architecture/distributed processing skills kicks in.
- Distributed Architecture RFC provides an outline of one possible approach and could be used as a starting point in the design.
- The Launchpad blueprint will soon provide additional details.
Language Bindings
The creators of OpenCog have interest in providing language bindings to a host of languages, so that MindAgents can be written in numerous languages.
Python has was implemented in last year's GSoC by Kizzo using Boost::Python - however, these could be made more Pythonic, and they also require more comprehensive examples and tests. In particular, there challenge of allowing MindAgents written in Python to be run as a MindAgent by the CogServer. Currently the CogServer only loads compiled C++ MindAgents.
AtomSpace Visualizer
It can be hard to understand what is going on within OpenCog, and visualizing the hypergraph can be a big help. Unfortunately, real-world hypergraphs run from 100K to 10M or more atoms, and thus overwhelm most graphing/visualization software. Tulip has been tried, and although it claims to visualize graphs with up to 1M nodes, in practice, it seems to choke on graphs with more than 5K nodes.
Ubigraph is close to being useful, but it seems to be an essentially dead project and is also NOT open source. See the existing ubigraph dynamic library written by Jared Wigmore and Joel Pitt to see how the AtomSpace is currently connected to Ubigraph.
Guess has not been tried.
The current preference is for an embedded javascript viewer using an HTML5 canvas element - this would supplement the existing web UI. Processingjs could be used to simplify the graph drawing process and calls to the REST interface could be used expand the AtomSpace.
Remote shell
Connected to the above (possibly in the same project if there was sufficient time). Create a shell environment for working with OpenCog. The Python shell (or a variant such as IPython) seems like it'd be ideal in this way, as it already has XML-RPC and undoubtedly JSON-RPC.
Performance measurement suite
Many contemplated changes to the opencog infrastructure have a real impact on execution time, and the amount of memory used. A performance suite would collect into one place a large variety of different test cases, instrument them properly so as to measure speed and memory usage, and then report the results, in a completely automated fashion.
OpenCog modules
The ideas listed in this section are all related to parts of OpenCog which are to some extent projects in their own right.
Natural Language Processing (NLP)
The overall OpenCog NLP pipeline is described here. It has many pieces, including the Link Grammar parser, and RelEx as pre-processing stages. Output is fed to OpenCog for reasoning. The projects below address various aspects of the pipeline. Almost all parts of the pipeline merit improvement.
- Code: https://code.launchpad.net/relex/
- List: http://groups.google.com/group/link-grammar (mentors: Linas and Murilo)
- License: BSD, Apache 2.0
- Languages: C, C++, Java, Scheme, Perl
RelEx is an English-language semantic relationship extractor, built on the Carnegie-Mellon link parser. It can identify subject, object, indirect object and many other relationships between words in a sentence. It can also provide part-of-speech tagging, noun-number tagging, verb tense tagging, gender tagging, and so on. Relex includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm. Optionally, it can use GATE for entity detection. RelEx also provides semantic relationship framing, similar to that of FrameNet.
The output from RelEx is a hypegraph of Nodes and Links, which are input into OpenCog, and may then be processed in various ways. In addition, the LexAt (lexical attaction) package provides many scripts and database tools to collect statistical information on parsed sentences.
PLN Inference on extracted semantic relationships
Currently RelEx takes in a sentence, and outputs a set of logical relationships expressing the semantics of the sentence. It is possible to take the logical relationships extracted from multiple sentences, and combine them using a logical reasoning engine, to see what conclusions can be derived. Thus, for example, the English-language input Aristotle is a man. Men are mortal should allow the deduction of Aristotle is mortal.
Some prototype experiments along these lines were performed in 2006, using sentences contained in PubMed abstracts. See paper. But no systematic software approach was ever implemented.
These experiments could be done in a variety of rule engines, including the Probabilistic Logic Networks engine, as well as more standard crisp rule engines such as SWIRLS.
This project is appropriate for a student who is interested in both computational linguistics and logical inference, and has some knowledge of predicate logic.
A simple, detailed example of language-based inference using RelEx and PLN is given here: Image:RelEx PLN Example Inference.pdf.
Related working notes can be found in the OpenCog source code tree, in the directory opencog/nlp/seme/README and opencog/nlp/triples/README.
See also NLP-PLN-NLGen pipeline.
Link Grammar Parser improvements
A number of improvements to the Link Grammar parser would be useful.
- Implement left-to-right, limited-window search through parse space. The parser currently examines sentences as a whole, which means that the parsing of long sentences becomes very slow (approximately as N^3, with N the number of words in the sentence). Implementing a "window" to limit searches for connections between distant words should dramatically improve parse performance. Not only that, but it makes the parser more "neurologically plausible", by limiting difficult, long-range correlations between words. The window algorithm can be thought of as a kind of Viterbi decoder. This project is technically challenging, and requires a reasonable grounding in the theory of contest-free grammars and at least a passing acquaintance with the idea of chart parsing, the backward-forward algorithm, and the Viterbi algorithm. This project requires a deep dive into the C code that implements the parse algorithms of Link Grammar. This project is the most critical, outstanding fix needed for Link Grammar; the slowness of long-sentence performance is the biggest thing holding back link grammar at this time.
- Implement support for "speech registers", such as newspaper headlines, medical notes, children's books, conversational speech. The style of linguistic expression varies depending on context: for example, in newspaper headlines, articles are frequently omitted. Children's books often sometimes use archaic grammatical constructions not commonly used in adult writing. During conversational speech, the rules of grammar are significantly relaxed; run-on sentences can predominate. Thus, a "one-size-fits-all" set of grammar rules has trouble with these situations, because the kind of grammar rules appropriate in one context might not be appropriate in another. Loosening parse rules sufficiently to work well for newspaper headlines or chat will result in rules that are too loose for book or newspaper text, generating an over-abundance of incorrect parses. Thus, a better approach is to apply a different set of grammatical rules, based on the context. This project requires modifications to the Link Grammar parser internals to allow for a different parse scoring mechanism for different speech situations, as well as the development of altered grammar rules that can begin to tackle some of these different "speech registers".
- Currently, Link Grammar is unable to properly handle quoted text and dialogue. A mechanism is needed to disentangle the quoting from the quoted text, so that each can be parsed appropriately. This might be done with some amount of pre-processing (e.g. in RelEx), or possibly within Link Grammar itself.
Statistically similar expressions for question answering
Proper language-based inference requires that similar expressions be recognized as such. For example, X is the author of Y and X wrote Y are more or less synonymous phrases. Lin and Pantel describe how to automatically discover such similar phrases. (See Dekang Lin and Patrick Pantel. 2001. "Discovery of Inference Rules for Question Answering." Natural Language Engineering 7(4):343-360.) A similar, but more sophisticated approach is given By Hoifung Poon & Pedro Domingos, 2009, "Unsupervised Semantic Parsing".
Both of these systems use statistical techniques to find commonly occurring sub-expressions. Both start with text that has been processed with a dependency parser. While the first searches for chains of grammatical dependency relations that can be taken as synonymous, the second searches for arbitrary patterns that can be clustered into the same "synonym" or "concept". The two papers also differ in the statistical measures used to assign synonymous patterns into a common cluster, but it is not clear which statistical technique is better, or if that even matters very much. Lin uses measures based on mutual information, while Domingos applies the (considerably more complex) theory of Markov Logic Networks.
The goal of this task would be to implement a frequent-pattern-mining system within the OpenCog infrastructure. That is, given a large number of OpenCog hypergraphs, common subgraphs are to be identified, and then, using some statistical measure (such as mutual information, or possibly MLN, or another distance measure), such hypergraphs would be clustered together. Each such discovered cluster can be thought of as being a "concept"; the elements of the cluster are "synonymous expressions for the concept".
Ideally, by searching not just for synonyms, but for common patterns in general, the system will hopefully automatically find "speech formula", or perhaps "lexical functions" as described in Melcuk's Meaning-Text Theory. Presumably, such clustering techniques should also be able to group similar things, such as a cluster containing names of musical instruments, etc.
The discovery of such clusters should be useful both for question answering, as well as for natural-language generation (as clusters would contain different ways of expressing the same "idea".)
Classifying words by grammatical usage, learning new grammar rules.
One of the datasets associated with OpenCog is a large database associating words with their grammatical usage; another database contains triples of (word, grammatical usage, word sense). These databases contain millions of entries, and are largely unexplored. They could be data-mined for clusters of words that are used in grammatically similar ways. This project would require evaluating and selecting appropriate open-source clustering software. The result of applying clustering would include identifying how new, unknown words should be treated grammatically (is the unknown word being used as a noun, or a verb?)
Clusters might distinguish between types of words that link-grammar treats as being the same (such as most adjectives, adverbs), but are, in fact, used quite differently by English speakers. For example, the parser treats acetylene and adroitness as nouns belonging to the same class; yet clearly one would never say the adroitness exploded. Perhaps clustering could help split this class of words into smaller, more refined classes. Perhaps clustering might also distinguish different semantic senses: I tend to sheep and I tend to agree use the word tend in two dramatically different senses. Can grammatical clustering distinguish between these senses?
Some words may be in classes that are too narrow: parses fail because the parser does not know that the word can be used in a broader grammatical context. By examining how a word is used, perhaps clustering could identify such cases as well.
Learning simple grammars
RelEx uses the CMU Link Grammar as its underlying English-language parser. Link Grammar's weak point is short sentences, simple commands, directives and the like; sentences which typically occur in chat rooms. The project is to tinker with with the automatic acquisition of language via some learning mechanism or another.
Jianfeng Gao,Hisami Suzuki, (2003) "Unsupervised Learning of Dependency Structure for Language Modeling" describe a method for learning dependency grammars. The parser currently used by OpenCog, the Link-Grammar parser, has many similarities to dependency grammars. The paper John Lafferty, Daniel Sleator, and Davy Temperley. 1992. "Grammatical Trigrams: A Probabilistic Model of Link Grammar." Proceedings of the AAAI Conference on Probabilistic Approaches to Natural Language, October, 1992. describes statistical learning algos to generate Link Grammar rules. See also the link grammar bibliography for details.
Yuret's(1998) algo tries to create a dependency tree by computing mutual information (MI) between word pairs. The tree is discovered by computing the maximum spanning tree of the MI between all word pairs. There is an alternate approach: true, hierarchical clustering. In hierarchical clustering, one creates an MI-based metric (MI alone is not a metric), and applies cluster analysis techniques. To get hierarchy, one also looks for and computes metric measures between clusters i.e. one computes the MI between a third word, and a word-pair, instead of just the third word, and the head-word of the pair phrase.
It is not clear how adaptable these algorithms would be to the short, frequently ungrammatical sentences seen in chatrooms; and what mechanisms they suggest for keeping such learning from garbaging up the correct parses of more complex sentences. It is quite possible that link grammar needs new link types which would be used only in short sentences. (!) The idea of learning new link types seems unexplored.
The project is then to learn new Link Grammar links and rules, suitable for parsing the kind of speech often seen in chat rooms.
Natural Language Generation
We have a Java software system called NLGen that, given a set of RelEx-like relations, generate a syntactically correct English-language sentence.
It works OK on short sentences but needs to be improved significantly to be able to handle sentences with complex phrasal or clausal structure. Also, some statistical linguistics should be inserted to allow it to use observed word frequencies to guide its sentence generation.
The basic idea underlying NLGen's algorithm is described here: SegSim
Other language generation approaches include Markus Guhe's (2003) "Incremental conceptualisation for language production"
Explore Landmark Transitivity in Link Grammar
The Link Grammar uses a constraint of "planar graphs" (i.e. no link crossings) to rule out unreasonable parses. It seems that it might be possible to replace this rule by the notion of "Landmark Transitivity" taken from Hudson's Word Grammar. The basic idea is this:
Each Link Grammar link is given a parent-child relationship: one end of the link is the parent, the other the child. Thus, for example, given a noun to noun-modifier link, the noun is the parent of the link. Then, parents are landmarks for children. Transitivity (in the mathematical sense of "transitive relation") is applied to these parent-child relationships. Specifically, the no-links-cross rule is replaced by two landmark transitivity rules:
- If B is a landmark for C, then A is also a type-L landmark for C
- If A is a landmark for C, then B is also a landmark for C
where type-L means either a right-going or left-going link.
Ben hypothesizes that adding Landmark transitivity might be able to eliminate most or all of Link Grammar's post-processing rules. See Ben's PROWL grammar for details. See also Natural Language Processing, below.
Word Grammar Parsing
Implement a Word Grammar parser that utilizes the link grammar dictionary, and utilizes PLN inference for semantic biasing of the parsing process. This gives a strong impression of being an approach to NL comprehension that is suitable for general intelligence.
Inferring Semantic Mapping Rules
Use PLN to combine the hand-coded semantic normalization rules that exist in the RelExToFrame rule-base, to form new rules. Also, to generalize the rules to other words not currently covered by them, via combining the rules with semantically-based concept-similarity measures. (1)
Improved reference resolution
Reference resolution (anaphora resolution) is the problem of determining what words like "it", "him" and "her" refer to. There are several ways in which this problem might be attacked.
1) Statistical methods: The current RelEx implementation uses the Hobbs algorithm for this, which is an intelligent but crude mechanism that achieves about 60% accuracy. By combining Hobbs algorithm with statistical measures (one of which is sometimes called the Hobbs score), it should be possible (according to literature results) to get up to 85% accuracy or so. Appropriate for anyone who is interested in getting some hands-on experience with statistical corpus linguistics.
2) Reasoning: Sentences commonly make assertions, which can be checked for accuracy. So, for example: Jules went to Paris. He saw many things there. Does he refer to Jules, or to Paris? Can Jules see things? Can Paris see things? Does there refer to Paris, or to Jules? Can one see many things in Jules? Can one see many things in Paris? This approach is more powerful than a statistical approach, but requires establishing a large knowledgebase, and performing reasoning there-on. It is expected that PLN would be used for reasoning.
MOSES
- Language: advanced C++
- Code: https://code.launchpad.net/opencog (opencog/learning/moses)
- Earlier versions: http://code.google.com/p/moses/
- List: http://groups.google.com/group/moses-users (mentors: Moshe and Nil)
- License: Apache 2.0
Meta-optimizing semantic evolutionary search (MOSES) is a new approach to program evolution, based on representation-building and probabilistic modeling. MOSES has been successfully applied to solve hard problems in domains such as computational biology, sentiment evaluation, and agent control. Results tend to be more accurate, and require less objective function evaluations, in comparison to other program evolution systems. Best of all, the result of running MOSES is not a large nested structure or numerical vector, but a compact and comprehensible program written in a simple Lisp-like mini-language. More at http://metacog.org/doc.html.
Extended MOSES to encompass primitive recursive functions using fold
A good GSoC project would be to implement in Reduct/MOSES the algebra of foldl, as Moshe hints at the bottom of page 5 of
http://www.agi-09.org/papers/paper_69.pdf
and as explained in detail in the paper
G. Hutton. A tutorial on the universality and expressiveness of fold. Journal of Functional Programming, 1999.
which is at
http://www.cs.nott.ac.uk/~gmh/fold.pdf
Higher-order Programmatic Constructs
A very important project, appropriate for a student with some functional programming background, is to extend MOSES to handle higher-order programmatic constructs, including variable expression. Our design for this involves Sinot's formalism of "director strings as combinators," and there is opportunity for the student to assist with working out the details of the design as well as the implementation. This can be done many ways, including using combinatory logic or lambda calculus. The route that seems best at the moment would be to use Sinot's formalism of "director strings as combinators." Much of the work here is in Reduct and representation-building, which would be useful for both MOSES and Pleasure. (2)
Pleasure
Complete implementation (or starts a new one from scratch) and then test and explore the Pleasure Algorithm for program learning started last year by Alesis Novik. http://opencog.org/wiki/MOSES:_the_Pleasure_Algorithm
Transfer Learning
Causing MOSES to generalize across problem instances, so what it has learned across multiple problem instances can be used to help prime its learning in new problem instances. This can be done by extending the probabilistic model building step to span multiple generations, but this poses a number of subsidiary problems, and requires integration of some sort of sophisticated attention allocation method into MOSES to tell it which patterns observed in which prior problem instances to pay attention to. (2)
Arbitrarily Complex Program Learning
More on the previous project suggestion: The motivation for the above is to allow MOSES to learn arbitrarily complex programs. For instance, we would like it to be able to easily learn nlogn sorting algorithms without any fancy data preparation or other "cheating." It is possible that integrating Sinot's formalism into MOSES will allow effective learning of moderately complex programs using recursive control, which is something no one has achieved before and which is of critical importance in automated program learning.
Action-Sequences Handling
The current version of MOSES does not elegantly or efficiently handle the learning of programs involving long sequences of actions. This is problematic for applications involving the control of robots or virtual agents. So, an important project is the extension of the Reduct and representation-building components of MOSES to effectively handle action-sequences. This work will be testable via using MOSES to control agents in virtual worlds such as Multiverse or CrystalSpace.
Improved hBOA
MOSES consists of four critical aspects: deme management, program tree reduction, representation-building, and population modeling. For the latter, the hBOA algorithm (invented by Martin Pelikan in his 2002 PhD thesis) is currently used, but we've found it not to be optimal in this context. So there is room for experimentation in replacing hBOA with a different algorithm; for instance, a variant of simulated annealing has been suggested, as has been a pattern-recognition approach similar to LZ compression. A student with some familiarity with evolutionary learning, probability theory and machine learning may enjoy experimenting with alternatives to hBOA so as to help turn MOSES into a super-efficient automated program learning framework. It already works quite well, dramatically outperforming GP, but we believe that with some attention to improving the hBOA component it can be improved dramatically.
Dimensional Embedding for Improved Program Learning
Suppose one is using MOSES (or some related technique) to learn a program tree that contains nodes referring to semantic knowledge in a large knowledge base (e.g. a program tree that contains terms like "cat", "walk" etc. that represent concepts in OpenCog's AtomTable).
Then, mutating these nodes (for "representation building", in MOSES lingo) requires some special mechanism -- for instance, one wants to mutate "cat" into "some other concept that is drawn from a Gaussian of specified variance, centered around 'cat'". One straightforward way to do this is to embed the concepts in the semantic knowledge base in an n-dimensional space, and then use Gaussian distributions in this dimensional space to do mutation.
We have identified some good algorithms for dimensional embedding, but the coding needs to be done, and a bunch of fiddling will likely be required!
MOSES Evolution of Recurrent Neural Nets
In principle one can uses MOSES to evolve recurrent neural nets; but, in practice, the ComboReduct library used to reduce program trees to an elegant, hierarchical normal form will probably need some tweaking in order to give nice normalizations for recurrent NN's.
This should allow substantially better results than existing methods for GA evolution of neural nets.
EDIT: Last year Joel Lehman worked on that [1]. He did a great job but due to limitation in MOSES' current capability in handling continuous knobs could not fully experiment his work. (I'll try to make continuous knobs be fully supported before the next GSoC starts).
Improve the reduct engine to be property based instead of operator based
Generalize reduct so that it exploits properties instead of specific operators.
For instance, x+0->x, and x*1->x could be 2 instances of the same reduction rule which would exploit the knowledge that 0 is the neutral element of + and 1 the neutral element of *.
This will have 2 positive effects
- The reduct engine will be simpler
- When adding a new operator, we only needs to specify its properties (or add new rules if it has new properties) instead of adding rules for that particular operator.
This will be very important for operators that are created on the fly as in PLEASURE. In this case we probably need another component to infer (perhaps using PLN) the properties of a given function but that would be for another project.
PLN
- Language: advanced C++
- Code: TBD
- List: TBD
- License: TBD
Probabilistic Logic Networks (PLN) are a novel conceptual, mathematical and computational approach to uncertain inference. In order to carry out effective reasoning in real-world circumstances, AI software must robustly handle uncertainty. However, previous approaches to uncertain inference do not have the breadth of scope required to provide an integrated treatment of the disparate forms of cognitively critical uncertainty as they manifest themselves within the various forms of pragmatic inference. Going beyond prior probabilistic approaches to uncertain inference, PLN is able to encompass within uncertain logic such ideas as induction, abduction, analogy, fuzziness and speculation, and reasoning about time and causality.
Intensional reasoning
Intensional inheritence has been implemented but more tests are needed, like comparing the results to data regarding human intensional inference (2)
Spatial and temporal reasoning
Implement spatial and temporal reasoning into PLN, using the Region Connection Calculus (for space) and Allen's Interval algebra (for time). (3)
Combining PLN and MOSES
Integrate MOSES-based supervised categorization into PLN, so that when PLN chaining hits a confusing point, it can launch MOSES to learn patterns in the members of the Atoms at the current end of the chain (which may then provide additional information useful in pruning). (1)
History-Guided Inference
Cause PLN's backward and forward chaining inference to utilize history -- so that an inference step is more likely to be taken if similar steps have been taken in similar instances. (1)
HypergraphDB
- Language: Java and C++
- Code: http://code.google.com/p/hypergraphdb/ (Java version)
- Code: https://launchpad.net/hypergraphdb (C++ version; no code yet)
- list: http://groups.google.com/group/hypergraphdb/ (mentors: Boris)
- License: LGPL (Java version), TBD (C++ port)
Originally HypergraphDB was intended to be the underlying representation and storage of the AtomSpace, but utilising a BigTable equivalent may be preferable to using the current version of HGDB. HypergraphDB would still be beneficial and possibly more efficient, but it would be a more involved project (and thus take longer) as it needs to be ported to C++ and made distributed.
OpenCog and BigTable integration
HGDB should be integrated with OpenCog as a persistent store.
The BerkeleyDB back end of HGDB should be replaced with BigTable or an open-source equivalent.
OpenCog Prime
Projects related to building and testing the OpenCogPrime design for an AGI.
- Language: advanced C++
- Code: TBD
- List: TBD
- License: AGPLv3
Concept Formation
Blending
Implement conceptual blending as a heuristic for combining concepts, with a fitness function for a newly formed concept incorporating the quality of inferential conclusions derived from the concepts, and the quality of the MOSES classification rules learned using the concept. (1)
Map formation
Implement "map formation," a process that uses frequent subgraph mining methods to find frequently co-occurring subnetworks of the AtomTable, and then embodies these as new nodes. This requires some extension and adaptation of current algorithms for frequent subgraph mining. It also requires functional attention allocation. (1)
Context formation
Implement context formation, wherein an Atom C is defined as a context if it is important, and restricting other Atoms A to the context of C leads to conclusions that are significantly different from A's default truth value. (2)
OpenCog Applications
projects relating to using OpenCog Prime or OpenCog Collective components for specific applications
- Language: advanced C++
(until bindings are created for other languages, such as Java, Python, Ruby, Lisp, etc.) - License: any approved FOSS license
Rex Proxy and OpenPetBrain
A proxy currently exists connecting OpenCog to the RealXTend virtual world, enabling OpenCog to control virtual dogs learning tricks and conversing in English in the virtual world. The code can be found at https://code.launchpad.net/~opencog-dev/opencog/embodiment_ReX-Proxy.
However, there are various quirks related to the interaction between OpenCog and RealXTend, partly to do with fixable shortcomings in RealXTend itself.
So, this project is good for someone who wants to tinker with RealXTend as well as the OpenCog/RealXTend proxy.
Note that we would really like to have that RealXTend proxy done (because Multiverse is not compatible with Linux or Wine) so if we find the right person it is pretty likely we will allocate a GSoC slot for it.
Sokoban
Sokoban would be a good toy domain for experimenting with various OpenCog methods. So hooking up OpenCog to a simple Sokoban server would seem worthwhile. MOSES and PLN could be used for Sokoban on their own, but it's more interesting of course to take an integrative approach. (1)
Robotics
Extend OpenCog to communicate with physical robots, using a toolkit like PyRo, or the Player-Stage Framework, or something similar. As mechanisms for communicating with agents in virtual worlds will be provided, this is not a huge conceptual leap, but will doubtless lead to many practical complexities. (1-3 depending on how far you go)
This project is currently being implemented by the Artificial Brain Lab at Xiamen University in Fujian, China.
- XIA-MAN: An Extensible, Integrative Architecture for Intelligent Humanoid Robotics PDF, Ben Goertzel, Hugo de Garis
OpenBiomind
- Language: Java
- Code: http://code.google.com/p/openbiomind
- List: http://groups.google.com/group/openbiomind (mentors: Lucio and Ben)
- License: GPLv2
OpenBiomind contains code for applying genetic programming to analyze gene expression microarray data and SNP data. This approach has been successfully used to learn diagnostic rules for cancer, Alzheimer's, Parkinson's and other diseases, as reflected in several publications.
Neurobiological data analysis
Add necessary data types preprocessors (e.g. MEG, fMRI, EEG, PET, etc.), analysis algorithm tweaks, documented methodologies and other facilities for analysis of neurobiological data. Many public neurobiological databases exist (falling under the umbrella of the Human Cognome Project), for example the Allen Brain Atlas and others.

