The Datum Universe Model
What?
The Datum Universe model is a knowledge representation scheme. It specifies how to represent data and how to make inferences.
Why?
Identify the fundamental elements of knowledge and the fundamental elements of intelligence. The goal is to provide a unified minimalistic framework that has the building blocks for representing any kind of knowledge constructs
How?
The Datum Universe represents knowledge using two fundamental concepts:
- The Datum; which is an abstract element defined entirely by its relations to other datums
- The "is" relation; which is a directed link between two datums and is the only type of relation allowed among datums.
Intelligence emerges from two fundamental properties of the Datum Universe:
- The transitivity property of the "is" relationship (if a is b and b is c, then a is c) allows for inheritance and generalization. See side notes for examples.
- Induction; which is the capability of the datum universe to create new datums autonomously. The induced datums reduce the number of connections and provide a built-in classification process. See side notes for an example.
You can understand the datum universe model as a Graph, as a Poset, or by comparison to relational Tables and EAV / RDF models. You can also read the white papers listed.
Advantages
Existing data models rely on higher, more complex, and diverse building blocks to represent real life knowledge. For example, in relational databases, we have the concepts: table, row, column, field, primary key, and foreign key. In graph-based knowledge representations, we have nodes and an unlimited number of relationship types. These concepts are "hard-coded" because we cannot modify or advance them, nor reason about their properties within the same framework.
The Datum Universe approach is to build these complex concepts as "soft-coded" constructs out of the minimum "hard-coded" elements. This approach makes it possible to:
- Encode intelligence into the framework. Providing inheritance, generalization /prediction, and a classification process. These are the building blocks for data mining, machine learning and natural language understanding applications.
- Study knowledge bases using formal tools like Partial Orders and Graph Theory. The Datum Universe is essentially a Directed Acyclic Graph (DAG) in a Transitive Reduction state. This gives us a deeper insight into the knowledge base content.
- Implement the model in various ways to achieve different performance-memory behaviour from extremely fast O(1) to extremely memory compact.
- Provide a simple operator-based query language.
- Implement the model totally in hardware. For example, similar to IBM's SyNAPSE chip
Applications
We can highlight 3 major areas for applications of the Datum Universe:
- Traditional In-memory database systems. The model provides flexibility in the actual representation of data in memory. Different representations may target different memory/performance profiles. The fundamental nature of the model makes it easy to represent temporal data as well as executable code as datums. This also leads to the simplicity and power of the models's query operators. See the Datumtron Graph Database API white paper.
- Data mining system. Use the induction process to have data mining / machine learning built into the database. As data changes, new patterns can be detected making predictions based on updated patterns. Contrast this with running external data mining algorithms on snapshots of the database. For example see the Predict tool.
- Intelligent knowledge Agent. Using Natural Language parsers, we can acquire knowledge from existing text sources and build a datum universe. Since the hard-coded knowledge is minimal, there is no limit on the depth of understanding that can be achieved by an intelligent agent.
Datum Universe representation of "The color of apple is red"
- color is thing
- red is color
- apple is red
Notice that:
- apple, red, color, and thing are all datums.
- "is" is the only relation allowed and it is a general case of the "IS-A" relation.
- The fact "color of apple is red" is concluded from the two relations: "red is color" and "apple is red".
- A full relational database (Northwind) is converted to the datum universe in the API Tutorial.
Inheritance
is the ability to deduce attributes of a datum based on attributes of its "is" relations. For example, if we have "apple is red", and "apple1 is apple", then we can conclude that apple1 is red.
Generalization
is the opposite of inheritance; deducing properties of a datum using the attributes of its instances. For example, if all of the instances of apple that we have seen are sweet, i.e., "apple1 is sweet", "apple2 is sweet", etc., then we can generalize that all apples are sweet and conclude that "apple is sweet".
Induction
The induction of a new datum1. The induction of a new datum
The induction of a new datum1. The induction of a new datum
The induction of a new datum1. The induction of a new datum
The induction of a new datum1. The induction of a new datum
The induction of a new datum1. The induction of a new datum
The induction of a new datum1. The induction of a new datum
The induction of a new datum1. The induction of a new datum
The induction of a new datum1. The induction of a new datum
Previous
Next
Induction
is the creation of a new datum to reduce the number of relations. For example, if we have 10 instances of apple, that are all sweet and fresh, and as a result we have many repeating pairs like "apple1 is sweet", "apple1 is fresh", "apple2 is sweet", "apple2 is fresh", etc. A new datum X may be induced as follows "X is sweet", "X is fresh" and "apple1 is X", "apple2 is X", etc. This reduces the number of relations from 2*10 to 2+10.