#header-inner {background-position: right !important; width: 100% !important;}

## 9/22/16

### Inductive Learning.

Introduction.

As a requirement for the complete understanding of this article, let's consider the Predicate Logic.

Inductive Inference.

Let's consider following logical consequence rule statement:

P ∧ W ⊧ K.

Where:
- P is premises set,
- W is student's inborn knowledge,
- K is conclusions set,
- ⊧ is logical consequence operator.

This statement says that knowledge represented by conclusions K is logical consequence of inborn knowledge W and premises set P.

For our needs, it's convenient to interpret this statement backwards, assume that conclusion is training information T, and assume that student in the process of inductive inference, acquires a certain hypothesis h.

It's also convenient to give following meanings & notations to the statement parts:
- P : knowledge generated during the learning process, also called hypothesis h,
- W : student's inborn knowledge,
- K : training information, noted as T.

Then our statement assumes form:

h ∧ W ⊧ T.

We can say that:
- hypothesis: 'h ∧ W' explains conclusion 'T'.
- 'T' is logical conclusion of hypothesis 'h ∧ W'.
- in this statement: hypothesis explains logical conclusion.

In more elaborate words:

- training information acquired by a student is logical consequence of inborn knowledge and generated hypothesis h.
- inductive hypothesis with student's inborn knowledge explains acquired training information.

Of course, logical consequence occurs when inborn knowledge, training information and hypothesis are correct. In practice often we have to differ from this assumption and settle for approximate consequence.

Finding correct hypothesis in light of above, means detecting in training data certain general regularities, that when joined with the inborn knowledge, explain that data well.

This approaches popular understanding of inductive inference as getting from facts and individual observations to generalizations.

These facts & observations are called 'training examples', and training information given to a student is a 'training examples set'.

Hypothesis, that a student has to find been given the training information, is a generalization of 'training examples'; it's purpose is not only explaining correctly (or approximately correctly), but more importantly prediction of new facts & observations.

There are three types of inductive learning:
- learning concepts (a way of objects classification),
- creating concepts (objects grouping, describing groups),
- learning approximations (mapping objects on real numbers).

Main Types of Inductive Learning.

The goal of inductive learning may assume different forms, depending mostly on the knowledge that has to be acquired by the inductive learning and the form of training information given to a student.

We'll demonstrate the three most important forms of training information, on which the most of the theoretical & practical work focuses, which have the most practical uses as well.

In each form of inductive learning, acquired knowledge is a certain type of mapping of input information on output information.

1. Domain.

Domain is an objects set X. Objects in X are related with knowledge acquired by a student. These objects may represent things, people, events, situations, states of things, etc. - anything that can be argument of a mapping that student has to learn.

2. Examples.

Each of objects, each element of a domain x ∈ X, we'll call an example.

3. Attributes.

We'll assume that examples are discribed using attributes. Attribute is any function specified on a domain. We'll assume that a description of every example x ∈ X consists of values: n ≥ 1 attributes, a1: X → A1, a2: X → A2, ... , an: X → An.

A set of all attributes specified on a domain we'll note as A = { a1, a2, ... , an } and call it the 'attributes space'.

In practice we sometimes identify an example x with attributes vector:

< a1(x), a2(x), ... , an(x) >,

So we'll call an example any element of the cartesian product of the codomain of attributes A1 × A2 × ... × An; this simplification might be misleading, but has it's uses.

For a convenience we'll recall this vector, that for an example x we'll note as <x>A.

Depending on the codomain (a values set), attributes can be divided into types.

Most basic, sufficient for learning purposes is an attributes division as follows:
- nominal attributes: with a finite set of unordered discrete values,
- ordinal attributes: with a countable set of ordered discrete values,
- continuous attributes: with values from a real numbers set.

For each examples set P ∈ X, attribute a : X → A and it's values v ∈ A we'll designate as Pav a set of these examples from P, for which an attribute a has a value v, thus:

Pav = { x ∈ P | a(x) = v }.

Example 1. Points on a plane:

Let's consider a domain X = R2 that is a two-dimensional plane. Examples are points on that plane. Each of examples can be described using two continuous attributes:

a1 : X → R and a2: X → R

that specify proper cartesian coordinates of this point relative to assumed coordinates set.

Similarly, a domain can be assumed to be space Rn for any specified value n ≥ 1.

Example 2. Binary Strings.

Let's consider a domain X = {0,1}n for a given value n ≥ 1. We can assume, that all examples from this domain are n-element binary strings.

Examples are naturally described as n attributes:

a1: X → {0,1}, a2: X → {0,1}, ..., an: X → {0,1}, where:

for each x ∈ X and for each i = 1, 2, ..., n - value ai(x) describes i-th element of a string x.

In this example, we can equalize examples with attribute vectors, and for convenience we can use notation xi instead of ai(x).

Example 3. Geometric Shapes.

Let's consider a doimain consisting of colorful geometrical shapes with differing sizes and shapes. Examples from this domain we can describe with following attributes:

size: ordinal attribute with values: small, medium, large,
color: nominal attribute with values: red, blue, green,
shape: nominal attribute with values: circle, square, triangle.

Example 4. Weather.

Let's consider a domain consisting of possible weather states. Each of examples from this domain we can describe with following attributes:

aura: nominal attribute with values: sunny, cloudy, rainy,
temperature: ordinal attribute with values: cold, moderate, warm,
humidity: ordinal attribute with values: normal, high,
wind: ordinal attribute with values: weak, strong.

Example 5. Cars.

As another example, we'll consider a domain, with elements are car models available on the market. We'll assume that examples from this domain can be described with following attributes:

class: ordinal attribute with values: small, compact, large,
price: ordinal attribute with values: small, moderate, high,
performance: ordinal attribute with values: weak, average, good,
unfailability: ordinal attribute with values: small, average, high.

Learning Concepts.

Concepts are one of forms of our knowledge about world, used to describe & interpret sensual observations & abstract ideas.

With a concept of 'chair', we can point in a large set of various furniture these, that are 'chairs', and these that are not - even if in both groups are furniture pieces with differing size, color, with differing amount of legs, made of differing materials.

In a most basic case, concept specifies division of a set of all considered objects, or domain, into two categories:
- objects belonging to a concept (positive examples),
- objects not belonging to a concept (negative examples).

Sometimes it's convenient to consider multiple concepts described on the same domain, we'll call this 'multiple concept'.

'Multiple concept' describes domain division into categories, of which each category corresponds to one of the 'single concepts'.

def. Concept: Let's assume that on a domain might be specified a class of concepts, noted as CC. Each of concepts c ∈ CC is a function c : X → C, where C describes finite set of categories of concepts of class CC.

In a case of 'single concepts' we'll assume C = {0,1}. In a case of 'multiple concepts' C might be any finite set of categories with quantity of |C| > 2.

'Single concept' describes subset of a domain, consisting of 'positive examples' of this concept:

XC = { x ∈ X | c(x) = 1 }.

In the general case, for a category d ∈ C, certain concept and any of examples set P ⊆ X we assume notation Pcd for these examples from P, that belong to a category d, thus:

Pcd = { x ∈ P | c(x) = d }.

We may omit c in this a notation, and use Pd.

In particular for a single concept c a set X1 = XC is a set containing all of it's positive examples, and set X0 = X - X1 is a set of all of it's negative examples.

Example 6. (Rectangles on a plane).

For a domain of points on a plane R2 introduced in an example 1, we can consider concepts class CC represented by all rectangles with sides parallel to the coordinates system' horizontal and vertical axes.

With a rectangle representing any concept from c ∈ CC we can connect coordinates of it's 'left-down' and 'right-up' points, appropriately (lC, dC) and (rC, uC).

Then a positive examples set of a concept c is defined as set of all points inside or on border of this a rectangle:

XC = { x ∈ X | lC ≤ a1(x) ≤ rC ∧ dC ≤ a2(x) ≤ uC }.

A concept represented by a rectangle as well as several positive examples (filled circles) and negative examples (empty circles) is shown on a following image.

(click on image to enlarge)

Example 7. (Boolean Functions).

For a domain of n-element binary strings introduced in an example 2, concepts might be represented by n-argument boolean functions.

Definitions of these functions have form of a logical formula, in which literals (atomic formulas joined into complex formulas by logical functors) are attributes (whose values of '0' or '1' are interpreted as 'false' or 'true' logical values).

More precisely, for a definition c(x) there can occur positive literals: ai(x) and negative literals: ¬ai(x) for each i = 1,2,...,n.

Positive examples of this domain are these domain elements for which proper formula is satisfied.

For n = 5, example definitions might be:

c1(x) = a1(x) ∨ ¬a3(x) ∧ (a4(x) ∨ a5(x)),
c2(x) = ¬a5(x),
c3(x) = a2(x) ∧ a4(x).

Example 8.

For a domain of geometric shapes introduced in an example 3, we can consider a concepts class CC consisting of all possible single concepts for this domain.

If we assume that this domain is finite, then |CC|= 2|X|.

Certain of these concepts might have a reasonable interpretation for us, for example: 'shapes that resemble fruits', or 'small shapes'.

Example 9.

For a domain of weather states introduced in an example 4, we'll consider a concept class CC consisting of all single concepts, that can be specified for this domain. Selected few of these concepts might have meaningful interpretation from our perspective - such as 'typical mediterranean weather' or 'weather good for sailing'.

Hypotheses for Learning Concepts.

For a given domain and concepts class there's specified, depending on used learning algorithm, space of possible hypotheses, noted as HH.

Hypotheses space consists of all hypotheses that student can construct.

Every hypothesis h ∈ HH, as well as every concept, is a function that assigns examples their categories, so we can write:

h: X → C.

Result of learning is always a selection of a hypothesis from HH, considered as best for given training examples (and possibly, inborn knowledge as well).

Precise learning of every target concept c ∈ CC is possible only if CC ⊆ HH. Then its true that c ∈ HH - that hypotheses space contains hypothesis identical to target concept.

In practice, for certain algorithms, we have however: HH ⊂ CC and we have no certainty that we can learn a target concept. This does not mean, however, that we should strive to equip student in richest hypotheses space possible - because this would hinder learning process.

(...)

(to be continued/rewritten as needed or neccessary, when/if i can).