11/24/13

Data Search & Dragonfly.

Regular Expressions can be used to search for patterns in data.

Exclusions (there's one when no dictionary 'word' matches any part of data searched) can help to find encrypted data. After decrypting they can be catalogued and indexed for Regular Expression searches.

Defensive Dragonfly should let users be alerted about certain data patterns passing through their internets.

This method can be used to effectively and reliably uncover facts (hidden meanings), at least in the Internet.


Basic Regular Expressions Tutorial:

Sequences of symbols (letters, digits and many others) can form Character Strings that can be used as part of Regular Expression pattern.

For example:

'Mat' regular expression pattern can be found in Character String: 'I liked this Matrix film', but does not match it (it would have to be exact to be a match).

To find a word in a character string add '.*' at beginning and at the end to make it match it.

For example:

'.*Mat.*' matches 'I like this Matrix film'.


Dot symbol '.' matches any symbol and can be used in a pattern.

For example:

'Al...' regular expression pattern matches Character Strings 'Aloha', 'Alice' and other five symbol Character Strings starting with letters 'Al'.


Question symbol '?' means 0 or 1 occurances of a symbol and can be used in a pattern.

For example:

'Ann?' matches either 'An' or 'Ann' Character Strings, while 'An.?' matches 'An' or 'Ann' or 'Ana' or 'An1' or many other Character strings.


Plus symbol '+' means 1 or more occurances of a symbol.

Asterisk symbol '*' means 0 or more occurances of a symbol.

Brackets '(' and ')' can be used to group symbols.

For example:

'(an)*as' matches character sequences 'ananas', 'anas', 'as', 'anananas' and many other similar ones.


There are other 'special symbols' used in regular expressions, but that's basics and a way of thinking.

Check implementation details for more, for regular expressions can be extended as for example in Java (numbered groups).


Basic Exclusions Tutorial:

We have character sequence string: 'I liked this Matrix film'.

We have regular expression patterns for 'Mat...' and 'film'.

This way we can find words 'Matrix' and 'film' in input sequence, and 'I liked this ' and ' ' character sequences as exclusions.

If it was encrypted data, we could find it this way and decrypt for further processing, which could be easily automated.

If we wished to find 'Matrix film' character sequence we'd have to define pattern 'Matrix film'. This would not find these words seperately.


See also: Dragonfly & Windshield, Internet, the Internet, and Intranets.

No comments:

Post a Comment