Copyright © 2003 Ben Martin
Abstract
Background information on libferris and specific details relating to how it handles full text and attribute indexing. Information on how the predicates and indexes can be used to support Formal Concept Analysis over filesystems.
Table of Contents
Background knowledge on current filesystem designs and interfaces is presented in the section called “Background” which also contains a presentation of the terminology used throughout the paper. This is followed by a formal consideration of filesystems as the data for FCA.
This section describes modern filesystem design and interfaces. Particular attention is paid to libferris and what interfaces and functionality are offered by it.
Traditional filesystems arrange data using two main abstractions; files and directories. Directories can be nested within other directories. Files may be created in any directory. Files are restricted to containing a linear range of octets. Directories can not have octet content but can contain other files and directories. UNIX filesystems have various specilized objects which are presented as files but are implemented differently under the surface. Such specilized file like objects include; links, pipes and device files. Recent operating system design also exposes access to internal kernel data through specilized files such as located in the /proc filesystem in Linux [Linux Proc fileystem]. Note that UNIX systems support files such that the linear range of octet content may contain holes. A hole is an area within the octet range that appears to exist in that it consumes part of the linear range as far as lseek(2) is concerned but the hole consumes a nominal amount of disk storage and contains no data.
libferris [libferris website] is a Virtual Filesystem (VFS) which runs in the user address space rather than the kernel address space of traditional VFS. By moving the VFS into the user address space libferris can use other shared libraries to present data as a filesystem, for example a berkeley db can be mounted as a filesystem or an XML file can be mounted as a filesystem. In fact both db and XML files as seen though ferris appear just like normal directories. libferris does away with the notion that a directory can not have octet content and merges the abstractions file and directory into a single abstraction called a "Context". Context objects in libferris are exposed though smart pointers using intrusive reference counting. The type for a smart pointer to a class in libferris is the classname prefixed by an "fh_". The "file://" context objects maintain a hot view of the filesystem as presented by the kernel in that they modify themself automatically to reflect addtions and deletions made to the filesystem by other processes [1]. One can walk the children of a Context using STL style iterators. These iterators will remain valid through insertion and deletion including the deletion of the Context that the iterator currently points to.
Files and directories may have arbitrary key value data attached. Such a scheme is refered to as Extended Attributes (EA) [ACL]. The EA interface of the kernel is extended in libferris by allowing the values attached to the attribute to be either stored seperately as kernel level EA or generated dynamically. Examples of EA that can be generated include the width and height of an image, the bitrate of an mp3 file or the MD5 [MD5 digest] of a file. Access to EA in libferris is achieved with either a quick and dirty string interface or through extented std::iostreams. The string interface uses a function like
std::string getStrAttr( | c, | |
| key, | ||
| default, | ||
| getAllLines, | ||
throw_for_errors); |
| fh_context | c; |
| const std::string& | key; |
| const std::string& | default; |
| bool | getAllLines; |
| bool | throw_for_errors; |
with the two last parameters being optional ones to control if all lines of data are obtained and if an exception is raised if the attribute doesn't exist or the default is returned. There is a similar function for setting attributes
std::string setStrAttr( | c, | |
| key, | ||
value); |
| fh_context | c; |
| const std::string& | key; |
| const std::string& | value; |
Using the iostream interface one first gets a smart pointer to an Attribute using Context::getAttribute() and then calls getIStream() or getIOStream() on the attribute depending on if they intend to update the data or not. Note that once the final reference to an iostream goes out of scope then if the iostream was updated then the attribute will have its value set.
Another way libferris extends the EA interface is by offering schema data for attributes. Such metadata allows for default sorting orders to be set for a datatype, filtering to use the correct comparison operator (integer vs. string comparison), and GUIs to present data in a format which the user will find intuitive.
The standard IOStreams [Standard C++ IOStreams and Locales] system in C++ is extended in libferris to allow stream handles to be passed by value. The underlying streambuffer is not copied during this process. The ability to pass the iostream object by value solves the sticky issue of having a method return data in a newly created iostream. Another extension to the standard allows a streambuf to perform work when the last iostream reference is going out of scope. This can be used by plugins, attributes and methods to return data in an iostream and perform some work on data that the user puts into that iostream once the user is done with it. An iostream can also be monitored by a developers application by attaching to a SigC signal which is fired when the streambuf is about to be reclaimed. This is done by attaching to StreamHandlableSigEmitter::getCloseSig(). Note that all iostream classes are a subclass of StreamHandlableSigEmitter. Attached functors will be called with an iostream and a std::streamsize informing of how much of the iostream was written to.
IOStreams implementors can use the ferris_basic_streambuf and ferris_basic_double_buffered_streambuf classes as superclasses to reuse some of the boilerplate code of iostreams.
If one has an iostream and wishes to create another iostream that exposes only a range of bytes in the original then there is the following function
fh_istream MakeLimitingIStream( | ss, | |
| be, | ||
en); |
| fh_istream | ss; |
| std::streampos | be; |
| std::streampos | en; |
which pretends the start of iostream is a octet offset into the oringial iostream and the eof is at another octet location.
Transparent compression is implemented for contexts in libferris and a context can be converted to and from compressed state using ConvertToCompressedChunkContext() and ConvertFromCompressedChunkContext(). Note that there are many options as to what blocking sizes to use and what compression algorithmn is used to compress chunks. See the API documentation for ConvertToCompressedChunkContext() for more information.
An interesting iostream can be created with
fh_ostream MakeHoleyOStream(ss);
fh_ostream ss;
which monitors the data that is written to it and if ss supports hole creation will replace any blocks of all zero with a hole instead.
There are also functions to obtain ferris iostream versions of cin, cout and cerr called fcin(), fcout() and fcerr() [2]. Standard iostreams can be wrapped using MakeProxyStream(). File descriptors can be wrapped using MakeFdIStream( int fd ), MakeFdOStream( int fd ) and MakeFdIOStream( int fd ). Memory regions can be wrapped as an iostream using MakeMemoryIStream( ptr, size ) and MakeMemoryIOStream( ptr, size ). And if the operating system supports it a file descriptor can be used to form a memory mapped iostream using MakeMMapIStream() and MakeMMapIOStream().
libferris provides the ability to create facade filesystems. Such filesystems accept another filesystem as an argument and modify the behaviour of access to the underlying filesystem while they themself present the same interface as the underlying filesystem. This design follows the Decorator pattern in [Design patterns]. Examples include sorting, filtering, set theoretic combinations such as unions and the collection of arbitrary filesystem objects into a single filesystem. Examples of when one may wish to combine multiple filesystem objects to appear as a single filesystem include; passing a selection, copy and paste, or returning a search result set. The filesystem that allows arbitrary collection not only presents a fake filesystem containing a list of its collected objects but also modifies what is reported as the filename of each object to be their URL. Such modification is needed because many applications implicitly expect the filename for all children of a directory to be distinct. It should be noted that many of the facade filesystems are hot in that they will notice what is happening to their underlying filesystem and adjust their contents to reflect file addition and removal in the base.
The main method of resolving a string to a Context in libferris is called
fh_context Resolve(url);
const std::string& url;
This method has support for creating facade contexts for URLs. For example, one can pass a string like "filter:predicate=(is-dir=1)///tmp" to Resolve() and the "file://" context "/tmp" will be subjected to a filtering operation which only exposes directories in "/tmp" through its filesystem. The filtered filesystem will be returned to the user. As can be seen from the above URL extra data can be passed to the wrapping context between the ":" after the facade context name and before the first "/". Note that facade contexts can be chained using either the string URL syntax or the APIs. For example a inheritea context can be attached to a sort context which is attached to a filtering context which is attached to a native kernel "file://" URL.
The following sections describe some of the facade contexts that libferris supports.
cachecontext. This is in the form "cachecontext:updatetime=300s/RemainingURL". This facade will cache reads to attribute values for a selected period or for the entire runtime by default. For more information on which extra params cachedcontext supports in your version see the API docs for Factory::MakeCachedContext().
inheritea. This is in the form "inheritea:/RemainingURL". When asked for an EA this context will forward requests that it can not satisfy to its parent. This can create a sort of value inheritance for a tree allowing default values for an attribute to be set and overridden at a lower level in the tree. Note that inheritea doesn't support extra params at this time. If the method Factory::makeInheritingEAContext() changes to include extra params then they can be passed above.
sort. This allows the filesystem to be sorted in a desired way. A sorting facade contains every object in its base filesystem but presents them in a different ordering. Sorting of filesystem data can be performed using an ordering over any EA. Equivalence within a sorting order can then be subjected to another ordering to an arbitrary depth. For example one can sort a directory by mtime, mimetype, atime, ctime, size and then filename. Sorting is a hot filesystem. The syntax is "sort:orderspec/RemainingURL" where orderspec is either a string describing a sort order "x" or a list of such strings each contained in "()" brackets. An example of a list would be "(:#:size)(name)" which will sort numerically by size and when two or more objects have the same size it will sort those then by name aswell. The format for "x" is the name of the attribute to sort on with an optional ":meta:" prefix for this sort. The meta applies only to the following attribute and can add reverse ordering (!), lazy ordering (L), or explicitly specifiy the type to use for the sort; numeric (#), floating point(FLOAT), case insensitive string (CIS) or version sort (V). Version sort gives the semantics of "ls(1) -lv", lazy ordering performs a normal sort and then new objects added to the context are appended to the sort order (much like Microsoft File Explorer). For more information on which extra params sort supports in your version see the API docs for Factory::MakeSortedContext().
filter. This allows a filesystem to only present contexts which match a predicate. The syntax is "filter:(pred)/RemainingURL". Filtering in libferris is done via an LDAP search string [LDAP search RFC] like predicate hereafter refered to as a ferris-filter. Such filters are based on a scheme like syntax and allow basic boolean logic. For example "pred" in the above could be "(firstname==fred)" which will require the value stored in the EA firstname to be equal to "fred". Equality will be tested based on what schema is bound for the "firstname" attribute, for example, case insensitive string comparison. A filter such as "(&(size<=100)(mtime>=3))" will require both predicates to be satisfied and short cut evaluation will eliminate "mtime" testing for objects greater than "100" octets in size. Due to the different objectives of libferris and LDAP some extensions have been made to the filter string format. Such extensions include regular expression searching "(=~)". A typical filtering process takes a ferris-filter and an underlying filesystem and presents a new filesystem in which only objects which pass the ferris-filter are presented. For more information on which extra params sort supports in your version see the API docs for Factory::MakeFilteredContext().
union. This allows one or more filesystems to be combined into a single filesystem which presents the set theoretic combination union of objects in all underlying filesystems. There are restrictions on what can be done when a facade context accepts many contexts as parameters. For such contexts each underlying context must be defined by a string containing only one ":" character. For example "union://file://tmp/file:/usr/tmp" will make the union of "/tmp" and "/usr/tmp". Contexts which appear in "/tmp" will appear in preference to those from "/usr/tmp" for example "/usr/tmp/a" will not be shown if "/tmp/a" exists. For more information see the API docs for Factory::MakeUnionContext().
setdifference. This is the set difference of two or more filesystems. The same restrictions as apply to union re underlying context arguments apply to setdifference. Note that the set difference is calculated left to right with the result from one setdifference defining the first argument of the next calculation. For example a collection of contexts "a,b,c,d" will be evaluated as "setdiff(setdiff(setdiff(a,b),c),d)". For more information see the API docs for Factory::MakeSetDifferenceContext().
setintersection. This is the set intersection of two or more filesystems. The same restrictions as apply to union re underlying context arguments apply to setintersection. Note that the set intersection is calculated left to right with the result from one setintersection defining the first argument of the next calculation. For example a collection of contexts "a,b,c,d" will be evaluated as "setinter(setinter(setinter(a,b),c),d)". For more information see the API docs for Factory::MakeSetIntersectionContext().
setsymdifference. This is the set symetric difference of two or more filesystems. The same restrictions as apply to union re underlying context arguments apply to setsymdifference. Note that the set symetric difference is calculated left to right with the result from one setsymdifference defining the first argument of the next calculation. For example a collection of contexts "a,b,c,d" and a function which operates much like the STL set_symmetric_difference() but on filesystems "f" the setsymdifference will be evaluated as "f(f(f(a,b),c),d)". For more information see the API docs for Factory::MakeSetSymmetricDifferenceContext().
selection or list. A collection of filesystem objects from arbitrary filesystems presented in one filesystem. see Factory::MakeContextList() and the "selectionfactory://" URL.
Document Object Model. Two way support to go from a ferris filesystem to a Document Object Model (DOM) and from a DOM to a filesystem are provided by libferris.
DOMDocument* makeDOM( | c, | |
hideXMLAttribute); |
| fh_context | c; |
| bool | hideXMLAttribute; |
fh_context mountDOM(doc);
DOMDocument* doc;
There is also support for mounting XML fragments using
DOMDocument* StreamToDOM(iss);
fh_istream iss;
Developer helper contexts. There are also some Context subclasses that are provided to help developers create new subcontext classes. The Context subclasses typically add a specific feature that a plugin implementor might be looking for. An example is networkRootContext which creates subcontexts for each name the computer that the code is executing on has and handles requests for other host names. One might subclass networkRootContext for a "sockets://" plugin or a plugin handling network interaction. There is also leafContext and FakeInternalContext which provide a superclass for plugins which are presenting childless contexts and internal context nodes to a tree respectively. For context classes which need to be able to setup an arbitrary directory structure quickly ParentPointingTreeContext serves as a good superclass with its ensureContextCreated() method. Some subclasses might wish to add attributes which it feels are very important. Such attributes may be recommended to be displayed to the user in any column list view. The two classes RecommendedEACollectingContext and Statefull_Recommending_ParentPointingTree_Context provide support for having custom attributes which will have their names added to the "recommended-ea" attribute.
StateLessEAHolder. Some template Context classes take both the child type and parent type and insert themself into an inhertance hierarchy between the two. By taking the type of the child Context as a template argument methods can be made to return objects of the child type and operate using the interface presented by objects of the child type. The StateLessEAHolder class is an example of a class of this type. One inserts a StateLessEAHolder into the inheritance tree if they wish to add stateless EA. Stateless EA are described in section the section called “EA implementation in libferris”.
mounting. Mounting non kernel based filesystems with libferris happens implicitly. There is a level of facade that must be introduced here to make this happen. For example if one has a fh_context handle to the Context at "/tmp/myfile.xml" and they then call read() on that handle then the XML file will be mounted as a directory implicitly. A new XML Context will be created with the URL "/tmp/myfile.xml" and will be bidirectionally linked with the NativeContext that the user has a handle to. Subsequent requests for child related operations on the NativeContext handle will forward requests to the XML Context. This way the user doesn't have to care about mounting or unmounting and just assumes the XML file has children. The subtree will be elegable to be reclaimed by the memory manager when the user drops their last reference to the NativeContext.
diff. Two filesystems can be compared using the "diff://file://tmp/before/file://tmp/after" syntax mounting. There is also an API call MakeDiffContext() that combines any two filesystems into a diff filesystem. A diff works much like a union filesystem but some interesting EA are added: "was-created", "was-deleted", "is-same", "is-same-bytes", "unidiff", "different-line-count", "lines-added-count", "lines-removed-count". Note that although at current this filesystem hands off the GNU diff to generate the unidiff it does so in a filesystem independant manner so as to allow getting the unidiff between anything ferris can mount. For example one can get the unidiff between two tables in a relational database via the "unidiff" EA from a diff mount.
The two main ways for a filesystem to attach EA depend on how common the attribute is. If a Context subclass knows that an attribute will exist for all its Context objects then it can bind that attribute as "stateless" using the StateLessEAHolder::tryAddStateLessAttribute() method. A large saving in memory overhead can be achieved using stateless attributes because one attribute object can service any number of Context objects. A prime example of stateless attibutes are the "size", "mtime" and "is-fifo" EA that are bound for all NativeContext objects. Attributes can also be bound to a single Context object using one of the addAttribute() methods. An attribute carries atleast three things; the name of the attribute, a getIStream() functor, and the schema for the attribute. Attributes which permit writing also carry a getIOStream() method and a Closed() functor. The Closed() functor has the following signature
void Closed( | c, | |
| eaname, | ||
| atom, | ||
iss); |
| Context* | c; |
| const std::string& | eaname; |
| EA_Atom* | atom; |
| fh_istream | iss; |
The changes that the user has made to a stream between getIOStream() and having the final fh_iostream handle drop out of scope are reflected in the iss stream.
The getIStream() and getIOStream() functons have the following templates
fh_istream getIStream( | c, | |
| eaname, | ||
atom); |
| Context* | c; |
| const std::string& | eaname; |
| EA_Atom* | atom; |
fh_iostream getIOStream( | c, | |
| eaname, | ||
atom); |
| Context* | c; |
| const std::string& | eaname; |
| EA_Atom* | atom; |
at many times the functors passed to tryAddStateLessAttribute() and addAttribute() for getIStream() and getIOStream() are actually to the same function.
The signatures of the getIStream(), getIOStream() and Closed() functions vary for Stateless attributes. The "Context*" parameter will be declared as a pointer to the derived Context classname. This allows a stateless attribute handler to access the internal methods and data of a derived Context class.
One should consider stateless attributes in the same way as a stateless object in an MTS [Microsoft transaction server] system. This means that the functors themself should not contain state but can access data in the Context object which is passed in as the first parameter. Thus a stateless functor should always perform the same task if the passed in Context has the same state.
Note that stateless attributes of a subclass can be augmented by calling Context::supplementStateLessAttributes() after the subcontext creates its own stateless attributes. Such a call will augment standard binary attributes with more human accessable versions. For example a "size" attribute will have a "size-human-readable" attribute added, a "mtime" attribute will be augmented with "mtime-ctime" and "mtime-display" the ctime version will convert a time_t value to a human readable version using ctime(3), the display version will use a time formating of the users choosing to present the time.
A schema can be bound to native disk EA by writing a new native disk EA with the prefix "schema:". Also for a entire directory a schema may be written for all current and future EA with a given name "ean" by writing to the "subtreeschema:ean" native EA in the parent directory.
libferris supports plugin modules which can supplement the standard EA of a context. Usually this supplementation is done by extracting information from the Context itself or from the Context and other attributes.
All EA generators attach statefull attributes using addAttribute(). EA plugins exist using two shared libraries, one library contains the main functionality and a factory library tells libferris if the main library should be opened. The factory library will define a factory function in the gloabl "C" scope
MatchedEAGeneratorFactory* MakeFactory(
which will typically create and return a GModuleMatchedEAGeneratorFactory object "gf" by passing it the name of the main library and a "matcher" predicate object. A matcher is much like a ferris-filter (see section filter) in that it takes a Context object and returns true or false. libferris passes Context objects to the "gf" object to decide if the main library should be opened and if the plugin is interested in attaching new attributes to the Context.
EA generator plugins have a default priority level which can be overriden by defining a function which returns a higher or lower than normal priority
AttributeCreator::CreatePri_t getCreatePri( |
As all EA generators are consulted from highest priority to lowest attempts to bind the same attribute name from two plugins can be resolved by adjusting getCreatePri() level.
The following diagram shows a filesystem fragment and what parts of the code are responsible for what attributes. The directory structure is defined along the left side and in brackets under some Contexts some of the EA are shown in brackets with the responsible party shown first.
/
/tmp
(NativeContext; is-dir, mtime, size, inode)
( Context: name, url, ea-names, size-human-readable)
/tmp/myfile.xml
/tmp/myfile.xml/baseitem
(ferris.xml: author, title)
( Context: name, url, ea-names)
/tmp/myimage.jpg
(NativeContext; is-dir, mtime, size, inode)
( Context: name, url, ea-names, size-human-readable)
( ferris.jpeg: width, height, rgba-32bpp)
Note that the width, height, rgba-32bpp are made by an EA generator and are statefull attributes attached with addAttribute(). All attributes shown for NativeContext are stateless and are in this case gleemed from a call to lstat(2) on the Context. The attributes bound by Context are all stateless also but for the /tmp/myfile.xml/baseitem path there is no "size-human-readable" attribute. This is the case because "size-human-readable" is only added if the "size" attribute is stateless and bound by the subcontext. The stateless EA are different for internal to the XML file because the Context class that takes care of the XML subtree has a different C++ type and thus can have a different collection of stateless attributes.
If all attributes were stateless then schema binding for attributes in libferris would be relatively simple. Many filesystems require the ability to bind attributes with a one to one mapping of attribute to context. For example when an XML file is mounted we need to be able to bind attributes to only the Context that they should be attached to. Another example is the handling of native kernel EA schemas; such schemas default to FXD_BINARY_NATIVE_EA type but can be changed on a individual attribute level to a aribitrary schema.
The ability to attach an attribute with an arbitrary schema to just a single Context complicates systems that rely on schema information to work. An example is processing the filter "(key<=7)" the attribute "key" might have a different schema depending on the Context that we are considering and thus require a different comparison function (integer vs string comparison).
Access to the full text index that libferris maintains can be achieved either through an API or using a purely filesystem based interface. For example when one reads the URL \url{fulltextquery://ranked/mad hatter} a ranked query is performed on the full text index and a filesystem which contains the matches is displayed. libferris maintains an index of EA values in order to support find queries for arbitrary predicates. The interface to this index is presented both through and API and as a filesystem. For example, reading the URL "eaq://(|(size<=100)(name=\~.*c)" will perform a search though the index for any object in the entire filesystem with a small size or a extension of ".c" and return a filesystem containing all of these objects. Note that the predicate that defines the search has the same format as a ferris-filter string described in filter.
The attribute index is an inverted file mapping a key to a list of document identifiers. The document identifiers can be used in a second lookup table to obtain the URLs for Context objects. The selection of what comprises the key in the mapping must take into consideration the following
The attribute name
The value for this attribute
The schema for this attribute name, taking into account that a given attribute name may have different schemas depending on where that attribute is bound.
The following lookup operations to get a list of document identifiers from the index must be optimized;
Presence of a given value for an attribute name. For satisfaction of $(name == fred)$ searches.
A selected attribute which is less than or equal a value. For satisfaction of (name<=fred) searches.
A selected attribute which has a value matching a regular expression. For satisfaction of (name =\sim fr.*ed) searches.
The operation of ">=" and the negation "(!(x))" operator can be implemented by forming the complement of the "<=" and "(x)" sets respectively.
Each schema type has a unique integer identifier defined in Ferris/SchemaSupport.hh which should never change or be reused for another type in the future. It can thus be used in the index as a schema identifier number "scid".
The following lookup tables are stored as either BTree or Hash structures
attribute name => aid. This is used by all searches.
urlid => URL string. This is used by searches to transform a set of urlid into the Contexts they represent.
value => vid. For resolution of "==" and "=~" a vid is found and a complete lookup can be performed on the inverted file to get the urlid list
vid => value. For resolution of "<=" and ">=" a partial lookup on the aid can be performed and then a binary_search() can resolve which of the inverted lists match.
The inverted file structure is called the Sorted Attribute Value (SAV) lookup and is used in all query resolution. This inverted file allows the following two lookups: aid => list(urlid) and aid,vid => list(urlid). For partial lookups the range {aid,vid_1} to {aid,vid_n} for all vid which share the same aid prefix must be obtainable quickly. A data structure which will allow such access is the B+Tree.
There are four SAV lookups stored each of which sorts its values in a different way. Usually the vid values for a given aid in each inverted file will be different and non overlapping. The four SAV tables sort their values based on either integer, double, string or case insensitive string. Having the range of vid for each aid sorted allows queries on "<=" to perform a binary_search() within the vid range.
There are two versions of most opcodes, ones without "?" in them are heuristic and ones with an embedded "?" use the schema for resolution of the query.
Presence of a given value for an attribute name
For satisfaction of "(name==fred)" searches. "name" and "fred" are looked up to get their aid and vid respectively. The schema is guessed from the value "fred" to be a string and the string inverted file is selected. Note that the same style as emacs is used, if the string has capitals in it then case sensitive lookup is assumed. The {aid,vid} combination are looked up in the string SAV to obtain the list of matching urlids. If the opcode is "=?=" then every SAV table is consulted with the value "fred" converted into the type that each SAV table expects.
For satisfaction of "(name<=paul)" searches. "name" and "paul" are looked up to get their aid and vid respectively. The schema is guessed from the value "paul" to be a string. The string SAV is used and the aid is looked up to get a cursor at that the start of the equivalance class of all values for that attribute.
Assume that the aid is 100 and the below table is part of the string SAV with the dereferenced vid shown as an extra column for clarity. A upper_bound() is performed on the vid range for the value "paul". This upper_bound() compares the dereferenced vid from the SAV with the string "paul" to locate the upper bound on that value. In the below table the upper bound would have a vid of 377. The returned urlid list is the union of the inverted lists from the begining of the vid range inclusive to the upper bound non inclusive. Which is the union of the lists for the vid range 432,21,934.
| 100 | 432 | alice | 1,4,6 |
| 100 | 21 | mary | 2,5,3 |
| 100 | 934 | nancy | 1 |
| 100 | 377 | sally | 2 |
| 100 | 321 | sara | 1,4,6 |
Resolution of the ">=" operator can be performed using the lower_bound() for the value inclusive to the start of the next aid in the index non inclusive.
For satisfaction of "(name=~fr.*ed)" searches. "name" is looked up to get its aid. The schema is guessed from the value to be a string. The aid is looked up in the SAV to get a cursor at that the start of the equivalance class of all values for that attribute. For each value found from the reverse vid lookup that the regex matches the union of the urlid list found from {aid,vid} is returned.
The document numbers are stored as a sorted sequence of numbers each being the difference from the previous document number to the current one. A docid is thus recovered from the document gaps as the partial_sum() of the previous numbers in the sequence.
Two distinct methods of creating a formal context are presented; firstly using EA and predicates and then using facade filesystems as the attributes themselves. The main difference is that using predicates will create the attributes from a directory G, using filesystems for M one must derive G as the union of the children in each M.
With the inclusion of EA in the data model one can consider a specific directory d as a multi-valued context (G,M,W,J) with G a set of filesystem objects that reside with d as their immediate parent directory, M a set of EA attribute names, W a set of arbitrary bit strings, and J subseteq G * M * W with (g,m,w) in J indicating that the filesystem object g has an EA m with a bit string value w.
To generate a formal context (G,A,I) from the above multi-valued context (G,M,W,J) one can repeatedly bind predicates to G to create a in A which uses the values in M and W to determine if a object g in G should be associated in I subseteq G * A.
Filesystems themself can form a formal context (G,M,I) by including an implicit plain scale. Consider M={m_1,...,m_n} as a collection of filesystems, G = union forall z in M and I subseteq G * M with (g,m) in I indicating that the filesystem object g appears also in the filesystem m. To create an interesting formal context using the above method the filesystems in M are usually the result of a fulltext query or other filesystem that contains an arbitrary collection of objects from the underlying file system. The use of facade filesystems in M is to allow g in G to appear in I more than once.
Combinations of the above methods can be performed by either
cachecontext
First collecting G as the union of M and then binding predicates on G
Generating a nested diagram with G as the union of M at the top level and creating a nested diagram for each top level concept using logic as the nested intent.
The following is an example of creating a formal context from a collection of filesystems created using full text from project gutenberg [Project gutenburg] which had been added to the default fulltext index and the following commands
ferrisls -lh --show-columns="rank,size-human-readable,url" \
--ferris-filter="" fulltextquery://ranked/alice
0.460043 149.9k alice13a.txt
ferrisls -lh --show-columns="rank,size-human-readable,name" \
--ferris-filter="" fulltextquery://ranked/war
0.0643407 47.8k boysw10.txt
0.0548522 253.1k dmoro11.txt
0.18801 109.6k nobos10.txt
0.135122 355.4k warw11.txt
ferrisls -lh --show-columns="rank,size-human-readable,name" \
--ferris-filter="" fulltextquery://ranked/control
0.0378179 253.1k dmoro11.txt
0.0524795 355.4k warw11.txt
ferrisls -lh --show-columns="rank,size-human-readable,name" \
--ferris-filter="" fulltextquery://ranked/tea
0.194187 149.9k alice13a.txt
0.0762724 109.6k nobos10.txt
0.0702319 39.7k snark12.txt
0.048025 355.4k warw11.txt
These filesystems can be combined using (G,M,I) as defined above to give the formal context
| alice13a.txt | 1 | 1 | ||
| boysw10.txt | 1 | |||
| dmoro11.txt | 1 | 1 | ||
| nobos10.txt | 1 | 1 | ||
| snark12.txt | 1 | |||
| warw11.txt | 1 | 1 | 1 |
[FerrisWebSite] Copyright © 2001 Ben Martin. Ferris' web site . .
[Linux Proc fileystem] Linux Proc fileystem . .
[Microsoft transaction server] Microsoft transaction server . .
[ACL] ACL . .
[libferris website] libferris website . .
[LDAP search RFC] LDAP search RFC . .
[Project gutenburg] Project gutenburg . .
[MD5 digest] MD5 digest . .
[STL] STL . .
[ORA XSLT] ORA XSLT . .
[Design patterns] Design patterns . .
[Standard C++ IOStreams and Locales] Standard C++ IOStreams and Locales . .
[1] One can set regular expressions to force views of native kernel filesystems to be passive. This allows one to view directories which may be too busy to view actively such as /tmp on a busy server machine. Run ferris-capplet-general for these settings.
[2] All three of these functions live in the Ferris::Factory namespace.