API¶
formic
Package¶
An implementation of Apache Ant globs.
- The
formic.formic
module contains the main classFileSet
- The
formic.command
module contains the command-line interface.
-
class
formic.__init__.
FileSet
(include, exclude=None, directory=None, default_excludes=True, walk=None, symlinks=True, casesensitive=True)¶ Bases:
object
An implementation of the Ant FileSet class.
Arguments to the constructor:
- include: An Ant glob or list of Ant globs for matching files to include
in the response. Ant globs can be specified either:
- As a string, eg
"*.py"
, or - As a
Pattern
object
- As a string, eg
- exclude: Specified in the same was as include, but any file that matches an exclude glob will be excluded from the result.
- directory: The directory from which to start the search; if None, the current working directory is used
- default_excludes: A boolean; if True (or omitted) the
DEFAULT_EXCLUDES
will be combined with the exclude. If False, the only excludes used are those in the excludes argument - symlinks: Sets whether symbolic links are included in the results or not. Defaults to True.
- walk: A function whose argument is a single directory that returns
a list of (dirname, subdirectoryNames, fileNames) tuples with the same
semantics of
os.walk()
. Defaults toos.walk()
- casesensitive: Only effective on POSIX, default True. Always False on NT.
Implementation notes:
FileSet
is lazy: The files in theFileSet
are resolved at the time the iterator is looped over. This means that it is very fast to set up and (can be) computationally expensive only when results are obtained.You can iterate over the same
FileSet
instance as many times as you want. Because the results are computed as you iterate over the object, each separate iteration can return different results, eg if the file system has changed.include and exclude arguments to the constructor can be given in several ways:
In addition to Apache Ant’s default excludes,
FileSet
excludes:__pycache__
You can modify the
DEFAULT_EXCLUDES
class member (it is a list ofPattern
instances). Doing so will modify the behaviour of all instances ofFileSet
using default excludes.You can provide and alternate function to
os.walk()
that, for example, heavily truncates the files and directories being searched or returns files and directories that don’t even exist on the file system. This can be useful for testing or even for passing the results of one FileSet result as the search path of a second. Seeformic.treewalk.walk_from_list()
:files = ["CVS/error.py", "silly/silly1.txt", "1/2/3.py", "silly/silly3.txt", "1/2/4.py", "silly/silly3.txt"] fileset = FileSet(include="*.py", walk=treewalk.walk_from_list(files)) for dir, file in fileset: print dir, file This lists 1/2/3.py and 1/2/4.py no matter what the contents of the current directory are. CVS/error.py is not listed because of the default excludes.
-
DEFAULT_EXCLUDES
= [**/__pycache__/**/*, **/*~, **/#*#, **/.#*, **/%*%, **/._*, **/CVS, **/CVS/**/*, **/.cvsignore, **/SCCS, **/SCCS/**/*, **/vssver.scc, **/.svn, **/.svn/**/*, **/.DS_Store, **/.git, **/.git/**/*, **/.gitattributes, **/.gitignore, **/.gitmodules, **/.hg, **/.hg/**/*, **/.hgignore, **/.hgsub, **/.hgsubstate, **/.hgtags, **/.bzr, **/.bzr/**/*, **/.bzrignore]¶ Default excludes shared by all instances. The member is a list of
Pattern
instances. You may modify this member at run time to modify the behaviour of all instances.
-
files
()¶ A generator function for iterating over the individual files of the FileSet.
The generator yields a tuple of
(rel_dir_name, file_name)
:- rel_dir_name: The path relative to the starting directory
- file_name: The unqualified file name
-
get_directory
()¶ Returns the directory in which the
FileSet
will be run.If the directory was set with None in the constructor, get_directory() will return the current working directory.
The returned result is normalized so it never contains a trailing path separator
-
qualified_files
(absolute=True)¶ An alternative generator that yields files rather than directory/file tuples.
If absolute is false, paths relative to the starting directory are returned, otherwise files are fully qualified.
- include: An Ant glob or list of Ant globs for matching files to include
in the response. Ant globs can be specified either:
-
class
formic.__init__.
Pattern
(elements, casesensitive)¶ Bases:
object
Represents a single Ant Glob.
The
Pattern
object compiles the pattern into several components:- file_pattern: The a pattern for matching files (not directories)
eg, for
test/*.py
, the file_pattern is*.py
. This is always the text after the final/
(if any). If the end of the pattern is a/
, then an implicit**
is added to the end of the pattern. - bound_start: True if the start of the pattern is ‘bound’ to the
start of the path. If the pattern starts with a
/
, the start is bound. - bound_end: True if the end of the pattern is bound to the immediate
parent directory where the file matching is occurring. This is True if
the pattern specifies a directory before the file pattern, eg
**/test/*
- sections: A list of
Section
instances. EachSection
represents a contiguous series of path patterns, andSection
instances are separated whenever there is a**
in the glob.
Pattern
also normalises the glob, removing redundant path elements (eg**/**/test/*
resolves to**/test/*
) and normalises the case of the path elements (resolving difficulties with case insensitive file systems)-
all_files
()¶ Returns True if the
Pattern
matches all files (in a matched directory).The file pattern at the end of the glob was / or
/*
-
static
create
(glob, casesensitive=True)¶
-
match_directory
(path_elements)¶ Returns a
MatchType
for the directory, expressed as a list of path elements, match for thePattern
.If
self.bound_start
is True, the firstSection
must match from the first directory element.If
self.bound_end
is True, the lastSection
must match the last contiguous elements of path_elements.
-
match_files
(matched, unmatched)¶ Moves all matching files from the set unmatched to the set matched.
Both matched and unmatched are sets of string, the strings being unqualified file names
- file_pattern: The a pattern for matching files (not directories)
eg, for
-
formic.__init__.
get_version
()¶ Returns the version of formic.
This method retrieves the version from VERSION.txt, and it should be exactly the same as the version retrieved from the package manager
-
exception
formic.__init__.
FormicError
(message=None)¶ Bases:
exceptions.Exception
Formic errors, such as misconfigured arguments and internal exceptions
formic
Module¶
An implementation of Ant Globs.
The main entry points for this modules are:
FileSet
: A collection of include and exclude globs starting at a specific directory.FileSet.files()
: A generator returning the matched files as directory/file tuplesFileSet.qualified_files()
: A generator returning the matched files as qualified paths
Pattern
: An individual glob
-
class
formic.formic.
ConstantMatcher
(pattern, casesensitive=True)¶ Bases:
formic.formic.Matcher
A
Matcher
for matching the constant passed in the constructor.This is used to more efficiently match path and file elements that do not have a wild-card, eg
__init__.py
-
match
(string)¶ Returns True if the argument matches the constant.
-
-
class
formic.formic.
FNMatcher
(pattern, casesensitive=True)¶ Bases:
formic.formic.Matcher
A
Matcher
that matches simple file/directory wildcards as per DOS or Unix.FNMatcher("*.py")
matches all Python files in a given directory.FNMatcher("?ed")
matches bed, fed, wed but not failed
FNMatcher
internally usesfnmatch.fnmatch()
to implementMatcher.match()
-
match
(string)¶ Returns True if the pattern matches the string
-
class
formic.formic.
FileSet
(include, exclude=None, directory=None, default_excludes=True, walk=None, symlinks=True, casesensitive=True)¶ Bases:
object
An implementation of the Ant FileSet class.
Arguments to the constructor:
- include: An Ant glob or list of Ant globs for matching files to include
in the response. Ant globs can be specified either:
- As a string, eg
"*.py"
, or - As a
Pattern
object
- As a string, eg
- exclude: Specified in the same was as include, but any file that matches an exclude glob will be excluded from the result.
- directory: The directory from which to start the search; if None, the current working directory is used
- default_excludes: A boolean; if True (or omitted) the
DEFAULT_EXCLUDES
will be combined with the exclude. If False, the only excludes used are those in the excludes argument - symlinks: Sets whether symbolic links are included in the results or not. Defaults to True.
- walk: A function whose argument is a single directory that returns
a list of (dirname, subdirectoryNames, fileNames) tuples with the same
semantics of
os.walk()
. Defaults toos.walk()
- casesensitive: Only effective on POSIX, default True. Always False on NT.
Implementation notes:
FileSet
is lazy: The files in theFileSet
are resolved at the time the iterator is looped over. This means that it is very fast to set up and (can be) computationally expensive only when results are obtained.You can iterate over the same
FileSet
instance as many times as you want. Because the results are computed as you iterate over the object, each separate iteration can return different results, eg if the file system has changed.include and exclude arguments to the constructor can be given in several ways:
In addition to Apache Ant’s default excludes,
FileSet
excludes:__pycache__
You can modify the
DEFAULT_EXCLUDES
class member (it is a list ofPattern
instances). Doing so will modify the behaviour of all instances ofFileSet
using default excludes.You can provide and alternate function to
os.walk()
that, for example, heavily truncates the files and directories being searched or returns files and directories that don’t even exist on the file system. This can be useful for testing or even for passing the results of one FileSet result as the search path of a second. Seeformic.treewalk.walk_from_list()
:files = ["CVS/error.py", "silly/silly1.txt", "1/2/3.py", "silly/silly3.txt", "1/2/4.py", "silly/silly3.txt"] fileset = FileSet(include="*.py", walk=treewalk.walk_from_list(files)) for dir, file in fileset: print dir, file This lists 1/2/3.py and 1/2/4.py no matter what the contents of the current directory are. CVS/error.py is not listed because of the default excludes.
-
DEFAULT_EXCLUDES
= [**/__pycache__/**/*, **/*~, **/#*#, **/.#*, **/%*%, **/._*, **/CVS, **/CVS/**/*, **/.cvsignore, **/SCCS, **/SCCS/**/*, **/vssver.scc, **/.svn, **/.svn/**/*, **/.DS_Store, **/.git, **/.git/**/*, **/.gitattributes, **/.gitignore, **/.gitmodules, **/.hg, **/.hg/**/*, **/.hgignore, **/.hgsub, **/.hgsubstate, **/.hgtags, **/.bzr, **/.bzr/**/*, **/.bzrignore]¶ Default excludes shared by all instances. The member is a list of
Pattern
instances. You may modify this member at run time to modify the behaviour of all instances.
-
files
()¶ A generator function for iterating over the individual files of the FileSet.
The generator yields a tuple of
(rel_dir_name, file_name)
:- rel_dir_name: The path relative to the starting directory
- file_name: The unqualified file name
-
get_directory
()¶ Returns the directory in which the
FileSet
will be run.If the directory was set with None in the constructor, get_directory() will return the current working directory.
The returned result is normalized so it never contains a trailing path separator
-
qualified_files
(absolute=True)¶ An alternative generator that yields files rather than directory/file tuples.
If absolute is false, paths relative to the starting directory are returned, otherwise files are fully qualified.
- include: An Ant glob or list of Ant globs for matching files to include
in the response. Ant globs can be specified either:
-
class
formic.formic.
FileSetState
(label, directory, based_on=None, unmatched=None)¶ Bases:
object
FileSetState is an object encapsulating the
FileSet
in a particular directory, caching inheritable Pattern matches.This is an internal implementation class and not meant for reuse or to be accessed directly
Implementation notes:
As the FileSet traverses the directories using, by default,
os.walk()
, it builds two graphs of FileSetState instances mirroring the graph of directories - one graph of FileSetState instances is for the include globs and the other graph of FileSetState instances for the exclude.FileSetState embodies logic to decide whether to prune whole directories from the search, either by detecting the include patterns cannot match any file within, or by detecting that an exclude matches all files in this directory and sub-directories.
The constructor has the following arguments:
- label: A string used only in the
__str__()
method (for debugging) - directory: The point in the graph that this FileSetState represents. directory is relative to the starting node of the graph
- based_on: A FileSetState from the previous directory traversed by walk_func(). This is used as the start point in the graph of FileSetStates to search for the correct parent of this. This is None to create the root node.
- unmatched: Used only when based_on is None - the set of initial
Pattern
instances. This is either the original include or exclude globs.
During the construction of the instance, the instance will evaluate the directory patterns in
PatternSet
self.unmatched
and, for eachPattern
, perform of of the following actions:1. If a pattern matches, it will be moved into one of the ‘matched’
PatternSet
instances:self.matched_inherit
: the directory pattern matches all sub subdirectories as well, eg/test/**
self.matched_and_subdir
: the directory matches this directory and may match subdirectories as well, eg/test/**/more/**
self.matched_no_subdir
: the directory matches this directory, but cannot match any subdirectory, eg/test/*
. This pattern will thus not be evaluated in any subdirectory.
- If the pattern does not match, either:
- It may be valid in subdirectories, so it stays in
self.unmatched
, eg**/nomatch/*
- It cannot evaluate to true in any subdirectory, eg
/nomatch/**
. In this case it is removed from allPatternSet
members in this instance.
- It may be valid in subdirectories, so it stays in
-
match
(files)¶ Given a set of files in this directory, returns all the files that match the
Pattern
instances which match this directory.
-
matches_all_files_all_subdirs
()¶ Returns True if there is a pattern that:
- Matches this directory, and
- Matches all sub-directories, and
- Matches all files (eg ends with “*”)
This acts as a terminator for
FileSetState
instances in the excludes graph.
-
no_possible_matches_in_subdirs
()¶ Returns True if there are no possible matches for any subdirectories of this
FileSetState
.When this :class:FileSetState is used for an ‘include’, a return of True means we can exclude all subdirectories.
- label: A string used only in the
-
exception
formic.formic.
FormicError
(message=None)¶ Bases:
exceptions.Exception
Formic errors, such as misconfigured arguments and internal exceptions
-
class
formic.formic.
MatchType
¶ Bases:
object
An enumeration of different match/non-match types to optimize the search algorithm.
There are two special considerations in match results that derive from the fact that Ant globs can be ‘bound’ to the start of the path being evaluated (eg bound start:
/Documents/**
).The various match possibilities are bitfields using the members starting
BIT_
.-
BIT_ALL_SUBDIRECTORIES
= 2¶
-
BIT_MATCH
= 1¶
-
BIT_NO_SUBDIRECTORIES
= 4¶
-
MATCH
= 1¶
-
MATCH_ALL_SUBDIRECTORIES
= 3¶
-
MATCH_BUT_NO_SUBDIRECTORIES
= 5¶
-
NO_MATCH
= 0¶
-
NO_MATCH_NO_SUBDIRECTORIES
= 4¶
-
-
class
formic.formic.
Matcher
(pattern, casesensitive=True)¶ Bases:
object
An abstract class that holds some pattern to be matched;
matcher.match(string)
returns a boolean indicating whether the string matches the pattern.The
Matcher.create()
method is a Factory that creates instances of various subclasses.-
static
create
(pattern, casesensitive=True)¶ Factory for
Matcher
instances; returns aMatcher
suitable for matching the supplied pattern
-
match
(_)¶ Matcher
is an abstract class - this will raise aFormicError
-
static
-
class
formic.formic.
Pattern
(elements, casesensitive)¶ Bases:
object
Represents a single Ant Glob.
The
Pattern
object compiles the pattern into several components:- file_pattern: The a pattern for matching files (not directories)
eg, for
test/*.py
, the file_pattern is*.py
. This is always the text after the final/
(if any). If the end of the pattern is a/
, then an implicit**
is added to the end of the pattern. - bound_start: True if the start of the pattern is ‘bound’ to the
start of the path. If the pattern starts with a
/
, the start is bound. - bound_end: True if the end of the pattern is bound to the immediate
parent directory where the file matching is occurring. This is True if
the pattern specifies a directory before the file pattern, eg
**/test/*
- sections: A list of
Section
instances. EachSection
represents a contiguous series of path patterns, andSection
instances are separated whenever there is a**
in the glob.
Pattern
also normalises the glob, removing redundant path elements (eg**/**/test/*
resolves to**/test/*
) and normalises the case of the path elements (resolving difficulties with case insensitive file systems)-
all_files
()¶ Returns True if the
Pattern
matches all files (in a matched directory).The file pattern at the end of the glob was / or
/*
-
static
create
(glob, casesensitive=True)¶
-
match_directory
(path_elements)¶ Returns a
MatchType
for the directory, expressed as a list of path elements, match for thePattern
.If
self.bound_start
is True, the firstSection
must match from the first directory element.If
self.bound_end
is True, the lastSection
must match the last contiguous elements of path_elements.
-
match_files
(matched, unmatched)¶ Moves all matching files from the set unmatched to the set matched.
Both matched and unmatched are sets of string, the strings being unqualified file names
- file_pattern: The a pattern for matching files (not directories)
eg, for
-
class
formic.formic.
PatternSet
¶ Bases:
object
- A set of
Pattern
instances;PatternSet
provides - a number of operations over the entire set.
PatternSet
contains a number of implementation optimizations and is an integral part of various optimizations inFileSet
.This class is not an implementation of Apache Ant PatternSet
-
all_files
()¶ Returns True if there is any
Pattern
in thePatternSet
that matches all files (seePattern.all_files()
)Note that this method is implemented using lazy evaluation so direct access to the member
_all_files
is very likely to result in errors
-
append
(pattern)¶ Adds a
Pattern
to thePatternSet
-
empty
()¶ Returns True if the
PatternSet
is empty
-
extend
(patterns)¶ Extend a
PatternSet
with addition patternspatterns can either be:
- A single
Pattern
- Another
PatternSet
or - A list of
Pattern
instances
- A single
-
iter
()¶ An iteration generator that allows the loop to modify the
PatternSet
during the loop
-
match_files
(matched, unmatched)¶ Apply the include and exclude filters to those files in unmatched, moving those that are included, but not excluded, into the matched set.
Both matched and unmatched are sets of unqualified file names.
-
remove
(pattern)¶ Remove a
Pattern
from thePatternSet
- A set of
-
class
formic.formic.
Section
(elements, casesensitive=True)¶ Bases:
object
A minimal object that holds fragments of a
Pattern
path.Each
Section
holds a list of pattern fragments matching some contiguous portion of a full path, separated by/**/
from otherSection
instances.For example, the
Pattern
/top/second/**/sub/**end/*
is stored as a list of threeSection
objects:Section(["top", "second"])
Section(["sub"])
Section(["end"])
-
match_iter
(path_elements, start_at)¶ A generator that searches over path_elements (starting from the index start_at), yielding for each match.
Each value yielded is the index into path_elements to the first element after each match. In other words, the returned index has already consumed the matching path elements of this
Section
.Matches work by finding a contiguous group of path elements that match the list of
Matcher
objects in thisSection
as they are naturally paired.This method includes an implementation optimization that simplifies the search for
Section
instances containing a single path element. This produces significant performance improvements.
-
formic.formic.
determine_casesensitive
(casesensitive)¶ Can be True/False on POSIX, but always False on NT.
-
formic.formic.
get_initial_default_excludes
()¶ Returns a the default excludes as a list of Patterns.
This will be the initial value of
FileSet.DEFAULT_EXCLUDES
. It is defined in the Ant documentation.
-
formic.formic.
get_path_components
(directory)¶ Breaks a path to a directory into a (drive, list-of-folders) tuple
Parameters: directory – Returns: a tuple consisting of the drive (if any) and an ordered list of folder names
-
formic.formic.
get_version
()¶ Returns the version of formic.
This method retrieves the version from VERSION.txt, and it should be exactly the same as the version retrieved from the package manager
-
formic.formic.
is_root
(directory)¶ Returns true if the directory is root (eg / on UNIX or c:on Windows)
-
formic.formic.
reconstitute_path
(drive, folders)¶ Reverts a tuple from get_path_components into a path.
Parameters: - drive – A drive (eg ‘c:’). Only applicable for NT systems
- folders – A list of folder names
Returns: A path comprising the drive and list of folder names. The path terminate with a os.path.sep only if it is a root directory
command
Module¶
The command-line glue-code for formic. Call formic.command.main()
with the command-line arguments.
Full usage of the command is:
usage: formic [-i [INCLUDE [INCLUDE ...]]] [-e [EXCLUDE [EXCLUDE ...]]]
[--no-default-excludes] [--no-symlinks] [--insensitive] [-r] [-h] [--usage]
[--version]
[directory]
-
formic.command.
create_parser
()¶ Creates and returns the command line parser, an
argparser.ArgumentParser
instance.
-
formic.command.
entry_point
()¶ Entry point for command line; calls
formic.command.main()
and thensys.exit()
with the return value.
-
formic.command.
main
(*kw)¶ Command line entry point; arguments must match those defined in in
create_parser()
; returns 0 for success, else 1.Example:
command.main("-i", "**/*.py", "--no-default-excludes")
Runs formic printing out all .py files in the current working directory and its children to
sys.stdout
.If kw is None,
main()
will usesys.argv
.