[svn r38516] majorly refactor future chapter, mentioning

APIgen and other more current ideas --HG-- branch : trunk
2007-02-11 20:52:11 +01:00 · 2007-02-11 20:52:11 +01:00 · 97aab00607
parent 790c9bbb88
commit 97aab00607
1 changed files with 56 additions and 362 deletions
--- a/py/doc/future/future.txt
+++ b/py/doc/future/future.txt
@ -9,321 +9,62 @@ This document tries to describe directions and guiding ideas
 for the near-future development of the py lib.  *Note that all
 statements within this document - even if they sound factual -
 mostly just express thoughts and ideas. They not always refer to 
-real code so read with some caution.  This is not a reference guide
+real code so read with some caution.*  
 (tm). Moreover, the order in which appear here in the file does 
 not reflect the order in which they may be implemented.* 
 .. _`general-path`: 
 .. _`a more general view on path objects`:
 A more general view on ``py.path`` objects 
 ==========================================
-Seen from a more general persective, the current ``py.path.extpy`` path 
+Distribute tests ad-hoc across multiple platforms
-offers a way to go from a file to the structured content of 
+======================================================
 a file, namely a python object.  The ``extpy`` path retains some
 common ``path`` operations and semantics but offers additional
 methods, e.g. ``resolve()`` gets you a true python object.   
-But apart from python files there are many other examples 
+After some more refactoring and unification of
-of structured content like xml documents or INI-style 
+the current testing and distribution support code
-config files.  While some tasks will only be convenient 
+we'd like to be able to run tests on multiple
-to perform in a domain specific manner (e.g. applying xslt 
+platforms simultanously and allow for interaction
-etc.pp) ``py.path`` offers a common behaviour for 
+and introspection into the (remote) failures. 
 structured content paths. So far only ``py.path.extpy``
 is implemented and used by py.test to address tests 
 and traverse into test files. 
 *You are in a maze of twisty passages, all alike*
 -------------------------------------------------
 Now, for the sake of finding out a good direction, 
 let's consider some code that wants to find all 
 *sections* which have a certain *option* value
 within some given ``startpath``:: 
    def find_option(startpath, optionname): 
        for section in startpath.listdir(dir=1): 
            opt = section.join(optionname) 
            if opt.check(): # does the option exist here? 
                print section.basename, "found:", opt.read() 
 Now the point is that ``find_option()`` would obviously work
 when ``startpath`` is a filesystem-like path like a local
 filesystem path or a subversion URL path. It would then see
 directories as sections and files as option-names and the
 content of the file as values. 
 But it also works (today) for ``extpy`` paths if you put the following
 python code in a file:: 
    class Section1:
        someoption = "i am an option value" 
    class Section2:
        someoption = "i am another option value" 
 An ``extpy()`` path maps classes and modules to directories and 
 name-value bindings to file/read() operations. 
 And it could also work for 'xml' paths if you put
 the following xml string in a file:: 
    <xml ...>
    <root>
        <section1>      
            <someoption>value</name></section1>
        <section2>
            <someoption>value</name></section2></root>
 where tags containing non-text tags map to directories 
 and tags with just text-children map to files (which
 upon read() return the joined content of the text 
 tags possibly as unicode. 
 Now, to complete the picture, we could make Config-Parser 
 *ini-style* config files also available::
    [section1]
    name = value 
    [section2]
    othername = value
 where sections map to directories and name=value mappings
 to file/contents. 
 So it seems that our above ``find_option()`` function would
 work nicely on all these *mappings*. 
 Of course, the somewhat open question is how to make the
 transition from a filesystem path to structured content
 useful and unified, as much as possible without overdoing it. 
 Again, there are tasks that will need fully domain specific
 solutions (DOM/XSLT/...) but i think the above view warrants
 some experiments and refactoring.  The degree of uniformity 
 still needs to be determined and thought about. 
 path objects should be stackable
 --------------------------------
 Oh, and btw, a ``py.path.extpy`` file could live on top of a 
 'py.path.xml' path as well, i.e. take::
    <xml ...>
    <code>
        <py>      
            <magic>
                <assertion>
                    import py 
                    ... </assertion>
                <exprinfo> 
                    def getmsg(x): pass </exprino></magic></py></code>
 and use it to have a ``extpy`` path living on it::
    p = py.path.local(xmlfilename)
    xmlp = py.path.extxml(p, 'py/magic/exprinfo')
    p = py.path.extpy(xmlp, 'getmsg')
    assert p.check(func=1, basename='getmsg') 
    getmsg = p.resolve() 
    # we now have a *live* getmsg() function taken and compiled from 
    # the above xml fragment
 There could be generic converters which convert between 
 different content formats ... allowing configuration files to e.g. 
 be in XML/Ini/python or filesystem-format with some common way 
 to find and iterate values. 
 *After all the unix filesystem and the python namespaces are 
 two honking great ideas, why not do more of them? :-)*
-.. _importexport: 
+Make APIGEN useful for more projects
 ================================================
-Revising and improving the import/export system 
+The new APIGEN tool offers rich information 
-===============================================
+derived from running tests against an application: 
 argument types and callsites, i.e. it shows
 the places where a particular API is used. 
 In its first incarnation, there are still
 some specialties that likely prevent it
 from documenting APIs for other projects. 
 We'd like to evolve to a `py.apigen` tool
 that can make use of information provided
 by a py.test run. 
-    or let's wrap the world all around 
+Distribute channels/programs across networks
 ================================================
-the export/import interface 
+Apart from stabilizing setup/teardown procedures
---------------------------
+for `py.execnet`_, we'd like to generalize its
-
+implementation to allow connecting two programs
-The py lib already incorporates a mechanism to select which
+across multiple hosts, i.e. we'd like to arbitrarily
-namespaces and names get exposed to a user of the library.
+send "channels" across the network. Likely this
-Apart from reducing the outside visible namespaces complexity 
+will be done by using the "pipe" model, i.e. 
-this allows to quickly rename and refactor stuff in the
+that each channel is actually a pair of endpoints,
-implementation without affecting the caller side.  This export
+both of which can be independently transported 
-control can be used by other python packages as well. 
+across the network.  The programs who "own" 
-
+these endpoints remain connected. 
 However, all is not fine as the import/export has a 
 few major deficiencies and shortcomings:
 - it doesn't allow to specify doc-strings 
 - it is a bit hackish (see py/initpkg.py)
 - it doesn't present a complete and consistent view of the API. 
 - ``help(constructed_namespace)`` doesn't work for the root 
  package namespace
 - when the py lib implementation accesses parts of itself 
  it uses the native python import mechanism which is 
  limiting in some respects.  Especially for distributed
  programs as encouraged by `py.execnet`_ it is not clear
  how the mechanism can nicely integrate to support remote
  lazy importing. 
 Discussions have been going on for a while but it is
 still not clear how to best tackle the problem.  Personally, 
 i believe the main missing thing for the first major release 
 is the docstring one.   The current specification 
 of exported names is dictionary based.  It would be 
 better to declare it in terms of Objects. 
 Example sketch for a new export specification 
 ---------------------------------------------
 Here is a sketch of how the py libs ``__init__.py`` file 
 might or should look like:: 
    """
        the py lib version 1.0
        http://codespeak.net/py/1.0
    """
    from py import pkg
    pkg.export(__name__,
        pkg.Module('path',
            '''provides path objects for local filesystem, 
               subversion url and working copy, and extension paths.
            ''',
            pkg.Class('local', '''
               the local filesystem path offering a single
               point of interaction for many purposes.
               ''', extpy='./path/local.LocalPath'),
            pkg.Class('svnurl', '''
               the subversion url path.
            ''', extpy='./path/local/svn/urlcommand.SvnUrlPath'),
        ),
    # it goes on ... 
    )
 The current ``initpkg.py`` code can be cleaned up to support
 this new more explicit style of stating things. Note that
 in principle there is nothing that stops us from retrieving
 implementations over the network, e.g. a subversion repository. 
 Let there be alternatives 
 -------------------------
 We could also specify alternative implementations easily::
    pkg.Class('svnwc', '''
       the subversion working copy.
    ''', extpy=('./path/local/svn/urlbinding.SvnUrlPath', 
                './path/local/svn/urlcommand.SvnUrlPath',)
    )
 This would prefer the python binding based implementation over
 the one working through he 'svn' command line utility.  And
 of course, it could uniformly signal if no implementation is 
 available at all. 
 Problems problems  
 -----------------
 Now there are reasons there isn't a clear conclusion so far. 
 For example, the above approach has some implications, the
 main one being that implementation classes like
 ``py/path/local.LocalPath`` are visible to the caller side but
 this presents an inconsistency because the user started out with
 ``py.path.local`` and expects that the two classes are really much
 the same.  We have the same problem today, of course. 
 The naive solution strategy of wrapping the "implementation
 level" objects into their exported representations may remind
 of the `wrapping techniques PyPy uses`_.  But it
 *may* result in a slightly heavyweight mechanism that affects
 runtime speed.  However, I guess that this standard strategy
 is probably the cleanest. 
 Every problem can be solved with another level ... 
 --------------------------------------------------
 The wrapping of implementation level classes in their export
 representations objects adds another level of indirection.
 But this indirection would have interesting advantages: 
 - we could easily present a consistent view of the library 
 - it could take care of exceptions as well 
 - it provides natural interception points for logging 
 - it enables remote lazy loading of implementations 
  or certain versions of interfaces 
 And quite likely the extra indirection wouldn't hurt so much
 as it is not much more than a function call and we cared
 we could even generate some c-code (with PyPy :-) to speed
 it up.   
 But it can lead to new problems ...
 -----------------------------------
 However, it is critical to avoid to burden the implementation
 code of being aware of its wrapping.  This is what we have 
 to do in PyPy but the import/export mechanism works at 
 a higher level of the language, i think.  
 Oh, and we didn't talk about bootstrapping :-) 
 .. _`py.execnet`: ../execnet.html
 .. _`wrapping techniques PyPy uses`: http://codespeak.net/pypy/index.cgi?doc/wrapping.html
 .. _`lightweight xml generation`: 
-Extension of py.path.local.sysexec()
+Benchmarking and persistent storage 
-====================================
+=========================================
-The `sysexec mechanism`_ allows to directly execute 
+For storing test results, but also benchmarking
-binaries on your system.  Especially after we'll have this
+and other information, we need a solid way 
-nicely integrated into Win32 we may also want to run python 
+to store all kinds of information from test runs. 
-scripts both locally and from the net::
+We'd like to generate statistics or html-overview 
-
+out of it, but also use such information to determine when
-    vadm = py.path.svnurl('http://codespeak.net/svn/vadm/dist/vadm/cmdline.py') 
+a certain test broke, or when its performance
-    stdoutput = vadm.execute('diff')
+decreased considerably. 
 To be able to execute this code fragement, we need either or all of 
 - an improved import system that allows remote imports 
 - a way to specify what the "neccessary" python import
  directories are. for example, the above scriptlet will
  require a certain root included in the python search for module 
  in order to execute something like "import vadm". 
 - a way to specify dependencies ... which opens up another
  interesting can of worms, suitable for another chapter
  in the neverending `future book`_. 
 .. _`sysexec mechanism`: ../misc.html#sysexec
 .. _`compile-on-the-fly`: 
 we need a persistent storage for the py lib 
 -------------------------------------------
 A somewhat open question is where to store the underlying
 generated pyc-files and other files generated on the fly 
 with `CPython's distutils`_.  We want to have a 
 *persistent location* in order to avoid runtime-penalties
 when switching python versions and platforms (think NFS). 
 A *persistent location* for the py lib would be a good idea
 maybe also for other reasons. We could cache some of the
 expensive test setups, like the multi-revision subversion
 repository that is created for each run of the tests. 
 .. _`CPython's distutils`: http://www.python.org/dev/doc/devel/lib/module-distutils.html
@ -364,59 +105,12 @@ is a can of subsequent worms).
 .. _`reiserfs v4 features`: http://www.namesys.com/v4/v4.html
 Improve and unify Path API 
 ==========================
-visit() grows depth control 
+Consider more features
--------------------------- 
+==================================
-Add a ``maxdepth`` argument to the path.visit() method, 
+There are many more features and useful classes 
-which will limit traversal to subdirectories. Example:: 
+that might be nice to integrate.  For example, we might put 
-
+Armin's `lazy list`_ implementation into the py lib. 
    x = py.path.local.get_tmproot()
    for x in p.visit('bin', stop=N): 
        ... 
 This will yield all file or directory paths whose basename
 is 'bin', depending on the values of ``stop``:: 
    p                       # stop == 0 or higher (and p.basename == 'bin')
    p / bin                 # stop == 1 or higher
    p / ... / bin           # stop == 2 or higher
    p / ... / ... / bin     # stop == 3 or higher
 The default for stop would be `255`. 
 But what if `stop < 0`?  We could let that mean to go upwards:: 
    for x in x.visit('py/bin', stop=-255): 
        # will yield all parent direcotires which have a 
        # py/bin subpath 
 visit() returning a lazy list? 
 ------------------------------ 
 There is a very nice "no-API" `lazy list`_ implementation from 
 Armin Rigo which presents a complete list interface, given some 
 iterable.  The iterable is consumed only on demand and retains 
 memory efficiency as much as possible.  The lazy list 
 provides a number of advantages in addition to the fact that
 a list interface is nicer to deal with than an iterator. 
 For example it lets you do:: 
    for x in p1.visit('*.cfg') + p2.visit('*.cfg'): 
        # will iterate through all results 
 Here the for-iter expression will retain all lazyness (with
 the result of adding lazy lists being another another lazy
 list) by internally concatenating the underlying
 lazylists/iterators.  Moreover, the lazylist implementation
 will know that there are no references left to the lazy list
 and throw away iterated elements.  This makes the iteration
 over the sum of the two visit()s as efficient as if we had 
 used iterables to begin with! 
 For this, we would like to move the lazy list into the 
 py lib's namespace, most probably at `py.builtin.lazylist`. 
 .. _`lazy list`: http://codespeak.net/svn/user/arigo/hack/misc/collect.py