[svn r38516] majorly refactor future chapter, mentioning
APIgen and other more current ideas --HG-- branch : trunk
This commit is contained in:
		
							parent
							
								
									790c9bbb88
								
							
						
					
					
						commit
						97aab00607
					
				|  | @ -9,321 +9,62 @@ This document tries to describe directions and guiding ideas | ||||||
| for the near-future development of the py lib.  *Note that all | for the near-future development of the py lib.  *Note that all | ||||||
| statements within this document - even if they sound factual - | statements within this document - even if they sound factual - | ||||||
| mostly just express thoughts and ideas. They not always refer to  | mostly just express thoughts and ideas. They not always refer to  | ||||||
| real code so read with some caution.  This is not a reference guide | real code so read with some caution.*   | ||||||
| (tm). Moreover, the order in which appear here in the file does  |  | ||||||
| not reflect the order in which they may be implemented.*  |  | ||||||
| 
 | 
 | ||||||
| .. _`general-path`:  | .. _`general-path`:  | ||||||
| .. _`a more general view on path objects`: | .. _`a more general view on path objects`: | ||||||
| 
 | 
 | ||||||
| A more general view on ``py.path`` objects  |  | ||||||
| ========================================== |  | ||||||
| 
 | 
 | ||||||
| Seen from a more general persective, the current ``py.path.extpy`` path  | Distribute tests ad-hoc across multiple platforms | ||||||
| offers a way to go from a file to the structured content of  | ====================================================== | ||||||
| a file, namely a python object.  The ``extpy`` path retains some |  | ||||||
| common ``path`` operations and semantics but offers additional |  | ||||||
| methods, e.g. ``resolve()`` gets you a true python object.    |  | ||||||
| 
 | 
 | ||||||
| But apart from python files there are many other examples  | After some more refactoring and unification of | ||||||
| of structured content like xml documents or INI-style  | the current testing and distribution support code | ||||||
| config files.  While some tasks will only be convenient  | we'd like to be able to run tests on multiple | ||||||
| to perform in a domain specific manner (e.g. applying xslt  | platforms simultanously and allow for interaction | ||||||
| etc.pp) ``py.path`` offers a common behaviour for  | and introspection into the (remote) failures.  | ||||||
| structured content paths. So far only ``py.path.extpy`` |  | ||||||
| is implemented and used by py.test to address tests  |  | ||||||
| and traverse into test files.  |  | ||||||
| 
 |  | ||||||
| *You are in a maze of twisty passages, all alike* |  | ||||||
| ------------------------------------------------- |  | ||||||
| 
 |  | ||||||
| Now, for the sake of finding out a good direction,  |  | ||||||
| let's consider some code that wants to find all  |  | ||||||
| *sections* which have a certain *option* value |  | ||||||
| within some given ``startpath``::  |  | ||||||
| 
 |  | ||||||
|     def find_option(startpath, optionname):  |  | ||||||
|         for section in startpath.listdir(dir=1):  |  | ||||||
|             opt = section.join(optionname)  |  | ||||||
|             if opt.check(): # does the option exist here?  |  | ||||||
|                 print section.basename, "found:", opt.read()  |  | ||||||
| 
 |  | ||||||
| Now the point is that ``find_option()`` would obviously work |  | ||||||
| when ``startpath`` is a filesystem-like path like a local |  | ||||||
| filesystem path or a subversion URL path. It would then see |  | ||||||
| directories as sections and files as option-names and the |  | ||||||
| content of the file as values.  |  | ||||||
| 
 |  | ||||||
| But it also works (today) for ``extpy`` paths if you put the following |  | ||||||
| python code in a file::  |  | ||||||
| 
 |  | ||||||
|     class Section1: |  | ||||||
|         someoption = "i am an option value"  |  | ||||||
| 
 |  | ||||||
|     class Section2: |  | ||||||
|         someoption = "i am another option value"  |  | ||||||
| 
 |  | ||||||
| An ``extpy()`` path maps classes and modules to directories and  |  | ||||||
| name-value bindings to file/read() operations.  |  | ||||||
| 
 |  | ||||||
| And it could also work for 'xml' paths if you put |  | ||||||
| the following xml string in a file::  |  | ||||||
| 
 |  | ||||||
|     <xml ...> |  | ||||||
|     <root> |  | ||||||
|         <section1>       |  | ||||||
|             <someoption>value</name></section1> |  | ||||||
|         <section2> |  | ||||||
|             <someoption>value</name></section2></root> |  | ||||||
| 
 |  | ||||||
| where tags containing non-text tags map to directories  |  | ||||||
| and tags with just text-children map to files (which |  | ||||||
| upon read() return the joined content of the text  |  | ||||||
| tags possibly as unicode.  |  | ||||||
| 
 |  | ||||||
| Now, to complete the picture, we could make Config-Parser  |  | ||||||
| *ini-style* config files also available:: |  | ||||||
| 
 |  | ||||||
|     [section1] |  | ||||||
|     name = value  |  | ||||||
|      |  | ||||||
|     [section2] |  | ||||||
|     othername = value |  | ||||||
| 
 |  | ||||||
| where sections map to directories and name=value mappings |  | ||||||
| to file/contents.  |  | ||||||
| 
 |  | ||||||
| So it seems that our above ``find_option()`` function would |  | ||||||
| work nicely on all these *mappings*.  |  | ||||||
| 
 |  | ||||||
| Of course, the somewhat open question is how to make the |  | ||||||
| transition from a filesystem path to structured content |  | ||||||
| useful and unified, as much as possible without overdoing it.  |  | ||||||
| 
 |  | ||||||
| Again, there are tasks that will need fully domain specific |  | ||||||
| solutions (DOM/XSLT/...) but i think the above view warrants |  | ||||||
| some experiments and refactoring.  The degree of uniformity  |  | ||||||
| still needs to be determined and thought about.  |  | ||||||
| 
 |  | ||||||
| path objects should be stackable |  | ||||||
| -------------------------------- |  | ||||||
|   |  | ||||||
| Oh, and btw, a ``py.path.extpy`` file could live on top of a  |  | ||||||
| 'py.path.xml' path as well, i.e. take:: |  | ||||||
| 
 |  | ||||||
|     <xml ...> |  | ||||||
|     <code> |  | ||||||
|         <py>       |  | ||||||
|             <magic> |  | ||||||
|                 <assertion> |  | ||||||
|                     import py  |  | ||||||
|                     ... </assertion> |  | ||||||
|                 <exprinfo>  |  | ||||||
|                     def getmsg(x): pass </exprino></magic></py></code> |  | ||||||
| 
 |  | ||||||
| and use it to have a ``extpy`` path living on it:: |  | ||||||
| 
 |  | ||||||
|     p = py.path.local(xmlfilename) |  | ||||||
|     xmlp = py.path.extxml(p, 'py/magic/exprinfo') |  | ||||||
|     p = py.path.extpy(xmlp, 'getmsg') |  | ||||||
|    |  | ||||||
|     assert p.check(func=1, basename='getmsg')  |  | ||||||
|     getmsg = p.resolve()  |  | ||||||
|     # we now have a *live* getmsg() function taken and compiled from  |  | ||||||
|     # the above xml fragment |  | ||||||
| 
 |  | ||||||
| There could be generic converters which convert between  |  | ||||||
| different content formats ... allowing configuration files to e.g.  |  | ||||||
| be in XML/Ini/python or filesystem-format with some common way  |  | ||||||
| to find and iterate values.  |  | ||||||
| 
 |  | ||||||
| *After all the unix filesystem and the python namespaces are  |  | ||||||
| two honking great ideas, why not do more of them? :-)* |  | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| .. _importexport:  | Make APIGEN useful for more projects | ||||||
|  | ================================================ | ||||||
| 
 | 
 | ||||||
| Revising and improving the import/export system  | The new APIGEN tool offers rich information  | ||||||
| =============================================== | derived from running tests against an application:  | ||||||
|  | argument types and callsites, i.e. it shows | ||||||
|  | the places where a particular API is used.  | ||||||
|  | In its first incarnation, there are still | ||||||
|  | some specialties that likely prevent it | ||||||
|  | from documenting APIs for other projects.  | ||||||
|  | We'd like to evolve to a `py.apigen` tool | ||||||
|  | that can make use of information provided | ||||||
|  | by a py.test run.  | ||||||
| 
 | 
 | ||||||
|     or let's wrap the world all around  | Distribute channels/programs across networks | ||||||
|  | ================================================ | ||||||
| 
 | 
 | ||||||
| the export/import interface  | Apart from stabilizing setup/teardown procedures | ||||||
| --------------------------- | for `py.execnet`_, we'd like to generalize its | ||||||
| 
 | implementation to allow connecting two programs | ||||||
| The py lib already incorporates a mechanism to select which | across multiple hosts, i.e. we'd like to arbitrarily | ||||||
| namespaces and names get exposed to a user of the library. | send "channels" across the network. Likely this | ||||||
| Apart from reducing the outside visible namespaces complexity  | will be done by using the "pipe" model, i.e.  | ||||||
| this allows to quickly rename and refactor stuff in the | that each channel is actually a pair of endpoints, | ||||||
| implementation without affecting the caller side.  This export | both of which can be independently transported  | ||||||
| control can be used by other python packages as well.  | across the network.  The programs who "own"  | ||||||
| 
 | these endpoints remain connected.  | ||||||
| However, all is not fine as the import/export has a  |  | ||||||
| few major deficiencies and shortcomings: |  | ||||||
| 
 |  | ||||||
| - it doesn't allow to specify doc-strings  |  | ||||||
| - it is a bit hackish (see py/initpkg.py) |  | ||||||
| - it doesn't present a complete and consistent view of the API.  |  | ||||||
| - ``help(constructed_namespace)`` doesn't work for the root  |  | ||||||
|   package namespace |  | ||||||
| - when the py lib implementation accesses parts of itself  |  | ||||||
|   it uses the native python import mechanism which is  |  | ||||||
|   limiting in some respects.  Especially for distributed |  | ||||||
|   programs as encouraged by `py.execnet`_ it is not clear |  | ||||||
|   how the mechanism can nicely integrate to support remote |  | ||||||
|   lazy importing.  |  | ||||||
| 
 |  | ||||||
| Discussions have been going on for a while but it is |  | ||||||
| still not clear how to best tackle the problem.  Personally,  |  | ||||||
| i believe the main missing thing for the first major release  |  | ||||||
| is the docstring one.   The current specification  |  | ||||||
| of exported names is dictionary based.  It would be  |  | ||||||
| better to declare it in terms of Objects.  |  | ||||||
| 
 |  | ||||||
| 
 |  | ||||||
| Example sketch for a new export specification  |  | ||||||
| --------------------------------------------- |  | ||||||
| 
 |  | ||||||
| Here is a sketch of how the py libs ``__init__.py`` file  |  | ||||||
| might or should look like::  |  | ||||||
| 
 |  | ||||||
|     """ |  | ||||||
|         the py lib version 1.0 |  | ||||||
|         http://codespeak.net/py/1.0 |  | ||||||
|     """ |  | ||||||
| 
 |  | ||||||
|     from py import pkg |  | ||||||
|     pkg.export(__name__, |  | ||||||
|         pkg.Module('path', |  | ||||||
|             '''provides path objects for local filesystem,  |  | ||||||
|                subversion url and working copy, and extension paths. |  | ||||||
|             ''', |  | ||||||
|             pkg.Class('local', ''' |  | ||||||
|                the local filesystem path offering a single |  | ||||||
|                point of interaction for many purposes. |  | ||||||
|                ''', extpy='./path/local.LocalPath'), |  | ||||||
| 
 |  | ||||||
|             pkg.Class('svnurl', ''' |  | ||||||
|                the subversion url path. |  | ||||||
|             ''', extpy='./path/local/svn/urlcommand.SvnUrlPath'), |  | ||||||
|         ), |  | ||||||
|     # it goes on ...  |  | ||||||
|     ) |  | ||||||
| 
 |  | ||||||
| The current ``initpkg.py`` code can be cleaned up to support |  | ||||||
| this new more explicit style of stating things. Note that |  | ||||||
| in principle there is nothing that stops us from retrieving |  | ||||||
| implementations over the network, e.g. a subversion repository.  |  | ||||||
| 
 |  | ||||||
| 
 |  | ||||||
| Let there be alternatives  |  | ||||||
| ------------------------- |  | ||||||
| 
 |  | ||||||
| We could also specify alternative implementations easily:: |  | ||||||
| 
 |  | ||||||
|     pkg.Class('svnwc', ''' |  | ||||||
|        the subversion working copy. |  | ||||||
|     ''', extpy=('./path/local/svn/urlbinding.SvnUrlPath',  |  | ||||||
|                 './path/local/svn/urlcommand.SvnUrlPath',) |  | ||||||
|     ) |  | ||||||
| 
 |  | ||||||
| This would prefer the python binding based implementation over |  | ||||||
| the one working through he 'svn' command line utility.  And |  | ||||||
| of course, it could uniformly signal if no implementation is  |  | ||||||
| available at all.  |  | ||||||
| 
 |  | ||||||
| 
 |  | ||||||
| Problems problems   |  | ||||||
| ----------------- |  | ||||||
| 
 |  | ||||||
| Now there are reasons there isn't a clear conclusion so far.  |  | ||||||
| For example, the above approach has some implications, the |  | ||||||
| main one being that implementation classes like |  | ||||||
| ``py/path/local.LocalPath`` are visible to the caller side but |  | ||||||
| this presents an inconsistency because the user started out with |  | ||||||
| ``py.path.local`` and expects that the two classes are really much |  | ||||||
| the same.  We have the same problem today, of course.  |  | ||||||
| 
 |  | ||||||
| The naive solution strategy of wrapping the "implementation |  | ||||||
| level" objects into their exported representations may remind |  | ||||||
| of the `wrapping techniques PyPy uses`_.  But it |  | ||||||
| *may* result in a slightly heavyweight mechanism that affects |  | ||||||
| runtime speed.  However, I guess that this standard strategy |  | ||||||
| is probably the cleanest.  |  | ||||||
| 
 |  | ||||||
| 
 |  | ||||||
| Every problem can be solved with another level ...  |  | ||||||
| -------------------------------------------------- |  | ||||||
| 
 |  | ||||||
| The wrapping of implementation level classes in their export |  | ||||||
| representations objects adds another level of indirection. |  | ||||||
| But this indirection would have interesting advantages:  |  | ||||||
| 
 |  | ||||||
| - we could easily present a consistent view of the library  |  | ||||||
| - it could take care of exceptions as well  |  | ||||||
| - it provides natural interception points for logging  |  | ||||||
| - it enables remote lazy loading of implementations  |  | ||||||
|   or certain versions of interfaces  |  | ||||||
| 
 |  | ||||||
| And quite likely the extra indirection wouldn't hurt so much |  | ||||||
| as it is not much more than a function call and we cared |  | ||||||
| we could even generate some c-code (with PyPy :-) to speed |  | ||||||
| it up.    |  | ||||||
| 
 |  | ||||||
| But it can lead to new problems ... |  | ||||||
| ----------------------------------- |  | ||||||
| 
 |  | ||||||
| However, it is critical to avoid to burden the implementation |  | ||||||
| code of being aware of its wrapping.  This is what we have  |  | ||||||
| to do in PyPy but the import/export mechanism works at  |  | ||||||
| a higher level of the language, i think.   |  | ||||||
| 
 |  | ||||||
| Oh, and we didn't talk about bootstrapping :-)  |  | ||||||
| 
 | 
 | ||||||
| .. _`py.execnet`: ../execnet.html | .. _`py.execnet`: ../execnet.html | ||||||
| .. _`wrapping techniques PyPy uses`: http://codespeak.net/pypy/index.cgi?doc/wrapping.html |  | ||||||
| .. _`lightweight xml generation`:  |  | ||||||
| 
 | 
 | ||||||
| Extension of py.path.local.sysexec() | Benchmarking and persistent storage  | ||||||
| ==================================== | ========================================= | ||||||
| 
 | 
 | ||||||
| The `sysexec mechanism`_ allows to directly execute  | For storing test results, but also benchmarking | ||||||
| binaries on your system.  Especially after we'll have this | and other information, we need a solid way  | ||||||
| nicely integrated into Win32 we may also want to run python  | to store all kinds of information from test runs.  | ||||||
| scripts both locally and from the net:: | We'd like to generate statistics or html-overview  | ||||||
| 
 | out of it, but also use such information to determine when | ||||||
|     vadm = py.path.svnurl('http://codespeak.net/svn/vadm/dist/vadm/cmdline.py')  | a certain test broke, or when its performance | ||||||
|     stdoutput = vadm.execute('diff') | decreased considerably.  | ||||||
| 
 |  | ||||||
| To be able to execute this code fragement, we need either or all of  |  | ||||||
| 
 |  | ||||||
| - an improved import system that allows remote imports  |  | ||||||
| 
 |  | ||||||
| - a way to specify what the "neccessary" python import |  | ||||||
|   directories are. for example, the above scriptlet will |  | ||||||
|   require a certain root included in the python search for module  |  | ||||||
|   in order to execute something like "import vadm".  |  | ||||||
| 
 |  | ||||||
| - a way to specify dependencies ... which opens up another |  | ||||||
|   interesting can of worms, suitable for another chapter |  | ||||||
|   in the neverending `future book`_.  |  | ||||||
| 
 |  | ||||||
| .. _`sysexec mechanism`: ../misc.html#sysexec |  | ||||||
| .. _`compile-on-the-fly`:  |  | ||||||
| 
 |  | ||||||
| we need a persistent storage for the py lib  |  | ||||||
| ------------------------------------------- |  | ||||||
| 
 |  | ||||||
| A somewhat open question is where to store the underlying |  | ||||||
| generated pyc-files and other files generated on the fly  |  | ||||||
| with `CPython's distutils`_.  We want to have a  |  | ||||||
| *persistent location* in order to avoid runtime-penalties |  | ||||||
| when switching python versions and platforms (think NFS).  |  | ||||||
| 
 |  | ||||||
| A *persistent location* for the py lib would be a good idea |  | ||||||
| maybe also for other reasons. We could cache some of the |  | ||||||
| expensive test setups, like the multi-revision subversion |  | ||||||
| repository that is created for each run of the tests.  |  | ||||||
| 
 | 
 | ||||||
| .. _`CPython's distutils`: http://www.python.org/dev/doc/devel/lib/module-distutils.html | .. _`CPython's distutils`: http://www.python.org/dev/doc/devel/lib/module-distutils.html | ||||||
| 
 | 
 | ||||||
|  | @ -364,59 +105,12 @@ is a can of subsequent worms). | ||||||
| .. _`reiserfs v4 features`: http://www.namesys.com/v4/v4.html | .. _`reiserfs v4 features`: http://www.namesys.com/v4/v4.html | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| Improve and unify Path API  |  | ||||||
| ========================== |  | ||||||
| 
 | 
 | ||||||
| visit() grows depth control  | Consider more features | ||||||
| ---------------------------  | ================================== | ||||||
| 
 | 
 | ||||||
| Add a ``maxdepth`` argument to the path.visit() method,  | There are many more features and useful classes  | ||||||
| which will limit traversal to subdirectories. Example::  | that might be nice to integrate.  For example, we might put  | ||||||
| 
 | Armin's `lazy list`_ implementation into the py lib.  | ||||||
|     x = py.path.local.get_tmproot() |  | ||||||
|     for x in p.visit('bin', stop=N):  |  | ||||||
|         ...  |  | ||||||
| 
 |  | ||||||
| This will yield all file or directory paths whose basename |  | ||||||
| is 'bin', depending on the values of ``stop``::  |  | ||||||
| 
 |  | ||||||
|     p                       # stop == 0 or higher (and p.basename == 'bin') |  | ||||||
|     p / bin                 # stop == 1 or higher |  | ||||||
|     p / ... / bin           # stop == 2 or higher |  | ||||||
|     p / ... / ... / bin     # stop == 3 or higher |  | ||||||
| 
 |  | ||||||
| The default for stop would be `255`.  |  | ||||||
| 
 |  | ||||||
| But what if `stop < 0`?  We could let that mean to go upwards::  |  | ||||||
| 
 |  | ||||||
|     for x in x.visit('py/bin', stop=-255):  |  | ||||||
|         # will yield all parent direcotires which have a  |  | ||||||
|         # py/bin subpath  |  | ||||||
| 
 |  | ||||||
| visit() returning a lazy list?  |  | ||||||
| ------------------------------  |  | ||||||
| 
 |  | ||||||
| There is a very nice "no-API" `lazy list`_ implementation from  |  | ||||||
| Armin Rigo which presents a complete list interface, given some  |  | ||||||
| iterable.  The iterable is consumed only on demand and retains  |  | ||||||
| memory efficiency as much as possible.  The lazy list  |  | ||||||
| provides a number of advantages in addition to the fact that |  | ||||||
| a list interface is nicer to deal with than an iterator.  |  | ||||||
| For example it lets you do::  |  | ||||||
| 
 |  | ||||||
|     for x in p1.visit('*.cfg') + p2.visit('*.cfg'):  |  | ||||||
|         # will iterate through all results  |  | ||||||
| 
 |  | ||||||
| Here the for-iter expression will retain all lazyness (with |  | ||||||
| the result of adding lazy lists being another another lazy |  | ||||||
| list) by internally concatenating the underlying |  | ||||||
| lazylists/iterators.  Moreover, the lazylist implementation |  | ||||||
| will know that there are no references left to the lazy list |  | ||||||
| and throw away iterated elements.  This makes the iteration |  | ||||||
| over the sum of the two visit()s as efficient as if we had  |  | ||||||
| used iterables to begin with!  |  | ||||||
| 
 |  | ||||||
| For this, we would like to move the lazy list into the  |  | ||||||
| py lib's namespace, most probably at `py.builtin.lazylist`.  |  | ||||||
| 
 | 
 | ||||||
| .. _`lazy list`: http://codespeak.net/svn/user/arigo/hack/misc/collect.py | .. _`lazy list`: http://codespeak.net/svn/user/arigo/hack/misc/collect.py | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue