diff --git a/README.md b/README.md
index 04a803c..f4ea1d3 100644
--- a/README.md
+++ b/README.md
@@ -36,31 +36,11 @@
   * [modules](#modulesfor-library-developers)
 * [__Release Notes__](#release-notes)
   * [v8.0](#version-80-release)
-* [__Parser__](#parser)
-  * [v1.0](#version-10-parser-last-update-20191014)
-* [__Abstract Syntax Tree__](#abstract-syntax-tree)
-  * [v1.2](#version-12-ast-last-update-20191031)
-  * [v2.0](#version-20-ast-last-update-2020831)
-  * [v3.0](#version-30-ast-last-update-20201023)
-  * [v5.0](#version-50-ast-last-update-202137)
-* [__Bytecode VM__](#bytecode-virtual-machine)
-  * [v4.0](#version-40-vm-last-update-20201217)
-  * [v5.0](#version-50-vm-last-update-202137)
-  * [v6.0](#version-60-vm-last-update-202161)
-  * [v6.5](#version-65-vm-last-update-2021624)
-  * [v7.0](#version-70-vm-last-update-2021108)
-  * [v8.0](#version-80-vm-last-update-2022212)
-  * [v9.0](#version-90-vm-last-update-2022518)
-  * [v10.0](#version-100-vm-latest)
-* [__Benchmark__](#benchmark)
-  * [v6.5 (i5-8250U windows 10)](#version-65-i5-8250u-windows10-2021619)
-  * [v6.5 (i5-8250U ubuntu-WSL)](#version-70-i5-8250u-ubuntu-wsl-on-windows10-2021629)
-  * [v8.0 (R9-5900HX ubuntu-WSL)](#version-80-r9-5900hx-ubuntu-wsl-2022123)
-  * [v9.0 (R9-5900HX ubuntu-WSL)](#version-90-r9-5900hx-ubuntu-wsl-2022213)
+* [__Development History__](./doc/dev.md)
+* [__Benchmark__](./doc/benchmark.md)
 * [__Difference__](#difference-between-andys-and-this-interpreter)
   * [strict definition](#1-must-use-var-to-define-variables)
-  * [(outdated)use after definition](#2-now-supported-couldnt-use-variables-before-definitions)
-  * [default dynamic arguments](#3-default-dynamic-arguments-not-supported)
+  * [default dynamic arguments](#2-default-dynamic-arguments-not-supported)
 * [__Trace Back Info__](#trace-back-info)
   * [native function 'die'](#1-native-function-die)
   * [stack overflow](#2-stack-overflow-crash-info)
@@ -77,11 +57,11 @@ __Contact us if having great ideas to share!__
 ## __Introduction__
 
 __[Nasal](http://wiki.flightgear.org/Nasal_scripting_language)__
-is an ECMAscript-like programming language that used in __[FlightGear](https://www.flightgear.org/)__.
-This language is designed by __[Andy Ross](https://github.com/andyross)__.
+is an ECMAscript-like programming language that used in [FlightGear](https://www.flightgear.org/).
+This language is designed by [Andy Ross](https://github.com/andyross).
 
-The interpreter is totally rewritten by __[ValKmjolnir](https://github.com/ValKmjolnir)__ using `C++`(`-std=c++11`)
-without reusing the code in __[Andy Ross's nasal interpreter](<https://github.com/andyross/nasal>)__.
+The interpreter is totally rewritten by [ValKmjolnir](https://github.com/ValKmjolnir) using `C++`(`-std=c++11`)
+without reusing the code in [Andy Ross's nasal interpreter](<https://github.com/andyross/nasal>).
 But we really appreciate that Andy created this amazing programming language and his interpreter project.
 
 Now this project uses __MIT license__ (2021/5/4).
@@ -91,7 +71,7 @@ use this project to learn or create more interesting things
 
 __Why writing this nasal interpreter?__
 In 2019 summer holiday,
-members in __[FGPRC](https://www.fgprc.org/)__ told me that it is hard to debug with nasal-console in Flightgear,
+members in [FGPRC](https://www.fgprc.org/) told me that it is hard to debug with nasal-console in Flightgear,
 especially when checking syntax errors.
 So i tried to write a new interpreter to help them checking syntax error and even, runtime error.
 
@@ -113,15 +93,15 @@ this interpreter a useful tool in your own projects (such as a script in a game
 ![macOS](https://img.shields.io/badge/Apple%20Inc.-MacOS-green?style=flat-square&logo=apple)
 ![linux](https://img.shields.io/badge/GNU-Linux-green?style=flat-square&logo=GNU)
 
+![g++](https://img.shields.io/badge/GNU-g++-A42E2B?style=flat-square&logo=GNU)
+![clang++](https://img.shields.io/badge/LLVM-clang++-262D3A?style=flat-square&logo=LLVM)
+![vs](https://img.shields.io/badge/Visual_Studio-MSVC-5C2D91?style=flat-square&logo=visualstudio)
+
 Better choose the latest update of the interpreter.
 Download the source and build it! It's quite easy to build this interpreter.
 
 __CAUTION__: If want to use the release zip/tar.gz file to build the interpreter, please read the [__Release Notes__](#release-notes) below to make sure this release file has no fatal bugs. There are some tips to fix the release manually.
 
-![g++](https://img.shields.io/badge/GNU-g++-A42E2B?style=flat-square&logo=GNU)
-![clang++](https://img.shields.io/badge/LLVM-clang++-262D3A?style=flat-square&logo=LLVM)
-![vs](https://img.shields.io/badge/Visual_Studio-MSVC-5C2D91?style=flat-square&logo=visualstudio)
-
 Use g++(`MinGW-w64`) or MSVC(`Visual Studio`) on __`Windows`__ platform. Download MinGW-w64 [__HERE__](https://www.mingw-w64.org/downloads/)(Visual Studio also has this), and use g++/clang++ on __`linux/macOS/Unix`__ platform (we suggest `clang`).
 
 We could build the interpreter using `makefile`.
@@ -802,743 +782,6 @@ Another bug is that in `nasal_err.h:class nasal_err`, we should add a constructo
 
 This bug is fixed in `v9.0`. So we suggest that do not use `v8.0`.
 
-## __Parser__
-
-`LL(1)` parser with special check.
-
-```javascript
-(var a,b,c)=[{b:nil},[1,2],func return 0;];
-(a.b,b[0],c)=(1,2,3);
-```
-
-These two expressions have the same first set,so `LL(1)` is useless for this language. We add some special checks in it.
-
-Problems mentioned above have been solved for a long time, but recently i found a new problem here:
-
-```javascript
-var f=func(x,y,z){return x+y+z}
-(a,b,c)=(0,1,2);
-```
-
-This will be recognized as this:
-
-```javascript
-var f=func(x,y,z){return x+y+z}(a,b,c)
-=(0,1,2);
-```
-
-and causes fatal syntax error.
-And i tried this program in flightgear nasal console.
-It also found this is a syntax error.
-I think this is a serious design fault.
-To avoid this syntax error, change program like this, just add a semicolon:
-
-```javascript
-var f=func(x,y,z){return x+y+z};
-                               ^ here
-(a,b,c)=(0,1,2);
-```
-
-### version 1.0 parser (last update 2019/10/14)
-
-First fully functional version of nasal_parser.
-
-Before version 1.0,i tried many times to create a correct parser.
-
-Finally i learned `LL(1)` and `LL(k)` and wrote a parser for math formulas in version 0.16(last update 2019/9/14).
-
-In version 0.17(2019/9/15) 0.18(2019/9/18) 0.19(2019/10/1)i was playing the parser happily and after that i wrote version 1.0.
-
-__This project began at 2019/7/25__.
-
-## __Abstract Syntax Tree__
-
-### version 1.2 ast (last update 2019/10/31)
-
-The ast has been completed in this version.
-
-### version 2.0 ast (last update 2020/8/31)
-
-A completed ast-interpreter with unfinished lib functions.
-
-### version 3.0 ast (last update 2020/10/23)
-
-The ast is refactored and is now easier to read and maintain.
-
-Ast-interpreter uses new techniques so it can run codes more efficiently.
-
-Now you can add your own functions as builtin-functions in this interpreter!
-
-I decide to save the ast interpreter after releasing v4.0. Because it took me a long time to think and write...
-
-### version 5.0 ast (last update 2021/3/7)
-
-I change my mind.
-AST interpreter leaves me too much things to do.
-
-If i continue saving this interpreter,
-it will be harder for me to make the bytecode vm become more efficient.
-
-## __Bytecode Virtual Machine__
-
-### version 4.0 vm (last update 2020/12/17)
-
-I have just finished the first version of bytecode-interpreter.
-
-This interpreter is still in test.
-After this test, i will release version 4.0!
-
-Now i am trying to search hidden bugs in this interpreter.
-Hope you could help me! :)
-
-There's an example of byte code below:
-
-```javascript
-for(var i=0;i<4000000;i+=1);
-```
-
-```x86asm
-.number 0
-.number 4e+006
-.number 1
-.symbol i
-0x00000000: pzero  0x00000000
-0x00000001: loadg  0x00000000 (i)
-0x00000002: callg  0x00000000 (i)
-0x00000003: pnum   0x00000001 (4e+006)
-0x00000004: less   0x00000000
-0x00000005: jf     0x0000000b
-0x00000006: pone   0x00000000
-0x00000007: mcallg 0x00000000 (i)
-0x00000008: addeq  0x00000000
-0x00000009: pop    0x00000000
-0x0000000a: jmp    0x00000002
-0x0000000b: nop    0x00000000
-```
-
-### version 5.0 vm (last update 2021/3/7)
-
-I decide to optimize bytecode vm in this version.
-
-Because it takes more than 1.5s to count i from `0` to `4000000-1`.This is not efficient at all!
-
-2021/1/23 update: Now it can count from `0` to `4000000-1` in 1.5s.
-
-### version 6.0 vm (last update 2021/6/1)
-
-Use `loadg`/`loadl`/`callg`/`calll`/`mcallg`/`mcalll` to avoid branches.
-
-Delete type `vm_scop`.
-
-Use const `vm_num` to avoid frequently new & delete.
-
-Change garbage collector from reference-counting to mark-sweep.
-
-`vapp` and `newf` operand use .num to reduce the size of `exec_code`.
-
-2021/4/3 update: Now it can count from `0` to `4e6-1` in 0.8s.
-
-2021/4/19 update: Now it can count from `0` to `4e6-1` in 0.4s.
-
-In this update i changed global and local scope from `unordered_map` to `vector`.
-
-So the bytecode generator changed a lot.
-
-```javascript
-for(var i=0;i<4000000;i+=1);
-```
-
-```x86asm
-.number 4e+006
-0x00000000: intg   0x00000001
-0x00000001: pzero  0x00000000
-0x00000002: loadg  0x00000000
-0x00000003: callg  0x00000000
-0x00000004: pnum   0x00000000 (4e+006)
-0x00000005: less   0x00000000
-0x00000006: jf     0x0000000c
-0x00000007: pone   0x00000000
-0x00000008: mcallg 0x00000000
-0x00000009: addeq  0x00000000
-0x0000000a: pop    0x00000000
-0x0000000b: jmp    0x00000003
-0x0000000c: nop    0x00000000
-```
-
-### version 6.5 vm (last update 2021/6/24)
-
-2021/5/31 update:
-
-Now gc can collect garbage correctly without re-collecting,
-which will cause fatal error.
-
-Add `builtin_alloc` to avoid mark-sweep when running a built-in function,
-which will mark useful items as useless garbage to collect.
-
-Better use setsize and assignment to get a big array,
-`append` is very slow in this situation.
-
-2021/6/3 update:
-
-Fixed a bug that gc still re-collects garbage,
-this time i use three mark states to make sure garbage is ready to be collected.
-
-Change `callf` to `callfv` and `callfh`.
-And `callfv` fetches arguments from `val_stack` directly instead of using `vm_vec`,
-a not very efficient way.
-
-Better use `callfv` instead of `callfh`,
-`callfh` will fetch a `vm_hash` from stack and parse it,
-making this process slow.
-
-```javascript
-var f=func(x,y){return x+y;}
-f(1024,2048);
-```
-
-```x86asm
-.number 1024
-.number 2048
-.symbol x   
-.symbol y
-0x00000000: intg   0x00000001
-0x00000001: newf   0x00000007
-0x00000002: intl   0x00000003
-0x00000003: offset 0x00000001
-0x00000004: para   0x00000000 (x)
-0x00000005: para   0x00000001 (y)
-0x00000006: jmp    0x0000000b
-0x00000007: calll  0x00000001
-0x00000008: calll  0x00000002
-0x00000009: add    0x00000000
-0x0000000a: ret    0x00000000
-0x0000000b: loadg  0x00000000
-0x0000000c: callg  0x00000000
-0x0000000d: pnum   0x00000000 (1024)
-0x0000000e: pnum   0x00000001 (2048)
-0x0000000f: callfv 0x00000002
-0x00000010: pop    0x00000000
-0x00000011: nop    0x00000000
-```
-
-2021/6/21 update: Now gc will not collect nullptr.
-And the function of assignment is complete,
-now these kinds of assignment is allowed:
-
-```javascript
-var f=func()
-{
-    var _=[{_:0},{_:1}];
-    return func(x)
-    {
-        return _[x];
-    }
-}
-var m=f();
-m(0)._=m(1)._=10;
-
-[0,1,2][1:2][0]=0;
-```
-
-In the old version,
-parser will check this left-value and tells that these kinds of left-value are not allowed(bad lvalue).
-
-But now it can work.
-And you could see its use by reading the code above.
-To make sure this assignment works correctly,
-codegen will generate byte code by `nasal_codegen::call_gen()` instead of `nasal_codegen::mcall_gen()`,
-and the last child of the ast will be generated by `nasal_codegen::mcall_gen()`.
-So the bytecode is totally different now:
-
-```x86asm
-.number 10
-.number 2
-.symbol _
-.symbol x
-0x00000000: intg   0x00000002
-0x00000001: newf   0x00000005
-0x00000002: intl   0x00000002
-0x00000003: offset 0x00000001
-0x00000004: jmp    0x00000017
-0x00000005: newh   0x00000000
-0x00000006: pzero  0x00000000
-0x00000007: happ   0x00000000 (_)
-0x00000008: newh   0x00000000
-0x00000009: pone   0x00000000
-0x0000000a: happ   0x00000000 (_)
-0x0000000b: newv   0x00000002
-0x0000000c: loadl  0x00000001
-0x0000000d: newf   0x00000012
-0x0000000e: intl   0x00000003
-0x0000000f: offset 0x00000002
-0x00000010: para   0x00000001 (x)
-0x00000011: jmp    0x00000016
-0x00000012: calll  0x00000001
-0x00000013: calll  0x00000002
-0x00000014: callv  0x00000000
-0x00000015: ret    0x00000000
-0x00000016: ret    0x00000000
-0x00000017: loadg  0x00000000
-0x00000018: callg  0x00000000
-0x00000019: callfv 0x00000000
-0x0000001a: loadg  0x00000001
-0x0000001b: pnum   0x00000000 (10.000000)
-0x0000001c: callg  0x00000001
-0x0000001d: pone   0x00000000
-0x0000001e: callfv 0x00000001
-0x0000001f: mcallh 0x00000000 (_)
-0x00000020: meq    0x00000000
-0x00000021: callg  0x00000001
-0x00000022: pzero  0x00000000
-0x00000023: callfv 0x00000001
-0x00000024: mcallh 0x00000000 (_)
-0x00000025: meq    0x00000000
-0x00000026: pop    0x00000000
-0x00000027: pzero  0x00000000
-0x00000028: pzero  0x00000000
-0x00000029: pone   0x00000000
-0x0000002a: pnum   0x00000001 (2.000000)
-0x0000002b: newv   0x00000003
-0x0000002c: slcbeg 0x00000000
-0x0000002d: pone   0x00000000
-0x0000002e: pnum   0x00000001 (2.000000)
-0x0000002f: slc2   0x00000000
-0x00000030: slcend 0x00000000
-0x00000031: pzero  0x00000000
-0x00000032: mcallv 0x00000000
-0x00000033: meq    0x00000000
-0x00000034: pop    0x00000000
-0x00000035: nop    0x00000000
-```
-
-As you could see from the bytecode above,
-`mcall`/`mcallv`/`mcallh` operands' using frequency will reduce,
-`call`/`callv`/`callh`/`callfv`/`callfh` at the opposite.
-
-And because of the new structure of `mcall`,
-`addr_stack`, a stack used to store the memory address,
-is deleted from `nasal_vm`,
-and now `nasal_vm` use `nasal_val** mem_addr` to store the memory address.
-This will not cause fatal errors because the memory address is used __immediately__ after getting it.
-
-### version 7.0 vm (last update 2021/10/8)
-
-2021/6/26 update:
-
-Instruction dispatch is changed from call-threading to computed-goto(with inline function).
-After changing the way of instruction dispatch,
-there is a great improvement in nasal_vm.
-Now vm can run test/bigloop and test/pi in 0.2s!
-And vm runs test/fib in 0.8s on linux.
-You could see the time use data below,
-in Test data section.
-
-This version uses g++ extension "labels as values",
-which is also supported by clang++.
-(But i don't know if MSVC supports this)
-
-There is also a change in nasal_gc:
-`std::vector` global is deleted,
-now the global values are all stored on stack(from `val_stack+0` to `val_stack+intg-1`).
-
-2021/6/29 update:
-
-Add some instructions that execute const values:
-`op_addc`,`op_subc`,`op_mulc`,`op_divc`,`op_lnkc`,`op_addeqc`,`op_subeqc`,`op_muleqc`,`op_diveqc`,`op_lnkeqc`.
-
-Now the bytecode of test/bigloop.nas seems like this:
-
-```x86asm
-.number 4e+006
-.number 1
-0x00000000: intg   0x00000001
-0x00000001: pzero  0x00000000
-0x00000002: loadg  0x00000000
-0x00000003: callg  0x00000000
-0x00000004: pnum   0x00000000 (4000000)
-0x00000005: less   0x00000000
-0x00000006: jf     0x0000000b
-0x00000007: mcallg 0x00000000
-0x00000008: addeqc 0x00000001 (1)
-0x00000009: pop    0x00000000
-0x0000000a: jmp    0x00000003
-0x0000000b: nop    0x00000000
-```
-
-And this test file runs in 0.1s after this update.
-Most of the calculations are accelerated.
-
-Also, assignment bytecode has changed a lot.
-Now the first identifier that called in assignment will use `op_load` to assign,
-instead of `op_meq`,`op_pop`.
-
-```javascript
-var (a,b)=(1,2);
-a=b=0;
-```
-
-```x86asm
-.number 2
-0x00000000: intg   0x00000002
-0x00000001: pone   0x00000000
-0x00000002: loadg  0x00000000
-0x00000003: pnum   0x00000000 (2)
-0x00000004: loadg  0x00000001
-0x00000005: pzero  0x00000000
-0x00000006: mcallg 0x00000001
-0x00000007: meq    0x00000000 (b=2 use meq,pop->a)
-0x00000008: loadg  0x00000000 (a=b use loadg)
-0x00000009: nop    0x00000000
-```
-
-### version 8.0 vm (last update 2022/2/12)
-
-2021/10/8 update:
-
-In this version vm_nil and vm_num now is not managed by    `nasal_gc`,
-this will decrease the usage of `gc::alloc` and increase the efficiency of execution.
-
-New value type is added: `vm_obj`.
-This type is reserved for user to define their own value types.
-Related API will be added in the future.
-
-Fully functional closure:
-Add new operands that get and set upvalues.
-Delete an old operand `op_offset`.
-
-2021/10/13 update:
-
-The format of output information of bytecodes changes to this:
-
-```x86asm
-0x000002f2: newf   0x2f6
-0x000002f3: intl   0x2
-0x000002f4: para   0x3e ("x")
-0x000002f5: jmp    0x309
-0x000002f6: calll  0x1
-0x000002f7: lessc  0x0 (2)
-0x000002f8: jf     0x2fb
-0x000002f9: calll  0x1
-0x000002fa: ret
-0x000002fb: upval  0x0[0x1]
-0x000002fc: upval  0x0[0x1]
-0x000002fd: callfv 0x1
-0x000002fe: calll  0x1
-0x000002ff: subc   0x1d (1)
-0x00000300: callfv 0x1
-0x00000301: upval  0x0[0x1]
-0x00000302: upval  0x0[0x1]
-0x00000303: callfv 0x1
-0x00000304: calll  0x1
-0x00000305: subc   0x0 (2)
-0x00000306: callfv 0x1
-0x00000307: add
-0x00000308: ret
-0x00000309: ret
-0x0000030a: callfv 0x1
-0x0000030b: loadg  0x32
-```
-
-2022/1/22 update:
-
-Delete `op_pone` and `op_pzero`.
-Both of them are meaningless and will be replaced by `op_pnum`.
-
-### version 9.0 vm (last update 2022/5/18)
-
-2022/2/12 update:
-
-Local values now are __stored on stack__.
-So function calling will be faster than before.
-Because in v8.0 when calling a function,
-new `vm_vec` will be allocated by `nasal_gc`, this makes gc doing mark-sweep too many times and spends a quite lot of time.
-In test file `test/bf.nas`, it takes too much time to test the file because this file has too many function calls(see test data below in table `version 8.0 (R9-5900HX ubuntu-WSL 2022/1/23)`).
-
-Upvalue now is generated when creating first new function in the local scope, using `vm_vec`.
-And after that when creating new functions, they share the same upvalue, and the upvalue will synchronize with the local scope each time creating a new function.
-
-2022/3/27 update:
-
-In this month's updates we change upvalue from `vm_vec` to `vm_upval`,
-a special gc-managed object,
-which has almost the same structure of that upvalue object in another programming language __`Lua`__.
-
-Today we change the output format of bytecode.
-New output format looks like `objdump`:
-
-```x86asm
-  0x0000029b:       0a 00 00 00 00        newh
-
-func <0x29c>:
-  0x0000029c:       0b 00 00 02 a0        newf    0x2a0
-  0x0000029d:       02 00 00 00 02        intl    0x2
-  0x0000029e:       0d 00 00 00 66        para    0x66 ("libname")
-  0x0000029f:       32 00 00 02 a2        jmp     0x2a2
-  0x000002a0:       40 00 00 00 42        callb   0x42 <__dlopen@0x41dc40>
-  0x000002a1:       4a 00 00 00 00        ret
-<0x29c>;
-
-  0x000002a2:       0c 00 00 00 67        happ    0x67 ("dlopen")
-
-func <0x2a3>:
-  0x000002a3:       0b 00 00 02 a8        newf    0x2a8
-  0x000002a4:       02 00 00 00 03        intl    0x3
-  0x000002a5:       0d 00 00 00 68        para    0x68 ("lib")
-  0x000002a6:       0d 00 00 00 69        para    0x69 ("sym")
-  0x000002a7:       32 00 00 02 aa        jmp     0x2aa
-  0x000002a8:       40 00 00 00 43        callb   0x43 <__dlsym@0x41df00>
-  0x000002a9:       4a 00 00 00 00        ret
-<0x2a3>;
-
-  0x000002aa:       0c 00 00 00 6a        happ    0x6a ("dlsym")
-```
-
-### version 10.0 vm (latest)
-
-2022/5/19 update:
-
-Now we add coroutine in this runtime:
-
-```javascript
-var coroutine={
-    create: func(function){return __cocreate;},
-    resume: func(co)      {return __coresume;},
-    yield:  func(args...) {return __coyield; },
-    status: func(co)      {return __costatus;},
-    running:func()        {return __corun;   }
-};
-```
-
-`coroutine.create` is used to create a new coroutine object using a function.
-But this coroutine will not run immediately.
-
-`coroutine.resume` is used to continue running a coroutine.
-
-`coroutine.yield` is used to interrupt the running of a coroutine and throw some values.
-These values will be accepted and returned by `coroutine.resume`.
-And `coroutine.yield` it self returns `vm_nil` in the coroutine function.
-
-`coroutine.status` is used to see the status of a coroutine.
-There are 3 types of status:`suspended` means waiting for running,`running` means is running,`dead` means finished running.
-
-`coroutine.running` is used to judge if there is a coroutine running now.
-
-__CAUTION:__ coroutine should not be created or running inside another coroutine.
-
-__We will explain how resume and yield work here:__
-
-When `op_callb` is called, the stack frame is like this:
-
-```C++
-+--------------------------+(main stack)
-| old pc(vm_ret)           | <- top[0]
-+--------------------------+
-| old localr(vm_addr)      | <- top[-1]
-+--------------------------+
-| old upvalr(vm_upval)     | <- top[-2]
-+--------------------------+
-| local scope(nas_ref)     |
-| ...                      |
-+--------------------------+ <- local pointer stored in localr
-| old funcr(vm_func)       | <- old function stored in funcr
-+--------------------------+
-```
-
-In `op_callb`'s progress, next step the stack frame is:
-
-```C++
-+--------------------------+(main stack)
-| nil(vm_nil)              | <- push nil
-+--------------------------+
-| old pc(vm_ret)           |
-+--------------------------+
-| old localr(vm_addr)      |
-+--------------------------+
-| old upvalr(vm_upval)     |
-+--------------------------+
-| local scope(nas_ref)     |
-| ...                      |
-+--------------------------+ <- local pointer stored in localr
-| old funcr(vm_func)       | <- old function stored in funcr
-+--------------------------+
-```
-
-Then we call `resume`, this function will change stack.
-As we can see, coroutine stack already has some values on it,
-but if we first enter it, the stack top will be `vm_ret`, and the return `pc` is `0`.
-
-So for safe running, `resume` will return `gc.top[0]`.
-`op_callb` will do `top[0]=resume()`, so the value does not change.
-
-```C++
-+--------------------------+(coroutine stack)
-| pc:0(vm_ret)             | <- now gc.top[0]
-+--------------------------+
-```
-
-When we call `yield`, the function will do like this.
-And we find that `op_callb` has put the `nil` at the top.
-but where is the returned `local[1]` sent?
-
-```C++
-+--------------------------+(coroutine stack)
-| nil(vm_nil)              | <- push nil
-+--------------------------+
-| old pc(vm_ret)           |
-+--------------------------+
-| old localr(vm_addr)      |
-+--------------------------+
-| old upvalr(vm_upval)     |
-+--------------------------+
-| local scope(nas_ref)     |
-| ...                      |
-+--------------------------+ <- local pointer stored in localr
-| old funcr(vm_func)       | <- old function stored in funcr
-+--------------------------+
-```
-
-When `builtin_coyield` is finished, the stack is set to main stack,
-and the returned `local[1]` in fact is set to the top of the main stack by `op_callb`:
-
-```C++
-+--------------------------+(main stack)
-| return_value(nas_ref)    |
-+--------------------------+
-| old pc(vm_ret)           |
-+--------------------------+
-| old localr(vm_addr)      |
-+--------------------------+
-| old upvalr(vm_upval)     |
-+--------------------------+
-| local scope(nas_ref)     |
-| ...                      |
-+--------------------------+ <- local pointer stored in localr
-| old funcr(vm_func)       | <- old function stored in funcr
-+--------------------------+
-```
-
-so the main progress feels the value on the top is the returned value of `resume`.
-but in fact the `resume`'s returned value is set on coroutine stack.
-so we conclude this:
-
-```C++
-resume (main->coroutine) return coroutine.top[0]. coroutine.top[0] = coroutine.top[0];
-yield  (coroutine->main) return a vector.         main.top[0]      = vector;
-```
-
-## Benchmark
-
-![benchmark](./pic/benchmark.png)
-
-### version 6.5 (i5-8250U windows10 2021/6/19)
-
-running time and gc time:
-
-|file|call gc|total time|gc time|
-|:----|:----|:----|:----|
-|pi.nas|12000049|0.593s|0.222s|
-|fib.nas|10573747|2.838s|0.187s|
-|bp.nas|4419829|1.99s|0.18s|
-|bigloop.nas|4000000|0.419s|0.039s|
-|mandelbrot.nas|1044630|0.433s|0.041s|
-|life.nas|817112|8.557s|0.199s|
-|ascii-art.nas|45612|0.48s|0.027s|
-|calc.nas|8089|0.068s|0.006s|
-|quick_sort.nas|2768|0.107s|0s|
-|bfs.nas|2471|1.763s|0.003s|
-
-operands calling frequency:
-
-|file|1st|2nd|3rd|4th|5th|
-|:----|:----|:----|:----|:----|:----|
-|pi.nas|callg|pop|mcallg|pnum|pone|
-|fib.nas|calll|pnum|callg|less|jf|
-|bp.nas|calll|callg|pop|callv|addeq|
-|bigloop.nas|pnum|less|jf|callg|pone|
-|mandelbrot.nas|callg|mult|loadg|pnum|pop|
-|life.nas|calll|callv|pnum|jf|callg|
-|ascii-art.nas|calll|pop|mcalll|callg|callb|
-|calc.nas|calll|pop|pstr|mcalll|jmp|
-|quick_sort.nas|calll|pop|jt|jf|less|
-|bfs.nas|calll|pop|callv|mcalll|jf|
-
-operands calling total times:
-
-|file|1st|2nd|3rd|4th|5th|
-|:----|:----|:----|:----|:----|:----|
-|pi.nas|6000004|6000003|6000000|4000005|4000002|
-|fib.nas|17622792|10573704|7049218|7049155|7049155|
-|bp.nas|7081480|4227268|2764676|2617112|2065441|
-|bigloop.nas|4000001|4000001|4000001|4000001|4000000|
-|mandelbrot.nas|1519632|563856|290641|286795|284844|
-|life.nas|2114371|974244|536413|534794|489743|
-|ascii-art.nas|37906|22736|22402|18315|18292|
-|calc.nas|191|124|109|99|87|
-|quick_sort.nas|16226|5561|4144|3524|2833|
-|bfs.nas|24707|16297|14606|14269|8672|
-
-### version 7.0 (i5-8250U ubuntu-WSL on windows10 2021/6/29)
-
-running time:
-
-|file|total time|info|
-|:----|:----|:----|
-|pi.nas|0.15625s|great improvement|
-|fib.nas|0.75s|great improvement|
-|bp.nas|0.4218s(7162 epoch)|good improvement|
-|bigloop.nas|0.09375s|great improvement|
-|mandelbrot.nas|0.0312s|great improvement|
-|life.nas|8.80s(windows) 1.25(ubuntu WSL)|little improvement|
-|ascii-art.nas|0.015s|little improvement|
-|calc.nas|0.0468s|little improvement|
-|quick_sort.nas|0s|great improvement|
-|bfs.nas|0.0156s|great improvement|
-
-### version 8.0 (R9-5900HX ubuntu-WSL 2022/1/23)
-
-running time:
-
-|file|total time|info|
-|:----|:----|:----|
-|bf.nas|1100.19s||
-|mandel.nas|28.98s||
-|life.nas|0.56s|0.857s(windows)|
-|ycombinator.nas|0.64s||
-|fib.nas|0.28s||
-|bfs.nas|0.156s|random result|
-|pi.nas|0.0625s||
-|bigloop.nas|0.047s||
-|calc.nas|0.03125s|changed test file|
-|mandelbrot.nas|0.0156s||
-|ascii-art.nas|0s||
-|quick_sort.nas|0s||
-
-### version 9.0 (R9-5900HX ubuntu-WSL 2022/2/13)
-
-running time:
-
-|file|total time|info|
-|:----|:----|:----|
-|bf.nas|276.55s|great improvement|
-|mandel.nas|28.16s||
-|ycombinator.nas|0.59s||
-|life.nas|0.2s|0.649s(windows)|
-|fib.nas|0.234s|little improvement|
-|bfs.nas|0.14s|random result|
-|pi.nas|0.0625s||
-|bigloop.nas|0.047s||
-|calc.nas|0.0469s|changed test file|
-|quick_sort.nas|0.016s|changed test file:100->1e4|
-|mandelbrot.nas|0.0156s||
-|ascii-art.nas|0s||
-
-`bf.nas` is a very interesting test file that there is a brainfuck interpreter written in nasal.
-And we use this bf interpreter to draw a mandelbrot set.
-
-In 2022/2/17 update we added `\e` into the lexer. And the `bfcolored.nas` uses this special ASCII code. Here is the result:
-
-![mandelbrot](./pic/mandelbrot.png)
-
 ## __Difference Between Andy's and This Interpreter__
 
 ### 1. must use `var` to define variables
@@ -1572,45 +815,7 @@ foreach(i;[0,1,2,3])
     print(i)
 ```
 
-### 2. (now supported) couldn't use variables before definitions
-
-(__Outdated__: this is now supported) Also there's another difference.
-In Andy's interpreter:
-
-```javascript
-var a=func {print(b);}
-var b=1;
-a();
-```
-
-This program runs normally with output 1.
-But in this new interpreter, it will get:
-
-```javascript
-[code] test.nas:1 undefined symbol "b".
-var a=func {print(b);}
-```
-
-This difference is caused by different kinds of ways of lexical analysis.
-In most script language interpreters,
-they use dynamic analysis to check if this symbol is defined yet.
-However, this kind of analysis is at the cost of lower efficiency.
-To make sure the interpreter runs at higher efficiency,
-i choose static analysis to manage the memory space of each symbol.
-By this way, runtime will never need to check if a symbol exists or not.
-But this causes a difference.
-You will get an error of 'undefined symbol',
-instead of nothing happening in most script language interpreters.
-
-This change is __controversial__ among FGPRC's members.
-So maybe in the future i will use dynamic analysis again to cater to the habits of senior programmers.
-
-(2021/8/3 update) __Now i use scanning ast twice to reload symbols.
-So this difference does not exist from this update.__
-But a new difference is that if you call a variable before defining it,
-you'll get nil instead of 'undefined error'.
-
-### 3. default dynamic arguments not supported
+## 2. default dynamic arguments not supported
 
 In this new interpreter,
 function doesn't put dynamic arguments into vector `arg` automatically.
diff --git a/doc/README_zh.md b/doc/README_zh.md
index 4e65797..407f263 100644
--- a/doc/README_zh.md
+++ b/doc/README_zh.md
@@ -18,7 +18,7 @@
 ## __目录__
 
 * [__简介__](#简介)
-* [__编译__](#编译方式)
+* [__编译__](#编译)
 * [__使用方法__](#使用方法)
 * [__教程__](#教程)
   * [基本类型](#基本类型)
@@ -36,31 +36,11 @@
   * [模块](#模块开发者教程)
 * [__发行日志__](#发行日志)
   * [v8.0](#version-80-release)
-* [__语法分析__](#语法分析)
-  * [v1.0](#version-10-parser-last-update-20191014)
-* [__抽象语法树__](#抽象语法树)
-  * [v1.2](#version-12-ast-last-update-20191031)
-  * [v2.0](#version-20-ast-last-update-2020831)
-  * [v3.0](#version-30-ast-last-update-20201023)
-  * [v5.0](#version-50-ast-last-update-202137)
-* [__字节码虚拟机__](#字节码虚拟机)
-  * [v4.0](#version-40-vm-last-update-20201217)
-  * [v5.0](#version-50-vm-last-update-202137)
-  * [v6.0](#version-60-vm-last-update-202161)
-  * [v6.5](#version-65-vm-last-update-2021624)
-  * [v7.0](#version-70-vm-last-update-2021108)
-  * [v8.0](#version-80-vm-last-update-2022212)
-  * [v9.0](#version-90-vm-last-update-2022518)
-  * [v10.0](#version-100-vm-latest)
-* [__测试数据__](#测试数据)
-  * [v6.5 (i5-8250U windows 10)](#version-65-i5-8250u-windows10-2021619)
-  * [v6.5 (i5-8250U ubuntu-WSL)](#version-70-i5-8250u-ubuntu-wsl-on-windows10-2021629)
-  * [v8.0 (R9-5900HX ubuntu-WSL)](#version-80-r9-5900hx-ubuntu-wsl-2022123)
-  * [v9.0 (R9-5900HX ubuntu-WSL)](#version-90-r9-5900hx-ubuntu-wsl-2022213)
+* [__开发历史__](../doc/dev_zh.md)
+* [__测试数据__](../doc/benchmark.md)
 * [__特殊之处__](#与andy解释器的不同之处)
   * [严格的定义要求](#1-必须用var定义变量)
-  * [(已过时)在定义后调用变量](#2-现在已经支持-不能在定义前使用变量)
-  * [默认不定长参数](#3-默认不定长参数)
+  * [默认不定长参数](#2-默认不定长参数)
 * [__堆栈追踪信息__](#trace-back-info)
   * [内置函数`die`](#1-内置函数die)
   * [栈溢出](#2-栈溢出信息)
@@ -77,34 +57,34 @@ __如果有好的意见或建议，欢迎联系我们!__
 ## __简介__
 
 __[Nasal](http://wiki.flightgear.org/Nasal_scripting_language)__
-是一个与ECMAscript标准语法设计相似的编程语言，并且作为运行脚本语言被著名的开源飞行模拟器 __[FlightGear](https://www.flightgear.org/)__ 所依赖。
-该语言的设计者和初版解释器实现者为 __[Andy Ross](https://github.com/andyross)__。
+是一个与ECMAscript标准语法设计相似的编程语言，并且作为运行脚本语言被著名的开源飞行模拟器 [FlightGear](https://www.flightgear.org/) 所依赖。
+该语言的设计者和初版解释器实现者为 [Andy Ross](https://github.com/andyross)。
 
-这个解释器项目则由 __[ValKmjolnir](https://github.com/ValKmjolnir)__ 完全使用 `C++`(`-std=c++11`)重新实现，没有复用 __[Andy Ross的nasal解释器](<https://github.com/andyross/nasal>)__ 中的任何一行代码。尽管没有任何的参考代码，我们依然非常感谢Andy为我们带来了这样一个神奇且容易上手的编程语言。
+这个解释器项目则由 [ValKmjolnir](https://github.com/ValKmjolnir) 完全使用 `C++`(`-std=c++11`)重新实现，没有复用 [Andy Ross的nasal解释器](<https://github.com/andyross/nasal>) 中的任何一行代码。尽管没有任何的参考代码，我们依然非常感谢Andy为我们带来了这样一个神奇且容易上手的编程语言。
 
 现在这个项目已经使用 __MIT 协议__ 开源 (2021/5/4)。根据该协议的内容，你们可以根据自己的需求进行修改，使用它来学习或者创造更多有趣的东西(不过可别忘了，如果要开源必须要附带本项目拥有者的相关信息)。
 
 __我们为什么想要重新写一个nasal解释器?__
-这是个很偶然的想法。2019年暑假，__[FGPRC](https://www.fgprc.org/)__ 的成员告诉我，在Flightgear中提供的nasal控制台窗口中进行调试实在是太费劲了，有时候只是想检查语法错误，也得花费时间打开这个软件等待加载进去之后进行调试。所以我就想，也许可以写一个全新的解释器来帮助他们检查语法错误，甚至是检查运行时的错误。
+这是个很偶然的想法。2019年暑假，[FGPRC](https://www.fgprc.org/) 的成员告诉我，在Flightgear中提供的nasal控制台窗口中进行调试实在是太费劲了，有时候只是想检查语法错误，也得花费时间打开这个软件等待加载进去之后进行调试。所以我就想，也许可以写一个全新的解释器来帮助他们检查语法错误，甚至是检查运行时的错误。
 
 我编写了nasal的词法分析器和语法分析器，以及一个全新的字节码虚拟机(曾经我们使用ast解释器来直接在抽象语法树中执行，然而在v4.0之后这个解释器已经淘汰)，并用这个运行时来进行nasal程序的调试。我们发现使用这个解释器来检测语法和运行时错误非常快捷，远比每次都需要复制nasal代码到Flightgear的nasal控制台中去查看要方便，且错误信息清晰直观。
 
 当然，你也可以使用这个语言来写一些与Flightgear运行环境无关的其他有趣的程序(它毕竟就是个脚本语言)，并用这个解释器来执行，让这个语言脱离Flightgear的环境，去别的地方大展身手。你也可以编写你自己的模块，让nasal来调用，使得这个语言成为你的项目中一个非常有用的工具。
 
-## __编译方式__
+## __编译__
 
 ![windows](https://img.shields.io/badge/Microsoft-Windows-green?style=flat-square&logo=windows)
 ![macOS](https://img.shields.io/badge/Apple%20Inc.-MacOS-green?style=flat-square&logo=apple)
 ![linux](https://img.shields.io/badge/GNU-Linux-green?style=flat-square&logo=GNU)
 
-我们推荐你下载最新更新的代码包来直接编译，这个项目非常小巧因此你可以非常快速地将它编译出来。
-
-__注意__: 如果你想直接下载发行版提供的zip/tar.gz压缩包来构建这个解释器，在下载之后请阅读下文中对应发行版本的[__发行日志__](#发行日志)以保证这个发行版的文件中不包含非常严重的bug(有的严重bug都是在发行之后才发现，非常搞心态)。在发行版日志中我们会告知如何在代码中手动修复这个严重的bug。
-
 ![g++](https://img.shields.io/badge/GNU-g++-A42E2B?style=flat-square&logo=GNU)
 ![clang++](https://img.shields.io/badge/LLVM-clang++-262D3A?style=flat-square&logo=LLVM)
 ![vs](https://img.shields.io/badge/Visual_Studio-MSVC-5C2D91?style=flat-square&logo=visualstudio)
 
+我们推荐你下载最新更新的代码包来直接编译，这个项目非常小巧因此你可以非常快速地将它编译出来。
+
+__注意__: 如果你想直接下载发行版提供的zip/tar.gz压缩包来构建这个解释器，在下载之后请阅读下文中对应发行版本的[__发行日志__](#发行日志)以保证这个发行版的文件中不包含非常严重的bug(有的严重bug都是在发行之后才发现，非常搞心态)。在发行版日志中我们会告知如何在代码中手动修复这个严重的bug。
+
 __`Windows`__ 用户通过g++(`MinGW-w64`)使用以下命令或者使用MSVC(`Visual Studio`)来进行编译. 没有编译环境的请在[__这里__](https://www.mingw-w64.org/downloads/)下载MinGW-w64。(VS同样也有MinGW-w64)
 __`linux/macOS/Unix`__ 用户可以使用g++或者clang++替代下面命令中中括号的部分来进行编译(我们建议您使用`clang`)。
 
@@ -759,675 +739,6 @@ in __`nasal_dbg.h:215`__: `auto canary=gc.stack+STACK_MAX_DEPTH-1;`
 
 同样这个也在`v9.0`中修复了。所以我们建议不要使用`v8.0`。
 
-## __语法分析__
-
-有特殊语法检查的`LL(1)`语法分析器。
-
-```javascript
-(var a,b,c)=[{b:nil},[1,2],func return 0;];
-(a.b,b[0],c)=(1,2,3);
-```
-
-这两个表达式有同一个first集，所以纯粹的`LL(1)`很难实现这个语言的语法分析。所以我们为其添加了特殊语法检查机制。本质上还是`LL(1)`的内核。
-
-上面这个问题已经解决很久了，不过我最近发现了一个新的语法问题:
-
-```javascript
-var f=func(x,y,z){return x+y+z}
-(a,b,c)=(0,1,2);
-```
-
-这种写法会被错误识别合并成下面这种:
-
-```javascript
-var f=func(x,y,z){return x+y+z}(a,b,c)
-=(0,1,2);
-```
-
-语法分析器会认为这是个严重的语法错误。我在Flightgear中也测试了这个代码，它内置的语法分析器也认为这是错误语法。当然我认为这是语法设计中的一个比较严重的缺漏。为了避免这个语法问题，只需要添加一个分号就可以了:
-
-```javascript
-var f=func(x,y,z){return x+y+z};
-                               ^ 就是这里
-(a,b,c)=(0,1,2);
-```
-
-### version 1.0 parser (last update 2019/10/14)
-
-第一版功能完备的nasal语法分析器完成了。
-
-在version 1.0之前，我多次尝试构建一个正确的语法分析器但是总存在一些问题。
-
-最终我学习了`LL(1)`和`LL(k)`文法并且在version 0.16(last update 2019/9/14)中完成了一个能识别数学算式的语法分析器。
-
-在version 0.17(2019/9/15) 0.18(2019/9/18) 0.19(2019/10/1)中我只是抱着玩的心态在测试语法分析器，不过在那之后我还是完成了version 1.0的语法分析器。
-
-__该项目于2019/7/25正式开始__。
-
-## __抽象语法树__
-
-### version 1.2 ast (last update 2019/10/31)
-
-抽象语法树在这个版本初步完成。
-
-### version 2.0 ast (last update 2020/8/31)
-
-在这个版本我们基于抽象语法树实现了一个树解释器，并且完成了部分内置函数。
-
-### version 3.0 ast (last update 2020/10/23)
-
-我们重构了抽象语法树的代码，现在可以更容易地读懂代码并进行维护。
-
-这个版本的树解释器用了新的优化方式，所以可以更高效地执行代码。
-
-在这个版本用户已经可以自行添加内置函数。
-
-我想在v4.0发布之后仍然保留这个树解释器，毕竟花了很长时间才写完这坨屎。
-
-### version 5.0 ast (last update 2021/3/7)
-
-我改变想法了，树解释器给维护带来了太大的麻烦。如果想继续保留这个解释器，那么为了兼容性，字节码虚拟机的优化工作会更难推进。
-
-## __字节码虚拟机__
-
-### version 4.0 vm (last update 2020/12/17)
-
-我在这个版本实现了第一版字节码虚拟机。不过这个虚拟机仍然在测试中，在这次测试结束之后，我会发布v4.0发行版。
-
-现在我在找一些隐藏很深的bug。如果有人想帮忙的话，非常欢迎！:)
-
-下面是生成的字节码的样例:
-
-```javascript
-for(var i=0;i<4000000;i+=1);
-```
-
-```x86asm
-.number 0
-.number 4e+006
-.number 1
-.symbol i
-0x00000000: pzero  0x00000000
-0x00000001: loadg  0x00000000 (i)
-0x00000002: callg  0x00000000 (i)
-0x00000003: pnum   0x00000001 (4e+006)
-0x00000004: less   0x00000000
-0x00000005: jf     0x0000000b
-0x00000006: pone   0x00000000
-0x00000007: mcallg 0x00000000 (i)
-0x00000008: addeq  0x00000000
-0x00000009: pop    0x00000000
-0x0000000a: jmp    0x00000002
-0x0000000b: nop    0x00000000
-```
-
-### version 5.0 vm (last update 2021/3/7)
-
-从这个版本起，我决定持续优化字节码虚拟机。
-
-毕竟现在这玩意从`0`数到`4000000-1`要花费1.5秒。这效率完全不能忍。
-
-2021/1/23 update: 现在它确实可以在1.5秒内从`0`数到`4000000-1`了。
-
-### version 6.0 vm (last update 2021/6/1)
-
-使用`loadg`/`loadl`/`callg`/`calll`/`mcallg`/`mcalll`指令来减少分支语句的调用。
-
-删除了`vm_scop`类型。
-
-添加作为常量的`vm_num`来减少内存分配的开销。
-
-将垃圾收集器从引用计数改为了标记清理。
-
-`vapp`和`newf`开始使用先前未被使用的.num段来压缩字节码生成数量，减少生成的`exec_code`的大小。
-
-2021/4/3 update: 从`0`数到`4e6-1`只需要不到0.8秒了。
-
-2021/4/19 update: 从`0`数到`4e6-1`只需要不到0.4秒了。
-
-在这次的更新中，我把全局变量和局部变量的存储结构从`unordered_map`变为了`vector`，从而提升执行效率。所以现在生成的字节码大变样了。
-
-```javascript
-for(var i=0;i<4000000;i+=1);
-```
-
-```x86asm
-.number 4e+006
-0x00000000: intg   0x00000001
-0x00000001: pzero  0x00000000
-0x00000002: loadg  0x00000000
-0x00000003: callg  0x00000000
-0x00000004: pnum   0x00000000 (4e+006)
-0x00000005: less   0x00000000
-0x00000006: jf     0x0000000c
-0x00000007: pone   0x00000000
-0x00000008: mcallg 0x00000000
-0x00000009: addeq  0x00000000
-0x0000000a: pop    0x00000000
-0x0000000b: jmp    0x00000003
-0x0000000c: nop    0x00000000
-```
-
-### version 6.5 vm (last update 2021/6/24)
-
-2021/5/31 update:
-
-现在垃圾收集器不会错误地重复收集未使用变量了。
-
-添加了`builtin_alloc`以防止在运行内置函数的时候错误触发标记清除。
-
-建议在获取大空间数组的时候尽量使用setsize，因为`append`在被频繁调用时可能会频繁触发垃圾收集器。
-
-2021/6/3 update:
-
-修复了垃圾收集器还是他妈的会重复收集的bug，这次我设计了三个标记状态来保证垃圾是被正确收集了。
-
-将`callf`指令拆分为`callfv`和`callfh`。并且`callfv`将直接从`val_stack`获取传参，而不是先通过一个`vm_vec`把参数收集起来再传入，后者是非常低效的做法。
-
-建议更多使用`callfv`而不是`callfh`，因为`callfh`只能从栈上获取参数并整合为`vm_hash`之后才能传给该指令进行处理，拖慢执行速度。
-
-```javascript
-var f=func(x,y){return x+y;}
-f(1024,2048);
-```
-
-```x86asm
-.number 1024
-.number 2048
-.symbol x   
-.symbol y
-0x00000000: intg   0x00000001
-0x00000001: newf   0x00000007
-0x00000002: intl   0x00000003
-0x00000003: offset 0x00000001
-0x00000004: para   0x00000000 (x)
-0x00000005: para   0x00000001 (y)
-0x00000006: jmp    0x0000000b
-0x00000007: calll  0x00000001
-0x00000008: calll  0x00000002
-0x00000009: add    0x00000000
-0x0000000a: ret    0x00000000
-0x0000000b: loadg  0x00000000
-0x0000000c: callg  0x00000000
-0x0000000d: pnum   0x00000000 (1024)
-0x0000000e: pnum   0x00000001 (2048)
-0x0000000f: callfv 0x00000002
-0x00000010: pop    0x00000000
-0x00000011: nop    0x00000000
-```
-
-2021/6/21 update:
-
-现在垃圾收集器不会收集空指针了。并且调用链中含有函数调用的赋值语句现在也可以执行了，下面这些赋值方式是合法的:
-
-```javascript
-var f=func()
-{
-    var _=[{_:0},{_:1}];
-    return func(x)
-    {
-        return _[x];
-    }
-}
-var m=f();
-m(0)._=m(1)._=10;
-
-[0,1,2][1:2][0]=0;
-```
-
-在老版本中，语法分析器会检查左值，并且在检测到有特别调用的情况下直接告知用户这种左值是不被接受的(bad lvalue)。但是现在它可以正常运作了。为了保证这种赋值语句能正常执行，codegen模块会优先使用`nasal_codegen::call_gen()`生成前面调用链的字节码而不是全部使用 `nasal_codegen::mcall_gen()`，在最后一个调用处才会使用`nasal_codegen::mcall_gen()`。
-
-所以现在生成的相关字节码也完全不同了:
-
-```x86asm
-.number 10
-.number 2
-.symbol _
-.symbol x
-0x00000000: intg   0x00000002
-0x00000001: newf   0x00000005
-0x00000002: intl   0x00000002
-0x00000003: offset 0x00000001
-0x00000004: jmp    0x00000017
-0x00000005: newh   0x00000000
-0x00000006: pzero  0x00000000
-0x00000007: happ   0x00000000 (_)
-0x00000008: newh   0x00000000
-0x00000009: pone   0x00000000
-0x0000000a: happ   0x00000000 (_)
-0x0000000b: newv   0x00000002
-0x0000000c: loadl  0x00000001
-0x0000000d: newf   0x00000012
-0x0000000e: intl   0x00000003
-0x0000000f: offset 0x00000002
-0x00000010: para   0x00000001 (x)
-0x00000011: jmp    0x00000016
-0x00000012: calll  0x00000001
-0x00000013: calll  0x00000002
-0x00000014: callv  0x00000000
-0x00000015: ret    0x00000000
-0x00000016: ret    0x00000000
-0x00000017: loadg  0x00000000
-0x00000018: callg  0x00000000
-0x00000019: callfv 0x00000000
-0x0000001a: loadg  0x00000001
-0x0000001b: pnum   0x00000000 (10.000000)
-0x0000001c: callg  0x00000001
-0x0000001d: pone   0x00000000
-0x0000001e: callfv 0x00000001
-0x0000001f: mcallh 0x00000000 (_)
-0x00000020: meq    0x00000000
-0x00000021: callg  0x00000001
-0x00000022: pzero  0x00000000
-0x00000023: callfv 0x00000001
-0x00000024: mcallh 0x00000000 (_)
-0x00000025: meq    0x00000000
-0x00000026: pop    0x00000000
-0x00000027: pzero  0x00000000
-0x00000028: pzero  0x00000000
-0x00000029: pone   0x00000000
-0x0000002a: pnum   0x00000001 (2.000000)
-0x0000002b: newv   0x00000003
-0x0000002c: slcbeg 0x00000000
-0x0000002d: pone   0x00000000
-0x0000002e: pnum   0x00000001 (2.000000)
-0x0000002f: slc2   0x00000000
-0x00000030: slcend 0x00000000
-0x00000031: pzero  0x00000000
-0x00000032: mcallv 0x00000000
-0x00000033: meq    0x00000000
-0x00000034: pop    0x00000000
-0x00000035: nop    0x00000000
-```
-
-从上面这些字节码可以看出，`mcall`/`mcallv`/`mcallh`指令的使用频率比以前减小了一些，而`call`/`callv`/`callh`/`callfv`/`callfh`则相反。
-
-并且因为新的数据结构，`mcall`指令以及`addr_stack`，一个曾用来存储指针的栈，从`nasal_vm`中被移除。现在`nasal_vm`使用`nasal_val** mem_addr`来暂存获取的内存地址。这不会导致严重的问题，因为内存空间是 __获取即使用__ 的。
-
-### version 7.0 vm (last update 2021/10/8)
-
-2021/6/26 update:
-
-指令分派方式从call-threading改为了computed-goto。在更改了指令分派方式之后，nasal_vm的执行效率有了非常巨大的提升。现在虚拟机可以在0.2秒内执行完test/bigloop和test/pi！并且在linux平台虚拟机可以在0.8秒内执行完test/fib。你可以在下面的测试数据部分看到测试的结果。
-
-这个分派方式使用了g++扩展"labels as values"，clang++目前也支持这种指令分派的实现方式。(不过MSVC支不支持就不得而知了，哈哈)
-
-nasal_gc中也有部分改动:
-全局变量不再用`std::vector`存储，而是全部存在操作数栈上(从`val_stack+0`到`val_stack+intg-1`)。
-
-2021/6/29 update:
-
-添加了一些直接用常量进行运算的指令:
-`op_addc`,`op_subc`,`op_mulc`,`op_divc`,`op_lnkc`,`op_addeqc`,`op_subeqc`,`op_muleqc`,`op_diveqc`,`op_lnkeqc`。
-
-现在test/bigloop.nas的字节码是这样的:
-
-```x86asm
-.number 4e+006
-.number 1
-0x00000000: intg   0x00000001
-0x00000001: pzero  0x00000000
-0x00000002: loadg  0x00000000
-0x00000003: callg  0x00000000
-0x00000004: pnum   0x00000000 (4000000)
-0x00000005: less   0x00000000
-0x00000006: jf     0x0000000b
-0x00000007: mcallg 0x00000000
-0x00000008: addeqc 0x00000001 (1)
-0x00000009: pop    0x00000000
-0x0000000a: jmp    0x00000003
-0x0000000b: nop    0x00000000
-```
-
-在这次更新之后，这个测试文件可以在0.1秒内运行结束。大多数的运算操作速度都有提升。
-
-并且赋值相关的字节码也有一些改动。现在赋值语句只包含一个标识符时，会优先调用`op_load`来赋值，而不是使用`op_meq`和`op_pop`。
-
-```javascript
-var (a,b)=(1,2);
-a=b=0;
-```
-
-```x86asm
-.number 2
-0x00000000: intg   0x00000002
-0x00000001: pone   0x00000000
-0x00000002: loadg  0x00000000
-0x00000003: pnum   0x00000000 (2)
-0x00000004: loadg  0x00000001
-0x00000005: pzero  0x00000000
-0x00000006: mcallg 0x00000001
-0x00000007: meq    0x00000000 (b=2 use meq,pop->a)
-0x00000008: loadg  0x00000000 (a=b use loadg)
-0x00000009: nop    0x00000000
-```
-
-### version 8.0 vm (last update 2022/2/12)
-
-2021/10/8 update:
-
-从这个版本开始`vm_nil`和`vm_num`不再由`nasal_gc`管理，这会大幅度降低`gc::alloc`的调用并且会大幅度提升执行效率。
-
-添加了新的数据类型: `vm_obj`。这个类型是留给用户定义他们想要的数据类型的。相关的API会在未来加入。
-
-功能完备的闭包：添加了读写闭包数据的指令。删除了老的指令`op_offset`。
-
-2021/10/13 update:
-
-字节码信息输出格式修改为如下形式:
-
-```x86asm
-0x000002f2: newf   0x2f6
-0x000002f3: intl   0x2
-0x000002f4: para   0x3e ("x")
-0x000002f5: jmp    0x309
-0x000002f6: calll  0x1
-0x000002f7: lessc  0x0 (2)
-0x000002f8: jf     0x2fb
-0x000002f9: calll  0x1
-0x000002fa: ret
-0x000002fb: upval  0x0[0x1]
-0x000002fc: upval  0x0[0x1]
-0x000002fd: callfv 0x1
-0x000002fe: calll  0x1
-0x000002ff: subc   0x1d (1)
-0x00000300: callfv 0x1
-0x00000301: upval  0x0[0x1]
-0x00000302: upval  0x0[0x1]
-0x00000303: callfv 0x1
-0x00000304: calll  0x1
-0x00000305: subc   0x0 (2)
-0x00000306: callfv 0x1
-0x00000307: add
-0x00000308: ret
-0x00000309: ret
-0x0000030a: callfv 0x1
-0x0000030b: loadg  0x32
-```
-
-2022/1/22 update:
-
-删除`op_pone`和`op_pzero`。这两个指令在目前已经没有实际意义，并且已经被`op_pnum`替代。
-
-### version 9.0 vm (last update 2022/5/18)
-
-2022/2/12 update:
-
-局部变量现在也被 __存储在栈上__。
-所以函数调用比以前也会快速很多。
-在v8.0如果你想调用一个函数，
-新的`vm_vec`将被分配出来用于模拟局部作用域，这个操作会导致标记清除过程会被频繁触发并且浪费太多的执行时间。
-在测试文件`test/bf.nas`中，这种调用方式使得大部分时间都被浪费了，因为这个测试文件包含大量且频繁的函数调用(详细数据请看测试数据一节中`version 8.0 (R9-5900HX ubuntu-WSL 2022/1/23)`)。
-
-现在闭包会在第一次在局部作用域创建新函数的时候产生，使用`vm_vec`。
-在那之后如果再创建新的函数，则他们会共享同一个闭包，这些闭包会在每次于局部作用域创建新函数时同步。
-
-2022/3/27 update:
-
-在这个月的更新中我们把闭包的数据结构从`vm_vec`换成了一个新的对象`vm_upval`，这种类型有着和另外一款编程语言 __`Lua`__ 中闭包相类似的结构。
-
-同时我们也修改了字节码的输出格式。新的格式看起来像是 `objdump`:
-
-```x86asm
-  0x0000029b:       0a 00 00 00 00        newh
-
-func <0x29c>:
-  0x0000029c:       0b 00 00 02 a0        newf    0x2a0
-  0x0000029d:       02 00 00 00 02        intl    0x2
-  0x0000029e:       0d 00 00 00 66        para    0x66 ("libname")
-  0x0000029f:       32 00 00 02 a2        jmp     0x2a2
-  0x000002a0:       40 00 00 00 42        callb   0x42 <__dlopen@0x41dc40>
-  0x000002a1:       4a 00 00 00 00        ret
-<0x29c>;
-
-  0x000002a2:       0c 00 00 00 67        happ    0x67 ("dlopen")
-
-func <0x2a3>:
-  0x000002a3:       0b 00 00 02 a8        newf    0x2a8
-  0x000002a4:       02 00 00 00 03        intl    0x3
-  0x000002a5:       0d 00 00 00 68        para    0x68 ("lib")
-  0x000002a6:       0d 00 00 00 69        para    0x69 ("sym")
-  0x000002a7:       32 00 00 02 aa        jmp     0x2aa
-  0x000002a8:       40 00 00 00 43        callb   0x43 <__dlsym@0x41df00>
-  0x000002a9:       4a 00 00 00 00        ret
-<0x2a3>;
-
-  0x000002aa:       0c 00 00 00 6a        happ    0x6a ("dlsym")
-```
-
-### version 10.0 vm (latest)
-
-2022/5/19 update:
-
-在这个版本中我们给nasal加入了协程:
-
-```javascript
-var coroutine={
-    create: func(function){return __cocreate;},
-    resume: func(co)      {return __coresume;},
-    yield:  func(args...) {return __coyield; },
-    status: func(co)      {return __costatus;},
-    running:func()        {return __corun;   }
-};
-```
-
-`coroutine.create`用于创建新的协程对象。不过创建之后协程并不会直接运行。
-
-`coroutine.resume`用于继续运行一个协程。
-
-`coroutine.yield`用于中断一个协程的运行过程并且抛出一些数据。这些数据会被`coroutine.resume`接收并返回。而在协程函数中`coroutine.yield`本身只返回`vm_nil`。
-
-`coroutine.status`用于查看协程的状态。协程有三种不同的状态：`suspended`挂起，`running`运行中，`dead`结束运行。
-
-`coroutine.running`用于判断当前是否有协程正在运行。
-
-__注意:__ 协程不能在其他正在运行的协程中创建。
-
-__接下来我们解释这个协程的运行原理:__
-
-当`op_callb`被执行时，栈帧如下所示:
-
-```C++
-+--------------------------+(主操作数栈)
-| old pc(vm_ret)           | <- top[0]
-+--------------------------+
-| old localr(vm_addr)      | <- top[-1]
-+--------------------------+
-| old upvalr(vm_upval)     | <- top[-2]
-+--------------------------+
-| local scope(nas_ref)     |
-| ...                      |
-+--------------------------+ <- local pointer stored in localr
-| old funcr(vm_func)       | <- old function stored in funcr
-+--------------------------+
-```
-
-在`op_callb`执行过程中，下一步的栈帧如下:
-
-```C++
-+--------------------------+(主操作数栈)
-| nil(vm_nil)              | <- push nil
-+--------------------------+
-| old pc(vm_ret)           |
-+--------------------------+
-| old localr(vm_addr)      |
-+--------------------------+
-| old upvalr(vm_upval)     |
-+--------------------------+
-| local scope(nas_ref)     |
-| ...                      |
-+--------------------------+ <- local pointer stored in localr
-| old funcr(vm_func)       | <- old function stored in funcr
-+--------------------------+
-```
-
-接着我们调用`resume`，这个函数会替换操作数栈。我们会看到，协程的操作数栈上已经保存了一些数据，但是我们首次进入协程执行时，这个操作数栈的栈顶将会是`vm_ret`，并且返回的`pc`值是`0`。
-
-为了保证栈顶的数据不会被破坏，`resume`会返回`gc.top[0]`。`op_callb`将会执行`top[0]=resume()`，所以栈顶的数据虽然被覆盖了一次，但是实际上还是原来的数据。
-
-```C++
-+--------------------------+(协程操作数栈)
-| pc:0(vm_ret)             | <- now gc.top[0]
-+--------------------------+
-```
-
-当我们调用`yield`的时候，该函数会执行出这个情况，我们发现`op_callb` 已经把`nil`放在的栈顶。但是应该返回的`local[1]`到底发送到哪里去了？
-
-```C++
-+--------------------------+(协程操作数栈)
-| nil(vm_nil)              | <- push nil
-+--------------------------+
-| old pc(vm_ret)           |
-+--------------------------+
-| old localr(vm_addr)      |
-+--------------------------+
-| old upvalr(vm_upval)     |
-+--------------------------+
-| local scope(nas_ref)     |
-| ...                      |
-+--------------------------+ <- local pointer stored in localr
-| old funcr(vm_func)       | <- old function stored in funcr
-+--------------------------+
-```
-
-当`builtin_coyield`执行完毕之后，栈又切换到了主操作数栈上，这时可以看到返回的`local[1]`实际上被`op_callb`放在了这里的栈顶:
-
-```C++
-+--------------------------+(主操作数栈)
-| return_value(nas_ref)    |
-+--------------------------+
-| old pc(vm_ret)           |
-+--------------------------+
-| old localr(vm_addr)      |
-+--------------------------+
-| old upvalr(vm_upval)     |
-+--------------------------+
-| local scope(nas_ref)     |
-| ...                      |
-+--------------------------+ <- local pointer stored in localr
-| old funcr(vm_func)       | <- old function stored in funcr
-+--------------------------+
-```
-
-所以主程序会认为顶部这个返回值好像是`resume`返回的。而实际上`resume`的返回值在协程的操作数栈顶。综上所述:
-
-```C++
-resume (main->coroutine) return coroutine.top[0]. coroutine.top[0] = coroutine.top[0];
-yield  (coroutine->main) return a vector.         main.top[0]      = vector;
-```
-
-## 测试数据
-
-![benchmark](../pic/benchmark.png)
-
-### version 6.5 (i5-8250U windows10 2021/6/19)
-
-执行时间以及垃圾收集器占用时间:
-
-|file|call gc|total time|gc time|
-|:----|:----|:----|:----|
-|pi.nas|12000049|0.593s|0.222s|
-|fib.nas|10573747|2.838s|0.187s|
-|bp.nas|4419829|1.99s|0.18s|
-|bigloop.nas|4000000|0.419s|0.039s|
-|mandelbrot.nas|1044630|0.433s|0.041s|
-|life.nas|817112|8.557s|0.199s|
-|ascii-art.nas|45612|0.48s|0.027s|
-|calc.nas|8089|0.068s|0.006s|
-|quick_sort.nas|2768|0.107s|0s|
-|bfs.nas|2471|1.763s|0.003s|
-
-指令调用频率:
-
-|file|1st|2nd|3rd|4th|5th|
-|:----|:----|:----|:----|:----|:----|
-|pi.nas|callg|pop|mcallg|pnum|pone|
-|fib.nas|calll|pnum|callg|less|jf|
-|bp.nas|calll|callg|pop|callv|addeq|
-|bigloop.nas|pnum|less|jf|callg|pone|
-|mandelbrot.nas|callg|mult|loadg|pnum|pop|
-|life.nas|calll|callv|pnum|jf|callg|
-|ascii-art.nas|calll|pop|mcalll|callg|callb|
-|calc.nas|calll|pop|pstr|mcalll|jmp|
-|quick_sort.nas|calll|pop|jt|jf|less|
-|bfs.nas|calll|pop|callv|mcalll|jf|
-
-指令总调用数:
-
-|file|1st|2nd|3rd|4th|5th|
-|:----|:----|:----|:----|:----|:----|
-|pi.nas|6000004|6000003|6000000|4000005|4000002|
-|fib.nas|17622792|10573704|7049218|7049155|7049155|
-|bp.nas|7081480|4227268|2764676|2617112|2065441|
-|bigloop.nas|4000001|4000001|4000001|4000001|4000000|
-|mandelbrot.nas|1519632|563856|290641|286795|284844|
-|life.nas|2114371|974244|536413|534794|489743|
-|ascii-art.nas|37906|22736|22402|18315|18292|
-|calc.nas|191|124|109|99|87|
-|quick_sort.nas|16226|5561|4144|3524|2833|
-|bfs.nas|24707|16297|14606|14269|8672|
-
-### version 7.0 (i5-8250U ubuntu-WSL on windows10 2021/6/29)
-
-执行时间:
-
-|file|total time|info|
-|:----|:----|:----|
-|pi.nas|0.15625s|great improvement|
-|fib.nas|0.75s|great improvement|
-|bp.nas|0.4218s(7162 epoch)|good improvement|
-|bigloop.nas|0.09375s|great improvement|
-|mandelbrot.nas|0.0312s|great improvement|
-|life.nas|8.80s(windows) 1.25(ubuntu WSL)|little improvement|
-|ascii-art.nas|0.015s|little improvement|
-|calc.nas|0.0468s|little improvement|
-|quick_sort.nas|0s|great improvement|
-|bfs.nas|0.0156s|great improvement|
-
-### version 8.0 (R9-5900HX ubuntu-WSL 2022/1/23)
-
-执行时间:
-
-|file|total time|info|
-|:----|:----|:----|
-|bf.nas|1100.19s||
-|mandel.nas|28.98s||
-|life.nas|0.56s|0.857s(windows)|
-|ycombinator.nas|0.64s||
-|fib.nas|0.28s||
-|bfs.nas|0.156s|random result|
-|pi.nas|0.0625s||
-|bigloop.nas|0.047s||
-|calc.nas|0.03125s|changed test file|
-|mandelbrot.nas|0.0156s||
-|ascii-art.nas|0s||
-|quick_sort.nas|0s||
-
-### version 9.0 (R9-5900HX ubuntu-WSL 2022/2/13)
-
-执行时间:
-
-|file|total time|info|
-|:----|:----|:----|
-|bf.nas|276.55s|great improvement|
-|mandel.nas|28.16s||
-|ycombinator.nas|0.59s||
-|life.nas|0.2s|0.649s(windows)|
-|fib.nas|0.234s|little improvement|
-|bfs.nas|0.14s|random result|
-|pi.nas|0.0625s||
-|bigloop.nas|0.047s||
-|calc.nas|0.0469s|changed test file|
-|quick_sort.nas|0.016s|changed test file:100->1e4|
-|mandelbrot.nas|0.0156s||
-|ascii-art.nas|0s||
-
-`bf.nas`是个非常有意思的测试文件，我们用nasal在这个文件里实现了一个brainfuck解释器，并且用这个解释器绘制了一个曼德勃罗集合。
-
-在2022/2/17更新中我们给词法分析器添加了对`\e`的识别逻辑。这样 `bfcolored.nas`可以使用特别的ASCII操作字符来绘制彩色的曼德勃罗集合:
-
-![mandelbrot](../pic/mandelbrot.png)
-
 ## __与andy解释器的不同之处__
 
 ### 1. 必须用`var`定义变量
@@ -1453,30 +764,7 @@ foreach(i;[0,1,2,3])
     print(i)
 ```
 
-### 2. (现在已经支持) 不能在定义前使用变量
-
-(__过时信息__: 现在已经支持)
-
-```javascript
-var a=func {print(b);}
-var b=1;
-a();
-```
-
-这个程序在andy的解释器中可以正常运行并输出内容。然而在这个新的解释器中，你会得到:
-
-```javascript
-[code] test.nas:1 undefined symbol "b".
-var a=func {print(b);}
-```
-
-这个差异主要是文法作用域分析带来的。在大多数的脚本语言解释器中，他们使用动态的分析方式来检测符号是不是已经定义过了。然而，这种分析方法的代价就是执行效率不会很高。为了保证这个解释器能以极高的速度运行，我使用的是静态的分析方式，用静态语言类似的管理方式来管理每个符号对应的内存空间。这样虚拟机就不需要在运行的时候频繁检查符号是否存在。但是这也带来了差异。在这里你只会得到`undefined symbol`，而不是大多数脚本语言解释器中那样可以正常执行。
-
-这个差异在FGPRC成员中有 __争议__。所以在未来我可能还是会用动态的分析方法来迎合大多数的用户。
-
-(2021/8/3 update) __现在我使用二次搜索抽象语法树的方式来检测符号是否会被定义，所以在这次更新之后，这个差异不复存在。__ 不过如果你直接获取一个还未被定义的变量的内容的话，你会得到一个空数据，而不是`undefined error`。
-
-### 3. 默认不定长参数
+### 2. 默认不定长参数
 
 这个解释器在运行时，函数不会将超出参数表的那部分不定长参数放到默认的`arg`中。所以你如果不定义`arg`就使用它，那你只会得到`undefined symbol`。
 
diff --git a/doc/benchmark.md b/doc/benchmark.md
new file mode 100644
index 0000000..59580e4
--- /dev/null
+++ b/doc/benchmark.md
@@ -0,0 +1,112 @@
+# __Benchmark__
+
+![benchmark](../pic/benchmark.png)
+
+## version 6.5 (i5-8250U windows10 2021/6/19)
+
+running time and gc time:
+
+|file|call gc|total time|gc time|
+|:----|:----|:----|:----|
+|pi.nas|12000049|0.593s|0.222s|
+|fib.nas|10573747|2.838s|0.187s|
+|bp.nas|4419829|1.99s|0.18s|
+|bigloop.nas|4000000|0.419s|0.039s|
+|mandelbrot.nas|1044630|0.433s|0.041s|
+|life.nas|817112|8.557s|0.199s|
+|ascii-art.nas|45612|0.48s|0.027s|
+|calc.nas|8089|0.068s|0.006s|
+|quick_sort.nas|2768|0.107s|0s|
+|bfs.nas|2471|1.763s|0.003s|
+
+operands calling frequency:
+
+|file|1st|2nd|3rd|4th|5th|
+|:----|:----|:----|:----|:----|:----|
+|pi.nas|callg|pop|mcallg|pnum|pone|
+|fib.nas|calll|pnum|callg|less|jf|
+|bp.nas|calll|callg|pop|callv|addeq|
+|bigloop.nas|pnum|less|jf|callg|pone|
+|mandelbrot.nas|callg|mult|loadg|pnum|pop|
+|life.nas|calll|callv|pnum|jf|callg|
+|ascii-art.nas|calll|pop|mcalll|callg|callb|
+|calc.nas|calll|pop|pstr|mcalll|jmp|
+|quick_sort.nas|calll|pop|jt|jf|less|
+|bfs.nas|calll|pop|callv|mcalll|jf|
+
+operands calling total times:
+
+|file|1st|2nd|3rd|4th|5th|
+|:----|:----|:----|:----|:----|:----|
+|pi.nas|6000004|6000003|6000000|4000005|4000002|
+|fib.nas|17622792|10573704|7049218|7049155|7049155|
+|bp.nas|7081480|4227268|2764676|2617112|2065441|
+|bigloop.nas|4000001|4000001|4000001|4000001|4000000|
+|mandelbrot.nas|1519632|563856|290641|286795|284844|
+|life.nas|2114371|974244|536413|534794|489743|
+|ascii-art.nas|37906|22736|22402|18315|18292|
+|calc.nas|191|124|109|99|87|
+|quick_sort.nas|16226|5561|4144|3524|2833|
+|bfs.nas|24707|16297|14606|14269|8672|
+
+## version 7.0 (i5-8250U ubuntu-WSL on windows10 2021/6/29)
+
+running time:
+
+|file|total time|info|
+|:----|:----|:----|
+|pi.nas|0.15625s|great improvement|
+|fib.nas|0.75s|great improvement|
+|bp.nas|0.4218s(7162 epoch)|good improvement|
+|bigloop.nas|0.09375s|great improvement|
+|mandelbrot.nas|0.0312s|great improvement|
+|life.nas|8.80s(windows) 1.25(ubuntu WSL)|little improvement|
+|ascii-art.nas|0.015s|little improvement|
+|calc.nas|0.0468s|little improvement|
+|quick_sort.nas|0s|great improvement|
+|bfs.nas|0.0156s|great improvement|
+
+## version 8.0 (R9-5900HX ubuntu-WSL 2022/1/23)
+
+running time:
+
+|file|total time|info|
+|:----|:----|:----|
+|bf.nas|1100.19s||
+|mandel.nas|28.98s||
+|life.nas|0.56s|0.857s(windows)|
+|ycombinator.nas|0.64s||
+|fib.nas|0.28s||
+|bfs.nas|0.156s|random result|
+|pi.nas|0.0625s||
+|bigloop.nas|0.047s||
+|calc.nas|0.03125s|changed test file|
+|mandelbrot.nas|0.0156s||
+|ascii-art.nas|0s||
+|quick_sort.nas|0s||
+
+## version 9.0 (R9-5900HX ubuntu-WSL 2022/2/13)
+
+running time:
+
+|file|total time|info|
+|:----|:----|:----|
+|bf.nas|276.55s|great improvement|
+|mandel.nas|28.16s||
+|ycombinator.nas|0.59s||
+|life.nas|0.2s|0.649s(windows)|
+|fib.nas|0.234s|little improvement|
+|bfs.nas|0.14s|random result|
+|pi.nas|0.0625s||
+|bigloop.nas|0.047s||
+|calc.nas|0.0469s|changed test file|
+|quick_sort.nas|0.016s|changed test file:100->1e4|
+|mandelbrot.nas|0.0156s||
+|ascii-art.nas|0s||
+
+`bf.nas` is a very interesting test file that there is a brainfuck interpreter written in nasal.
+And we use this bf interpreter to draw a mandelbrot set.
+
+In 2022/2/17 update we added `\e` into the lexer. And the `bfcolored.nas` uses this special ASCII code. Here is the result:
+
+![mandelbrot](../pic/mandelbrot.png)
diff --git a/doc/dev.md b/doc/dev.md
new file mode 100644
index 0000000..ab0f998
--- /dev/null
+++ b/doc/dev.md
@@ -0,0 +1,644 @@
+# __Development History__
+
+## __Contents__
+
+* [__Parser__](#parser)
+  * [v1.0](#version-10-parser-last-update-20191014)
+* [__Abstract Syntax Tree__](#abstract-syntax-tree)
+  * [v1.2](#version-12-ast-last-update-20191031)
+  * [v2.0](#version-20-ast-last-update-2020831)
+  * [v3.0](#version-30-ast-last-update-20201023)
+  * [v5.0](#version-50-ast-last-update-202137)
+* [__Bytecode VM__](#bytecode-virtual-machine)
+  * [v4.0](#version-40-vm-last-update-20201217)
+  * [v5.0](#version-50-vm-last-update-202137)
+  * [v6.0](#version-60-vm-last-update-202161)
+  * [v6.5](#version-65-vm-last-update-2021624)
+  * [v7.0](#version-70-vm-last-update-2021108)
+  * [v8.0](#version-80-vm-last-update-2022212)
+  * [v9.0](#version-90-vm-last-update-2022518)
+  * [v10.0](#version-100-vm-latest)
+
+## __Parser__
+
+`LL(1)` parser with special check.
+
+```javascript
+(var a,b,c)=[{b:nil},[1,2],func return 0;];
+(a.b,b[0],c)=(1,2,3);
+```
+
+These two expressions have the same first set,so `LL(1)` is useless for this language. We add some special checks in it.
+
+Problems mentioned above have been solved for a long time, but recently i found a new problem here:
+
+```javascript
+var f=func(x,y,z){return x+y+z}
+(a,b,c)=(0,1,2);
+```
+
+This will be recognized as this:
+
+```javascript
+var f=func(x,y,z){return x+y+z}(a,b,c)
+=(0,1,2);
+```
+
+and causes fatal syntax error.
+And i tried this program in flightgear nasal console.
+It also found this is a syntax error.
+I think this is a serious design fault.
+To avoid this syntax error, change program like this, just add a semicolon:
+
+```javascript
+var f=func(x,y,z){return x+y+z};
+                               ^ here
+(a,b,c)=(0,1,2);
+```
+
+### version 1.0 parser (last update 2019/10/14)
+
+First fully functional version of nasal_parser.
+
+Before version 1.0,i tried many times to create a correct parser.
+
+Finally i learned `LL(1)` and `LL(k)` and wrote a parser for math formulas in version 0.16(last update 2019/9/14).
+
+In version 0.17(2019/9/15) 0.18(2019/9/18) 0.19(2019/10/1)i was playing the parser happily and after that i wrote version 1.0.
+
+__This project began at 2019/7/25__.
+
+## __Abstract Syntax Tree__
+
+### version 1.2 ast (last update 2019/10/31)
+
+The ast has been completed in this version.
+
+### version 2.0 ast (last update 2020/8/31)
+
+A completed ast-interpreter with unfinished lib functions.
+
+### version 3.0 ast (last update 2020/10/23)
+
+The ast is refactored and is now easier to read and maintain.
+
+Ast-interpreter uses new techniques so it can run codes more efficiently.
+
+Now you can add your own functions as builtin-functions in this interpreter!
+
+I decide to save the ast interpreter after releasing v4.0. Because it took me a long time to think and write...
+
+### version 5.0 ast (last update 2021/3/7)
+
+I change my mind.
+AST interpreter leaves me too much things to do.
+
+If i continue saving this interpreter,
+it will be harder for me to make the bytecode vm become more efficient.
+
+## __Bytecode Virtual Machine__
+
+### version 4.0 vm (last update 2020/12/17)
+
+I have just finished the first version of bytecode-interpreter.
+
+This interpreter is still in test.
+After this test, i will release version 4.0!
+
+Now i am trying to search hidden bugs in this interpreter.
+Hope you could help me! :)
+
+There's an example of byte code below:
+
+```javascript
+for(var i=0;i<4000000;i+=1);
+```
+
+```x86asm
+.number 0
+.number 4e+006
+.number 1
+.symbol i
+0x00000000: pzero  0x00000000
+0x00000001: loadg  0x00000000 (i)
+0x00000002: callg  0x00000000 (i)
+0x00000003: pnum   0x00000001 (4e+006)
+0x00000004: less   0x00000000
+0x00000005: jf     0x0000000b
+0x00000006: pone   0x00000000
+0x00000007: mcallg 0x00000000 (i)
+0x00000008: addeq  0x00000000
+0x00000009: pop    0x00000000
+0x0000000a: jmp    0x00000002
+0x0000000b: nop    0x00000000
+```
+
+### version 5.0 vm (last update 2021/3/7)
+
+I decide to optimize bytecode vm in this version.
+
+Because it takes more than 1.5s to count i from `0` to `4000000-1`.This is not efficient at all!
+
+2021/1/23 update: Now it can count from `0` to `4000000-1` in 1.5s.
+
+### version 6.0 vm (last update 2021/6/1)
+
+Use `loadg`/`loadl`/`callg`/`calll`/`mcallg`/`mcalll` to avoid branches.
+
+Delete type `vm_scop`.
+
+Use const `vm_num` to avoid frequently new & delete.
+
+Change garbage collector from reference-counting to mark-sweep.
+
+`vapp` and `newf` operand use .num to reduce the size of `exec_code`.
+
+2021/4/3 update: Now it can count from `0` to `4e6-1` in 0.8s.
+
+2021/4/19 update: Now it can count from `0` to `4e6-1` in 0.4s.
+
+In this update i changed global and local scope from `unordered_map` to `vector`.
+
+So the bytecode generator changed a lot.
+
+```javascript
+for(var i=0;i<4000000;i+=1);
+```
+
+```x86asm
+.number 4e+006
+0x00000000: intg   0x00000001
+0x00000001: pzero  0x00000000
+0x00000002: loadg  0x00000000
+0x00000003: callg  0x00000000
+0x00000004: pnum   0x00000000 (4e+006)
+0x00000005: less   0x00000000
+0x00000006: jf     0x0000000c
+0x00000007: pone   0x00000000
+0x00000008: mcallg 0x00000000
+0x00000009: addeq  0x00000000
+0x0000000a: pop    0x00000000
+0x0000000b: jmp    0x00000003
+0x0000000c: nop    0x00000000
+```
+
+### version 6.5 vm (last update 2021/6/24)
+
+2021/5/31 update:
+
+Now gc can collect garbage correctly without re-collecting,
+which will cause fatal error.
+
+Add `builtin_alloc` to avoid mark-sweep when running a built-in function,
+which will mark useful items as useless garbage to collect.
+
+Better use setsize and assignment to get a big array,
+`append` is very slow in this situation.
+
+2021/6/3 update:
+
+Fixed a bug that gc still re-collects garbage,
+this time i use three mark states to make sure garbage is ready to be collected.
+
+Change `callf` to `callfv` and `callfh`.
+And `callfv` fetches arguments from `val_stack` directly instead of using `vm_vec`,
+a not very efficient way.
+
+Better use `callfv` instead of `callfh`,
+`callfh` will fetch a `vm_hash` from stack and parse it,
+making this process slow.
+
+```javascript
+var f=func(x,y){return x+y;}
+f(1024,2048);
+```
+
+```x86asm
+.number 1024
+.number 2048
+.symbol x   
+.symbol y
+0x00000000: intg   0x00000001
+0x00000001: newf   0x00000007
+0x00000002: intl   0x00000003
+0x00000003: offset 0x00000001
+0x00000004: para   0x00000000 (x)
+0x00000005: para   0x00000001 (y)
+0x00000006: jmp    0x0000000b
+0x00000007: calll  0x00000001
+0x00000008: calll  0x00000002
+0x00000009: add    0x00000000
+0x0000000a: ret    0x00000000
+0x0000000b: loadg  0x00000000
+0x0000000c: callg  0x00000000
+0x0000000d: pnum   0x00000000 (1024)
+0x0000000e: pnum   0x00000001 (2048)
+0x0000000f: callfv 0x00000002
+0x00000010: pop    0x00000000
+0x00000011: nop    0x00000000
+```
+
+2021/6/21 update: Now gc will not collect nullptr.
+And the function of assignment is complete,
+now these kinds of assignment is allowed:
+
+```javascript
+var f=func()
+{
+    var _=[{_:0},{_:1}];
+    return func(x)
+    {
+        return _[x];
+    }
+}
+var m=f();
+m(0)._=m(1)._=10;
+
+[0,1,2][1:2][0]=0;
+```
+
+In the old version,
+parser will check this left-value and tells that these kinds of left-value are not allowed(bad lvalue).
+
+But now it can work.
+And you could see its use by reading the code above.
+To make sure this assignment works correctly,
+codegen will generate byte code by `nasal_codegen::call_gen()` instead of `nasal_codegen::mcall_gen()`,
+and the last child of the ast will be generated by `nasal_codegen::mcall_gen()`.
+So the bytecode is totally different now:
+
+```x86asm
+.number 10
+.number 2
+.symbol _
+.symbol x
+0x00000000: intg   0x00000002
+0x00000001: newf   0x00000005
+0x00000002: intl   0x00000002
+0x00000003: offset 0x00000001
+0x00000004: jmp    0x00000017
+0x00000005: newh   0x00000000
+0x00000006: pzero  0x00000000
+0x00000007: happ   0x00000000 (_)
+0x00000008: newh   0x00000000
+0x00000009: pone   0x00000000
+0x0000000a: happ   0x00000000 (_)
+0x0000000b: newv   0x00000002
+0x0000000c: loadl  0x00000001
+0x0000000d: newf   0x00000012
+0x0000000e: intl   0x00000003
+0x0000000f: offset 0x00000002
+0x00000010: para   0x00000001 (x)
+0x00000011: jmp    0x00000016
+0x00000012: calll  0x00000001
+0x00000013: calll  0x00000002
+0x00000014: callv  0x00000000
+0x00000015: ret    0x00000000
+0x00000016: ret    0x00000000
+0x00000017: loadg  0x00000000
+0x00000018: callg  0x00000000
+0x00000019: callfv 0x00000000
+0x0000001a: loadg  0x00000001
+0x0000001b: pnum   0x00000000 (10.000000)
+0x0000001c: callg  0x00000001
+0x0000001d: pone   0x00000000
+0x0000001e: callfv 0x00000001
+0x0000001f: mcallh 0x00000000 (_)
+0x00000020: meq    0x00000000
+0x00000021: callg  0x00000001
+0x00000022: pzero  0x00000000
+0x00000023: callfv 0x00000001
+0x00000024: mcallh 0x00000000 (_)
+0x00000025: meq    0x00000000
+0x00000026: pop    0x00000000
+0x00000027: pzero  0x00000000
+0x00000028: pzero  0x00000000
+0x00000029: pone   0x00000000
+0x0000002a: pnum   0x00000001 (2.000000)
+0x0000002b: newv   0x00000003
+0x0000002c: slcbeg 0x00000000
+0x0000002d: pone   0x00000000
+0x0000002e: pnum   0x00000001 (2.000000)
+0x0000002f: slc2   0x00000000
+0x00000030: slcend 0x00000000
+0x00000031: pzero  0x00000000
+0x00000032: mcallv 0x00000000
+0x00000033: meq    0x00000000
+0x00000034: pop    0x00000000
+0x00000035: nop    0x00000000
+```
+
+As you could see from the bytecode above,
+`mcall`/`mcallv`/`mcallh` operands' using frequency will reduce,
+`call`/`callv`/`callh`/`callfv`/`callfh` at the opposite.
+
+And because of the new structure of `mcall`,
+`addr_stack`, a stack used to store the memory address,
+is deleted from `nasal_vm`,
+and now `nasal_vm` use `nasal_val** mem_addr` to store the memory address.
+This will not cause fatal errors because the memory address is used __immediately__ after getting it.
+
+### version 7.0 vm (last update 2021/10/8)
+
+2021/6/26 update:
+
+Instruction dispatch is changed from call-threading to computed-goto(with inline function).
+After changing the way of instruction dispatch,
+there is a great improvement in nasal_vm.
+Now vm can run test/bigloop and test/pi in 0.2s!
+And vm runs test/fib in 0.8s on linux.
+You could see the time use data below,
+in Test data section.
+
+This version uses g++ extension "labels as values",
+which is also supported by clang++.
+(But i don't know if MSVC supports this)
+
+There is also a change in nasal_gc:
+`std::vector` global is deleted,
+now the global values are all stored on stack(from `val_stack+0` to `val_stack+intg-1`).
+
+2021/6/29 update:
+
+Add some instructions that execute const values:
+`op_addc`,`op_subc`,`op_mulc`,`op_divc`,`op_lnkc`,`op_addeqc`,`op_subeqc`,`op_muleqc`,`op_diveqc`,`op_lnkeqc`.
+
+Now the bytecode of test/bigloop.nas seems like this:
+
+```x86asm
+.number 4e+006
+.number 1
+0x00000000: intg   0x00000001
+0x00000001: pzero  0x00000000
+0x00000002: loadg  0x00000000
+0x00000003: callg  0x00000000
+0x00000004: pnum   0x00000000 (4000000)
+0x00000005: less   0x00000000
+0x00000006: jf     0x0000000b
+0x00000007: mcallg 0x00000000
+0x00000008: addeqc 0x00000001 (1)
+0x00000009: pop    0x00000000
+0x0000000a: jmp    0x00000003
+0x0000000b: nop    0x00000000
+```
+
+And this test file runs in 0.1s after this update.
+Most of the calculations are accelerated.
+
+Also, assignment bytecode has changed a lot.
+Now the first identifier that called in assignment will use `op_load` to assign,
+instead of `op_meq`,`op_pop`.
+
+```javascript
+var (a,b)=(1,2);
+a=b=0;
+```
+
+```x86asm
+.number 2
+0x00000000: intg   0x00000002
+0x00000001: pone   0x00000000
+0x00000002: loadg  0x00000000
+0x00000003: pnum   0x00000000 (2)
+0x00000004: loadg  0x00000001
+0x00000005: pzero  0x00000000
+0x00000006: mcallg 0x00000001
+0x00000007: meq    0x00000000 (b=2 use meq,pop->a)
+0x00000008: loadg  0x00000000 (a=b use loadg)
+0x00000009: nop    0x00000000
+```
+
+### version 8.0 vm (last update 2022/2/12)
+
+2021/10/8 update:
+
+In this version vm_nil and vm_num now is not managed by    `nasal_gc`,
+this will decrease the usage of `gc::alloc` and increase the efficiency of execution.
+
+New value type is added: `vm_obj`.
+This type is reserved for user to define their own value types.
+Related API will be added in the future.
+
+Fully functional closure:
+Add new operands that get and set upvalues.
+Delete an old operand `op_offset`.
+
+2021/10/13 update:
+
+The format of output information of bytecodes changes to this:
+
+```x86asm
+0x000002f2: newf   0x2f6
+0x000002f3: intl   0x2
+0x000002f4: para   0x3e ("x")
+0x000002f5: jmp    0x309
+0x000002f6: calll  0x1
+0x000002f7: lessc  0x0 (2)
+0x000002f8: jf     0x2fb
+0x000002f9: calll  0x1
+0x000002fa: ret
+0x000002fb: upval  0x0[0x1]
+0x000002fc: upval  0x0[0x1]
+0x000002fd: callfv 0x1
+0x000002fe: calll  0x1
+0x000002ff: subc   0x1d (1)
+0x00000300: callfv 0x1
+0x00000301: upval  0x0[0x1]
+0x00000302: upval  0x0[0x1]
+0x00000303: callfv 0x1
+0x00000304: calll  0x1
+0x00000305: subc   0x0 (2)
+0x00000306: callfv 0x1
+0x00000307: add
+0x00000308: ret
+0x00000309: ret
+0x0000030a: callfv 0x1
+0x0000030b: loadg  0x32
+```
+
+2022/1/22 update:
+
+Delete `op_pone` and `op_pzero`.
+Both of them are meaningless and will be replaced by `op_pnum`.
+
+### version 9.0 vm (last update 2022/5/18)
+
+2022/2/12 update:
+
+Local values now are __stored on stack__.
+So function calling will be faster than before.
+Because in v8.0 when calling a function,
+new `vm_vec` will be allocated by `nasal_gc`, this makes gc doing mark-sweep too many times and spends a quite lot of time.
+In test file `test/bf.nas`, it takes too much time to test the file because this file has too many function calls(see test data below in table `version 8.0 (R9-5900HX ubuntu-WSL 2022/1/23)`).
+
+Upvalue now is generated when creating first new function in the local scope, using `vm_vec`.
+And after that when creating new functions, they share the same upvalue, and the upvalue will synchronize with the local scope each time creating a new function.
+
+2022/3/27 update:
+
+In this month's updates we change upvalue from `vm_vec` to `vm_upval`,
+a special gc-managed object,
+which has almost the same structure of that upvalue object in another programming language __`Lua`__.
+
+Today we change the output format of bytecode.
+New output format looks like `objdump`:
+
+```x86asm
+  0x0000029b:       0a 00 00 00 00        newh
+
+func <0x29c>:
+  0x0000029c:       0b 00 00 02 a0        newf    0x2a0
+  0x0000029d:       02 00 00 00 02        intl    0x2
+  0x0000029e:       0d 00 00 00 66        para    0x66 ("libname")
+  0x0000029f:       32 00 00 02 a2        jmp     0x2a2
+  0x000002a0:       40 00 00 00 42        callb   0x42 <__dlopen@0x41dc40>
+  0x000002a1:       4a 00 00 00 00        ret
+<0x29c>;
+
+  0x000002a2:       0c 00 00 00 67        happ    0x67 ("dlopen")
+
+func <0x2a3>:
+  0x000002a3:       0b 00 00 02 a8        newf    0x2a8
+  0x000002a4:       02 00 00 00 03        intl    0x3
+  0x000002a5:       0d 00 00 00 68        para    0x68 ("lib")
+  0x000002a6:       0d 00 00 00 69        para    0x69 ("sym")
+  0x000002a7:       32 00 00 02 aa        jmp     0x2aa
+  0x000002a8:       40 00 00 00 43        callb   0x43 <__dlsym@0x41df00>
+  0x000002a9:       4a 00 00 00 00        ret
+<0x2a3>;
+
+  0x000002aa:       0c 00 00 00 6a        happ    0x6a ("dlsym")
+```
+
+### version 10.0 vm (latest)
+
+2022/5/19 update:
+
+Now we add coroutine in this runtime:
+
+```javascript
+var coroutine={
+    create: func(function){return __cocreate;},
+    resume: func(co)      {return __coresume;},
+    yield:  func(args...) {return __coyield; },
+    status: func(co)      {return __costatus;},
+    running:func()        {return __corun;   }
+};
+```
+
+`coroutine.create` is used to create a new coroutine object using a function.
+But this coroutine will not run immediately.
+
+`coroutine.resume` is used to continue running a coroutine.
+
+`coroutine.yield` is used to interrupt the running of a coroutine and throw some values.
+These values will be accepted and returned by `coroutine.resume`.
+And `coroutine.yield` it self returns `vm_nil` in the coroutine function.
+
+`coroutine.status` is used to see the status of a coroutine.
+There are 3 types of status:`suspended` means waiting for running,`running` means is running,`dead` means finished running.
+
+`coroutine.running` is used to judge if there is a coroutine running now.
+
+__CAUTION:__ coroutine should not be created or running inside another coroutine.
+
+__We will explain how resume and yield work here:__
+
+When `op_callb` is called, the stack frame is like this:
+
+```C++
++--------------------------+(main stack)
+| old pc(vm_ret)           | <- top[0]
++--------------------------+
+| old localr(vm_addr)      | <- top[-1]
++--------------------------+
+| old upvalr(vm_upval)     | <- top[-2]
++--------------------------+
+| local scope(nas_ref)     |
+| ...                      |
++--------------------------+ <- local pointer stored in localr
+| old funcr(vm_func)       | <- old function stored in funcr
++--------------------------+
+```
+
+In `op_callb`'s progress, next step the stack frame is:
+
+```C++
++--------------------------+(main stack)
+| nil(vm_nil)              | <- push nil
++--------------------------+
+| old pc(vm_ret)           |
++--------------------------+
+| old localr(vm_addr)      |
++--------------------------+
+| old upvalr(vm_upval)     |
++--------------------------+
+| local scope(nas_ref)     |
+| ...                      |
++--------------------------+ <- local pointer stored in localr
+| old funcr(vm_func)       | <- old function stored in funcr
++--------------------------+
+```
+
+Then we call `resume`, this function will change stack.
+As we can see, coroutine stack already has some values on it,
+but if we first enter it, the stack top will be `vm_ret`, and the return `pc` is `0`.
+
+So for safe running, `resume` will return `gc.top[0]`.
+`op_callb` will do `top[0]=resume()`, so the value does not change.
+
+```C++
++--------------------------+(coroutine stack)
+| pc:0(vm_ret)             | <- now gc.top[0]
++--------------------------+
+```
+
+When we call `yield`, the function will do like this.
+And we find that `op_callb` has put the `nil` at the top.
+but where is the returned `local[1]` sent?
+
+```C++
++--------------------------+(coroutine stack)
+| nil(vm_nil)              | <- push nil
++--------------------------+
+| old pc(vm_ret)           |
++--------------------------+
+| old localr(vm_addr)      |
++--------------------------+
+| old upvalr(vm_upval)     |
++--------------------------+
+| local scope(nas_ref)     |
+| ...                      |
++--------------------------+ <- local pointer stored in localr
+| old funcr(vm_func)       | <- old function stored in funcr
++--------------------------+
+```
+
+When `builtin_coyield` is finished, the stack is set to main stack,
+and the returned `local[1]` in fact is set to the top of the main stack by `op_callb`:
+
+```C++
++--------------------------+(main stack)
+| return_value(nas_ref)    |
++--------------------------+
+| old pc(vm_ret)           |
++--------------------------+
+| old localr(vm_addr)      |
++--------------------------+
+| old upvalr(vm_upval)     |
++--------------------------+
+| local scope(nas_ref)     |
+| ...                      |
++--------------------------+ <- local pointer stored in localr
+| old funcr(vm_func)       | <- old function stored in funcr
++--------------------------+
+```
+
+so the main progress feels the value on the top is the returned value of `resume`.
+but in fact the `resume`'s returned value is set on coroutine stack.
+so we conclude this:
+
+```C++
+resume (main->coroutine) return coroutine.top[0]. coroutine.top[0] = coroutine.top[0];
+yield  (coroutine->main) return a vector.         main.top[0]      = vector;
+```
diff --git a/doc/dev_zh.md b/doc/dev_zh.md
new file mode 100644
index 0000000..76d99ef
--- /dev/null
+++ b/doc/dev_zh.md
@@ -0,0 +1,577 @@
+# __开发历史记录__
+
+## __目录__
+
+* [__语法分析__](#语法分析)
+  * [v1.0](#version-10-parser-last-update-20191014)
+* [__抽象语法树__](#抽象语法树)
+  * [v1.2](#version-12-ast-last-update-20191031)
+  * [v2.0](#version-20-ast-last-update-2020831)
+  * [v3.0](#version-30-ast-last-update-20201023)
+  * [v5.0](#version-50-ast-last-update-202137)
+* [__字节码虚拟机__](#字节码虚拟机)
+  * [v4.0](#version-40-vm-last-update-20201217)
+  * [v5.0](#version-50-vm-last-update-202137)
+  * [v6.0](#version-60-vm-last-update-202161)
+  * [v6.5](#version-65-vm-last-update-2021624)
+  * [v7.0](#version-70-vm-last-update-2021108)
+  * [v8.0](#version-80-vm-last-update-2022212)
+  * [v9.0](#version-90-vm-last-update-2022518)
+  * [v10.0](#version-100-vm-latest)
+
+## __语法分析__
+
+有特殊语法检查的`LL(1)`语法分析器。
+
+```javascript
+(var a,b,c)=[{b:nil},[1,2],func return 0;];
+(a.b,b[0],c)=(1,2,3);
+```
+
+这两个表达式有同一个first集，所以纯粹的`LL(1)`很难实现这个语言的语法分析。所以我们为其添加了特殊语法检查机制。本质上还是`LL(1)`的内核。
+
+上面这个问题已经解决很久了，不过我最近发现了一个新的语法问题:
+
+```javascript
+var f=func(x,y,z){return x+y+z}
+(a,b,c)=(0,1,2);
+```
+
+这种写法会被错误识别合并成下面这种:
+
+```javascript
+var f=func(x,y,z){return x+y+z}(a,b,c)
+=(0,1,2);
+```
+
+语法分析器会认为这是个严重的语法错误。我在Flightgear中也测试了这个代码，它内置的语法分析器也认为这是错误语法。当然我认为这是语法设计中的一个比较严重的缺漏。为了避免这个语法问题，只需要添加一个分号就可以了:
+
+```javascript
+var f=func(x,y,z){return x+y+z};
+                               ^ 就是这里
+(a,b,c)=(0,1,2);
+```
+
+### version 1.0 parser (last update 2019/10/14)
+
+第一版功能完备的nasal语法分析器完成了。
+
+在version 1.0之前，我多次尝试构建一个正确的语法分析器但总是存在一些问题。
+
+最终我学习了`LL(1)`和`LL(k)`文法并且在version 0.16(last update 2019/9/14)中完成了一个能识别数学算式的语法分析器。
+
+在version 0.17(2019/9/15) 0.18(2019/9/18) 0.19(2019/10/1)中我只是抱着玩的心态在测试语法分析器，不过在那之后我还是完成了version 1.0的语法分析器。
+
+__该项目于2019/7/25正式开始__。
+
+## __抽象语法树__
+
+### version 1.2 ast (last update 2019/10/31)
+
+抽象语法树在这个版本初步完成。
+
+### version 2.0 ast (last update 2020/8/31)
+
+在这个版本我们基于抽象语法树实现了一个树解释器，并且完成了部分内置函数。
+
+### version 3.0 ast (last update 2020/10/23)
+
+我们重构了抽象语法树的代码，现在可以更容易地读懂代码并进行维护。
+
+这个版本的树解释器用了新的优化方式，所以可以更高效地执行代码。
+
+在这个版本用户已经可以自行添加内置函数。
+
+我想在v4.0发布之后仍然保留这个树解释器，毕竟花了很长时间才写完这坨屎。
+
+### version 5.0 ast (last update 2021/3/7)
+
+我改变想法了，树解释器给维护带来了太大的麻烦。如果想继续保留这个解释器，那么为了兼容性，字节码虚拟机的优化工作会更难推进。
+
+## __字节码虚拟机__
+
+### version 4.0 vm (last update 2020/12/17)
+
+我在这个版本实现了第一版字节码虚拟机。不过这个虚拟机仍然在测试中，在这次测试结束之后，我会发布v4.0发行版。
+
+现在我在找一些隐藏很深的bug。如果有人想帮忙的话，非常欢迎！:)
+
+下面是生成的字节码的样例:
+
+```javascript
+for(var i=0;i<4000000;i+=1);
+```
+
+```x86asm
+.number 0
+.number 4e+006
+.number 1
+.symbol i
+0x00000000: pzero  0x00000000
+0x00000001: loadg  0x00000000 (i)
+0x00000002: callg  0x00000000 (i)
+0x00000003: pnum   0x00000001 (4e+006)
+0x00000004: less   0x00000000
+0x00000005: jf     0x0000000b
+0x00000006: pone   0x00000000
+0x00000007: mcallg 0x00000000 (i)
+0x00000008: addeq  0x00000000
+0x00000009: pop    0x00000000
+0x0000000a: jmp    0x00000002
+0x0000000b: nop    0x00000000
+```
+
+### version 5.0 vm (last update 2021/3/7)
+
+从这个版本起，我决定持续优化字节码虚拟机。
+
+毕竟现在这玩意从`0`数到`4000000-1`要花费1.5秒。这效率完全不能忍。
+
+2021/1/23 update: 现在它确实可以在1.5秒内从`0`数到`4000000-1`了。
+
+### version 6.0 vm (last update 2021/6/1)
+
+使用`loadg`/`loadl`/`callg`/`calll`/`mcallg`/`mcalll`指令来减少分支语句的调用。
+
+删除了`vm_scop`类型。
+
+添加作为常量的`vm_num`来减少内存分配的开销。
+
+将垃圾收集器从引用计数改为了标记清理。
+
+`vapp`和`newf`开始使用先前未被使用的.num段来压缩字节码生成数量，减少生成的`exec_code`的大小。
+
+2021/4/3 update: 从`0`数到`4e6-1`只需要不到0.8秒了。
+
+2021/4/19 update: 从`0`数到`4e6-1`只需要不到0.4秒了。
+
+在这次的更新中，我把全局变量和局部变量的存储结构从`unordered_map`变为了`vector`，从而提升执行效率。所以现在生成的字节码大变样了。
+
+```javascript
+for(var i=0;i<4000000;i+=1);
+```
+
+```x86asm
+.number 4e+006
+0x00000000: intg   0x00000001
+0x00000001: pzero  0x00000000
+0x00000002: loadg  0x00000000
+0x00000003: callg  0x00000000
+0x00000004: pnum   0x00000000 (4e+006)
+0x00000005: less   0x00000000
+0x00000006: jf     0x0000000c
+0x00000007: pone   0x00000000
+0x00000008: mcallg 0x00000000
+0x00000009: addeq  0x00000000
+0x0000000a: pop    0x00000000
+0x0000000b: jmp    0x00000003
+0x0000000c: nop    0x00000000
+```
+
+### version 6.5 vm (last update 2021/6/24)
+
+2021/5/31 update:
+
+现在垃圾收集器不会错误地重复收集未使用变量了。
+
+添加了`builtin_alloc`以防止在运行内置函数的时候错误触发标记清除。
+
+建议在获取大空间数组的时候尽量使用setsize，因为`append`在被频繁调用时可能会频繁触发垃圾收集器。
+
+2021/6/3 update:
+
+修复了垃圾收集器还是他妈的会重复收集的bug，这次我设计了三个标记状态来保证垃圾是被正确收集了。
+
+将`callf`指令拆分为`callfv`和`callfh`。并且`callfv`将直接从`val_stack`获取传参，而不是先通过一个`vm_vec`把参数收集起来再传入，后者是非常低效的做法。
+
+建议更多使用`callfv`而不是`callfh`，因为`callfh`只能从栈上获取参数并整合为`vm_hash`之后才能传给该指令进行处理，拖慢执行速度。
+
+```javascript
+var f=func(x,y){return x+y;}
+f(1024,2048);
+```
+
+```x86asm
+.number 1024
+.number 2048
+.symbol x   
+.symbol y
+0x00000000: intg   0x00000001
+0x00000001: newf   0x00000007
+0x00000002: intl   0x00000003
+0x00000003: offset 0x00000001
+0x00000004: para   0x00000000 (x)
+0x00000005: para   0x00000001 (y)
+0x00000006: jmp    0x0000000b
+0x00000007: calll  0x00000001
+0x00000008: calll  0x00000002
+0x00000009: add    0x00000000
+0x0000000a: ret    0x00000000
+0x0000000b: loadg  0x00000000
+0x0000000c: callg  0x00000000
+0x0000000d: pnum   0x00000000 (1024)
+0x0000000e: pnum   0x00000001 (2048)
+0x0000000f: callfv 0x00000002
+0x00000010: pop    0x00000000
+0x00000011: nop    0x00000000
+```
+
+2021/6/21 update:
+
+现在垃圾收集器不会收集空指针了。并且调用链中含有函数调用的赋值语句现在也可以执行了，下面这些赋值方式是合法的:
+
+```javascript
+var f=func()
+{
+    var _=[{_:0},{_:1}];
+    return func(x)
+    {
+        return _[x];
+    }
+}
+var m=f();
+m(0)._=m(1)._=10;
+
+[0,1,2][1:2][0]=0;
+```
+
+在老版本中，语法分析器会检查左值，并且在检测到有特别调用的情况下直接告知用户这种左值是不被接受的(bad lvalue)。但是现在它可以正常运作了。为了保证这种赋值语句能正常执行，codegen模块会优先使用`nasal_codegen::call_gen()`生成前面调用链的字节码而不是全部使用 `nasal_codegen::mcall_gen()`，在最后一个调用处才会使用`nasal_codegen::mcall_gen()`。
+
+所以现在生成的相关字节码也完全不同了:
+
+```x86asm
+.number 10
+.number 2
+.symbol _
+.symbol x
+0x00000000: intg   0x00000002
+0x00000001: newf   0x00000005
+0x00000002: intl   0x00000002
+0x00000003: offset 0x00000001
+0x00000004: jmp    0x00000017
+0x00000005: newh   0x00000000
+0x00000006: pzero  0x00000000
+0x00000007: happ   0x00000000 (_)
+0x00000008: newh   0x00000000
+0x00000009: pone   0x00000000
+0x0000000a: happ   0x00000000 (_)
+0x0000000b: newv   0x00000002
+0x0000000c: loadl  0x00000001
+0x0000000d: newf   0x00000012
+0x0000000e: intl   0x00000003
+0x0000000f: offset 0x00000002
+0x00000010: para   0x00000001 (x)
+0x00000011: jmp    0x00000016
+0x00000012: calll  0x00000001
+0x00000013: calll  0x00000002
+0x00000014: callv  0x00000000
+0x00000015: ret    0x00000000
+0x00000016: ret    0x00000000
+0x00000017: loadg  0x00000000
+0x00000018: callg  0x00000000
+0x00000019: callfv 0x00000000
+0x0000001a: loadg  0x00000001
+0x0000001b: pnum   0x00000000 (10.000000)
+0x0000001c: callg  0x00000001
+0x0000001d: pone   0x00000000
+0x0000001e: callfv 0x00000001
+0x0000001f: mcallh 0x00000000 (_)
+0x00000020: meq    0x00000000
+0x00000021: callg  0x00000001
+0x00000022: pzero  0x00000000
+0x00000023: callfv 0x00000001
+0x00000024: mcallh 0x00000000 (_)
+0x00000025: meq    0x00000000
+0x00000026: pop    0x00000000
+0x00000027: pzero  0x00000000
+0x00000028: pzero  0x00000000
+0x00000029: pone   0x00000000
+0x0000002a: pnum   0x00000001 (2.000000)
+0x0000002b: newv   0x00000003
+0x0000002c: slcbeg 0x00000000
+0x0000002d: pone   0x00000000
+0x0000002e: pnum   0x00000001 (2.000000)
+0x0000002f: slc2   0x00000000
+0x00000030: slcend 0x00000000
+0x00000031: pzero  0x00000000
+0x00000032: mcallv 0x00000000
+0x00000033: meq    0x00000000
+0x00000034: pop    0x00000000
+0x00000035: nop    0x00000000
+```
+
+从上面这些字节码可以看出，`mcall`/`mcallv`/`mcallh`指令的使用频率比以前减小了一些，而`call`/`callv`/`callh`/`callfv`/`callfh`则相反。
+
+并且因为新的数据结构，`mcall`指令以及`addr_stack`，一个曾用来存储指针的栈，从`nasal_vm`中被移除。现在`nasal_vm`使用`nasal_val** mem_addr`来暂存获取的内存地址。这不会导致严重的问题，因为内存空间是 __获取即使用__ 的。
+
+### version 7.0 vm (last update 2021/10/8)
+
+2021/6/26 update:
+
+指令分派方式从call-threading改为了computed-goto。在更改了指令分派方式之后，nasal_vm的执行效率有了非常巨大的提升。现在虚拟机可以在0.2秒内执行完test/bigloop和test/pi！并且在linux平台虚拟机可以在0.8秒内执行完test/fib。你可以在下面的测试数据部分看到测试的结果。
+
+这个分派方式使用了g++扩展"labels as values"，clang++目前也支持这种指令分派的实现方式。(不过MSVC支不支持就不得而知了，哈哈)
+
+nasal_gc中也有部分改动:
+全局变量不再用`std::vector`存储，而是全部存在操作数栈上(从`val_stack+0`到`val_stack+intg-1`)。
+
+2021/6/29 update:
+
+添加了一些直接用常量进行运算的指令:
+`op_addc`,`op_subc`,`op_mulc`,`op_divc`,`op_lnkc`,`op_addeqc`,`op_subeqc`,`op_muleqc`,`op_diveqc`,`op_lnkeqc`。
+
+现在test/bigloop.nas的字节码是这样的:
+
+```x86asm
+.number 4e+006
+.number 1
+0x00000000: intg   0x00000001
+0x00000001: pzero  0x00000000
+0x00000002: loadg  0x00000000
+0x00000003: callg  0x00000000
+0x00000004: pnum   0x00000000 (4000000)
+0x00000005: less   0x00000000
+0x00000006: jf     0x0000000b
+0x00000007: mcallg 0x00000000
+0x00000008: addeqc 0x00000001 (1)
+0x00000009: pop    0x00000000
+0x0000000a: jmp    0x00000003
+0x0000000b: nop    0x00000000
+```
+
+在这次更新之后，这个测试文件可以在0.1秒内运行结束。大多数的运算操作速度都有提升。
+
+并且赋值相关的字节码也有一些改动。现在赋值语句只包含一个标识符时，会优先调用`op_load`来赋值，而不是使用`op_meq`和`op_pop`。
+
+```javascript
+var (a,b)=(1,2);
+a=b=0;
+```
+
+```x86asm
+.number 2
+0x00000000: intg   0x00000002
+0x00000001: pone   0x00000000
+0x00000002: loadg  0x00000000
+0x00000003: pnum   0x00000000 (2)
+0x00000004: loadg  0x00000001
+0x00000005: pzero  0x00000000
+0x00000006: mcallg 0x00000001
+0x00000007: meq    0x00000000 (b=2 use meq,pop->a)
+0x00000008: loadg  0x00000000 (a=b use loadg)
+0x00000009: nop    0x00000000
+```
+
+### version 8.0 vm (last update 2022/2/12)
+
+2021/10/8 update:
+
+从这个版本开始`vm_nil`和`vm_num`不再由`nasal_gc`管理，这会大幅度降低`gc::alloc`的调用并且会大幅度提升执行效率。
+
+添加了新的数据类型: `vm_obj`。这个类型是留给用户定义他们想要的数据类型的。相关的API会在未来加入。
+
+功能完备的闭包：添加了读写闭包数据的指令。删除了老的指令`op_offset`。
+
+2021/10/13 update:
+
+字节码信息输出格式修改为如下形式:
+
+```x86asm
+0x000002f2: newf   0x2f6
+0x000002f3: intl   0x2
+0x000002f4: para   0x3e ("x")
+0x000002f5: jmp    0x309
+0x000002f6: calll  0x1
+0x000002f7: lessc  0x0 (2)
+0x000002f8: jf     0x2fb
+0x000002f9: calll  0x1
+0x000002fa: ret
+0x000002fb: upval  0x0[0x1]
+0x000002fc: upval  0x0[0x1]
+0x000002fd: callfv 0x1
+0x000002fe: calll  0x1
+0x000002ff: subc   0x1d (1)
+0x00000300: callfv 0x1
+0x00000301: upval  0x0[0x1]
+0x00000302: upval  0x0[0x1]
+0x00000303: callfv 0x1
+0x00000304: calll  0x1
+0x00000305: subc   0x0 (2)
+0x00000306: callfv 0x1
+0x00000307: add
+0x00000308: ret
+0x00000309: ret
+0x0000030a: callfv 0x1
+0x0000030b: loadg  0x32
+```
+
+2022/1/22 update:
+
+删除`op_pone`和`op_pzero`。这两个指令在目前已经没有实际意义，并且已经被`op_pnum`替代。
+
+### version 9.0 vm (last update 2022/5/18)
+
+2022/2/12 update:
+
+局部变量现在也被 __存储在栈上__。
+所以函数调用比以前也会快速很多。
+在v8.0如果你想调用一个函数，
+新的`vm_vec`将被分配出来用于模拟局部作用域，这个操作会导致标记清除过程会被频繁触发并且浪费太多的执行时间。
+在测试文件`test/bf.nas`中，这种调用方式使得大部分时间都被浪费了，因为这个测试文件包含大量且频繁的函数调用(详细数据请看测试数据一节中`version 8.0 (R9-5900HX ubuntu-WSL 2022/1/23)`)。
+
+现在闭包会在第一次在局部作用域创建新函数的时候产生，使用`vm_vec`。
+在那之后如果再创建新的函数，则他们会共享同一个闭包，这些闭包会在每次于局部作用域创建新函数时同步。
+
+2022/3/27 update:
+
+在这个月的更新中我们把闭包的数据结构从`vm_vec`换成了一个新的对象`vm_upval`，这种类型有着和另外一款编程语言 __`Lua`__ 中闭包相类似的结构。
+
+同时我们也修改了字节码的输出格式。新的格式看起来像是 `objdump`:
+
+```x86asm
+  0x0000029b:       0a 00 00 00 00        newh
+
+func <0x29c>:
+  0x0000029c:       0b 00 00 02 a0        newf    0x2a0
+  0x0000029d:       02 00 00 00 02        intl    0x2
+  0x0000029e:       0d 00 00 00 66        para    0x66 ("libname")
+  0x0000029f:       32 00 00 02 a2        jmp     0x2a2
+  0x000002a0:       40 00 00 00 42        callb   0x42 <__dlopen@0x41dc40>
+  0x000002a1:       4a 00 00 00 00        ret
+<0x29c>;
+
+  0x000002a2:       0c 00 00 00 67        happ    0x67 ("dlopen")
+
+func <0x2a3>:
+  0x000002a3:       0b 00 00 02 a8        newf    0x2a8
+  0x000002a4:       02 00 00 00 03        intl    0x3
+  0x000002a5:       0d 00 00 00 68        para    0x68 ("lib")
+  0x000002a6:       0d 00 00 00 69        para    0x69 ("sym")
+  0x000002a7:       32 00 00 02 aa        jmp     0x2aa
+  0x000002a8:       40 00 00 00 43        callb   0x43 <__dlsym@0x41df00>
+  0x000002a9:       4a 00 00 00 00        ret
+<0x2a3>;
+
+  0x000002aa:       0c 00 00 00 6a        happ    0x6a ("dlsym")
+```
+
+### version 10.0 vm (latest)
+
+2022/5/19 update:
+
+在这个版本中我们给nasal加入了协程:
+
+```javascript
+var coroutine={
+    create: func(function){return __cocreate;},
+    resume: func(co)      {return __coresume;},
+    yield:  func(args...) {return __coyield; },
+    status: func(co)      {return __costatus;},
+    running:func()        {return __corun;   }
+};
+```
+
+`coroutine.create`用于创建新的协程对象。不过创建之后协程并不会直接运行。
+
+`coroutine.resume`用于继续运行一个协程。
+
+`coroutine.yield`用于中断一个协程的运行过程并且抛出一些数据。这些数据会被`coroutine.resume`接收并返回。而在协程函数中`coroutine.yield`本身只返回`vm_nil`。
+
+`coroutine.status`用于查看协程的状态。协程有三种不同的状态：`suspended`挂起，`running`运行中，`dead`结束运行。
+
+`coroutine.running`用于判断当前是否有协程正在运行。
+
+__注意:__ 协程不能在其他正在运行的协程中创建。
+
+__接下来我们解释这个协程的运行原理:__
+
+当`op_callb`被执行时，栈帧如下所示:
+
+```C++
++--------------------------+(主操作数栈)
+| old pc(vm_ret)           | <- top[0]
++--------------------------+
+| old localr(vm_addr)      | <- top[-1]
++--------------------------+
+| old upvalr(vm_upval)     | <- top[-2]
++--------------------------+
+| local scope(nas_ref)     |
+| ...                      |
++--------------------------+ <- local pointer stored in localr
+| old funcr(vm_func)       | <- old function stored in funcr
++--------------------------+
+```
+
+在`op_callb`执行过程中，下一步的栈帧如下:
+
+```C++
++--------------------------+(主操作数栈)
+| nil(vm_nil)              | <- push nil
++--------------------------+
+| old pc(vm_ret)           |
++--------------------------+
+| old localr(vm_addr)      |
++--------------------------+
+| old upvalr(vm_upval)     |
++--------------------------+
+| local scope(nas_ref)     |
+| ...                      |
++--------------------------+ <- local pointer stored in localr
+| old funcr(vm_func)       | <- old function stored in funcr
++--------------------------+
+```
+
+接着我们调用`resume`，这个函数会替换操作数栈。我们会看到，协程的操作数栈上已经保存了一些数据，但是我们首次进入协程执行时，这个操作数栈的栈顶将会是`vm_ret`，并且返回的`pc`值是`0`。
+
+为了保证栈顶的数据不会被破坏，`resume`会返回`gc.top[0]`。`op_callb`将会执行`top[0]=resume()`，所以栈顶的数据虽然被覆盖了一次，但是实际上还是原来的数据。
+
+```C++
++--------------------------+(协程操作数栈)
+| pc:0(vm_ret)             | <- now gc.top[0]
++--------------------------+
+```
+
+当我们调用`yield`的时候，该函数会执行出这个情况，我们发现`op_callb` 已经把`nil`放在的栈顶。但是应该返回的`local[1]`到底发送到哪里去了？
+
+```C++
++--------------------------+(协程操作数栈)
+| nil(vm_nil)              | <- push nil
++--------------------------+
+| old pc(vm_ret)           |
++--------------------------+
+| old localr(vm_addr)      |
++--------------------------+
+| old upvalr(vm_upval)     |
++--------------------------+
+| local scope(nas_ref)     |
+| ...                      |
++--------------------------+ <- local pointer stored in localr
+| old funcr(vm_func)       | <- old function stored in funcr
++--------------------------+
+```
+
+当`builtin_coyield`执行完毕之后，栈又切换到了主操作数栈上，这时可以看到返回的`local[1]`实际上被`op_callb`放在了这里的栈顶:
+
+```C++
++--------------------------+(主操作数栈)
+| return_value(nas_ref)    |
++--------------------------+
+| old pc(vm_ret)           |
++--------------------------+
+| old localr(vm_addr)      |
++--------------------------+
+| old upvalr(vm_upval)     |
++--------------------------+
+| local scope(nas_ref)     |
+| ...                      |
++--------------------------+ <- local pointer stored in localr
+| old funcr(vm_func)       | <- old function stored in funcr
++--------------------------+
+```
+
+所以主程序会认为顶部这个返回值好像是`resume`返回的。而实际上`resume`的返回值在协程的操作数栈顶。综上所述:
+
+```C++
+resume (main->coroutine) return coroutine.top[0]. coroutine.top[0] = coroutine.top[0];
+yield  (coroutine->main) return a vector.         main.top[0]      = vector;
+```