LeoPostings.leo

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet ekr_stylesheet?>
<leo_file>
<leo_header file_format="2" tnodes="0" max_tnode_index="0" clone_windows="0"/>
<globals body_outline_ratio="0.5" body_secondary_ratio="0.5">
	<global_window_position top="50" left="50" height="500" width="700"/>
	<global_log_window_position top="0" left="0" height="0" width="0"/>
</globals>
<preferences/>
<find_panel_settings/>
<vnodes>
<v t="ekr.20071028032354"><vh>@chapters</vh></v>
<v t="ekr.20050421221914"><vh>About this file</vh></v>
<v t="ekr.20050425064819"><vh>3.x 2001 @file trees</vh>
<v t="ekr.20050425064819.1"><vh>Designing @file trees (From LeoDocs.Leo)</vh>
<v t="ekr.20050425064819.2"><vh>Deciding to do Leo2</vh></v>
<v t="ekr.20050425064819.3"><vh>A prototype</vh></v>
<v t="ekr.20050425064819.4"><vh>User interaction</vh></v>
<v t="ekr.20050425064819.5"><vh>The write code</vh></v>
<v t="ekr.20050425064819.6"><vh>The read code</vh></v>
<v t="ekr.20050425064819.7"><vh>The load/save code</vh></v>
<v t="ekr.20050425064819.8"><vh>Attributes, mirroring and dummy nodes</vh></v>
<v t="ekr.20050425064819.9"><vh>Clones</vh></v>
<v t="ekr.20050425064819.10"><vh>Error recovery, at last</vh></v>
</v>
</v>
<v t="ekr.20050421212523"><vh>4.0 (2002-2003) New file format w/o child indices, eliminated error "recovery"</vh>
<v t="ekr.20050422071739"><vh>  2003-10-17 From 4.0 readme</vh></v>
<v t="ekr.20050422065602.7"><vh>2002 &amp; 2003: Early ideas</vh>
<v t="ekr.20050421205312"><vh>2002-10-21 design.doc</vh></v>
<v t="ekr.20050421192149.12"><vh>2002-10 gti-open.doc New (long) design notes ***</vh>
<v t="ekr.20050422055636"><vh>2002-10-21 Theme 1: Global Tnode Indices </vh></v>
<v t="ekr.20050422055636.1"><vh>2002-10-21 Theme 2: Small (template) .leo files </vh></v>
<v t="ekr.20050422055636.2"><vh>2002-10-21 Theme 3: @@file nodes </vh></v>
<v t="ekr.20050422055636.8"><vh>2002-10-21 Summary of themes 1-3 </vh></v>
<v t="ekr.20050422055636.6"><vh>2002-10-21Theme 4: Revised XML file format</vh></v>
<v t="ekr.20050422055636.7"><vh>2002-10-22 RE: Theme 4: Revised XML file format</vh></v>
<v t="ekr.20050422055636.3"><vh>2002-10-23 Yes, GTI's _are_ possible ***</vh></v>
<v t="ekr.20050422055636.4"><vh>2002-10-24 RE: Yes, GTI's _are_ possible, more </vh></v>
<v t="ekr.20050422055636.5"><vh>2002-10-24 sequence numbers in gti's </vh></v>
<v t="ekr.20050422055636.9"><vh>2002-10-24 Glorious unification &amp; leo.py 4.0 **</vh></v>
<v t="ekr.20050422055636.10"><vh>2002-10-26 Setting global name: LeoID.txt </vh></v>
<v t="ekr.20050422060227"><vh>2002-10-24 Embedded XML != XML </vh></v>
<v t="ekr.20050422060227.1"><vh>2002-10-27 Embedded XML escapes</vh></v>
<v t="ekr.20050422060227.2"><vh>2002-10-29 Embedded XML escapes: second thoughts</vh></v>
</v>
<v t="ekr.20050421192149.13"><vh>2002-12 gti's.doc  Big picture: why 4.0 is important ***</vh>
<v t="ekr.20050421205312.1"><vh>2002-12-06 Big picture: why 4.0 is important: more</vh></v>
<v t="ekr.20050421205312.2"><vh>2002-12-07 Big picture: why 4.0 is important: more</vh></v>
<v t="ekr.20050421192149.18"><vh>2002-12-07 structure rule.doc (Continuation of big picture)</vh></v>
<v t="ekr.20050421205312.3"><vh>2002-12-17 Big picture: thick or thin?</vh></v>
<v t="ekr.20050421205312.4"><vh>2002-12-18 Big picture: thick or thin?</vh></v>
<v t="ekr.20050421210335"><vh>2002-12-18 Big picture: thick or thin?</vh></v>
<v t="ekr.20050421210335.1"><vh>2002-12-18 Big picture: thick or thin?</vh></v>
<v t="ekr.20050421210335.2"><vh>2002-12-18 Big picture: thick or thin?</vh></v>
<v t="ekr.20050421210335.3"><vh>2002-12-19 Teamwork with LEO</vh></v>
<v t="ekr.20050421210335.4"><vh>2002-12-19 RE: Teamwork with LEO</vh></v>
<v t="ekr.20050421210335.5"><vh>2002-12-19 RE: Teamwork with LEO</vh></v>
<v t="ekr.20050421210335.6"><vh>2002-12-19 RE: Teamwork with LEO</vh></v>
<v t="ekr.20050421210335.7"><vh>2002-12-18 Separate presentation from content</vh></v>
<v t="ekr.20050421210335.8"><vh>2002-12-18 RE: Separate presentation from content</vh></v>
<v t="ekr.20050421210335.9"><vh>2002-12-18 RE: Separate presentation from content</vh></v>
<v t="ekr.20050421210335.10"><vh>2002-12-19 RE: Separate presentation from content</vh></v>
<v t="ekr.20050421210335.11"><vh>2002-12-19 RE: Separate presentation from content</vh></v>
<v t="ekr.20050421210335.12"><vh>2002-12-19 RE: Separate presentation from content</vh></v>
<v t="ekr.20050421210335.13"><vh>2002-12-19 RE: Separate presentation from content</vh></v>
<v t="ekr.20050421210335.14"><vh>2002-12-30 Why thick is required </vh></v>
<v t="ekr.20050421210335.15"><vh>2002-12-31 RE: Thick &amp; thin</vh></v>
<v t="ekr.20050421210335.16"><vh>2003-01-02 RE: Thick &amp; thin</vh></v>
<v t="ekr.20050421210335.17"><vh>2003-01-06 RE: Thick &amp; thin</vh></v>
</v>
<v t="ekr.20050421192149.11"><vh>2003-02-18 gti_summary.doc</vh></v>
</v>
<v t="ekr.20050421194542.1" a="EM"><vh>2003-04 thru 2003-07 Doubts about reliability &amp; resolution</vh>
<v t="ekr.20050421211313"><vh>2003-04-30 Design questions</vh></v>
<v t="ekr.20050421204424" a="M"><vh>2003-05-01 thru  2003-05-30 vxnodes (shared nodes)</vh>
<v t="ekr.20050421192149.15"><vh>2003-05-01 nodes.doc</vh></v>
<v t="ekr.20050421203956"><vh>2003-05-01 shared nodes.doc</vh></v>
<v t="ekr.20050421192149.14"><vh>2003-05-02 inodes redux.doc</vh></v>
<v t="ekr.20050421203956.4"><vh>2003-05-29 scrolling.doc</vh></v>
<v t="ekr.20050421203956.2"><vh>2003-05-30 More about vxnodes.doc</vh></v>
</v>
<v t="ekr.20050421192149.6"><vh>2003-05-06 Conflicts.doc</vh></v>
<v t="ekr.20050421192149.5"><vh>2003-05-07 Conflicts2.doc ** (user can't resolve conflicts)</vh></v>
<v t="ekr.20050421211313.1"><vh>2003-5-13 Progress.doc</vh></v>
<v t="ekr.20050421192149.2"><vh>2003-05-25 4.0 is dead, long live leo.doc  **  (Valid concerns, wrong conclusion: see 2004-02-5)</vh></v>
<v t="ekr.20050421192149.20"><vh>2003-05-26 Using gnx's safely.doc</vh></v>
<v t="ekr.20050421192149.7"><vh>2003-05-27 Eliminate clones.doc</vh></v>
<v t="ekr.20050421192149.16"><vh>2003-05-30 Objections to link nodes.doc</vh></v>
<v t="ekr.20050421192149.3"><vh>2003-05-31 clones2links script.doc</vh></v>
<v t="ekr.20050421203956.3"><vh>2003-06-02 positions.doc</vh>
<v t="ekr.20050421204424.1"><vh>Overview</vh></v>
<v t="ekr.20050421204424.2"><vh>Positions</vh></v>
</v>
<v t="ekr.20050421203956.1"><vh>2003-06-02 Giant Aha re positions.doc ***</vh></v>
<v t="ekr.20050421211313.2"><vh>2003-06-09 Progress report</vh></v>
<v t="ekr.20050421192149.4"><vh>2003-06-12 Cold feet.doc</vh></v>
<v t="ekr.20050421211313.3"><vh>2003-06-17 Progress report (shared tnodes delayed)</vh></v>
<v t="ekr.20050421192149.10"><vh>2003-06-18 gnxs must go.doc ** (Valid concerns, wrong conclusion: see 2004-02-5)</vh></v>
<v t="ekr.20050421192149.19"><vh>2003-06-18 ugnx.doc</vh></v>
<v t="ekr.20050421192149.8"><vh>2003-06-19 Eliminating child indices.doc</vh></v>
<v t="ekr.20050421192149.17"><vh>2003-06-22 Reply 6-22.doc</vh></v>
<v t="ekr.20050421192149"><vh>2003-06-26 What has been gained.doc</vh></v>
<v t="ekr.20050421212523.1"><vh>2003-07-09 New design principles.doc</vh></v>
<v t="ekr.20050421192149.1"><vh>2003-07-26 ++About consistency.doc</vh>
<v t="ekr.20050421195802.1"><vh>A new 4.0?  Consistency</vh></v>
<v t="ekr.20050421195802.2"><vh>A new 4.0?  ironical gnx's</vh></v>
<v t="ekr.20050421195802.3"><vh>A new 4.0?  Derived files can be the SUM</vh></v>
<v t="ekr.20050421195802.4"><vh>A new 4.0?  .leo files must be disjoint unions</vh></v>
<v t="ekr.20050421195802.5"><vh>A new 4.0?  Owned &amp; unowned clones</vh></v>
<v t="ekr.20050421195802.6"><vh>A new 4.0?  Acid tests</vh></v>
<v t="ekr.20050421195802.7"><vh>A new 4.0?  Primary &amp; secondary data</vh></v>
<v t="ekr.20050421195802.8"><vh>A new 4.0?  gnx's redux</vh></v>
</v>
<v t="ekr.20050421212523.2"><vh>2003-07-30 About at-include.doc</vh></v>
<v t="ekr.20050421212523.3"><vh>2003-07-30 More about at-include.doc</vh></v>
<v t="ekr.20050421212523.4"><vh>2003-07-31 comments about 4-0 design.doc</vh></v>
<v t="ekr.20050421212523.5"><vh>2003-07-31 synch reply 2.doc</vh></v>
</v>
<v t="ekr.20050422065602.8"><vh>2003-09 Code details</vh>
<v t="ekr.20050421212523.6"><vh>2003-09-05 New 4-0 Design Notes.doc</vh>
<v t="ekr.20050421212523.7"><vh>Executive summary</vh></v>
<v t="ekr.20050421212523.8"><vh>Background and discussion</vh></v>
<v t="ekr.20050421212523.9"><vh>Examples</vh></v>
</v>
<v t="ekr.20050421212523.10"><vh>2003-09-03 ProgressReport.doc</vh></v>
<v t="ekr.20050421214628"><vh>2003-09-14 tempBodyString.doc</vh></v>
<v t="ekr.20050421214628.1"><vh>2003-09-17 Progress.doc *** (summary of features of 4.0)</vh></v>
<v t="ekr.20050421214628.2"><vh>2003-09-18 Farewell to at-ws.doc</vh></v>
<v t="ekr.20050421214628.3"><vh>2003-09-22 4-0 complete.doc</vh></v>
<v t="ekr.20050421214628.4"><vh>2003-09-23 Transition.doc</vh></v>
</v>
</v>
<v t="ekr.20050421214628.5"><vh>4.1 (2003) Unicode, gui-agnosic code and gnx's in .leo files.</vh>
<v t="ekr.20050422071739.1"><vh>  2004-02-20 From 4.1 readme</vh></v>
<v t="ekr.20050421214628.6"><vh>2003-11-03 4-1a1 released.doc</vh></v>
</v>
<v t="ekr.20050421214704"><vh>4.2 (2004) shared tnodes, positions and @thin (gnx's in derived files)</vh>
<v t="ekr.20050422071739.2"><vh>  2004-09-20 From 4.2 readme</vh></v>
<v t="ekr.20050421214921.6" a="M"><vh>2004-02-05 at-file-thin.doc ***** (abandon the synchronization principle)</vh></v>
<v t="ekr.20050421214921.16"><vh>2004-02-29 New plans-2-29-04.doc **</vh></v>
<v t="ekr.20050421214921.21"><vh>2004-02-29 shared tnode design.doc ***</vh></v>
<v t="ekr.20050421214921.9"><vh>2004-02-29 Code details of shared tnode.doc</vh></v>
<v t="ekr.20050421214921.12"><vh>2004-02-29 Details and schedule.doc</vh></v>
<v t="ekr.20050421214921"><vh>2004-03-02 Transition notes.doc</vh></v>
<v t="ekr.20050421214921.7"><vh>2004-03-02 Better convert routine.doc</vh></v>
<v t="ekr.20050421214921.19"><vh>2004-03-02 Progress report 3-2-04.doc</vh></v>
<v t="ekr.20050421214921.20"><vh>2004-03-04 Progress report 3-4-04.doc</vh></v>
<v t="ekr.20050421214921.14"><vh>2004-03-04 Iter test code.doc</vh></v>
<v t="ekr.20050421214921.11"><vh>2004-03-04 Design answers.doc</vh></v>
<v t="ekr.20050421214921.23"><vh>2004-03-04 Status report 3-5-04.doc</vh></v>
<v t="ekr.20050421214921.15"><vh>2004-03-05 iterators make positions safe.doc ***</vh></v>
<v t="ekr.20050421214921.18"><vh>2004-03-07 positions can be compatible.doc **</vh></v>
<v t="ekr.20050421214921.4"><vh>2004-03-08 A little gem.doc **</vh></v>
<v t="ekr.20050421214921.17"><vh>2004-03-09 New read logic works.doc</vh></v>
<v t="ekr.20050421214921.8"><vh>2004-03-10 cmp and nonzero.doc</vh></v>
<v t="ekr.20050421214921.24"><vh>2004-03-11 Status report 3-11-04.doc</vh></v>
<v t="ekr.20050421214921.10"><vh>2004-03-11 Compatibility report.doc</vh></v>
<v t="ekr.20050421214921.13"><vh>2004-03-13 Heavy lifting.doc</vh></v>
<v t="ekr.20050421214921.22"><vh>2004-03-14 Small code--big aha.doc **</vh></v>
<v t="ekr.20050421214921.1"><vh>2004-03-15 4-2 liftoff near.doc</vh></v>
<v t="ekr.20050421214921.26"><vh>2004-03-19 The taste of dog food.doc *** (eliminating positions)</vh></v>
<v t="ekr.20050421214921.25"><vh>2004-03-23 Status report 3-23-04.doc</vh></v>
<v t="ekr.20050421214921.3"><vh>2004-03-25 4-2a1 now on cvs.doc</vh></v>
<v t="ekr.20050421214921.2"><vh>2004-03-26 4-2 looks solid.doc</vh></v>
<v t="ekr.20050421214921.5"><vh>2004-05-01 at-file-thin works.doc</vh>
<v t="ekr.20050421221330"><vh>@all directive</vh></v>
<v t="ekr.20050421221330.1"><vh>@file-thin-wait won't work</vh></v>
<v t="ekr.20050421221330.2"><vh>Organizing projects with @file-thin</vh></v>
<v t="ekr.20050421221330.3"><vh>Embedded sentinels are essential</vh></v>
</v>
</v>
<v t="ekr.20050422065602.9"><vh>4.3 (2005) Settings dialog</vh></v>
<v t="ekr.20050425053621"><vh>The essentials</vh>
<v t="ekr.20050422065602.1"><vh>The big questions that Leo must answer</vh>
<v t="ekr.20050422065602.2"><vh>How to read derived files reliably?</vh></v>
<v t="ekr.20050422065602.3"><vh>How to ensure the integrity of data?</vh></v>
<v t="ekr.20050422065602.4"><vh>How to represent clones?</vh></v>
<v t="ekr.20050422065602.5"><vh>How to make derived files friendly to cvs?</vh></v>
<v t="ekr.20050422071828"><vh>How to handle unicode reliably?</vh></v>
</v>
<v t="ekr.20050421214628.1"></v>
<v t="ekr.20050425053635"><vh>About consistency</vh>
<v t="ekr.20050421192149.5"></v>
<v t="ekr.20050421192149.2"></v>
<v t="ekr.20050421192149.10"></v>
<v t="ekr.20050421214921.6" a="M"></v>
</v>
<v t="ekr.20050425060514"><vh>About positions</vh>
<v t="ekr.20050421203956.1"></v>
<v t="ekr.20050421214921.21"></v>
<v t="ekr.20050421214921.15"></v>
<v t="ekr.20050425064819.11"><vh>Missing paper here? Referenced in previous paper</vh></v>
<v t="ekr.20050421214921.18"></v>
<v t="ekr.20050421214921.4"></v>
<v t="ekr.20050421214921.22"></v>
<v t="ekr.20050421214921.26"></v>
</v>
</v>
<v t="ekr.20050425060514.1"><vh>To do</vh></v>
</vnodes>
<tnodes>
<t tx="ekr.20050421192149">What has been gained?

The recent design changes are subtle: much remains from the old 4.0 design.  In this posting I'd like to summarize what has been gained.

1. The design of Leo is complete and solid.  There should be no need for further extended design discussions about the fundamentals of Leo.

2. The "single-owner" rule for clones ensures that .leo files will remain consistent and meaningful even in collaborative environments.  This ensures that @file-thin and @file x.leo (@include) can be made to work reliably.

3. There is now a simple strategy for resolving conflicts: namely the Resolve Conflicts command.   This command will depend neither on detailed information from cvs nor on gnx's.  This command may _use_ such information, but the Resolve Conflicts command will be designed to work even if that information is missing or unreliable.

4. It is now clear that gnx's create only non-essential information such as clone links from .leo files to derived.  Information used to resolve conflicts is also non-essential.

5. There is a clear plan for changes to Leo's file formats.  In 4.0 sentinels will contain gnx's.  Clone indices will be gone.  Except for these changes the format of derived files will remain the same.  .leo files will contain xml elements needed to recreate non-essential information such as marks and node order.

Edward

Minor choices

There are a few minor choices yet to make about the new 4.0.   Please make your views known.

1.  Should the new gnx's include an id field?

  Now that gnx's give non-essential data it would be conceivable to eliminate the id field.  This would make it a bit more convenient for new users: Leo wouldn't immediately prompt them for an id.  OTOH, I believe this field might be useful in some situations, say the Resolve Conflicts command.  I am inclined to retain the id field.

2.  Make minimal changes to the format of derived files for 4.0?

In 4.0 sentinels will contain gnx's.  Clone indices will be gone.  No other changes are required for 4.0.  I would prefer not to make any changes to make derived files "friendlier" to cvs.  Such changes would make derived files a bit more cluttered, for very little gain.

Edward

Transitioning to the "new Leo"

I plan to implement the new design in the following phases:

Phase 1: Implement the "single-owner" restriction for clones.

This can be done in 3.13: no change to file formats are needed.   The atFile.read code will no longer do error "recovery".  Experience shows such recovery is useless.  This might be delayed until 4.0.

Phase 2: Revise file formats

This will be the basis of 4.0.  I plan to do this in September.  Derived files will no longer contain child indices.  Sentinels will have full gnx's.  .leo files will contain xml elements needed to recreate non-essential information such as marks and node order.

Phase 3: Implement @file-flat.

Phase 4: Implement @file x.leo

These last two phases could be done in Phase 2, and we'll probably have our hands full with the transition to 4.0.

Edward
</t>
<t tx="ekr.20050421192149.1">A new 4.0?  

I do appreciate people's efforts to revive 4.0.  This shows good "fighting spirit".

It should be possible to get just about everything anyone has ever wanted for 4.0.  I'll be writing up my thoughts in separate, shorter postings in this thread.  I've noticed that I have extreme difficulty following long postings, and I suspect others have the same problem :-)

Edward

P.S.  I am fairly confident that the scheme I am about to discuss will meet with general approval.    I put a question mark in the title to indicate that nothing has been firmly decided.
</t>
<t tx="ekr.20050421192149.10">gnx's MUST DIE

[WARNING: the concerns in this posting are real.  The conclusions are WRONG!  See 2004-02-05]

Standing in the shower this morning I saw again how dangerous gnx's are.

Proof:  Suppose I wanted to create a file called LeoAttic.leo containing old project nodes that aren't very useful but that I wanted to keep around "just in case".  Creating this file _decouples_ the contents of all the nodes in the attic from the ongoing development.  Would I _ever_ want to _recouple_ these nodes?  Absolutely not!  The "old" nodes are _exceedingly_ dangerous!  Bad (old) data are the enemy of all good (up-to-date) data!

As we shall see, the notion of time pervades the entire discussion.

Once I saw how really bad gnx's might be, a whole new train of thought arose immediately:

gnx's are an attempt to solve a problem a the wrong level.  Conflicts are not about nodes, they are about outlines, or even entire projects.

Gnx's are not needed for LeoN.  What we want is collaboration at the outline level, not at the node level.  The identities of outlines and @file nodes change very slowly if at all.  

Why are we so eager to have global node indices?  Why aren't we as suspicious of global nodes as we are of global variables?  What we are asking for is a completely chaotic situation that _falsely identifies_ nodes that a) were created at the same instant and b) now have arbitrarily different data and structure.  In my mind, this is a recipe for disaster.  It's a completely stupid idea.  My criticism can be so harsh because the idea was mine :-)

It is impossible to resolve conflicts between versions of code that vary greatly _in time.  We know this in our bones!  Programs are really complex, and changing anything can have profound consequences throughout the code.  Yes, even in Python.  The recent fiasco with cut/paste shows this clearly.  You want such fiascos to become routine?  Then start messing with arbitrary conflicting nodes!

We can only resolve conflicts in code that has _recently_ been changed.  Even if merging code separated in time could be done, it would be really foolish to create an environment that makes such a horrid undertaking part of the anticipated work flow.  BTW, the algorithms that Rodrigo has been studying are attempts to synchronize development that is happening "concurrently".  Even that is complex.  Synchronizing present development with work that happened two months ago is futile.

Leo works _because of_, not in spite of, the close relationship between Leo outlines and derived files.  Indeed, Leo outlines guarantee that all derived files are related _in time_.  In other words, Leo ensures that derived files were all written "at the same time" or were all current at the time the time the .leo file was written.

Consistency is a property of the entire outline, not of parts of it!  In other words, consistency is a global property, not the sum of individual properties of nodes.  N.B.  This is a far different use of the word "global" than in the so-called "global" indices.  You can't recreate global properties by summing the properties of individual nodes !!

We aren't going to cure cvs's problems with a new file format.  LeoN isn't based on cvs, and the "Resolve Conflicts" command really needs to know only that there is, in fact, a conflict between files.  Something like a gnx might be tempting for the Resolve Conflicts command.  I would consider adding some kind of identifying mark to sentinel lines provided that Leo doesn't use such marks to join nodes!

I would be more willing to adding _modification_ dates to nodes rather than creation dates.  But adding modification dates is going to make cvs's problems worse, not better.

Similarly, @include is also dangerous.  There is no way to keep @included info joined _in time_.  If we are going to have @include at all, we must _break_ the links between nodes in different files.  For sure we must never create links between nodes that are separated in time.

Conclusions

At last the picture is clear.  Gnx's are dangerous because they join old data to new.

Unless I hear an _absolutely convincing_ argument to the contrary, I plan to abandon 4.0 immediately.  Please note: I will be the sole judge of what "absolutely convincing" means.   This is not a matter for experiment or "muddling though".  I won't even consider gnx's further unless somebody shows why connecting old data to new data makes any kind of sense.  Good luck :-)

No matter how many false starts we have taken with gnx's, this clear result is worthwhile and encouraging.  I trust you will agree with me.

The sooner I stop trying to solve the wrong problems the sooner I can get 3.12 out the door and the sooner we can do LeoN and the "Resolve Conflicts" command :-)

Edward

P.S.  The solution to the attic problem is either:

a) To _throw away_ stuff that is no longer useful (like we should) or
b) To make _dead_ copies of stuff and put them in the attic.

We _never_ want stuff in the attic to come to life automatically.  The consequences of the dead coming to life would be similar to a "Friday the 13th" movie :-)

EKR
</t>
<t tx="ekr.20050421192149.11">In the last day or so I have been reviewing and rewriting all the notes about gti's that have appeared in the Leo Forums.  This posting summarizes what I plan to do and why.  I plan to begin rewriting leoAtFile.py in the next day or so.  This is a good time to make any comments...

1. Global Tnode Indices (gti's) are the defining feature of 4.0.  A "full" gti is a string of the form: "userid:location:timestamp:index" where userid is a cvs name, like edream or dthein, location identifies a location, timestamp denotes a time, and index is an integer used to disambiguate gti's that would otherwise be identical.  The user will specify userid and location strings in a file, say leoID.txt. This file is private: it will not be part of any distribution nor will it be part of cvs.  I'm not sure what Leo should do if it can't find leoID.txt.

Derived files will specify defaults for the userid and location strings.  A minimal gti is a string of the form "::timestamp", where the userid, location and index strings are taken to be the defaults.

2. Leo does not need childIndex values in sentinels in order to reconstruct the outline. Leo can deduce the order of nodes introduced in the derived file by the @others directive.  The order for nodes introduced in the derived file by section references is inessential. Therefore, we may store order information separately in .leo files.  .leo files will have &lt;marks&gt; and &lt;order&gt; elements containing this inessential information.  The &lt;marks&gt; and &lt;order&gt; elements will be a list of gti's of nodes.

3. All essential information (structure and content) of an @file tree must be kept together.  Therefore, derived files must be fat.

@file trees in outlines can be thin. We no longer need information in the .leo file to recreate clone links.

@file-asis and @file-nosent trees in outlines must be thick.  The corresponding derived files contain no sentinels.  The _only_ way to create a thin derived file is to use @file-nosent or @file-asis. This way all essential information is in one place, namely in the outline. 

4. Whether a node is clone or not is a property of the outline in which the node resides; it is _not_ an intrinsic property of a node.  There can be no such thing as a clone index.

Unanswered question:  what happens if a node is used in several derived files and has different text in each?  This issues may happen more now that outlines don't mirror structure in derived files. This can't be a show-stopper, and it must be handled somehow, if only to warn people away from certain practices...

5. Gti's should allow Leo to handle included .leo files in an outline.  These will probably be represented in an outline as @file x.leo, though perhaps @include x.leo would be more accurate and less likely to cause confusion with the various flavors of @file nodes.

6. There is still the possibility of cvs corrupting the structure of derived files.  The format of derived files should be designed so that Leo can recover from corrupted derived files.  @+nodes will contain only the gti, and to have the headline _follow_ the @+node.  This scheme may be expanded in order to facilitate recovering from cvs interference.

7. Some way should be found to eliminate extra blank lines in derived files. This is a long-standing request, and it should be a requirement of 4.0 derived files.  This may involve defining new sentinels to handle whitespace issues.

The following are the goals of the new format for derived files:
a) The minimum of sentinels needed to properly recreate the outline.
b) A robust way of telling whether newlines belong to sentinels or not.
b) A minimum of intrusion and ugliness,
c) No unnecessary blanks lines.

8. The code in leoAtFile.py will follow the present model, except that routines may be dispatched using a dispatching dict as in the syntax colorer.  The code in leoFileCommands.py will change slightly (mainly to handle &lt;marks&gt; and &lt;order&gt; elements.  It would be possible to use Python's xmllib or similar modules, and this is a fairly low priority, and not really connected with any other 4.0 design issues.

Edward

P.S. A note to myself:  Python dictionaries will simplify both the read and write code. 
</t>
<t tx="ekr.20050421192149.12">By: edream ( Edward K. Ream ) 
New (long) design notes 
2002-10-21 08:49 
One of the great joys of the Leo project is the way it takes, in unexpected ways and at unexpected times, surprising new directions. The last time a major change in Leo happened was a little more than a year ago when I decided that @file trees were feasible. I believe a similar seismic shift is about to happen. 

There seems to be a natural rhythm involved: expansion and contraction, invention and consolidation/completion, positive and negative. I believe part of this natural rhythm involves forgetting, especially forgetting why things don’t work. Often a slightly new point of view invalidates formerly real obstacles. 

Such changes and rhythms are heralded by largely unconscious thought. Recently there have been a great many requests for user options, as well as other features. I believe this has had the mostly unconscious effect of changing my thinking from “what is the right way?” to “why not do it every way?” Another formerly unconscious impetus for the present avalanche of ideas was the recent question about whether someone had imported the entire Linux kernel into Leo. That brought to my mind several problems with the present way of doing things: 

1. Leo files can get very large. 
2. It can take a long time to read all derived files. 
3. The larger the file, and particular the more clones, the more time it takes to move cloned nodes. 
4. Leo practically hangs when recovering from read errors. 
5. Using .leo files with CVS is a real pain. I’ve pushed this to the background by promising the “Resolve CVS conflicts” command, but this command may not be easy to do, or even possible. 

These thoughts got me thinking about all parts of Leo’s implementation, especially clones. I recalled the discussion about “global” clone indices. These thoughts have suddenly created a flood of new ideas, in several related directions or themes. 

The major problems facing any new implementation strategy involve clones. At present, clone linking happens entirely within .leo files as the result of redundantly saving all information in the derived files as part of the .leo file. The derived files create content, the .leo file creates clone links, marks, etc. 

This post is already too long, and it is just the introduction. So I am going to break the rest of this posting into 5 themes, each a "response" on this thread. </t>
<t tx="ekr.20050421192149.13">Big picture: why 4.0 is important 
2002-12-03 22:05 

There are many reasons why 4.0 is important. Yes, gti's solve many implementation problems. Yes, gti's allow for much smaller .leo files. But these are minor issues in the grand scheme of things. 

As I see it, the biggest unfinished project is to make Leo suitable for using _everywhere_. In particular, I'd like to see some or all of the Python project done in Leo. Whether or not that ever happens isn't up to me, but this goal keeps me focused. 

As I see it, there are several main drawbacks to doing Python in Leo: 

1. Sentinels in derived files. I'm not sure how serious this issue is in general, and recent developments (@file-asis
and @file-nosent) are a pretty complete solution. 

2. Spurious CVS diffs. Without gti's _many_ sentinels change whenever a node gets move. With gti's sentinels
_never_ change. Of course this issue won't arise if, say, Python uses @file-nosent trees, but in general gti's will
make using Leo with CVS much easier. 

3. Dreaded read errors. These can and will happen if .leo files aren't downloaded from CVS "in synch" with derived files. Such read errors will disappear completely with gti's. 

So 4.0 will make Leo CVS friendly. In the Open Source world I think this is absolutely essential. 
</t>
<t tx="ekr.20050421192149.14">Actually, my first analysis incomplete.  Suppose we eliminate vnodes completely?  Leo would then redraw the screen directly from the inodes.  This wouldn't be so hard: each inode would contain an list of Tk.Text widgets.  The drawing code merely has to place the an unused widget in the correct place in the screen (the Tk.Canvas).

In some sense, the array of Tk.Text widgets in each inode is like the join list, but only visible nodes need be on this list.  Furthermore, this list only needs to be updated when the outline is actually redrawn.  It would be easy to insert or delete new Text widgets in this list.  There are lots of possibilities, all easy to do in Python.

However, replacing vnodes with inodes is likely to be a very bad idea, for several reasons:

1. As I mentioned earlier, this implementation would require massive changes throughout Leo's code.  All "user" code would have to use an iterator to traverse the outline.  In particular, the fundamental code to manage the outlines would be changed significantly and would almost certainly become more complex.

2.  Inodes complicate Leo from the user's point of view.  The present data model is much better because

** The vnode tree corresponds directly to what the user sees on the screen **

Giving up this correspondence seems like a big step backward.

3.  As mentioned in an earlier post, marks present a problem without vnodes.  Perhaps each inode could contain a list of locations (in the full tree traversal) that should be marked.  However, updating this kind of list when the outline changes could be very complex.  It wouldn't be horrible to say that all joined nodes must be marked in synchronization, but it wouldn't be a step forward.

4. There are other ways of improving Leo's performance without touching the data model at all.  In particular, rewriting the vnode, tnode, atFile and fileCommands modules as C++ code in a Python extension will almost certainly double key operations.  And as mentioned in the first post, there are optimizations that the vnode class can do to avoid deleting dependent trees and then immediately recreating them.  And don't forget that the average speed of our computers doubles every 2-3 years or so.  So just waiting for a faster machine is a highly effective optimization!

Revised conclusions

Replacing vnodes with inodes is possible, and it might even provide some performance gains for huge outlines containing many clones.  However, replacing vnodes with inodes would be an extremely high risk project: very complex, with possible negative consequences.

The present code base is plenty good enough for most outlines, and there are much simpler and better ways to speed up key outline operations.  The result of all this noodling is that I have no more interest in inodes and their attendant complexities.

Edward
</t>
<t tx="ekr.20050421192149.15">Design notes: tnodes and vnodes

I have just realized that only tnodes need to be uniquely identified with a gnx (global node index).  Vnodes do _not_ need to be so identified.  Indeed, vnodes are now, and can always be, "anonymous".  This will simplify the format of both .leo files and derived files in 4.0.  I shall justify this conclusion in several informal ways:

1. Only tnodes have indices that in pre-4.0 .leo files.  This causes no problems whatever.  Indeed, the vnodes section of .leo files shows the nesting structure of vnodes by the nesting of the v tags.  This is all that is required to create vnodes properly.

2. After creating an outline, Leo never at any time needs to refer to a particular vnode "by name."  Instead, Leo simply traverses vnodes using the threadNext, back, next, etc. methods.

3. While we speak loosely of cloned vnodes, what we have in fact are _separate_ vnodes that share uniquely identified tnodes.  In pre-4.0 files, tnodes are identified in .leo files by tx fields.  These indices are generated as needed when Leo writes the .leo file.

4. Conceptually, vnodes are simply locations on the screen (or equivalently, positions in an outline) attached to tnodes that hold body text.  The implementation is different from the concept: headlines are held in vnodes.  This was done for historical reasons: in the Borland version of C it was natural to put headline in vnodes.  

However, it would be more natural to place both headlines and body text in tnodes because cloned nodes must all have the same headline.  In fact, you could say that the present code acts "as if" headlines were really stored in tnodes.  It may be desirable later on (after 3.11.x becomes truly stable) to move headlines into tnodes.  Whatever representation is finally chosen, however, Leo (in particular the read code and the event handlers) will ensure that all vnodes sharing the same tnode will in fact have the same headline.

5. The identity of tnodes (represented by gnx's in 4.0) is sufficient to do everything that Leo needs to do.  A formal proof would be difficult.  An informal proof is easy:  Leo's 4.0 code is very similar to the pre-4.0 code in all respects, and the pre-4.0 code did not use the identity of vnodes in any real way.

For example, when reading a 4.0 derived file, Leo can create (anonymous) vnodes in the outline without any kind of identity whatever.  All that is needed is the nesting structure of vnodes indicated by the sentinel lines in the derived file.

Summary &amp; Conclusion

However one looks at it, it seems clear that the identity of vnodes is never needed.  Leo's present code (including the 4.0 code) never uses the identity of vnodes, and it is not at all clear how Leo could use the identity of vnodes even if Leo had them.

The only thing that is important is the identity of tnodes.  Leo uses the fact that vnodes share tnodes to create join links and in turn to ensure that all vnodes have the same headline.  I believe the 4.0 should do essentially the same.

The present 4.0 code needlessly writes gnx fields for vnodes in.leo files and derived files.  I shall remove these fields very soon. This will remove quite a bit of clutter, which is especially important in derived files.  Perhaps more importantly, this discussion shows that the present 4.0 code is already essentially complete.

Edward
</t>
<t tx="ekr.20050421192149.16">Serious objections to link-target nodes?

I am seriously considering adding link-target nodes to Leo 4.x.  As stated in the 10 breakthroughs post, there are a number of advantages to doing so, chiefly the following:

1.  A foundation for making distinctions about kinds of clones.  This may be very important in LeoN.

2.  A way to greatly speed up fundamental outline operations.

3.  A way to unify vnodes and tnodes internally, and a way to simplify the format of .leo files.

The drawbacks:

1.  Slightly different user interface.  Target nodes will have a bulls-eye.

N.B.  The key features of clones will be retained.  Link nodes will appear to have descendents just like today's clones, and you will be able to edit those virtual descendents just as today.

2.  There may possibly be restrictions on using link nodes in LeoN.  No such restrictions will exist in single-user Leo.

3.  This change implies some differences in Leo's data model.  Scripts may be affected slightly.

4.  This change implies some difference in how Leo traverses trees.  By default (and maybe always?) Leo will simply skip virtual descendents during tree traversals.  I'll have a bit more to say about this in another post, coming today.

If there are any serious objects to this plan I'd like to hear them immediately.

Edward

P.S.  Single-user user will probably always have the option of using old-style clones.  Certainly this option will be available for the foreseeable future.

EKR
</t>
<t tx="ekr.20050421192149.17">Many, many thanks for this posting.  I was beginning to think people didn't care about the end of 4.0 :-)

Let me assure you, LeoN is important to me, and I shall be glad to support it in any way I can, _including_ adding "names" that are exactly like the "old" gnx's.  The point of the original post was not that any particular tool or technique was wrong.  Such an idea would be brain-dead.  Rather, the main idea was this:  it would be _fatal_ to Leo to confuse old data with new data.  We simply must not allow this.

&gt; Still, I strongly dislike this idea that the whole of the .LEO is a monolithic, binary block which changes drastically with each little change in the LEO source tree.

My original mistake was thinking that gnx's would allow Leo to link cloned nodes reliably across different files.  That idea was COMPLETELY WRONG.  Old nodes are poison: we must never link to them.  So it is not the gnx's themselves that are dangerous, it is using gnx's to link nodes that is wrong, wrong, wrong.

This is basically a database discussion.  Whatever the means, we must maintain a single, unified and consistent view of all the data that we presently keep in a .leo file.  Failing to do this kills the Leo project.  I can see no way to maintain consistency of clone links when these links somehow reside in distinct files that may be changed arbitrarily at different files.  No distributed database could possibly exist in this kind of chaotic environment.

In short, CONSISTENCY OF DATA is driving everything.  The discussion about "worse is better" is irrelevant here.  We are not talking about marketing or hype.  We are talking about the engineering foundations of Leo.  Also, the question of "heroic" solutions does not apply here.  I didn't kill 4.0 because I was timid, I killed 4.0 to protect the consistency of Leo's data.

Please note:  cvs also creates a "unified" view of data.  Yes, cvs may manage many data files, but for each file cvs presents a consistent picture.  Moreover, cvs forces users to update before committing, so that changes always happen "at the same time".

&gt; For me, LEO nodes can serve as a "grand unifying concept"

No.  You can't unify data using  "atoms".  Consistency of data is a global properly, especially where clones are concerned.  And it's not just the "identity" of data that is important.  The data must be "up-to-date" as well.  You CAN NOT create a distributed database when that data base does not control each of its parts!

&gt; the current implementation of LEO is totally unsuited for representing large source trees, and even more so for multiple people working on the same tree.

As you know, I've spent quite a bit of time thinking about alternative representations of data in Leo.  However, I am inclined to disagree with you.  Sure, Leo would bog down dealing with huge outlines, but so what?  Large enough data files are going to break any implementation.  And I fail to see how the present implementation would discourage the LeoN project.  Leo's vnodes and tnodes are proven themselves to be robust views of Leo's data.

&gt; To summarize this post: 

1. 4.x is a step to make LEO more modular. A monolithic LEO file is unsuitable for collaboration without massive additional tool support. 

2. Factoring out suboutlines as self-contained LEO files, which seamlessly integrate with the super-outline, is a relatively simple way to both get "low-tech" collaboration support and a dramatic increase in scalability.

Alas, both of these points ignore the fundamental problem of ensuring the consistency of Leo's data.

Edward

P.S.  I believe that the "Resolve Conflicts" command should be relatively straightforward.  They key idea is that this command should _not_ try to guess the intentions of programmers.  Rather, this command will display differences between outlines in some simple way and leave it to programmers to resolve those differences.

EKR
</t>
<t tx="ekr.20050421192149.18">This really is a continuation of the thread Big picture: why 4.0 is important.

Yesterday, after writing that gti's don't solve all CVS problems, I had another Aha regarding 4.0 derived files.  This is still not a complete solution, and it may be a big step in that direction.  Recall that even with gti's two problems with CVS remain:

Problem 1: moving nodes can cause @node sentinels to change.
Problem 2: CVS "helpfully" inserts lines into a file when it detects a conflict.

These two problems are related in a nasty way:  CVS can corrupt the structure of  changed sentinels!

I was lightly dozing in the middle of the day, thinking quite vaguely about these problems when I suddenly I realized both problems can be made to go away!  With gti's there is really no need to specify outline structure in derived files at all!  We could adopt the following rule:

Structure rule: derived files specify content; the outline specifies structure.

The structure rule can be made to solve both Problem 1 and Problem 2 as follows:

1. Leo 4.0 could generate just a single @gti sentinel for each section reference. (or body text in @file-noref trees). This sentinel will contain only the gti of the defining vnode.  This gti doesn't change no matter how the outline is reorganized, so CVS will never alter it!

2. For documentation purposes, Leo should generate the headline text of the vnode as a comment following the @gti sentinel.  Headlines _can_ change, so CVS might alter such comments when CVS detects a conflicts, but changing this comment will _not_ corrupt the @gti lines!  Robust recovery from CVS meddling may be possible.

3. Similarly, @ref sentinels could represent the actual reference in the body text.  Again, the @ref sentinel would be followed by the actual text of the reference, so the actual @ref sentinel will never be altered by CVS.

The result is a radical simplification of derived files.  At present (without gti's) Leo is forced to represent outline structure using nested @nodes sentinels and @body sentinels.  When outline structure changes, arbitrarily many of these sentinels can change.   In the new scheme, all these sentinels can be replaced by a single @gti sentinel that can never change.


Earlier I said that the Aha (i.e., the structure rule) does not solve all problems:

1.  CVS could still corrupt @gti sentinels if it decides that a range of lines including @gti sentinels have changed.  I'm not sure exactly what to do about this, but clearly the structure rule will reduce the number of times this will happen.  Indeed, rather than posing problems for CVS's diff (as changed @node sentinels presently do), @gti lines will provide "islands of stability" for CVS.

2. As always, there is the problem of keeping .leo files and derived files in synch.  Clearly, this problem can never go away completely.  What Leo must do is to provide ways of a) discovering out-of-synch conditions and b) recovering in a straightforward way.  With the new scheme, we know we are out-of-synch if an @file or @file-noref node refers to a gti not found in the derived file, or conversely, if the derived files contains an @gti sentinel with no corresponding tnode in the outline.  The outline and derived files could be out-of-synch in other ways that would be undetectable.  For example, suppose the only change to an outline is that a node was moved.  This will cause no changes to gti's.  On the other hand, such kind of out-of-synch conditions might safely be ignored.  I'm not sure about this though...

We might try to associate a global time stamp (using the same techniques used to create gti's) with @file nodes and @file-noref nodes.  Leo's atFile read logic can then determine whether the derived file was created by the @file node.  However, this is a dubious idea: if this timestamp is represented in the derived file then CVS will complain and interfere when it changes...

3. If we adopt the structure rule then outlines must specify the structure of all @file trees, just as it does now.  In particular, @file and @file-noref nodes can _not_ be placeholders.  But we have the option of not saving body text that is used _only_ in @file and @file-noref trees. This has the potential to radically reduce the size of .leo files.  

However, removing information from .leo files makes it more difficult to recover from errors.  I think the solution to this kind of dilemma are option, either in leoConfig.txt or the .leo file itself specifying whether the .leo file will contain all information (that is, whether some body text may be deleted).  I'm not sure about what this option should be when using CVS.

4. With or without the structure rule CVS can still interfere with .leo files in unpleasant ways.  However, the structure rule makes it impossible to use placeholders for @file and @file-noref nodes, so the structure rule ensures that LeoPy.leo must be part of Leo's cvs tree.  This might be considered a step backwards.


To summarize:

1. The structure rule completely solves Problem 1, and goes a long way towards solving Problem 2.
2. The structure rule greatly simplifies derived files.
3. The structure rule allows smaller .leo files, at the cost of making error recovery more difficult.
4. The structure rule would require that LeoPy.leo be part of Leo's CVS tree.

An important way to evaluate designs is whether the design is "headed in the right direction", that is, whether the design is tending to become more complex or less complex.  Heading in the right direction is always important because simplifications tend to suggest further simplifications.  I have some hope that further improvements may be possible...

Clearly, the structure rule greatly simplifies derived files.  As a result, both the atFile.read and the atFile.write code will become simpler.  The structure rule simplifies error recovery by essentially making it impossible!  Either body text for a gti appears somewhere (in the .leo file or the derived file) or it doesn't.  If it doesn't, then we have an out-of-synch condition for which no error recovery is possible.  We can make sure that .leo files contain all the text for all their gti's by writing all body text to .leo files, just as is done now.  Even so, out-of-synch conditions could still happen if the derived file defines a gti that does not appear in the outline.  In that case the outline is missing a node used to create the derived file and no recovery is possible.  But similar situations exist today, so the new way is no worse than the old.

To summarize further: the structure rule appears to be headed in the right direction, and only experience will tell whether error recovery considerations will allow us to use "small" .leo files.  The structure rule appears to require that LeoPy.leo remain a part of Leo's CVS tree, which might be considered a step backward.  I doubt that CVS will every handle XML files well.  On balance, I think the structure rule is worth implementing to see what happens.

Now is a good time for your comments.

Edward
</t>
<t tx="ekr.20050421192149.19">Many thanks, Paul, for these thoughtful comments.

&gt; Speaking off the top of my head, I understood than Gnx's mainly attempted to solve the problem of false conflicts in CVS raised when the local structure around a node changed but the text did not. eg in a derived file.

I had _many_ fond hopes for gnx's.  This was one of them.

Let me be clear:  there is still plenty of room for invention concerning Leo file formats, data structures, whatever.  Today's "aha" was simply a realization that it is stupid to rely _too much_ on a particular kind of data.  We could, in fact, put something that looks like a gnx into derived files _provided_ that we don't use the "new gnx"  stupidly.  

We must not use the new gnx's to create clone links, but I see no harm in using them as aids to the present "mirroring" scheme.  I trust you see the importance of the distinction.  The mirroring scheme relies on data _in the .leo file_ to create links, especially clone links.  The forbidden gnx scheme threw all that data away pursuing a fool's errand.  The fundamental reason the mirroring scheme works is that all the data that must be consistent are in a single outline.

&gt;  The situation in a .leo file is worse because the resulting CVS conflict marker would break the XML structure.

My present opinion is that trying to recover from the damage cvs may do to _any_ file is futile.  Better to take the much simpler approach of using cvs conflicts merely to signal that we must do conflict resolution.  Conflict resolution should be done on two successive, undamaged versions of a file.  Both adjectives are important. "Successive:"  We don't want to do anything but the simplest merges.  Note that cvs enforces this constraint by requiring updates before commits.  "Undamaged:"  It's so much easier just to ignore cvs's notion of what is going on.  What is _really_ going on is that the user has changed, inserted, deleted _outlines_, not just blocks of text that happen to have these confusing sentinels in them ;-)

&gt; There are two problems here,

1. Structural changes cause node ID's to change
2. CVS conflict markers break XML structure.

As I indicated in my reply to Rich, there is a good chance that we can make problem 1 go away, and in so doing clean up the appearance of derived files.  I'd like to sidestep problem 2 completely by dealing with only undamaged files.  I think this is entirely reasonable.

Furthermore, the old gnx's would really have not have made the Resolve Conflicts command easier to do.  In fact, they may have mislead us into considering algorithms that were fundamentally flawed.  Make no mistake, the Resolve Conflicts command is non-trivial.  I am only suggesting that the real problems with the old gnx's would likely have made gnx's useless (or even worse than useless) for resolving conflicts.

My vision is this:  The Resolve Conflicts command _must_ rely on the user to make sense out of the differences between two conflicting files.  All we can expect is that the Resolve Conflicts command can display the _approximate_ differences between two outlines.  The user is going to have to sort out what is important and accurate.  At worst, two users are going to have to email back and forth to discuss what actually was intended.  N.B.  This is _exactly_ the worst case in how people presently use cvs.

In other words,  diffs are _always_ and always _inherently_ approximate, so the "pseudo precision" of the old gnx's was always going to be nothing but a red herring.

&gt; What if we had a two component GNX?

My first, and probably last, reaction is that this kind of scheme is way too heroic.  I don't see how it could help, and even if it could help we don't want to put ugnx's in derived files.  Also, I've considered various schemes to "put all the pollution in the derived file in one place".  I don't like the whole notion, though I play with it from time to time.

No.  The way to solve problems is not with heroism but with fundamental simplicity.  If we accept that the "Resolve Conflicts" command will work only on undamaged files we get the following benefits:

1. We can play with schemes to simplify derived files still further.  This won't really change anything, and it would placate those who dislike (oh horrors!) the data that Leo puts into derived files.  And it would make cvs a bit happier, though not enough happier to make a real difference ;-)

2. These schemes can add new data to .leo files.  We can "think these thoughts" because the unrealistic expectations for gnx's have died.

Summary

The "Resolve Conflicts" command should work only on undamaged files.  I am willing to consider adding data either to .leo files or derived files to help the Resolve Conflicts command, provided we don't have unrealistic expectations about what that data can do.

Paul, was it you who pointed out fundamental problems with resolving conflicts, even using "plain" cvs?  I think we should remember those fundamental limitations and design a Resolve Conflicts command that relies on the user to do what people do best, namely understand the _meaning_ and _intention_ of changes to code.

Edward
</t>
<t tx="ekr.20050421192149.2">4.0 is dead! Long live Leo!

[WARNING: the concerns in this posting are real.  The conclusions are WRONG!  See 2004-02-05]

I have been growing increasingly uneasy about 4.0.  The thoughts weren't fully formed, and I had vague worries about gnx's creating problems similar to the ill-fated backup .leo files.

Last night, the picture suddenly became clear.   Gnx's have no chance of working.  Don't panic; good things will come of this.


Conflicts: the fatal flaw

Lying in bed last night I saw this simple picture:  two versions of the _same_ node (two nodes with the same gnx), each having different subtrees!

Yes, this is possible.  There is nothing to prevent two people from editing separate copies of a .leo file so as to create different children for any node.  People can rearrange .leo files in endless ways.

Unless I am greatly mistaken, there is no way of resolving all the messes that could result.  This is the kind of problem that invalidates an entire design.  Gnx's are history.

In retrospect, the situation seems clear:  it is not nearly enough to identify nodes uniquely.  Entire trees much match in outlines and derived files.   Identifying _individual_ nodes as "the same" in no way guarantees that the trees of which they are a part have similar shape or contents.  End of story.  End of design.


Long live Leo

Last night I was more relieved than upset.   Maybe I was confident that something good would appear.  Maybe I was just grateful to see the true situation clearly.  More likely, my intuition has known for quite awhile that gnx's wouldn't work.  Anyway,  when I awoke this morning a completely new train of thought appeared.  It went something like this:

1. Gnx's won't work, so the problem they solved must be solved anew.  The fundamental problem that gnx's actually _would_ have solved is creating sturdy links between nodes in derived files and nodes in outlines.

2. Alas, we can easily imaging those "sturdy links" creating conflicts when reading derived files:  the structure of cloned nodes in a tree won't match the structure of the cloned nodes in the derived files.  The read code is toast: it has no way of recovering.  This is the "dreaded read error" with a vengeance!

3. Therefore, we are stuck with the "mirroring" scheme used in all recent versions of Leo:  clone links must be contained in the .leo file, not in derived files.  What if we make a virtue out of necessity?  That is, what if .leo files once again become the primary source files?

We can remove _almost all_ sentinel lines from derived files!

Could the read code detect changes made to the derived file and incorporate those changes into the outline?  Yes, it could!  Leo would only need #@ lines in derived files that denote the start of a node.  These lines would contain no other information: just raw marker lines.  We could follow those lines with #@&lt;&lt;name&gt;&gt; lines to denote the section names, but such lines would be strictly optional.  

Leo's atFile.read code would first ensure that the number of nodes in the derived file matched the number of nodes in the outline.  If not, no simple "untangling" is possible, and a warning message would be sent to the log pane.  Otherwise, Leo would compare the text section by section, and replace the outline with new text in the derived file as needed.  This would be a major simplification of the read code!

4. What if we do the unthinkable and remove _all_ sentinel lines from the derived files?

There are several consequences:

A: @file nodes would become @file-nosentinel nodes by default.  Don't worry!  For the foreseeable future Leo will allow you to write @file-sentinel nodes by default if you want.

B:   Derived files become "clean"; Leo adds nothing to them.   This is _crucial_ for the wider adoption of Leo.  Aside:  I do like the context provided by #@&lt;&lt; name &gt;&gt; comments.  There could be an option to write such lines in @file-nosentinel files.  Such extra lines don't matter at all if Leo never reads those derived files.

C:  Leo will  load .leo files much more quickly.  The pass that loads @file nodes will do nothing.

D:  We can't update outlines from changes made derived files; no more automatic untangling.

This last consequence seems severe.  However,  recent developments make it seem bearable, for the following reasons:

A:  Leo's Open With command provides an easy way of updating derived files outside of Leo and then integrating the changes back into the Leo outline.

B:  The new workflow has almost entirely eliminated the need for me make changes to derived files outside of Leo.  I run tests in a separate copy of Leo, and if that copy gets corrupted I make changes in an earlier version of Leo that still works.

Yes, I must remember not to close Leo.  (Maybe a new Inhibit Close command would prevent me from doing something that I don't want to do.)  Anyway, even if I forget and do close Leo with a corrupted copy of leo.py, all I would need to do is open LeoPy.leo from a non-corrupted copy of leo.py.

So there is less need for reading derived files when opening a .leo file.   We might rely on an explicit Untangle command to update the outline as needed, provided of course, that at least minimal sentinels were written to the derived files.


Leo and cvs

I am considering making @file-nosentinel files the standard way of interacting with cvs.  Let's look at the consequences:

1. Cvs conflicts will have less direct effect on derived files.  Cvs conflicts alter derived files, and must be dealt with, but the conflicts will not affect how Leo reads the corresponding .leo file.

2.  We should be able to design a Resolve CVS Conflicts command to deal with such conflicts in a semi-automatic way.  This command might just set up a script-oriented Find/Change panel.  The script would walk through the derived file and present changes in the outline, using the code for the Go To Line command.

Leo should able to open plain text files in a plain text window.  This should have been done long ago.  We may want to open the conflicting derived file in such a text window while executing the Resolve CVS Conflicts command.

3. Cvs is sufficient to manage development of derived files!  It no longer has to worry about tracking clones across files.  This is a _major_ step forward.

4.  As before, cvs is completely incompetent to handle changes or conflicts in .leo files.   In some ways, the situation is just the same as it has always been.   In some ways, the situation is actually better, because of cvs can handle derived files properly.

A manual solution is viable in the short term: developers would see the conflicts in derived files, agree which conflicting .leo file to use as the "base" .leo file, and put the resolved code into the base .leo file by hand.

Longer term, we clearly need to implement the xmldiff approach to resolving conflicting .leo files.  It should be possible to run xmldiff from within a reference copy of LeoPy.leo, i.e., a copy not affected by the cvs conflicts.  N.B.:  the xmldiff command or script can be self contained and will affect no other part of Leo's design.


Questions to consider

1.   How important is it to have the option of automatic untangling when reading .leo files?  Are the advantages gained in dealing with cvs enough to overcome the inconvenience of not having automatic untangling?

2.  Supposing that Leo does write sentinels, what format should be used?  The choices:

A:  Use the old way.  This is fairly appealing visually and it robustly identifies structure.

B:  Use the minimalist way using only #@ markers, possibly followed by #@&lt;&lt;name&gt;&gt; lines.  This way probably isn't robust enough for automatic untangling, and it could be used to support an explicit Untangle @file Node command.

C:  Use the way  developed for 4.0, modified so that it doesn't use gnx's.  I am not comfortable with this approach.  It seems to neither as visually clear as point A and it is more sensitive to cvs conflicts than point B.  My vague unhappiness with this format has been growing for quite some time.


Summary and conclusions

Gnx's create problems that cannot be resolved.  My intuition is relieved to be rid of them and I am not looking for ways to bring them back.  Yes,  I am open to ways of resuscitating 4.0, and I don't expect that this can be done.

The section called "Long Live Leo" proposes new ways of using Leo.  These changes are a major new way of understanding Leo how fits into the world.  These changes imply no major changes to Leo's code base.

These changes will create many good things in the long run, regardless of how unsettling they may be in the short run.  My intuition tells me that the "new old Leo" is a big step in the right direction.

Using clean derived files, devoid of all sentinels, may be best when using cvs.   Clean derived files promise to simplify how people collaborate using Leo.  Resolving cvs conflicts _inside Leo_ becomes feasible using relatively simple scripts.

Your comments and suggestions are very important now  I shall make no major changes in Leo's code until your comments have sunk in thoroughly.

Edward

P.S.  I envisage continuing the 3.x version numbering for the foreseeable future.  It may well be that 3.12 can come out soon.  I'd like that very much.

EKR
</t>
<t tx="ekr.20050421192149.20">Using gnx's safely

[Warning: these are obsolete ideas:
- thin derived files carry all essential info, including structure and content.
- The root @thin node in an outline carries only marks, expansion state and uA's]

My initial intuition wasn't completely faulty, I think:  using gnx's naively has the potential to create almost unlimited chaos.  The problem is this:  if "flat" (derived) files contain embedded structure, the structure implied by a derived file may not match the structure of corresponding (cloned) node in a .leo file.  As I have said before, the situation may result in an extreme form of "dreaded read errors".  Note that such mismatches of structure may occur even if @file-thin is in effect.  When many people are editing flat files simultaneously the potential for conflicts seems almost unlimited.

Maybe we should revisit an old idea, namely that outlines should carry structure and that flat files should carry content.  The implications:

1. There will be no @file-thin or @file-thick options.  Outlines, not flat files, will be the _only_ determinant of outline structure.

2.  Flat files will use gnx's _only_ to delimit text, not to show structure.  This will greatly simplify the format of flat files.  Indeed, about the only sentinels will be:

#@+ &lt;tnx&gt;
#@- &lt;tnx&gt;

These mark the start and end of body text for the tnode with the given tnx.  There will be a few other sentinels, roughly the same as presently, for marking section references, verbatim escapes, etc.  What will _not_ be part of flat files are the sentinels that describe nested vnodes.  Note: we still have to handle the _effects_ of nested vnodes, which is why we need #@- &lt;tnx&gt; sentinels and @ref sentinels.

3.  Reading a flat file might be considerably simpler than the present atFile.read code.  The code need not recreate structure; it need only update the contents of tnodes that exist in the outline.  It might be that the new simpler code could handle cvs conflict markers in a separate pre-scan.

4.  This scheme works well with the Resolve CVS Conflicts command that I just described on a recent thread.

We might not want this simplification if Gil can convince me that Leo can easily representing conflicts in tree structure.

Edward
</t>
<t tx="ekr.20050421192149.3">Links, clones &amp; mode bits

Today I wrote a script that scans a .leo file looking to see how clones are used.   Something like this script could be used to convert clones to links.

The results were most interesting:

targetsInDerivedFiles: 418
 [snip]  Clones for which exactly one item on the join list is in an @file tree.

clonesInNoDerivedFiles: 20
[snip]  Clones for which no item on the join list is in an @file tree.

clonedAtFileNodes: 9
[snip]  Cloned @file trees themselves.

multipleTargetsInDerivedFiles: 8
    &lt;&lt; Append any unused text to the parent's body text &gt;&gt;
    &lt;&lt; Check both parts for @ comment conventions &gt;&gt;
    &lt;&lt; Compare single characters &gt;&gt;
    &lt;&lt; Set the default directory &gt;&gt;
    class nodeIndices
    frame.OpenWithFileName
    recentButtonCallback
    replacePatterns

The sections in multipleTargetsInDerivedFiles show a bug in how I have been using Leo (!)  For example, OpenWithFileName is indeed defined twice in LeoPy.leo (!!)  Can you see how this happened?

This script shows that the vast majority of clones could be converted to links automatically, simply by picking as the target the unique node on the join list that appears in some @file tree, including cloned @file nodes themselves.  multipleTargetsInDerivedFiles indicate problems with the present version of LeoPy.leo.  clonesInNoDerivedFiles are clones that are "floating" without any target in any derived file.  The clones2links script could pick one at random to be the target without great harm being done.  This might be subject to the constraint that no link can point to an ancestor, something for which the present test script did _not_ check.

So I am beginning to think that links might actually be better, in some sense, than clones.  Leo would refuse to allow targets that appear twice in derived files, whether twice in the same derived file or in two separate derived files.

N.B.  This special pleading!!  This argument is _not_ disinterested.  I am considering reneging on my statement that clones will remain as they are in the single-user version of Leo.  The problem is that this is going to lead to all sorts of coding problems.  In effect Leo would need one or more "mode bits" that indicate which flavor of code to use:

- use clones or use links.
- use gnx's or use file indices.

The present code uses the app().use_gnx switch to determine whether to write gnx's or not.  This was always intended a _strictly temporary_ expedient.  Enshrining such mode bits as part of Leo would be very bad practice.  For example, the recent train wreck involving cut &amp; paste of nodes was probably related to app().use_gnx:  the old code works only if the tnodesDict is cleared, but that wasn't done when use_gnx was false.

In any event, there is no way I am going to allow even 1 mode bit in Leo on a permanent basis.  Each mode bit in effect doubles the number of paths through the code.  Sure, we must have backward compatibility, but that kind of special-case code can't be helped.  Mode bits must be avoided.

Edward
</t>
<t tx="ekr.20050421192149.4">Please read: cold feet

After all the interesting design work with shared vnodes,  I am having major doubts that such a change to Leo is wise.  Here are my concerns:

1.  Using shared vnodes will likely cause all kinds of subtle compatibility problems:  Eliminating tnodes (even with the self.t = self hack) has the potential to create complex and subtle changes in the meaning of code.  For example, at present both vnodes and tnodes contain status bits.  In particular, merging the vnode and tnode "visited" bits into a single vnode is going to affect existing code.  I am already seeing this kind of problem as the result of comparatively minor changes needed for 4.0.  Eliminating tnodes altogether makes me extremely nervous.

2. Sharing vnodes means that different nodes on the screen are _really_ identical.  For example, with a shared vnode scheme all shared vnodes would have to be marked if any shared vnode were marked.  This would be a small violation of what users might expect.

3.  The present separation between vnodes and tnodes is actually quite natural.  Tnodes represent shared information.  Vnodes represent nodes on the screen.

4.  Although theoretically interesting, major changes are needed neither to support 4.x nor to support LeoN.

In short, sharing vnodes looks like all pain and no gain:

1. The potential for a major disruption to Leo's progress is very real, in spite of the trick of making positions look like vnodes.

2.  Scalability issues do not seem pressing now.  Moreover, there may be less drastic ways of improving Leo's performance besides changing Leo's fundamental data model.

3.  Abandoning the shared vnode scheme would mean that I could release 4.0 beta 1 in a matter of a week or three.

I am going to let these thoughts and fears sink in for a few more days at the least.  There is no need for another quiet time.  Indeed, I encourage your comments on this vital subject.

Edward

P.S:  Leo's file formats are not, in fact, strongly connected to how Leo represents data internally.  We are pretty much free to represent data in .leo files and derived files as we like: Leo's various kinds of read/write can easily and cleanly translate from any external file format to/from any internal data representation.

EKR
</t>
<t tx="ekr.20050421192149.5">Note: a copy of this will be on Leo's wiki shortly.  I am posting this here because I think it is important that everyone sees it.

Yesterday in the bath I was mulling over how Leo should deal with conflicting contents of the "same" node (nodes with the same vnx or tnx).  In particular, I was considering Gil's picture of conflict nodes on Leo's wiki.

I didn't understand a lot of the details and assumptions about that picture, and I was wondering how to go about understanding the details, when I suddenly realized that it wouldn't matter if I _did_ understand the details.  Any conflict scheme based on a complex model, with complex operations, implementing subtle distinctions would fail completely.  To be useful to the user, conflicts must be dead simple to see, to use and to understand.  Everything must be obvious and intuitive to the _naive_ user.

This conclusion is based on my own experience with "dreaded read errors" (errors that arise in 3.x as the result of mismatches between the outline structure in .leo files and the corresponding derived files.)  Here I was, the creator of Leo, with intimate knowledge of all aspects of Leo and its implementation, and my one and only reaction to a dialog asking me to make a choice was helpless, blind panic.  The dialog didn't provide me with nearly enough information to make a proper choice, and the panic that the dialog induced in me would have prevented me from making an informed choice even if I did have the missing information!

A similar situation pertains regarding conflicts.  There is no use in having fancy distinctions about kinds of conflicts.  The user won't understand those distinctions, no matter how sophisticated the user is.  I am sure that I would not be able to understand those distinctions.  Moreover, we _must_ assume that the user knows nothing about various flavors of conflicts.  Users won't read documentation until well _after_ the conflicts have been presented to him or her, if ever.  And I would be most unwilling to answer endless questions on Leo's Help Forum regarding various kinds of conflicts.

With all this in mind, the fundamental design issues became quite clear.  The following is based on Gil's diagram, with several simplifying assumptions:

- Leo must indeed handle conflicts.  Gil has just pointed out that cvs can't detect conflicts in nodes in different files.

- Leo can't use dialogs to resolve conflicts: dialogs offer too little context.

- N.B. Leo will represent conflicts by a single _conflict node_ (possibly cloned).  Conflict nodes behave _exactly_ as regular nodes as far as most of Leo is concerned.  In particular, conflict nodes are cloned if and only if the corresponding conflict node would be cloned.  Moving, inserting, deleting, cloning conflict nodes happens just as with any other nodes.  A change to any part of a conflict node is propagated to all other joined nodes, and all those joined nodes will be conflict nodes.

- N.B. On the screen, Leo shows conflicts as "fat nodes."  Fat nodes look like a tree, but in reality _fat nodes are a single node_.   Leo draws fat nodes as a _conflict_ tree, consisting of a _main headline_ and one or more _conflict headlines_.  Leo will draw the conflict tree in a distinctive manner to indicate that the nodes of a conflict tree are closely related to the main headline.   There may be a command to delete a selected conflict headline. BTW, the details of how to draw fat tnodes are confined solely to code in leoTree.py.

- N.B. _main headline_ represents the _main data_ of the conflict node.  Leo uses the main data of a node in all operations.  In particular, ** Leo only writes main data to .leo or derived files **

In other words, _conflicts are invisible_ to Leo for most purposes.  All conflicts will be lost when Leo exits.  Until then, users can update or replace the main data with whatever data they choose.

- N.B. Leo will represent fat nodes in memory as a _fat tnode_.  Fat tnodes are exactly like present tnodes except that they contain a new conflictList ivar.  This ivar holds a list of alternate values for the headline and body text stored in tnodes.  This ivar does _not_ affect the rest of Leo (except as noted below).  In particular, tnode getters and setters _ignore_ the contents of conflictList, so the file code is unchanged, the find command ignores alternative conflicting values, etc. etc.  (Of course, it would be possible to write scripts that would access t.conflictList data.)

In this representation,  the _conflict data_ is the contents of t.conflictList.  The main headline is the fat tnode's regular headline text.  Conflict headlines are headlines associated with items in conflictList.

- N.B. There is only one conflict tnode associated with any set of conflicting nodes with the same gnx or tnx.  This is because in 4.0 tnodes contain headline text.  No matter how many conflicts are associated with a particular vnode or tnode, all such conflicts get merged into a single conflict tnode.

- We can extend the Cut Node and Copy Node commands to work on conflict headlines.  We shall certainly also want to have Go To Next/Previous Conflict commands.  We might want to add Insert Conflict Node or Delete Conflict Node commands, but these are optional.  There should be no need for other operations on conflicts.  As always, users could create new nodes to squirrel away data wherever they please.

That is just about all there is to it.  Note in particular what is _not_ present:

- We haven't created anything new; we have just added the conflictList ivar to tnodes.  True, Leo draws fat nodes differently, but this changes little.

- All operations on nodes remain exactly as have been in the past.  In particular, Leo always writes main data to .leo and derived files.  In other words, Leo mostly ignores conflicts.

- Leo will not expect the user to see or understand any kind of distinctions regarding conflicts.  It doesn't matter how or why the conflicts arose: if data conflicts, no matter what the reasons, Leo simply shows all the possibilities.  Leo will pick one of those possibilities (somewhat at random) to use as the main headline.  I say "somewhat at random" because in some cases Leo might make an educated guess about what the main data should be.  See the next section.

Avoiding conflicts

In some cases Leo won't even bother to create items in t.conflictList, but will instead silently replace one version of data with another.  In other words, it may be valid for Leo to make distinctions about types of conflicts, _as long as those distinctions never become apparent to the user_ !!  In particular, I plan to change the format of .leo files to add a file modification date to the &lt;v&gt; element of each @file node.  This will allow Leo to do the following:

1. Leo will silently ignore conflicts that arise solely because the user has edited a derived file after the derived file is created.  Leo does this now: it's essential to make "automatic untangle" work without being intrusive.

2. Leo must warn when reading a derived file that was created _before_ the modification date in the corresponding &lt;v&gt; element in the .leo file.  Such situations are more like file reversions than conflicts.

3. Leo probably will warn when conflicts arise one the _same_ node as the result of reading the node from two different derived files.  Such conflicts seem to me to reflect bad organization.  Note: in spite of this warning (probably given in the log pane) Leo will create entries in t.conflictList as usual.  There is _nothing special_ about such conflicts.

Conclusions &amp; Summary

I believe the general form of the conflict issue is becoming clear:

- Conflicts must be simple.   Users, even sophisticated and experienced users like me simply won't be able to understand complex conflicts.

- Leo must present only one kind of conflict to the user, and there must be no new kinds of outline nodes or outline operations.  Fat nodes only appear to be new.  In fact, they work exactly as always.

- Adding a few simple commands will suffice to support conflicts.

-  Leo can continue to ignore some (most?) conflicts, silently replacing one node with another.

The scheme just outlined can handle all conflicts because it doesn't attempt to do too much.  Conflicts are simply presented to the user for the user to handle or ignore.  Leo can't be expected to reconcile conflicting versions of nodes.

I believe this scheme presents a firm foundation for further work.  Indeed, I probably won't implement fat tnodes now, secure in the knowledge that they can be added easily at any time.  For 4.0 alpha 0 it will suffice to do the following:

- Add modification dates to &lt;v&gt; elements in .leo files corresponding to @file nodes.
- Warn on conflicts in the log pane and use the version of the conflict in the derived file that is read last.

With these changes we can see often serious conflicts actually arise.

That's all for now.  I am most interested in your comments.

Edward

P.S. Thanks again, Gil for your many comments.  To repeat, your diagram on Leo's wiki was crucial to all these thoughts.  Fat tnodes are obviously based directly on that diagram.

EKR
</t>
<t tx="ekr.20050421192149.6">Conflicts: new directions

Sometimes in the creative process it is useful to _increase_ the confusion.

With that in mind, I'd like to propose two different, contradictory approaches to resolving node conflicts in 4.x.

I.  Avoid the conflicts entirely

 It might be possible to avoid conflicts completely using cvs.  Indeed, cvs is quite successful in managing concurrent development.  Let's not forget this fact!

Suppose we declare that all developers using Leo must use cvs if we are going to use Leo cooperatively.  Might it not be true that this rule will, by itself, solve all the 4.x conflict issues?  It certainly seems plausible to me.  After all, that's really cvs's job!

II. Redesign Leo along client-server lines

Another way would be to use the Lotus Notes approach, whatever that is.  Here is where homework comes in.  Does anyone have personal knowledge of Lotus Notes?  One of my friends does have that experience.  He's a realtor, so he won't be tainted by any knowledge of Leo, which might be really useful: he will focus on the big picture.

The nice thing about design is that we can contemplate huge changes in Leo (like making it work like Lotus Notes) without any investment in coding.

Comments please.

Edward
</t>
<t tx="ekr.20050421192149.7">OTW: Replace clones with hoists?

I've titled this Off The Wall because it is pure speculation.

Nobody is more fond of clones than I, and we are seeing all sorts of implementation and design problems that arise directly from clones:

- Clones slow down fundamental outline operations.  This limits how big Leo outlines can be.  OTOH, all programs have limits, and it may be that clones really aren't the limiting factor.  Clearly, drawing many nodes on a screen is going to be slow in any case.

- Clones appear to make @file-thin impossible.

- Clones greatly complicate error recovery, and error recovery drives all aspects of Leo's design.

- Clones complicate resolving conflicts between files, both .leo files and derived files.

So my question is:  can we get the _benefits_ of clones without actually implementing clones?

Well, what are the benefits of clones?  How we answer this question is crucial!  I am going to focus on how I actually use clones; I'm going to ignore uses of clones that I haven't yet thought of.

1.  Clones allow different views of a tree.  We clone nodes and gather them together for easy viewing.

2.  Clones join all these different views together.

3.  Clones are "live".  Altering any clone, including its structure, alters all other clones.

At present, all clones of a node are equivalent to the node itself.  There is no such thing as a "master" node from which all clones are derived.  Suppose we alter this as follows:

1.  We replace clone nodes with "link nodes" (patent pending).  Link nodes have no structure and no content.  They merely point to another node, the target node.  Linking to a link node is exactly equivalent to linking to the target node.

2.  Selecting a link node takes us to the target node.  

We must make sure that we don't get dizzy as Leo skips around the outline.  The great advantage of clones is that we can move from clone to clone in a project headline _without_ constantly redrawing the outline, and without having the vertiginous experience of the outline changing.  So if we eliminate clones we want to do so in a way that doesn't create visual chaos.  

Hoisting may provide a way to do this.  Hoisting is one of MORE's other cute ideas.  Hoisting just means replacing the view of the outline in the outline pane by the presently selected suboutline.  Naturally, there must be a way of "dehoisting":  going back to the original view.  BTW, in MORE, (and in Leo without clones) there can be arbitrarily many levels of hoisting.

Hoists can be done in several ways:

1.  Inside the Leo window:  Leo would replace the view of the entire outline with a view only of the linked outline. This isn't so jolly because the set of link nodes in the project view disappears and we would have to dehoist in order to see the project view.

2.  In a separate Hoist Window.   This can be done in two ways: we could show the hoisted outline either in the Hoist Window or in the main Leo Window.  There are advantages to either.  Perhaps we shall want to allow either way.  Either way, clicking a link node hoists the target node.

If we show the hoisted outline in the main Leo window we need a command to copy the present node (Presumably a project node including links) to the Hoist Window.  If we show hoisted code in the Hoist Window then we need no such command.  Clicking a link node shows the hoisted tree immediately in the Hoist window.

I think I favor using the Hoist Window for showing the project view: then we always see the project view while hoisting.  The alternative is to show the hoisted outline and the body text in two panes of the Hoist Window.  The advantage is that the entire outline is always visible in the main Leo window.


Consequences

This would be a radical restructuring and simplification of Leo:

1.  There would no longer be a need for separate tnodes and vnodes.  All we need is a separate link for link nodes.

2.  Gnx's make link nodes possible!

3.  Leo could insist that link nodes appear only outside of @file trees.  This kind of restriction would be similar to the ban on orphan and ignored nodes.

4.  Therefore, only .leo files contain link nodes.  Link nodes refer to whatever tree the gnx refers.  Link nodes _have no structure_.  Actually, we could allow link nodes to have descendents, but the structure of such descendents does not affect the link node in any way.  It would probably be less confusing, though, to disallow link nodes from having children.  In any event, this is a minor point.

5.  Leo could use a much simpler file format, possibly even OPML.  However, we would need a kludge so that we could represent link nodes in OPML.

6.  Leo's fundamental outline operations would be much simpler.  No need for dependent trees.  No need for join lists!

7.  @file-thin nodes become possible again.  Derived files can completely specify structure.  Links nodes refer to gnx's, whatever their structure.   Without clones we have no more structure mismatches when reading derived files.

8.  Structure conflicts are still possible between two .leo files, i.e., between two versions of a single flat file.  I'm not sure how to resolve this.  In any case, the problems are no worse than before.

9.  When deleting an original node all link nodes become "broken".  We must deal with this somehow.  Maybe we can even ignore it and let the user delete the links.  Or we can prompt.  Note that undo can restore the target of the link, so we probably don't want to automatically delete link nodes (unless undo also restores the link nodes!).

Summary

Hoists would eliminate clones and dependent trees, at the expense of a bit more work on the users part.

This is a completely experimental idea.  I would have to implement it completely before deciding whether it is a good idea.  It does have appeal.

The notion of tabbed Leo windows and the recent work with Mark and Recent windows has shown that altering Leo's visual presentation is relatively easily done.

Naturally, your comments and suggestions are crucial.

Edward

P.S.  I think the best time for blue-ski thinking is when things are already completely confused :-)  Now is that time, so let her rip.

EKR
</t>
<t tx="ekr.20050421192149.8">No more indices in derived files!

There is no need for the "childIndex" field of sentinels in derived files!  These are the fields that change when the user moves nodes; eliminating these fields will eliminate many spurious changes in cvs diffs.

The .leo file contains all then information we need to reconstruct the order of nodes, so eliminating these index fields is easy.  The atFile.read code has a routine called createNthChild(n,parent,headline) where n is the childIndex field obtained from a @+node sentinels.  But we can get the n from the outline!  The @+node sentinel contains the headline text of the node, so we merely search for a child node, say c, of parent and use c.childIndex() instead of n.

Two minor complications:

1. All sibling nodes in @file trees must have unique headlines.  Leo would deal with violations of this restriction just as it presently deals with orphan and ignored nodes.

Does anyone object to this restriction?  If so, Leo could have  an option to write child indices in derived files.

2. Error recovery is a little different with this scheme.  At present, recovering from "dreaded read errors" throws away the old outline and trashes clone links.  This kind of "recovery" is almost useless.

It might be best to do no recovery at all.  After reading the outline Leo would check to see if there were any read errors.  If so, Leo would reread the entire outline just as it does for the Read Outline Only command, and then mark the @file nodes that could not be read properly.  The great advantage of this new error recovery scheme is that it won't break clone links in the outline.  It also removes some very ugly code in atFile.read().

I plan to make these changes for 3.13, or 3.12 if I just can't resist :-)

Edward

P.S. This Aha is a direct result of killing gnx's.  It is now clear that .leo files must be primary.  Indeed, only .leo files guarantee the integrity of clone links.  This new point of view got me thinking about moving information from derived files to .leo files.  Once I did that the rest followed immediately.

EKR
</t>
<t tx="ekr.20050421194542.1">0</t>
<t tx="ekr.20050421195802.1">A new 4.0?  Consistency

The overriding requirement for 4.0 is that Leo must preserve the _meaning_, _consistency_ and _intention_ of code in all situations.  I know from experience that this does not happen automatically.  In particular, I am haunted by the ill-fated concept (and implementation) of "backup" .leo files.  Such files did not contain all the information typically found in .leo files.  Backup .leo files were truly dangerous.  It was way too easy to unwittingly revert to previous versions of code using such backup files.

The meaning, consistency and intention of code is not a property of individual nodes; it is a property of all nodes of a group. But which group?  A derived file?  An entire .leo file?  Some other group? 

I am most interested in this question:  what is the "smallest unit of meaning" (SUM) in a .leo file?   I believe there is a way to make derived files the smallest unit of meaning.  This will have important consequences.

Edward

P.S.  Cvs treats files as the smallest unit of meaning.  Yes, cvs reports conflicts in parts of files, but ultimately cvs requires humans to resolve conflicts at the file level.  In other words, cvs does the best it can to create accurate diffs, and CVS DOES NOT RELY ON THE ACCURACY OF DIFFS.  Whatever the diffs, in essence cvs relies on humans to keep _files_ consistent.

</t>
<t tx="ekr.20050421195802.2">A new 4.0?  ironical gnx's

gnx's are a tool in search of a problem to solve.  Individual nodes are not, and never can be, the smallest unit of meaning in a program.  The proof is immediate:  the meaning, consistency and intention of code is a property of _all_ the nodes of some group.  Therefore, gnx's are not _any part_ of a solution to the fundamental problem.

Indeed, I have adopted the following ironical rule of thumb: any solution requiring gnx's is the wrong solution.  I'll state this rule of thumb without any proof or justification except this:  it has turned out to be useful and valid.

Edward</t>
<t tx="ekr.20050421195802.3">A new 4.0?  Derived files can be the SUM

Without clones,  the smallest unit of meaning (SUM) in a .leo file would be a derived file.  Without clones, we could manage projects as usual--we would have to be careful about old copies of code lying around, and that is nothing new.

Clones complicate matters.  The problem is this:  clones create copies of code _in the same .leo file_ that may be used in different contexts.  Indeed, in the present implementation clones may be used in arbitrarily many derived files.  What happens if some, but not all, of those clones change in the derived file?  How can Leo know what to do?  In particular, there would be no automatic way for Leo to read .leo files containing such conflicts.

This suggests the following strategy:

1.  Make it the goal of 4.0 to make derived files the "smallest unit of meaning".

2.  Restrict clones (very slightly!) so this goal is possible.

The next several postings show how to do this.

Edward</t>
<t tx="ekr.20050421195802.4">A new 4.0?  .leo files must be disjoint unions

At present clones can affect any part of a Leo outline and a .leo file is a mass of interrelated information.  We must disentangle these relationships if we are to make derived files the smallest unit of meaning.  

In mathematical terms, we want a .leo file to be a _disjoint union_ of sets.  In other words, every node of a .leo file must be associated with exactly one _owning file_: either a derived file or the .leo file itself.  The union of these files "covers" all the nodes of the .leo file.  This union is disjoint:  the intersection of any of the files that cover the .leo file is empty.

I hope this picture is clear:  a .leo file becomes a jigsaw puzzle such that _no pieces overlap_.  This is the property we must have if derived files are to be the smallest unit of meaning: we must _never_ have two derived files associated with the same node.

Edward

</t>
<t tx="ekr.20050421195802.5">A new 4.0?  Owned &amp; unowned clones

In the last posting I suggest that every node in a .leo file should be associated with a single "owning" file: either the .leo file itself or a single derived file.   It is easy to enforce this property as follows:

1. When writing derived files, Leo will check to see that no clone in that derived file is used in any other @file node.  This restriction is similar to the present restrictions on "orphan" and "ignored" nodes.

2.  When writing .leo files, Leo will distinguish between "owned clones" (clones contained in exactly one derived file) and "unowned clones" (clones contained in no derived files).  Leo will write unowned clones and their subtrees as usual.  Leo _not_ write the subtrees of owned clones.  Instead those owned clones completely depend on the derived file for their existence and meaning.

3.  When reading, Leo will attempt to recreate owned clones from derived files.  To do this, Leo will use links within the .leo file that point at the derived file.  Once again, gnx's should not be used.

Notes:

1.  An unowned clone may have an owned cloned in its subtree.  That's not a problem: Leo will simply not write the subtree of owned clones, even if the owned clone appears as a subtree of an unowned clone.

2. The look, feel and operation of Leo's clones will remain almost completely unchanged with these new restrictions.

3.  It is quite possible that links from owned clones to derived files may be broken.  This is _absolutely essential_ if derived files are to be the smallest unit of meaning.  In other words, we must put "meaning" in a _single_ place to ensure consistency, and if we do that there must _necessarily_ be the possibility that the links to "that place" will be broken.

4. Derived files become primary data; .leo files are secondary.  This is true for both @file-thin and @file-thick trees.

Edward

</t>
<t tx="ekr.20050421195802.6">A new 4.0?  Acid tests

Let us suppose that every clone in a .leo file is "owned" by exactly one file: either the .leo file itself or exactly one derived file.  The following are the "acid tests" of this scheme:

1.  Is putting nodes in "an attic" completely safe?

The answer is clearly yes.  If the attic is a separate file the cloned nodes in the attic can have _no effect_ on any derived file.  It doesn't matter how "out of date" node in the attic get; they aren't shared by any other derived file.  Because we _don't_ use gnx's there can never be improper linking between old and new nodes.

2.  Can @file-thin be made completely safe?

At long last the answer is an unequivocal yes.  Because a .leo file is a disjoint union of separate files, the _temporal_ or _semantic_ relationships between different derived files _does not really matter_.  Sure, one would like to keep all derived files together "in time", and the integrity of the .leo file will not be affected even if that is not so!!  This is a hugely important result.

3. Can @include x.leo be made completely safe?

Again, the answer is finally yes.  The subsidiary question is: what if two .leo files "share" the same derived file?  Leo can easily handle this situation as follows.  When Leo reads a derived file, it will simply create a cloned @file node if Leo has already read the file.

For example:

@file f.py
@file x.leo
 ....@file f.py

When Leo handles the @file f.py node in x.leo, Leo will see that it has already read f.py.  Leo will simply create a clone of f.py.  Linking these two trees ensures data consistency.  Note that both clones are owned, and owned by the same derived file.  The code that checks the ownership of cloned nodes will have to take this complication into account.

Significantly and ironically, gnx's need not and should not be [used] to implement @include.  In this context it is even _more_ essential that there be the possibility of broken links into derived files.  Indeed, the _last_ things we want are permanent unbreakable links!

Edward</t>
<t tx="ekr.20050421195802.7">A new 4.0?  Primary &amp; secondary data

To summarize the preceding posts:

- Derived files can be made the smallest unit of meaning.
- Derived files are the "primary data" from which Leo recreates outlines when reading .leo files.
- Because derived files contain all primary data, links from owned cloned nodes to derived files might break.

There are a few minor details left to handle:

1.  We don't want to change derived files just because "secondary" data like marks change in the outline.  Therefore, .leo files should contain "mark bits" even for @file-thick nodes.  These bits will be held in new xml elements somewhere in the .leo files.

2. For the same reason w don't want derived files to contain information sufficient to recreate full node order.  We don't need that information at all for @file-thick trees.  For @file-thin trees we shall again resort to adding the information somewhere in the .leo file.

The crucial point is this:  marks and ordering information are inessential (secondary) data, so file no real harm is done if such data becomes inconsistent with the derived file.

Edward

</t>
<t tx="ekr.20050421195802.8">A new 4.0?  gnx's redux

Now that gnx's are no essential part of the new 4.0 we can consider adding them back for convenience.  The new gnx's would be used to complete the link from an owned clone to a node in a derived file.   In most cases it would suffice to use the filename:headline combination to make this link.  However, it might turn out to be convenient to use a timestame.index scheme similar to the old gnx's.  No need for a "creator" field: we are not requiring that links be absolutely unique.  In any case, the new gnx will be augmented by a virtual pointer from the owned cloned node to a single derived file.

Edward

P.S.  The exact details don't matter much now that we aren't asking  the new gnx's to do much.

P.P.S  Many clone links might break when the .leo files and derived files are out of synch.  This really is the same situation as happens now.  Indeed, the announcement that links have been broken will serve as a useful warning that not everything is in synch.  In any event, there is no way to deal with this kind of situation automatically, and no real danger in this situation either.

EKR</t>
<t tx="ekr.20050421203956">Breakthrough re shared trees &amp; a surprising result

Today I have had series of important insights regarding Leo's implementation.  To state my conclusion first:

** No representation of clones is likely to be better than the present way using vnodes and tnodes. **

Since Leo's earliest days I have wondered whether it might not be possible to improve how Leo represents clones.  The present scheme laboriously creates and deletes "dependent trees" of vnodes when a node is moved that is a clone or a descendent of a clone.  Couldn't we do better if we represented clones as a single tree of nodes that is shared by other nodes?

Today I thought of a new way of looking at things.  You can think of it as a thought experiment, though at first I thought of it as a viable implementation strategy.  This new point of view has greatly clarified the essentials of the situation.  It strongly suggests the conclusion stated above.

The thought was this:  suppose vnodes are not "real" or "permanent", but merely epiphenomena of the underlying "reality", the shared nodes.  I thought of vnodes as existing in "the floating world."  The "real" nodes, the potentially shared nodes, I call inodes (information nodes.)  All fundamental information would reside in inodes.

Furthermore, suppose that vnodes are created by the tree class _as a by-product of redrawing the screen_.  This actually is a pretty clever idea.  It has the following big advantages:

1.  Only a single iterator is ever needed to traverse the tree of inodes.  This iterator is called only by the tree class when the tree needs to be redrawn.  The result of the tree traversal is the tree of vnodes.   Most other Leo code (and user scripts) can use and traverse the tree of transient vnodes  without any modifications.

2. Creating and destroying dependent trees happens automatically as the result of the tree traversal.  There is no need for the complex special cases found in the present vnode class.

So this seems like hot stuff.   Very clever indeed, if not a big breakthrough.  However, a closer examination reveals that almost nothing is gained by this "cleverness".   If this clever way doesn't really improve matters, it is most unlikely that any other scheme will.  This insight is the real breakthrough.

So let us look at the details and implications of generating temporary vnodes while redrawing the screen:

1. Because inodes may be shared, they need not have unique parents.  But the generated vnodes _will_ have unique parents.  Therefore, all the code that traverses vnodes will work just fine.

2. Alas, alas, there is a huge cost in creating vnodes on the fly.  We must create:
a) the vnodes themselves,
b) the Tk.Text widgets corresponding to visible vnodes and
c) all the data in the vnodes, including links to other vnodes and links to inodes.

These costs can't be optimized away.

3. Most importantly, even if vnodes are "ephemeral" they are in fact essential.  We need _unique_ vnodes to represent _different_ areas on the screen, even if those vnodes are on the same join list.  In other words, we need _distinct_ Tk.Text widgets in order to draw the screen at all.

4. Moreover, it turns out that join lists are essential!  Indeed, if the headline of a cloned node changes, _different_ Tk.Text widgets must be updated on the screen.  No matter how elegant the implementation of inodes, we _still_ must have a way of quickly updating all joined vnodes _on the screen_.

Summary

At long last the situation has become clear:

Fact 1: Separate vnodes are required to redraw the screen at all.
Fact 2: Join lists (whether in vnodes or tnodes) are also essential so that items _on the screen_ can be redrawn.
Fact 3: Leo must create and destroy dependent trees of vnodes, _because those trees appear and disappear from the screen._
Fact 4: Using shared inodes will not change either fact 1, 2 or 3. 

In short, nothing would be gained by trying to represent clones as shared trees because:

** join lists and dependent trees are needed to draw the screen correctly **

This came as a great surprise to me.  It resolves a question that has been on my mind for at least 8 years.  As a result, there is no need to do any prototyping of inodes or iterators. 

Edward

P.S.  As I have said before, the present representation of join links can most likely be improved greatly.  I plan to do this (and probably not much else) for 4.0.

P.P.S. There are other implementation problems with temporary vnodes.  For example, what would carry marks?  These problems aren't really part of the discussion, but they would be quite important if one actually rewrote the code!

P.P.P.S. I'll look at Gil's ideas (and any others) thoroughly before fixing 4.0 in stone.  As Gil suggests, designing 4.0 properly is more important than releasing in the next week :-)
EKR
</t>
<t tx="ekr.20050421203956.1">@nocolor
In my last posting on Leo's Developers Forum I discussed in great detail how Leo can represent clones as shared trees of vnodes.   In this post I'll show why this new design can have minimal impact on Leo.

To summarize this design:

- The design introduces two new kinds of vnodes: link vnodes and target vnodes.
- User scripts and Leo itself must use position iterators to traverse trees.

You might think that such changes would require massive changes throughout Leo.  Yesterday I realized that is not true;  we can drastically change how Leo represents clones without changing how "user code" deals with those trees!

OK.  I've kept you in suspense long enough.  The aha is the following:

**** Leo 4.x can treat positions just Leo 3.x treats vnodes****

It's actually quite easy to do this:

- c.rootVnode() will return a _position_, namely position(c.tree.rootVnode,[])
- c.currentVnode() will return the present position, namely c.tree.currentPosition.
- All the setters and getters presently found in the vnode class will move to the position class.

With these easy changes, user code (including almost all code within Leo!) can traverse trees as usual.  For example:

v = c.rootVnode()
while v:
....&lt;&lt; do something with &gt;&gt;
....v = v.threadNext()

Moreover, v.headString(), v.headString(), v.setVisited(), v.isCloned() will work just as before, even though v is a position, not a vnode!

Using this scheme,  all tree traversals will cover all nodes of the entire tree.  Indeed, without special code it won't even be _possible_ to tell the difference between old and new ways of representing outlines.  In particular, if we want to create the notion of "anchor" clones (so that we can say that anchors may appear in at most one derived file), we shall have to add more code to do so.  In the new design, all cloned nodes are identical, just as they have always been!

In short, it appears that transitioning to the new representation of clones might be done in a matter of days, not weeks or months!


Discussion and more details

This is all you really need to know about this Aha.  You can stop reading now if you want :-)  However, there are several fascinating details related to this Aha:

1. In retrospect, we can see that positions can be made into a "proxy" class for the old vnode class.  In other words, the position class becomes a thin wrapper that preserves the interface to the "old" vnode class.

I'm not sure I ever would have discovered this in a language like C++.  In C++ you can't get away from the actual names of classes: they appear in so many declarations.  In python, we don't _know_ and don't _care_ about what the type of "v" is.  So we can say v = c.rootVnode() without worrying about what c.rootVnode() "really" returns.

2. What's in a name?  There was some discussion recently about whether we should rename the vnode class to be something else.  At that time I dismissed the idea, saying it would require massive changes to the code.  We see now that neither the original suggestion nor my response were on precisely on target.  _It doesn't matter_ what the name of the vnode class is!  It has _no effect_ on statements such as v = c.rootVnode()  (!!!)

In other words, the "real" vnode class becomes essentially invisible to user code.  Sure, you can get at the details of virtual vnodes if you really wanted to, but typical "user" scripts will never care.  I'll probably leave the name of the vnode class unchanged because the code is clear the way it is.

3.  I mentioned that the getters and setters of the vnode class will move to the position class.  It certainly makes no sense to leave those methods in the old vnode class.  What's really fascinating is this: the fundamental traversal methods could used in the position class _without any changes_!

For example:  the v.threadBack method is defined in the vnodes class as:
@color

def threadBack (self):
	
    back = self.back()
    if back:
        lastChild = back.lastChild()
        if lastChild:
            return lastChild.lastNode()
        else:
            return back
        else:
            return self.parent()

@nocolor
** Exactly the same code will work in the position class! **

If you think this point is unimportant, you should look at the code for previous versions of the inode class.  It was based on the vnode setters, and it was much different and much more complex.

The reason the code works in either the vnode or the position class is that the position class has p.back, p.next, p.parent and p.firstChild getters that hide the details of "moving around in the bag". 

5.  This kind of flexibility is similar to so-called protocols in objective-C.  A protocol is an agreement or guarantee that a class implements certain methods.  Objective-C can enforce that agreement.  Python doesn't support protocols directly, though it would be trivial to check that a class contains all the methods of a protocol.  The reason threadBack can move unchanged to the position class is that the position class supports the same protocol that the old vnode class supports.

Edward</t>
<t tx="ekr.20050421203956.2">More about vxnodes

It is truly remarkable how the design process works.  I think of it as "jiggling".  Conclusions that seem firm get radically revised with just a slightly different point of view or a slightly different assumption.

The real reason why the 10 breakthroughs were possible was that I forgot temporarily all the problems involved with iterators and tree traversals.  It turns out that it will be quite useful to reintroduce some of these complexities.  So these breakthroughs were really a result in a subtle change in point of view.

Here are my present plans:

1.  The tree drawing code will use iterators internally.  This will allow the tree drawing code to avoid creating the entire tree of vxnodes when redrawing the screen.  Given an iterator, we can easily move backward or forward in the tree of vnodes/vxnodes, so scrolling can be done quickly.  Scrolling need only:

- create the vxnodes that have just be become visible as the result of the scrolling 
- destroy the vxnodes (or at least their Tk widgets) that have just become invisible.

This should make the asymptotic performance of the screen redraw code constant (linear in the size of the visible portion of the Tk.Canvas widget, which is bounded and therefore constant).

2.  Using iterators requires an expanded version of the notion of "the current vnode".  Basically, the new definition of the current vnode is a tuple (v,position), where v is a vnode and position is an iterator, basically a list of link nodes that are the ancestors of the current vnode.

3.  When this code works in the context of the tree drawing code, we can make the notions of position available to the rest of Leo.  We can add an optional "traverseLinkedDescendents" option to all of Leo's tree traversal routines:  v.back, v.next, v.threadBack, v.threadNext, v.visBack, v.visNext, v.firstChild, etc.

In other words, we can implement Josef's recent request that " the traversal of linked subtrees should be made an option."

Edward

P.S.  I am, perhaps naturally, slightly uneasy about making such radical changes to the core of Leo.  For one thing, some implementation surprises might loom.  I _think_ I understand all the details, but there is no guarantee that more jiggling won't happen.  More importantly, unifying vnodes and tnodes is a huge change to Leo's data model.

So I'll probably let this percolate for a few more days.  In the meantime, fixing the cut/paste but in the cvs code is really important.

EKR
</t>
<t tx="ekr.20050421203956.3">Positions &amp; shared vnodes

Yesterday I started a detailed design, including a lot of code, for the shared vnodes version of Leo.  This long posting will discuss  the design in detail.  A later, shorter posting will discuss a number of Aha's that are the direct result of this work.

IMPORTANT: you do _not_ have to understand this long post in order to understand the Aha's in the next post.  That's why the Aha's are so exciting!

Summary

No doubt this seems quite complex.  It is.  It turns out that we can completely hide this complexity from the user.  The next post will show how.
Edward
</t>
<t tx="ekr.20050421203956.4">There is a glitch in the vxnode idea:  scrolling.  If we only create vxnodes for vnode visible on the screen, we shall have to recreate vxnodes when the user scrolls the screen.

A simple fact drives the design of vxnodes:  it is easy to traverse a vnode tree, even in the presence of link vnodes, _provided_ that we traverse the tree of vnodes all at once.  Indeed, the traversal code need only recursively call itself to create the vxnodes from a shared portion of the vnode tree.

There appears to be no easy way for Leo to add vxnodes when scrolling, so it looks like the drawing code must create the _entire_ tree of vxnodes every time the screen is redrawn.  This needn't be expensive: the drawing code will create Tk widgets (the expensive part) only for vxnodes that are actually visible.  Scrolling would create Tk widgets for newly visible vxnodes and destroy (or recycle) Tk widgets for newly invisible vxnodes.   In this way, the number of Tk widgets stays roughly constant, and that's what is important for performance.

This scheme will complicate the scrolling code, and that can't be helped.  It's clearly an optimization that must be performed to make Leo more scalable.  

This optimization also depends on the existence of scrolling events.  This should not be a problem.  The "command" callback of Tk.Scrollbar widgets can presumably be used to do more processing than just scrolling the associated Tk widget.  Exactly how easy this is to do remains to be seen.

Edward

P.S. The drawing code now has a worst-case running time linear in the size of the outline.  However, the average case is still good: it's linear in the number of visible vnodes.

EKR
</t>
<t tx="ekr.20050421204424"></t>
<t tx="ekr.20050421204424.1">This design is a much-improved version of the inode design that I've worked on for many years.  However, I've dropped the term inode.  Rather than define a new inode class, Leo will use different flavors of vnodes.

This design represents clones as shared subtrees of vnodes.   Cloned vnodes will "point" (a loose term) to the shared subtree.  (We'll see how this actually works in great detail below.)  Because several nodes may point at the same vnode, vnodes no longer have a unique parent.</t>
<t tx="ekr.20050421204424.2">Because vnodes no longer have unique parents, it is no longer possible to traverse a tree simply by following the back, next, parent and firstChild links of vnodes.  Instead, we must have a new kind of object called a __position__.  Positions represent a particular state of a tree traversal.  You can think of positions as iterators if you like.  Positions are actually quite simple: a position is a object containing a pointer to a vnode v and a _parents stack_ of vnodes representing the parents of v.  We push entries on the parents stack when entering a shared tree of vnodes, and pop the parents stack when leaving shared trees.

Here is an example showing (sort of) how Leo represents trees and uses positions.  Suppose we have the following tree, where primes denote clones:

root
..A'
....B'
......C
....B'
......C
..A'
....B'
......C
....B'
......C

In other words, all nodes except C are clones.  Using shared vnodes, Leo would represent this tree internally _sort of_ like this (Again, we'll see exactly how later).

root
..A1 node: pointer to tree for A
..A2 node: pointer to tree for A

A (the tree for A)
.. B1 node: pointer to tree for B
.. B2 node: pointer to tree for B

B (the tree for B)
..C

To traverse the tree, visiting all 11 nodes, we must traverse shared tree for A twice, the shared tree for B 4 times and the shared tree for C 4 times.  As stated earlier, we push entries on the position stack when entering a shared tree of vnodes, and pop the stack when leaving shared trees.

The following shows the v and stack ivars of the position iterator during a complete traversal of the outline:

root [ ]
a1 [root]
b1 [root,a1]
c [root,a1,b1]
b2 [root,a1]
c [root,a1,b1]
a2 [root]
b1 [root,a2]
c [root,a2,b1]
b2 [root,a2]
c [root,a2,b1]


Actual representation:

A relatively simple picture clarified my thinking about how to represent shared vnode.

You can see this picture in two places:

- The bottom of Leo's Icon page: http://webpages.charter.net/edreamleo/icons.html

- On Leo's Wiki: http://leo.shwartz.org/tiki-index.php?page=BagsAndSharedVnodes

Please refer to this picture somehow while reading the next section.
The outermost circle represents what I call a _bag_.  It denotes a shared tree of vnodes and _all_ the cloned nodes that "use" this tree.

The two smaller black circles represent _link vnodes_.  They represent cloned nodes, especially on the screen.  The three arrows coming out of each link vnode represent the parent, back and next ivars.   Leo uses these ivars to link the link vnode into the outline, just as in the present implementation.

The white circle inside the bag represents a _target vnode_.  It's larger than the black vnode because it also represents all the descendents of the target vnode, thought that isn't shown.  The single arrow coming out of the target vnode represents the firstChild ivar of the vnode.  Leo uses the firstChild ivar just as it does in the present implementation.  
The lines connecting the link vnodes represent two new ivars:  the link and links ivars.  Only link vnodes have non-None link ivars.  The link ivar of a link vnode points at the target node.  The links ivar of a target vnode is a Python list of all link nodes that point to the target.

N.B. Link nodes are never target nodes.  Target nodes are never link nodes.
For each shared tree of vnodes, there is exactly one target vnode.  The target vnode is the root of the shared tree.  As mentioned before, the target vnode contains the firstChild ivar that points to the first child of its subtree.  The target vnode also contains the headline and body text that is shared by all the cloned nodes.
 Each link vnode represents a cloned node _in the outline pane_.  The parent, back and next ivars of a link node position the node in the outline visually.

Therefore, a _single_ link vnode together with its target vnode fully represents a cloned node.  You can think of each such combination of a link vnode and its (shared) target vnode as a _virtual vnode_.  For any virtual vnode, Leo can get any information found in a normal vnode (not a cloned vnode) either from the link vnode or the target vnode.  Moreover, a virtual vnode corresponds to a _single_ position in a tree traversal.
Internally, Leo "moves around in the virtual vnode" as needed to set or get information from the virtual vnode.  Traversals of the tree are unaware that the virtual vnode has structure: it looks exactly like any other vnode.  Because all positions within a bag are equivalent, Leo must be careful to compare positions properly.</t>
<t tx="ekr.20050421205312">One of the great joys of the Leo project is the way it takes, in unexpected ways and at unexpected times, surprising new directions.  The last time a major change in Leo happened was a little more than a year ago when I decided that @file trees were feasible.  I believe a similar seismic shift is about to happen.

There seems to be a natural rhythm involved: expansion and contraction, invention and consolidation/completion, positive and negative.  I believe part of this natural rhythm involves forgetting, especially forgetting why things don’t work.  Often a slightly new point of view invalidates formerly real obstacles.

Such changes and rhythms are heralded by largely unconscious thought.  Recently there have been a great many requests for user options, as well as other features.  I believe this has had the mostly unconscious effect of changing my thinking from “what is the right way?” to “why not do it every way?”  Another formerly unconscious impetus for the present avalanche of ideas was the recent question about whether someone had imported the entire Linux kernel into Leo.  That brought to my mind several problems with the present way of doing things:

1.	Leo files can get very large.
2.	It can take a long time to read all derived files.
3.	The larger the file, and particular the more clones, the more time it takes to move cloned nodes.
4.	Leo practically hangs when recovering from read errors.
5.	Using .leo files with CVS is a real pain.  I’ve pushed this to the background by promising the “Resolve CVS conflicts” command, but this command may not be easy to do, or even possible.

These thoughts got me thinking about all parts of Leo’s implementation, especially clones.  I recalled the discussion about “global” clone indices. These thoughts have suddenly created a flood of new ideas, in several related directions or themes.

The major problems facing any new implementation strategy involve clones.  At present, clone linking happens entirely within .leo files as the result of redundantly saving all information in the derived files as part of the .leo file.  The derived files create content, the .leo file creates clone links, marks, etc.

Theme 1: Global Tnode Indices

Previously I have dismissed the notion of global node (or clone) indices, saying that there is no way to guarantee uniqueness.  This is correct, but it is not the whole story.  CVS provides another model: that of resolving conflicts.

So suppose every node (in a derived file or in a .leo file) has a global node index.  This index would be associated with _all_ tnodes, not just cloned nodes.  These indices (global tnode indices, or gti’s for short) would be _semi_ mutable.  That is, they would be immutable except for the (rare) instances in which gti conflicts occurred.  A gti conflict occurs when two nodes in a .leo file have the same gti.  At that point, one or both of the gti’s must change.

A subsidiary part of this scheme is that .leo files (and derived files?), would contain an indication of the maximum gti used so far.  When writing a file, the maximum gti would be set to the maximum of all the max gti fields found when reading all the files.  When creating a tnode for the very first time, its gti would be set to the max gti value, and that value then would be incremented.  Possibly a new @max_gti sentinel would be needed in derived files.  This might (will?) cause conflicts: to resolve the CVS conflict we would simply pick the larger number.

In short, we have a “semi-global” index that is _not_ guaranteed to be unique, but which will be “almost” unique.  We resolve conflicts as follows.  Each .leo file will contain a gti table, associating gti’s with headlines.  A conflict occurs when two nodes have different headlines and the same gti’s.  The gti table will give the headline that was written last for each gti, so the gti table will usually match one of the headlines of the conflicting nodes. The other tnode will have it’s gti changed.  If neither node’s headline is in the table we will change both gti’s.

Theme 2: Small (template) .leo files and CVS

Presently, updating LeoPy.leo to CVS takes a long time and isn’t very useful.  CVS doesn’t understand the format of .leo files; CVS destroys the XML structure of .leo files when trying to merge conflicts.

The “heroic” solution is the “Resolve CVS Conflicts” command, but I think another way might be better, and much simpler.  The idea is just this: don’t include LeoPy.leo in CVS at all.

To do this, we need to make LeoPy.leo “irrelevant” as far as CVS is concerned.  We can do that by having the derived files contain _all_ information in LeoPy.leo.  So LeoPy.leo would become nothing but a shell, or placeholder.  This would be just what is needed for truly huge projects, starting with the Linux kernel.

The elements of this scheme might be as follows:

1.	A user option: copy_derived_info = 0 that causes @file trees in .leo files not to be saved when derived files are written.  For completeness, perhaps we would have a Write Full Outline command that does write everything.  Such a small .leo file might be called a “template” .leo file.  It would consist mainly of the vnodes describing @file nodes (but not their descendents), as well as nodes that exist only in the outline.  We would want to reduce or eliminate such non-@file nodes: see below.
2.	 Tables within .leo files (or in other files?) would allow marks and clones links to be recreated when derived files are read.  I call these gti tables.  Regardless of where these tables reside, these tables are conceptually _local_ to each .leo file; they do _not_ have to be managed by CVS.  In other words, clone links are created by each .leo file, _not_ by derived files.  The acid test of this scheme is whether it can handle the section in LeoPy.leo now called (Project views)
3.	A @read_only option causes the .leo file to be read-only.  The Save command would be dimmed in that case.  This isn’t really needed, and it might emphasize that the .leo file isn’t part of the CVS distribution and isn’t mutable.
4.	All information in LeoPy.leo should be “carried” by derived files.  An @file local_notes.txt section creates a derived file _not_ part of CVS, for the private use of people besides me.  The Notes section becomes @file EdwardsNotes.txt, containing the "official" (i.e., my) notes about Leo.  “Official” derived files are the files handled by CVS.  Local files are not managed by CVS, and can be changed by anyone at any time for any reason.  LeoDocs.leo and leoConfig.leo would probably exist as before.

I’ve got to double check that this scheme avoid the horror’s of the old “backup” .leo files.  I think it does because there is will be no such thing as read errors.  Such errors arose because of mismatches between the structure of the derived file in the .leo file and the real structure as specified by the derived file.  If the .leo file contains no structure information no such mismatch can exist.

Theme 3: @@file nodes

These nodes would be read only when the user first selects them.  This eliminates the reading of derived files when the template .leo file is opened.

What underlies all these themes is that the user experience of Leo would be almost completely unchanged!  The only difference is that Leo would load template .leo files much more quickly, and the messages about loading derived files would happen only on demand.  New user options would control all the “controversial” parts of these proposals.

I’m excited about these new directions.  It’s possible that there will be gotchas that can’t be resolved.  I’ve tried some of these ideas before without success.  However, I think it is more likely that gti’s form a firm foundation for new implementations.

The key to all of this is minimizing gti conflicts, and handling them properly when they happen.  If this can’t be done without heroic decisions from the user the whole scheme fails.  It may be possible, it may not be.  By a “heroic” decision I mean a decision at the time the conflict is announced requiring information that will not be present until after the decision was made.  Some of the old error recovery schemes required such heroic decisions, which is why they didn’t work.

Theme 4: Revised XML file format

The present XML file format is “regular”, that is, all vnodes and tnodes are represented in the same way.  In particular, tnodes are associated with vnode using tnode indices.  Tnode indices are computed each time the .leo file is written, and a “small” change in a .leo file can cause many of these indices to change.  This causes CVS to report many diffs.

It would be possible to use a slightly more complex scheme that would take an entirely different approach.

- We don’t use the &lt;tnodes&gt; element at all.  Instead, we associate text with the first headline that uses it, using an attribute field.  For example, instead of:

&lt;v t="T4"&gt;&lt;vh&gt;The headline text&lt;/vh&gt;...any nested vnodes&lt;/v&gt;

We would have:

&lt;v tx="The body text"&gt;&lt;vh&gt;The headline text&lt;/vh&gt;...&lt;/v&gt;

To make this work, we must ensure that two nodes are cloned if and only if their headline texts are identical.  Leo can do this as follows.  When writing a .leo file, Leo will first scan the entire list of vnodes, entering headline text into a Python dictionary.  If two vnodes have the same headline text but are _not_ clones a disambiguating tag will be added, something like this:

&lt;v tag="1" tx="The body text"&gt;&lt;vh&gt;The headline text&lt;/vh&gt;...&lt;/v&gt;

The read logic will consider that vnodes with the same headline text are clones provided that:

1.	They don’t use the old t= tnode index field
2.	They either have no new tag= field or they have the same tag= field.

This scheme will eliminate all the old tnode index fields, and the new tag= fields won’t change when other vnodes or tnodes change.  This means that essentially all changes to .leo files will be as the result of adding vnodes or changing tnode text.  That is, changes in .leo files will correspond directly to user changes.  This presumably will make CVS happier.

Theme 5: File format options and local configuration options.

Given that there are a number of ways to write .leo files, the new attitude says that rather than trying to figure out which is best we give the user the ability to pick any way.  When writing a file, we might want to specify:

1.	Whether to use the old, compatible file format, or the new file format without the &lt;tnodes&gt; element.
2.	Whether to redundantly write information to the .leo file, as is done presently, or whether to write template @file nodes.

The natural place to specify such options are in headlines, not body text, because only headlines are guaranteed to be written in template @file nodes.  Moreover, such options are more similar to @file, @rawfile and @silentfile than to “real” directives.

We could generalize such “options headlines” to specify many other options.  For example,

@option remove_sentinels_extension = .txt
@option body_text_font_family = Courier New
etc.

Not all options could be “localized” this way.  I doubt, for example, that keyboard shortcuts should be dependent on the location of the headline.

All comments welcome.

Edward

</t>
<t tx="ekr.20050421205312.1">By: edream ( Edward K. Ream ) 
 Big picture: why 4.0 is important: more
2002-12-06 20:12

There are two other items that seem to me to have strategic importance: 

1. Moving nodes with many clones can be very slow. This is, I think, a hard problem. The MORE outliner
(from which I borrowed all significant details about clones) could also be very slow. Anyway, the goal is to
have Leo work smoothly with very large outlines containing many clone links. 

2. Gti's don't completely solve all problems with CVS. In particular, derived files must have "structure" links
that indicate outline structure (when the derived file is loaded into the Leo outline). The present scheme
uses child indices and nested @node sentinels to indicate outline structure. 

4.0 will probably use another way: It will combine @node and @body sentinels into a single kind of sentinel.
The outline structure will be indicated by parent, firstChild, back and next gti's, exactly as vnodes are
presently linked using similar ivars. This means that moving, adding or deleting a tree will cause at most 4
and at least 2 sets of sentinels to change. 

This is a much better way than at present, when moving a node can cause arbitrarily many sets of sentinels
to change. However, it is _not_ a complete solution because CVS conflicts can still corrupt derived files.
Indeed, I don't think a perfect solution exists as long as CVS insists on stuffing extra lines into "conflicting"
files. 

Conceivably the new sentinel scheme will be simple enough that I could actually envisage recovering a
derived file from a mangled derived file. However, this would be _very_ ugly because it would depend on
exactly what "warning" lines CVS inserts. For example, changes to CVS could break this kind of code. 

Does anyone have some thoughts on how Leo could more smoothly deal with CVS conflicts? I'm afraid I'm
out of ideas. 

Note: after 4.0 comes out it may be possible to eliminate LeoPy.leo from CVS entirely because it will turn
into just a placeholder... 
</t>
<t tx="ekr.20050421205312.2">By: edream ( Edward K. Ream ) 
Big picture: why 4.0 is important: more
2002-12-07 17:48

There will always be an option to write all information to .leo files. As you say, using .leo files
as archive files is very handy. 

As I mention in the "structure rule" posting, it's not so clear whether smaller .leo files are worth doing.

4.0 and the structure rule 
2002-12-07 15:36 
This really is a continuation of the thread Big picture: why 4.0 is important. 

Yesterday, after writing that gti's don't solve all CVS problems, I had another Aha regarding 4.0 derived files. This is still not a complete solution, and it may be a big step in that direction. Recall that even with gti's two problems with CVS remain: 

Problem 1: moving nodes can cause @node sentinels to change. 
Problem 2: CVS "helpfully" inserts lines into a file when it detects a conflict. 

These two problems are related in a nasty way: CVS can corrupt the structure of changed sentinels! 

I was lightly dozing in the middle of the day, thinking quite vaguely about these problems when I suddenly I realized both problems can be made to go away! With gti's there is really no need to specify outline structure in derived files at all! We could adopt the following rule: 

Structure rule: derived files specify content; the outline specifies structure. [I think this is wrong now]

The structure rule can be made to solve both Problem 1 and Problem 2 as follows: 

1. Leo 4.0 could generate just a single @gti sentinel for each section reference. (or body text in @file-noref trees). This sentinel will contain only the gti of the defining vnode. This gti doesn't change no matter how the outline is reorganized, so CVS will never alter it! 

2. For documentation purposes, Leo should generate the headline text of the vnode as a comment following the
@gti sentinel. Headlines _can_ change, so CVS might alter such comments when CVS detects a conflicts, but
changing this comment will _not_ corrupt the @gti lines! Robust recovery from CVS meddling may be possible. 

3. Similarly, @ref sentinels could represent the actual reference in the body text. Again, the @ref sentinel would be followed by the actual text of the reference, so the actual @ref sentinel will never be altered by CVS. 

The result is a radical simplification of derived files. At present (without gti's) Leo is forced to represent outline
structure using nested @nodes sentinels and @body sentinels. When outline structure changes, arbitrarily many
of these sentinels can change. In the new scheme, all these sentinels can be replaced by a single @gti sentinel
that can never change. 

Earlier I said that the Aha (i.e., the structure rule) does not solve all problems: 

1. CVS could still corrupt @gti sentinels if it decides that a range of lines including @gti sentinels have changed.
I'm not sure exactly what to do about this, but clearly the structure rule will reduce the number of times this will
happen. Indeed, rather than posing problems for CVS's diff (as changed @node sentinels presently do), @gti lines will provide "islands of stability" for CVS. 

2. As always, there is the problem of keeping .leo files and derived files in synch. Clearly, this problem can never go away completely. What Leo must do is to provide ways of
a) discovering out-of-synch conditions and
b) recovering in a straightforward way.

With the new scheme, we know we are out-of-synch if an @file or @file-noref node refers to a gti not found in the derived file, or conversely, if the derived files contains an @gti sentinel with no corresponding tnode in the outline. The outline and derived files could be out-of-synch in other ways that would be undetectable. For example, suppose the only change to an outline is that a node was moved. This will cause no changes to gti's. On the other hand, such kind of out-of-synch conditions might safely be ignored. I'm not sure about this though... 

We might try to associate a global time stamp (using the same techniques used to create gti's) with @file nodes
and @file-noref nodes. Leo's atFile read logic can then determine whether the derived file was created by the @file node. However, this is a dubious idea: if this timestamp is represented in the derived file then CVS will complain and interfere when it changes... 

3. If we adopt the structure rule then outlines must specify the structure of all @file trees, just as it does now. In
particular, @file and @file-noref nodes can _not_ be placeholders. But we have the option of not saving body text that is used only in @file and @file-noref trees. This has the potential to radically reduce the size of .leo files. 

However, removing information from .leo files makes it more difficult to recover from errors. I think the solution to this kind of dilemma are option, either in leoConfig.txt or the .leo file itself specifying whether the .leo file will
contain all information (that is, whether some body text may be deleted). I'm not sure about what this option should be when using CVS. 

4. With or without the structure rule CVS can still interfere with .leo files in unpleasant ways. However, the
structure rule makes it impossible to use placeholders for @file and @file-noref nodes, so the structure rule
ensures that LeoPy.leo must be part of Leo's cvs tree. This might be considered a step backwards. 

To summarize: 

1. The structure rule completely solves Problem 1, and goes a long way towards solving Problem 2. 
2. The structure rule greatly simplifies derived files. 
3. The structure rule allows smaller .leo files, at the cost of making error recovery more difficult. 
4. The structure rule would require that LeoPy.leo be part of Leo's CVS tree. 

An important way to evaluate designs is whether the design is "headed in the right direction", that is, whether the
design is tending to become more complex or less complex. Heading in the right direction is always important
because simplifications tend to suggest further simplifications. I have some hope that further improvements may be possible... 

Clearly, the structure rule greatly simplifies derived files. As a result, both the atFile.read and the atFile.write code will become simpler. The structure rule simplifies error recovery by essentially making it impossible! Either body text for a gti appears somewhere (in the .leo file or the derived file) or it doesn't. If it doesn't, then we have an out-of-synch condition for which no error recovery is possible. We can make sure that .leo files contain all the text for all their gti's by writing all body text to .leo files, just as is done now. Even so, out-of-synch conditions could still happen if the derived file defines a gti that does not appear in the outline. In that case the outline is missing a node used to create the derived file and no recovery is possible. But similar situations exist today, so the new way is no worse than the old. 

To summarize further: the structure rule appears to be headed in the right direction, and only experience will tell
whether error recovery considerations will allow us to use "small" .leo files. The structure rule appears to require
that LeoPy.leo remain a part of Leo's CVS tree, which might be considered a step backward. I doubt that CVS will every handle XML files well. On balance, I think the structure rule is worth implementing to see what happens. </t>
<t tx="ekr.20050421205312.3">Big picture: thick or thin?
2002-12-17 16:02 

This posting discusses some important choices that Leo may (or may not) present to the user. 

Let's set the stage a moment. Here are the salient facts as I see them. 

1. The defining characteristic of 4.0 will be file formats that use Global Tnode Indices, aka gti's. This is a fundamental advance in Leo's implementation. For the first time, we can _reliably_ identify nodes that have been created by different people. This makes the implementation of clone links much simpler and more robust. 

2. It is becoming clear that CVS does not handle large binary files well. It appears that CVS is missing some
crucial features that Leo really would like to use. In particular, we would like diff to treat .leo files as text, but we
would also like to update .leo files as if they were binary. If anyone knows how to do this _please_ let me know! 

Let us define the terms "thick" and "thin" as follows:

A thick _derived_ file contains sentinels that can be used to recreate outline structure using only the derived file. A thin derived file contains only sentinels that delimit tnodes (body text). 

A thick _.leo_ file contains full body text of all nodes in the outline. A thin .leo file contains only structure
information (vnodes), plus body text of nodes that appear in no derived file. 

At present, both .leo files and derived files are thick (and the user has no way of creating thin files.) I have always contemplated that 4.0 will create thin derived files. This makes it unlikely for CVS to corrupt @gti sentinel lines. Because CVS is behaving so badly with .leo files, I am becoming more and more inclined to "thinning" .leo files as much as possible. This will take the following forms: 

1. By default, .leo files will be thin. However, some command will exist to create a thick .leo files. Such files would be "stand-alone archives", just like .leo files are presently. 

2 I would like to implement @include trees in Leo. This has never been possible before gti's. A thin .leo file would have _nothing_ descending from the @include node. A thick .leo file would contain the entire contents of the @included .leo file. (Maybe there will be options concerning this; such options are not important to this
discussion.) 

@include x.leo would, in effect, incorporate x.leo into the outline as needed. Similarly, @include x.y would act like an on-the-fly import of x.y (probably a "plain" import, without creating any structure, much like @read-only now works.) 

3. If @include trees can be made to work (and I am pretty sure they can), then the next step would be to organize LeoPy.leo as follows: 

@include leoNotes.leo 
@include leoProjects.leo 
@include leoX.leo (for all @file leoX.py nodes presently in LeoPy.leo) 
etc. 

In other words, LeoPy.leo becomes the skinniest of thin files, and we would have a .leo file for every .py file in Leo's distribution. These "subsidiary" .leo files will be part of the CVS archive, and will be _much_ smaller that LeoPy.leo is at present. This should make CVS happier. 

4. Only time will tell whether there are major problems with this "thin-thin" scheme. As I said earlier, Leo will
probably give the user the option to create fat .leo files, but we had best use thin .leo files when committing to
CVS. 

The big disadvantage to a thin-thin scheme is that _both_ the .leo file and the derived file will be needed to recreate the entire outline. There are backup and archiving problems any time two sets of files must be kept in synch. However, I think that thin-thin may be worth the problems. 

Also note that we can recreate a "flat" outline just using a thin derived file. This isn't as good as the "Read @file
Node" command, and it is much better than nothing. Indeed, in an emergency it would be much easier to recreate the outline structure than it would to recreate the text! Also note that the "plain body text" contains many clues (in the form of @others directives and section references) that could be used to _partially_ recreate the outline just from @gti sentinels in the derived files. It's easy to see that a total reconstruction of the outline is not possible. Indeed, structure nodes in an outline are completely invisible to the derived file. 

Anyway, this is my present thinking on these topics. All ideas, comments and questions are welcome. </t>
<t tx="ekr.20050421205312.4">By: korakot ( Korakot Chaovavanich ) 
Big picture: thick or thin?
2002-12-18 08:01 
I like the thin-thin idea and @include. 

My previous problem is that I try to contain everything in 
a single leo file. When it get bigger and bigger, I can't just 
put it on a diskette anymore. With @include, I can separate 
it into active data &amp; archive data and can maintain it in 
'virtually' a single file. 

In the new 'tintin' format, I would suggest that leo become 
tolerant to a missing file. For example, without tnode data, 
leo can still display only the headlines. Or without complete 
structure data, leo can still open the derived file in flat-outline 
mode. 

There's one more idea I would like to discuss. It's about the 
directives in general. Can we just use @file leoNotes.leo 
instead of @include leoNotes.leo ? My instinct tell me that 
many more types of file will be supported by leo in the future. 
Now it can deal with source code (plain text). I tried to 
implement an @folder. Someone else might try an @xml 
(and have tag/attribute editable directly). Now you are having 
an @include. 

It might be a good approach to put them all into @file directive 
and let leo intelligently import the file according to its type. 
Now simply by the file extension, but it could look at file content 
as well in the future. 

In this way, we won't clutter the directive space with 
@yafld (yet another file like directive). </t>
<t tx="ekr.20050421210335">By: edream ( Edward K. Ream ) 
Big picture: thick or thin?
2002-12-18 13:44 
&gt; In the new 'tintin' format, I would suggest that leo become tolerant to a missing file. 

Yes. This may be a good idea. 

&gt; Can we just use @file leoNotes.leo  instead of @include leoNotes.leo 

Again, probably a good idea. There are a zillion details to be handled. Examples: 

1. What happens when two "included" files refer to each other? My solution: the @include/@file node
in the "included" .leo file is non-functional, maybe with a special icon to denote that fact. Another set
of possibilities: an Open in New Window command (for @include/@file or maybe even any node). 

2. How do we resolve conflicting preferences in multiple .leo files? It probably depends on how
"important" the prefs are, or maybe we can have a tree of prefs...It will surely take some work. 

3. [The big one] What kind of errors are going to happen? How often will they occur? How easy will it
be to recover. If thin-thin can't handle these issues it will fail. We can't have CVS issues ruin Leo's
error handling abilities. </t>
<t tx="ekr.20050421210335.1">By: nobody ( Nobody/Anonymous ) 
Big picture: thick or thin?
2002-12-18 16:32 
My 2 (s)cents: 
&gt;&gt;1. What happens when two "included" files refer to each other? 
Bad Response: The system crashes. 
Better: An error message ("Don't DO that!") is shown in the log window. 
Better: An error message ("Don't DO that!") is shown in the log window, and a new Leo is
opened with the offending file. 

2. How do we resolve conflicting preferences in multiple .leo files? 
Make the "includer" the Master. What it says, goes. 

3. [The big one] What kind of errors are going to happen? 
Hopefully, Santa will bring us all a crystal ball... 

This @include function in combination with the @folder function, could make it easier to
update customizeLeo. Customizations could be in separate files in a separate folder, and the
user could @include whatever customizations they wanted. 

</t>
<t tx="ekr.20050421210335.10">By: edream ( Edward K. Ream ) 
RE: Separate presentation from content
2002-12-19 07:27 
I'm trying to see where, if anywhere, we disagree. First, let me say that I agree with
much of what you have said. If I understand you, the heart of your proposal is this: 

&gt; I propose that transient settings (presentation as opposed to content) should go into
a separate file...These settings would include the &lt;globals&gt;, &lt;find_panel_settings&gt; and
&lt;preferences&gt; elements and portions of the "a" tag in &lt;v&gt; elements. 

It is true that these elements are in some ways different from the "content" of the .leo
file, so what you are suggesting makes sense, at least from one point of view. And I
agree with much of the background rationale for this proposal. 

However, I think you overstate the benefits of this suggestion and minimize the
problems associated with it: 

&gt; This should decrease cvs thrashing with leo dramatically and allow more sensible
diffs as well. 

I don't see how this can be true. It certainly can not be true of derived files, as none of
the items you mention are contained in derived files. So you must be speaking of .leo
files. Well, I just don't see how removing these fields can possibly either a) decrease
cvs "thrashing" or b) allow more sensible diffs. 

a) the .leo file or files (if we are talking about using "super-thin" .leo files with @include)
will change whenever a derived file changes, so there is almost nothing to be gained by
trying to reduce the number of changes to .leo files as far as cvs concerned. 

b) the problem with diffs is that .leo files contain structure and cvs doesn't understand
this structure, _regardless_ of how the structure is represented. Presently, .leo files
use xml to represent structure and the fundamental problem is not that .leo files
contain extraneous information; the problem is that cvs doesn't diff xml well, and ruins
the xml file when trying to present conflicts. 

So, in my view, separating out parts of .leo files would not solve the cvs problems. 

Now let's turn to the problems with creating yet another kind of file associated with Leo.
This is the time to trot out an old math joke. Question: how do mathematicians count?
Answer: one, two, many and too many. 

The point is this. There are already problems with have just two different kinds of files:
.leo files and all other derived files. Adding a third kind of file makes these problems
much worse. The fundamental problem is that we want to keep all files "in synch".
Right now that means keeping all the derived files in synch with the .leo file. Yes, cvs
helps us do this. No, cvs does not _require_ that we do this. Probably the rule to use is
this: commit LeoPy.leo whenever committing any derived file. 

So how would this change if we added, say, .lpf (Leo presentation file) files? The rule
would be: commit both LeoPy.leo and LeoPy.lpf whenever committing any derived file. 

But with the thin-thin scheme I envision there would be a "master" file, still called
LeoPy.leo and a separate .leo file for every derived file: leoAtFile.leo, etc. So then do
we have also a separate .lpf file for every derived file? 

And there are other problems. What I am most concerned about with thin-thin are
out-of-synch conditions. At least with the old "dreaded read errors" we could recover
using the Read Outline Only command because .leo files are thick. In a thin-thin world
error recovery is harder. This may, in practice, be the death of thin-thin. In any case,
what happens if the structure implied by the "a" tags of vnodes does not match the
structure of the .leo file? Well, it's not fatal, since none of the information in a .lpf file
would be crucial, but it's not pleasant either. 

In short, I don't believe .lpf files really solve any fundamental problem and at the same
time they would complicate (even if only slightly) the heart of the matter, namely the
question of whether thin-thin will work at all. 

Edward 

P.S. The major rewrite of code for 4.0 provides the rare(!) opportunity to fix a number of
problems with the present implementation. Leo will use Python's xml tools to parse
.leo files and will use a dictionary-driven method of dispatching methods in scanText,
very similar to the method used in the present colorizing code. However, there must be
strict limits to experimentation. There are major questions about whether thin-thin will
work at all. Until these questions are answered all other matters must be left for later. 

P.P.S I keep saying that "thin-thin" is untried and may fail. Please note that gti's will
still be extremely valuable even if we must abandon "thin-thin" and stay with the
present "thick-thick" scheme. In other words, 4.0 _will_ happen. Only the details of 4.0
remain to be thrashed out. 

</t>
<t tx="ekr.20050421210335.11">By: gilshwartz ( Gil Shwartz ) 
RE: Separate presentation from content
2002-12-19 08:05 
I am not an expect on CVS, but I did get the point of CVS complaining about
insignificant diffs in Leo files due to the presentation information. Wouldn't it be
simpler to just move the presentation information to the end of the Leo file, and
thus when one encounters the first presentation related diff, then the rest can
safely be ignored? 

Re. "thin-thin" (and I did like the suggestion to call it "tintin"), and looking at how
you implement changes Edward, I am sure that behavior will be configuration
driven so it will be easy to switch back to a "thick-thin" mode (available
anyway). Since a "think" Leo is kind of a project backup (and it may be quite
big), I believe it would be interesting to be able to "generate compressed thick
Leo" as well. 
</t>
<t tx="ekr.20050421210335.12">By: edream ( Edward K. Ream ) 
 RE: Separate presentation from content
2002-12-19 13:02 
The problem isn't so much that diffs are hard to read, because I suspect
_nobody_ reads diffs of .leo files. The problem is that cvs corrupts .leo
files when cvs reports conflicts (spurious or not!) 

Moving "mark" bits to a separate location at the rear of the .leo file just
makes things messier. Ditto for other info. 

In short, this problem isn't going away with hacks to the format of .leo
files. It's a basic, real, cvs problem. 

P.S. re thick/thin. Yes. Leo will offer options in this regard. I may also
offer all four thick/thin options: thick-thick, thick-thin, thin-thick and
thin-thin. 

Besides configuration options there may be new commands as well. It
may be that Write Outline Only will always write a thick outline. For sure
we want to have Read Outline Only available in a pinch. 
</t>
<t tx="ekr.20050421210335.13">By: jmgilligan ( Jonathan M. Gilligan ) 
RE: Separate presentation from content
2002-12-19 18:52 
I am indeed talking about .leo files. We disagree because I don't like having
CVS track which tree nodes are open and closed, while you don't mind this. 

I have made my case. You disagree with me. I don't think there is a unique best
way to do things and while I would do things differently, I respect your decision.
Time to move on. 

By: edream ( Edward K. Ream ) 
 RE: Separate presentation from content
2002-12-19 20:08 
&gt; Time to move on. 

Thanks. I appreciate your good humor about this. As always, the
subconscious will play with these ideas as it sees fit. Who knows what
will ultimately come of these discussions? 

</t>
<t tx="ekr.20050421210335.14">Thick &amp; thin
2002-12-30 23:55 
Why thick is required 

The more I think about it, the more I am convinced that my previous idea of "thin" derived files (and the structure rule) would be a very bad idea. Here's why: 

The first, and most important reason is that thin derived files would have to be kept in synch with one (or more) .leo files. Experiences shows a) that this can not be done and b) that no proper error recovery will be possible when files get out-of-synch. I don't believe there is any way around this problem. In other words, both structure and content must, at all costs, be kept together. 

Secondly, I had originally intended that a separate .leo file be created for each derived file in cvs. The more I think about this, the sillier it seems. It will hardly be a selling point for Leo to require cvs to manage double the number of files.

The previous two reasons are negative; they are reasons why not to have thin derived files. Immediately after
confronting that thin won’t work I saw a way to make thick work much better. 

Before giving the “Aha”, I’d like to recall for you the problems with the present scheme: 

1. Moving, adding or deleting nodes cause the childIndex field of arbitrarily many sentinels to change, which greatly confuses cvs. 

2. When many sentinels change “improperly” cvs may tend to report “false” conflicts. 

3. Whenever cvs reports a conflict involving sentinel lines, cvs corrupts the derived file by duplicating sentinels. 

With this background the “Aha” is simply this: Leo does not need childIndex values in sentinels in order to
reconstruct the outline! Perhaps I should have seen this earlier, but at the time of the original design the problems above were not apparent. 

Here is how it works: We can deduce the order of nodes introduced in the derived file by the @others directive; we can make up an order for nodes introduced in the derived file by section references. Indeed, an @others directive causes the following sentinels and text to be placed in the derived file: 

@+others 
@+node 
text of first node following @others 
@-node 
@+node 
text of second node following @others 
@-node 
and so on 
@-others 

So Leo can recreate the order of the nodes in the outline because the order is the same as the order of the
sentinels in the derived file. 

We can _not_ deduce the original outline order of nodes included as the result of section references, but I don’t believe that matters! Indeed, I usually keep nodes containing section definitions in one of two orders: alphabetical order by name of the section being defined, or “reference order”, the order in which the references appear in the parent node. Using either of these orders would recreate the outline sufficiently for almost any purpose. 

In short, we do not need the childIndex field of @node sentinels in order to _fully_ recreate the outline structure and content! So in effect, derived files _are_ another form of outline! This has several implications... 

1. .leo files no longer need to “mirror” the structure contained in derived files. Now that the gti’s in derived files are completely reliable we no longer need the info in the .leo file to recreate clone links. With gti’s there is absolutely no need for such redundancy! 

2. Therefore, only the vnode corresponding to the @file node in the outline needs to be saved in the outline:
everything else can come from the derived file! So .leo files can be “super thin”. An outline (.leo file) need contain only those tnodes and vnodes that are contained in no derived file.

Aside: vnodes and tnodes in @file-asis and @file-nosentinel trees must still be completely saved in .leo files because files derived from such trees contain no sentinels. 

3. There is only one minor complication: namely the red “marks” attached to nodes by the mark and unmark
commands. Clearly, such marks are non-essential. At present, these marks are carried by the “mirrored”
(redundant) outline in the .leo file. It would be wrong to represent those marks in derived files: First, such marks
would create problems for cvs if stored in derived files. Second, it is more natural to regard marks as belonging to a particular outline (.leo file) rather than a derived file. For these reasons, .leo files should have a &lt;marks&gt; element that simply lists all the gti’s of marked nodes. If such nodes don’t happen to exist any longer in derived files, nobody will care and no harm can come of it. 

So we can now have the best of both worlds: we can use “thin” trees in outlines (we don’t have to mirror info
redundantly anymore) and derived files now are “complete” in the sense that they can contain enough structure and content to recreate fully the original outline (except for the unimportant order of section definition nodes). 

The only disadvantage is that there is still the possibility of cvs corrupting the structure of derived files. This problem can not wholly be cured as long as cvs corrupts text files. However, the new scheme will lessen this problem. Indeed, @node sentinels will no longer have childIndex fields. This means that moving, inserting and deleting nodes will have _no effect_ on other sentinels. So changes (and potential conflicts) can only arise as the result of real changes to sentinels (say as the result of changing the name of a section). But such problems can be much more easily handled by hand if there aren’t dozens of other spurious changes to confuse the issue. 

Let’s step back now and see if we can see the broad outlines of the problem and the solution. 

1. We have completely eliminated the potential for files to be out-of-synch. I believe only such a fundamental approach is correct. 

2. We have eliminated the need to “mirror” structure of derived files in .leo files, provided only that we are willing to accept that section definition nodes may appear in a canonical order (alphabetical or reference order) that might be different from the order in which nodes appeared in the original outline. This might actually be considered a benefit; I can’t see any down-side to this at all. 

3. We can still write thick .leo files as self-contained archives. gti’s make this a complete, and completely safe, solution. 

4. We have completely eliminated “false” cvs diffs when inserting, deleting or changing nodes or headlines. The potential for cvs to corrupt derived files still exists, and this potential will only happen when a real conflict does, in fact, exist. Given that separating structure and content is a bad idea, this is as good as we can get. 

5. We have eliminated the thick/thin option, which I believe would have caused endless confusion and endless opportunity for out-of-synch files. Instead, .leo files will be much smaller by default, which should make upping LeoPy.leo to cvs much more pleasant. 

6. We could add @file-mirror option if we want to mirror info in the .leo file on a file-by-file basis, but this isn’t likely to be all that useful... 

7. This scheme is conservative in the best sense. It requires minimal changes to derived files (and the code that reads and writes them). It does not require any special procedures to keep cvs files in synch. Indeed, it makes it almost impossible for files to get out-of-synch. It should make Leo immediately more pleasant to use with CVS. 

As always, comments and discussion are most welcome. 

</t>
<t tx="ekr.20050421210335.15">By: gilshwartz ( Gil Shwartz ) 
RE: Thick &amp; thin
2002-12-31 18:35 

1. One of Leo's main tasks is being an editor. I do not like it when my editor forgets things the way I put
them, even if it thinks it knows better ("it looks like you are writing a letter, would you like some help?").
I am STRONGLY against Leo forgetting anything in general, and reference order in particular. 

2. It does not seem reasonable to me to have one Leo per source file. A more realistic situation I would
envision would be to have a Leo file per library or a reasonable functional set. This helps partition code to
small pieces (files), while maintaining integrity and overall view. Basically this is Leo's concept of using
sections. A "master" Leo can include child Leos. 

3. I do most of my file editing in Leo for two reasons. The first is to take advantage of sections and the
outline Leo provides (sure, otherwise why use it :-)). The second is because all the sentinels "pollute" the
source file with too many remarks. This is the main reason I thought thin for derived is good. It would provide
better source file readability. 

However, improved readability of derived files can also be done by minimizing Leo sentinels without
sacrificing the structural content. E.g. a typical derived file may look like this: 

#@+node:3::Event handlers (Frame) 
#@+node:1::frame.OnCloseLeoEvent 
#@+body 

#@&lt;&lt; Prompt for change &gt;&gt; 
#@+node:1::&lt;&lt; Prompt for change &gt;&gt; 
#@+body 

#@&lt;&lt; Put up a file save dialog &gt;&gt; 
#@+node:1::&lt;&lt; Put up a file save dialog &gt;&gt; 
#@+body 

#@-body 
#@-node:1::&lt;&lt; Put up a file save dialog &gt;&gt; 

#@-body 
#@-node:1::&lt;&lt; Prompt for change &gt;&gt; 

#@-body 
#@-node:1::frame.OnCloseLeoEvent 
#@+node:2::OnActivateBody 
#@+body 

#@-body 
#@-node:2::OnActivateBody 
#@+node:3::frame.OnControlKeyUp/Down 
#@+body 

#@-body 
#@-node:3::frame.OnControlKeyUp/Down 
#@-node:3::Event handlers (Frame) 
#@+node:4::Menus, Commands &amp; Shortcuts 
#@+node:1::canonicalizeShortcut 
#@+body 
#@+at 
... 

Notice start and end nodes, body start/end, and section repetition. The simplified approach I though about is
something like this: 

a. No child count - you are absolutely right. 

b. No need for both section marker and section node start. Obviously if a section node starts, e.g. 

#@+node::&lt;&lt; Put up a file save dialog &gt;&gt;::gti 

This is the place of the marker. 

c. The node name is useful as comment. Put the gti in the end as they are likely to be ugly and not very
informative when one looks at the code. 

d. No need for the +body. As far as I could see the body or contained nodes always come after a node
start. 

e. Use "=node" instead of "+node" to indicate the first subnode. The use of "+node" thus reflects sister
nodes. A node ends at the beginning of a sister node. The last subnode in a row may end before the body of
its parent ends, or may end the body of a few parent nodes. Thus, use "-node" WITH A RELATIVE count to
indicate how many levels are ending, e.g. -node:1 closes just the subnode and the parent body may
continue, or next come a (sister) node. 

With these rules the layout above will look like this: 

#@+node::Event handlers (Frame)::gti1 
#@=node::frame.OnCloseLeoEvent::gti2 

#@=node::&lt;&lt; Prompt for change &gt;&gt;::gti3 

#@=node::&lt;&lt; Put up a file save dialog &gt;&gt;::gti4 

#@-node:1 

#@-node:1 

#@+node::OnActivateBody::gti5 

#@+node::frame.OnControlKeyUp/Down::gti6 

#@-node:1 
#@+node::Menus, Commands &amp; Shortcuts::gti7 
#@=node::canonicalizeShortcut::gti8 
#@+at 
... 

This takes much less space and distraction without loss of information. Also, having less sentinels make
the chances to breaking their integrity smaller (same goes for reducing redundancy). 

</t>
<t tx="ekr.20050421210335.16">By: edream ( Edward K. Ream ) 
RE: Thick &amp; thin
2003-01-02 20:53 
Thanks for these thoughtful comments, Gil. In general, I think you understand the issues; but then
again, I think i do too :-) 

&gt; I am STRONGLY against Leo forgetting anything in general, and reference order in particular. 

I suppose there could be an &lt;order&gt; element of vnodes containing references that could specify the
order. However, I think it unlikely that as a matter of fact you would ever notice if nodes were "out of
order". Note three things: 

1. The proposal to "make up" an order applies only to normal @file trees, not to @file-nosent or
@file-asis trees. 

2. It can not be the case the order of reference nodes in the outline can matter in the derived file, as
the order of text in the derived files is completely determined by the referencing text. 

3. @others could always be used in an organizing node if you really did need to preserve order in the
outline, say to make an external tool happy. 

Still, perhaps you are right and an &lt;order&gt; element in the outline would be best. Note that this is an
entirely safe idea since order is never essential, so even if things get out of synch nothing terrible will
happen. 

&gt; I do most of my file editing in Leo for two reasons. The first is to take advantage of sections and the
outline Leo provides (sure, otherwise why use it :-)). The second is because all the sentinels "pollute"
the source file with too many remarks. This is the main reason I thought thin for derived is good. It
would provide better source file readability. 

That’s why I liked thin derived files as well. Alas, I know from experience that thin derived files will
cause horrible synchronization problems. It just won’t work. The _only_ way to create a super thin
derived file is to use @file-nosent or @file-asis. This way all essential information is in one place,
namely the outline. 

&gt; No need for both section marker and section node start. 

Alas, this is not strictly correct. The section start node represents the headline containing the
section definition (including, say, any trailing comments), while the section marker represents the
actual text used to reference the section. I am not sure anything better is possible, and I’ll keep your
suggestion in the back of my mind. 

&gt;No need for the +body. 

Maybe. The problem, as usual, is getting whitespace, especially newlines, correct. The present
scheme was designed when it wasn’t clear that sentinels could do the job at all. Later, a “caching”
scheme was added to eliminate newlines between sentinels. Finally, I tried to eliminate all “extra”
blank lines, but without success after several days of trying. While on vacation I have given this
subject some more thought. It may be an @ws sentinel will be needed to eliminate “extra” blank
lines. 

My guess is that you are correct, but this subject affects all parts of the read and write code.
Merging @body and @node sentinels has been on my list for a long time, and I’ll make every effort to do so. 

&gt; The node name is useful as comment. Put the gti in the end as they are likely to be ugly and not
very informative when one looks at the code. 

My present idea is to have the @+node contain only the gti, and to have the headline _follow_ the
@+node. This way changes to a section name do not necessarily corrupt the @+node sentinel.
Remember that only gti’s are immutable; headlines are not. 

BTW, I’m still dithering about the format of @-node sentinels. Maybe they should not contain the
headline at all. And maybe it would be possible and desirable to merge several @-node sentinels. 

&gt; Use =node" instead of "+node" to indicate the first subnode. 

I believe you are suggesting a scheme similar to the MORE outline format. I’m not at all sure this is a
good idea, and again I’ll keep it in the back of my mind. I believe the present sentinels scheme will
be best once child indices are gone. 

In conclusion, let me say the following: 

1. We want:
a) the minimum of sentinels needed to properly recreate the outline, including in particular a robust way of telling
whether newlines belong to sentinels or not
b) a minimum of intrusion and ugliness,
c) no unnecessary blanks lines.

I shall be coding 4.0 starting with a blank canvas, keeping these issues always before me. 

2. The key principle I am following at present is that essential information (structure or content) must
_not_ be split between files. If the derived file is thick, the .leo file may mirror (or otherwise contain)
nonessential info like marks and order of reference nodes. This principle is essential to make sure
info stays in synch. 

3. There will be at least several months of experimentation with 4.0, including a minimum of a month
of beta testing before anything official goes out the door. There will be plenty of time for suggestions
after I have given the design and coding my first best shot. 

</t>
<t tx="ekr.20050421210335.17">By: nobody ( Nobody/Anonymous ) 
RE: Thick &amp; thin
2003-01-06 12:21 
Hopefully you have looked at the even simpler format of TreePad (www.treepad.com). We have far fewer
problems with it with CVS - almost none. 
- Rajiv Bhagwat 
</t>
<t tx="ekr.20050421210335.2">By: edream ( Edward K. Ream ) 
Big picture: thick or thin?
2002-12-18 21:55

1. Having identified the problem, it isn't too likely that Leo is going to crash :-)Furthermore, there is _no_ reason to prohibit mutual references. Indeed, it would be very bad to do so: we want .leo files to be valid no matter who refers to them! The only proper action, IMNSHO, is to break one of the links, with a gentle indication
that this has been done. 

2. That is one option. It may even be the best option. We shall see... 

&gt; This @include function in combination with the @folder function, could make it easier
to update customizeLeo. 

A nice idea. Thanks for suggesting it. </t>
<t tx="ekr.20050421210335.3">By: klamano ( Norbert Klamann ) 
Teamwork with LEO
2002-12-19 06:56 
This @include Idea has the potential to solve a problem I see with Leo: 
How do you work as a team with Leo ? 

Leo's strength is (among others) that it connects information snippets either by
structure (Outlining) or by reference(Cloning of Headlines and Named Sections).
Well in a way Outlining is a kind of referencing too. 

It would be very cool if these links could cross file boundaries. This would be
important for team work with LEO because at the moment a LEO file contains
'everything' about a problem and can't be divided along the division of labor. 

If all files are @included together (how do i know that ?) than all references
should be resolved. Missing references should be findable. 

If i work with LEO on a sub-file, broken references are quite normal, but they
should be findable too. 

Well maybe I created some confusion, but the problem how to work in teams
with LEO interests me. </t>
<t tx="ekr.20050421210335.4">By: edream ( Edward K. Ream ) 
 RE: Teamwork with LEO
2002-12-19 13:58 
&gt; It would be very cool if these links could cross file boundaries. 

This is _exactly_ what gti's will allow Leo to do. It may be easy to lose
track of this central fact during all the discussion of thick vs. thin. 


</t>
<t tx="ekr.20050421210335.5">By: gilshwartz ( Gil Shwartz ) 
 RE: Teamwork with LEO
2002-12-19 07:50 
I think you are quite right around the teamwork theme. When I first
encountered Leo, I immediately thought the @include (of other .leo files)
are required in order to facilitate parallel development of different project
aspects. I have learnt that this is cooking so I let it go, but I am happy to
see it maturing toward implementation. 

The way I see it, capturing "sub-files" with Leo is kind of an over-kill,
since typically a project file should be of some reasonably managed size
(otherwise you should break it down anyway). Also, Leo is good at
maintaining a set of closely related complete files, and could be cluttered
when incomplete sub-files come to play. I have a feeling that having .leo
files for sub-files means that they are temporarily created for that piece of
code and are not really part of the project structure description. </t>
<t tx="ekr.20050421210335.6">By: edream ( Edward K. Ream ) 
 RE: Teamwork with LEO
2002-12-19 14:11 
&gt; The way I see it, capturing "sub-files" with Leo is kind of an
over-kill... 

I'm not sure what you mean by "sub-files". It's not a term I use. 

The idea of associating a single .leo file with each derived file is
intended to reduce problems with cvs. Only experience will show
whether this idea has any merit. 

Please keep in mind that _all_ choices I am considering will leave
the user experience of using, say, LeoPy.leo almost completely
unchanged. 

1. You will still see the _entire_ project, regardless of whether
LeoPy.leo contains nothing but @include nodes. 

2. You will be able to create and move clones at will, across "file
boundaries" in outline. 

The reason for this continuity is simple: when loading files (from
whatever source) Leo will simply create a tree of tnodes, exactly
as it does now. The _only_ differences between Leo 3.x and Leo
4.x will lie in how Leo reads and writes files. 

Edward 

P.S. I use @include as a convenient mental shorthand for @file
x.leo nodes. It may well be that Leo won't need a separate
@include directive. 

P.P.S It is conceivable that a new @file option could be
introduced (@file-noload ?) that delays the loading of the file (and
the creation of the outline) until the user explicitly requests that
the outline be loaded, say by expanding the node. This is about
the only change to the experience of using Leo that I contemplate.
This option will surely be delayed until much weightier issues have
been resolved. </t>
<t tx="ekr.20050421210335.7">By: jmgilligan ( Jonathan M. Gilligan ) 
Separate presentation from content
2002-12-18 19:31 
Here are some further thoughts. 

Orthogonal to the thick/thin question, there are two kinds of information stored in a .leo file: content and
presentation. 

The content consists of the tree structure and the text contained in the tnode bodies. 

The presentation information consists of the way the tree is presented when it is opened (which nodes are
expanded, etc.). 

If the presentation data is included in the .leo file, then there will be a lot of unnecessary thrashing in cvs. If
.leo is checked in as -kb, then the whole .leo file will have to be copied over the network when someone
does an update or a commit. 

I propose that transient settings (presentation as opposed to content) should go into a separate file. 

These settings would include the &lt;globals&gt;, &lt;find_panel_settings&gt; and &lt;preferences&gt; elements and
portions of the "a" tag in &lt;v&gt; elements. 

This should decrease cvs thrashing with leo dramatically and allow more sensible diffs as well. Right now
when you diff two revisions of a .leo file, there is a lot of garbage having to do with different nodes being
expanded or contracted in the visual representation of the tree, which have nothing to do with the content of
the file. 

If there were a separate file (.lep for leo presentation, with the nice coincidence that "p" follows "o" in the
alphabet) with the presentation data, I think use of cvs could be much cleaner in both thick and thin modes. 

</t>
<t tx="ekr.20050421210335.8">By: edream ( Edward K. Ream ) 
RE: Separate presentation from content
2002-12-18 21:51 
Most "presentation" information is already present in .leo files, and that is certainly not going to
change. Indeed, the essential features of gti's are that they are _unique_ and _immutable_. The
"separate" file you speak of is the .leo file. 

Unless I am missing something: 
a) I totally agree with what you say, 
b) this is the way that Leo already works (except for clone indices), 
c) this is the way that Leo _must_ work in 4.0. 

</t>
<t tx="ekr.20050421210335.9">By: jmgilligan ( Jonathan M. Gilligan ) 
RE: Separate presentation from content
2002-12-18 22:16 
I think you may be missing something or else I do not understand why you have made the
choices you have made. 

I think that having the presentation mixed in with the content has two bad consequences ---
almost as bad as the lack of gti's: 

1) it makes it hard to track substantive changes with cvs because it makes file diffs hard to
read. If I change a few nodes, but expand and collapse a large number, most of the changes
reported by cvs diff will be changes in the presentation (expanded and collapsed nodes), and I
will have a hard time reading the diff and finding the places where actual tnode text or tree
structure has changed. 

2) it may increase the frequency of conflicts when cvs tries to merge two versions together.
For instance, suppose neither of us changes the content of a LEO file, but we both expand
and contract different nodes in order to look at different parts of the file. Now each of us saves
his own copy of the file. CVS may report a conflict between two files that have the same
content, but different presentation. 

</t>
<t tx="ekr.20050421211313">Design questions: conflicts among global nodes

There is only one design issue left before finishing the 4.0 code.  I believe there is a simple and good solution, and I want here to discuss the problem and various solutions.

In 4.0 we have the ability to reliably and permanently identify individual nodes, both vnodes (representing headlines in an outline) and tnodes (representing text that may be shared by several tnodes).  This ability raises some interesting questions...

First some implementation history:

It may not be apparent to the casual user (or even the not-so-casual user) but earlier versions of Leo really never identified nodes apart from the outline that contained them.  Indeed, .leo files contain "indices" for tnodes of the form "Tn" where n is an integer.  These indices are computed when the outline is saved, and are a way of linking tnodes to vnodes _within the context of a .leo file_.  Leo uses these indices while reading derived files.  This is the _only_ way, pre 4.0, for Leo to recreate clone links spanning files, that is, clone links between nodes in .leo file and derived files.

As a consequence, in all previous versions of Leo there had to be a "mirroring" between the structure of the derived file and the structure of the derived file being read.  Otherwise, the links from the outline couldn't be used to reestablish the links from the derived file.  This was why "dreaded read errors" (mismatches between the structure of the outline and the derived file) were so catastrophic pre-4.0: all clone links got cleared.

[End of implementation history]

With 4.0 the situation is entirely different.  We can identify vnodes and tnodes uniquely because their gnx's (global node indices) never change.  We don't need to "mirror" the structure of the derived file in the outline.  Indeed, the present 4.0 read code in leoFileCommands.py deletes the corresponding part of the outline (if any) _before_ reading the derived file, trusting that gnx's will be enough to recreate all clone links.  So reading an outline is exactly the same in 4.0 regardless of whether the outline contains redundant info (@file-fat) or not (@file-thin).  In particular, there is no such thing in 4.0 as a "structure error" while reading derived files, a major improvement in reliability.

Aside: the near certainty of structure errors was what doomed the ill-fated Create Backup command.  A similar command might be useful in 4.0.  As I write this it occurs to me that it might be a good idea to write an attribute to the .leo file (say in the global options section) for such files.  This option would cause Leo _not_ to load derived files when loading the .leo file, in effect always simulating the Read Outline Only command.  [End of aside.]

A note about clones:  In 4.0 there is no need to treat cloned nodes differently from other nodes as far as file format is concerned.  Indeed, whether a vnode is a clone or not is entirely a property of the structure of the outline containing the node.  In 4.0 clone bits are never written to the outline.  Moreover, the same node may be a clone in one outline and be a normal node in another outline.  Leo 4.0 handles this without any substantial changes to the file code.

Anyway, with all this background we can now discuss the last problem in the design of 4.0, namely what to do if various "copies" of nodes do not match while reading .leo files and derived files?  I think of the question in this way:  How global are global node indices?

This is a non-trivial question, for the following reasons:

1.  Leo will eventually allow "included" .leo files, like this:  @file x.leo.
2. There may be more than one window open at once, each potentially with various versions of the same node!

So unless we are careful we could have chaos on our hands:  multiple outlines each containing multiple copies of arbitrarily many nodes.  This is the kind of situation that a good design absolutely must avoid.  Fortunately, I believe there are a few guidelines that can simplify the situation so that it becomes manageable.  Indeed, little or no change will be needed in the present code.

Here are my present thoughts:

1. Leo will ignore the problem of keeping different nodes in different outlines "in synch".  The principle is this: the user is responsible for keeping track of different copies of the "same" nodes in different outlines.  That is, when a node changes Leo will update all "joined" nodes only within the present outline.  Not only is this easier to implement, it is also a perfectly reasonable thing to do.  Indeed, suppose a node appears in outlines A,B and C, and suppose we are editing outlines A and B.  Leo has no way of updating C in any case, so updating B will hardly ensure any consistency.  In short, if the user insists in using the same node in different outlines it will be entirely up to the user to make sense of all the potential complications.

In other words, the information that Leo uses to keep track of nodes, namely, the tnodesDict ivar of the fileCommands class, will remain local to a particular outline.  (The fileCommands object is a subcommander attached to a particular outline.)  Leo will _not_ try to use a truly global app().tnodesDict or app().vnodesDict to keep track of connections between different open outlines.

2. The only conflicts that Leo need trouble itself with are conflicts between a) a _dirty_ node in an outline and b) a node that is being read from a derived file.  As will be seen from the discussion above, the present code _requires_ that Leo reads the outline first, then the derived files.  For sanity's sake we _must_ assume that nodes in a derived file override non-dirty nodes in the outline.  Otherwise there will be no end to the conflicts that are reported to the user.

In the present version of Leo, nodes in the outline can never be dirty as the outline is being read.   In the future things may get more complicated because of the @file-wait option.  This option (not implemented yet) will load such derived file only when the user selects the @file-wait node.  Because of clones, a node may be dirty in the outline while the derived file is being read.

The somewhat paradoxical upshot of reporting conflicts only involving dirty nodes is that _saving_ a .leo file will allow future reads of derived files to override whatever information is shared in outline!  This isn't so bad:  saving the outline will typically rewrite derived files, so a future read won't in fact result in any conflicts.  Yes, one can imagine weird cases.  I believe only the user can resolve such pathologies.

In short, the only time we need raise a dialog is when the user edits a cloned node in the outline then reads an @file-wait derived file.  The dialog will give the user the choice of using the outline version of the node, the derived file version of the node, or of canceling the read.  It is even possible that we could dispense with the dialog entirely, and simply report that n dirty nodes in the outline have been replaced by the values of nodes from the derived files.

3. There are related issues involving the Copy, Copy and Past Nodes commands.  All previous version of Leo explicitly cleared all clone links when cutting and pasting outlines.  Leo had to do this because there was no way to recreate _all_ the clone links reliably; recreating clone links only within the pasted tree would have created endless confusions.  With 4.0 we have the potential for two kinds of cut/copy commands: commands that preserves the "identity" of nodes and commands that creates nodes with a new identity.  Both kinds of commands are reasonable and useful, and Leo will support both.  When pasting nodes whose identities have been preserved there is again the potential for conflicts.  There can be arbitrarily many conflicts between pasted nodes and nodes in the outline, so again we may not want to use an endless stream of dialogs...

Conclusion

In the "normal" cases, conflicts will not arise provided that nodes reside in at most one derived file.  I believe the user must be responsible for handling conflicting definitions when multiple derived files contain the same node.  Indeed, there is very little that Leo can do in such situations other than alerting the user that a conflict has occurred.  The present 4.0 code should do well in most situations, and a little tweaking may be needed.  I have no plans at all for "heroic" solutions to conflicting nodes.

All comments and suggestions are welcome.  There will be plenty of time for changes after 4.0 alpha 1 comes out.  Actually, my focus right now is on getting 3.11.1 out the door ;-)

Edward
</t>
<t tx="ekr.20050421211313.1">Progress report 5/13/03: please read

Early this morning I uploaded the new 4.0 code base to cvs.  This is an important milestone in Leo's history.  I urge anyone using code on cvs to at least skim through this long posting.

1.  Code on cvs is safe and stable

AT PRESENT, the code on cvs reads and writes 3.x files just as before.   The code appears stable, which is not surprising because the new code is based on the old.  Very few changes were needed to handle the new file formats.  These changes are enabled when app().use_gnx is true. app().use_gnx is false on cvs now.

I think the present cvs code is safe and stable.  However, I _strongly_ recommend that you assume otherwise:  Please make backups of any files or folders affected by the new cvs code.

Most of the work required to read and write 4.x .leo and derived files is complete; there are only a few days work left before the transition to 4.x formats.  I do NOT recommend that you enable the new code; some file format changes are in store.  You will be warned when the transition to the new file formats happens on cvs.  This will be another major event: all sentinels will change.

2.  Code bases have merged

The code on cvs merges the 3.x and 4.x code bases.  In retrospect, I regret splitting the development.  It really didn't help.  In any event, there will be no more splits in development.

With the merged code base, one can easily imaging releasing both 3.x versions and 4.0 beta versions in parallel.  Indeed, the only difference between the two versions would be the setting of app().use_gnx.  Also note that even in the 4.0 final version you will be able to write 3.x files.  Just set use_pre_4pt0_file_formats = 1 in leoConfig.txt.

The latest cvs code assumes that leo folders contain the config, doc,  plugins, scripts and src subfolders.  The new version of the installer script, leo.nsi, found in LeoPy.leo, creates these folders.  See the diary entry for 5/12 in LeoPy.leo for a list of the changes made to support the new directory structure.

3.  Improved code base &amp; new invariant

I took the opportunity afforded by 4.0 to clean up the old code base in several important ways.  First, the new code uses proper Python lists to represent join lists.  This greatly clarifies the relevant sections of code, and may even provide a performance boost.  Secondly, tnodes now contain both headline and body text.   This change creates an important invariant:  vnodes with the same vnx's always point to tnodes with the same tnx's.  This invariant is vital to resolving conflicts properly.

4.  Prototype of graphics &amp; styled text

leoColor.py contains a prototype of code that allows wiki-style markup in body panes.  This is important work, because it tests Tk's abilities to handle styling and graphics in the highly dynamic environment that is Leo's body pane.  Some details:

- There are a few bugs left to fix, and it is clear that this prototype is a great success.  The combination of Python and Tk are easily up to the job.  The programming details are interesting, and not interesting enough to discuss here :-)  Look at the topic called (Graphics &amp; Styled Text: Wiki format) in LeoPy.leo if you are curious.

- The code that handles the wiki formatting is disabled at present on cvs.  To enable it, just change if 1: to if 0: in doWikiText.  It is quite safe to do so: the syntax colorer catches all exceptions.  The worst that can happen is that some of your text won't be colored properly.  To repeat, there are a few known bugs:  I'll be fixing them soon.

- The present code only handles wiki markup in Python triple-quoted strings denoted by three double quotes.  Because wiki markup uses single quotes, it does not seem prudent to allow wiki formatting in ''' strings.  Still to do: allow wiki markup in doc parts and comments.  This will be easy enough.

- doWikiText at present only handles the following markup:

    __bold__,
    ''italics'' (two single quotes, not a single double quote!)
   {picture file=&lt;filename&gt;}
   ~~&lt;color specifier&gt;~~ 

To repeat: using wiki markup to denote styling and graphics is merely a prototyping expedient.  I'm not sure how to represent such information.  The great advantage of a text-based specification is that it doesn't change the rest of Leo _at all_.  In particular, I had a nice aha: we can use the Show Invisibles command to show or hide the wiki text.  When invisibles are hidden, all you see are the _effects_ of the wiki markup.  When invisibles are visible, you see the markup plus their effects.  No need for extra commands.

- rst is another alternative to wiki format.  I'll soon start another thread to discuss the pros &amp; cons.  Please don't vote here.

- Instead of the present hand-written parser, Leo should use an "official" parser for whatever markup is chosen.  However, this can wait.

5. To do

There are significant additional work to do before 4.0 final:

- I shall soon write a script that converts most @doc sections to Python doc strings.  This will bring Leo's code base closer to proper Python style, and will Python's docutils to process Leo's derived files, for what that is worth.

- After 4.0 alpha 1 comes out (maybe in a week?), I plan to rewrite the code that reads .leo files.  The new code will use an xml parser (probably sax) to read .leo files rather than the present hand-written mess.  I have fairly high hopes that sax will significantly speed up the reading of .leo files themselves.  Note, however, that such tools can _not_ be used to read derived files because derived files are not xml files.

Edward
</t>
<t tx="ekr.20050421211313.2">Progress report: 6/9/03

Much progress has been made during quiet time.  Quiet time continues...

Here is my present thinking:

Whatever problems remain in 4.0 can be solved independently of each other.  This is the big news.  Indeed, 4.0 will be very much as planned previously.

My present plan for 4.0 releases is as follows:

4.0 alpha 1.x will implement gnx's, and new formats for .leo files and derived files.  Most likely, the format of .leo files will anticipate the shared vnode scheme, although in fact the format of .leo files is largely independent of the internal representation of nodes.

When reading files, Leo will use the method of resolving conflicts that I discussed on Leo's wiki site.  I believe this scheme is no worse (and possibly better) than Leo's present atFile.read code.

4.0 alpha 2.x will use Python's xml.sax parser to read .leo files.   Initial experiments indicate that sax will provide only a modest speedup.  Whatever the actual speedup (if any), using an official xml parser is long overdue.

4.0 alpha 3.x will implement the shared vnode scheme.  I have made major progress on this in the last several days.  See the following paragraphs.

About shared vnodes

For several weeks I have been dithering about whether the tree traversal methods of the position class (p.next, p.back, p.parent, p.threadNext, etc.) should return copies of self or alter self to reflect the new position.

Today I resolved that question in a most satisfactory manner as follows:

1. For compatibility with present code in Leo itself, as well as compatibility with existing user scripts, p.next and all similar routines _must_ return a copy of p that indicates the new position.  Failure to do this would break code such as the following:

v = c.rootVnode() # N.B. v is actually a position.
while v:
....v2 = v
....&lt;&lt; do something with v2 &gt;&gt;
....v = v.threadNext()

The problem is this:  v2 is simply another name for v, and if v.threadNext alters v it will alter v2 as well.  This would lead to utter chaos.  See the P.S. for an illustration of the problem.

Consequently, p.threadNext() and all similar routines must return an altered _copy_ of p.  I call the methods that return copies of positions "copying" or "safe" methods.

2. The copying methods are implemented by methods that _do_ alter positions.  This is done elegantly as follows.  p.copy() returns a copy of position p.  All the safe methods are implemented just like this:

def copyMethod(self):
....return self.copy().moveMethod()

For example:

def threadNext(self):
.... return self.copy().moveToThreadNext()

This is the way it is written in "the book".  Exactly one copy of self is made for each such call.   You may think of moveToThreadNext() as an optimization of threadNext().  Rarely does an optimization have so elegant an expression, and rarely can the relationship between two sets of methods be so concisely expressed.

3.  Traversing a large outline will create many temporary copies of positions.  These copies are immediately eligible to be garbage collected, so these copies probably will never be of any concern.  However, the new moveToX methods provide easy ways of completely eliminating the extra copies as needed.

There are two ways of traversing an entire outline using the moveToX routines:

# First way:  assigning to v
v = c.rootVnode() # N.B. v is actually a position.
while v:
....&lt;&lt; do something with v &gt;&gt;
....v = v.moveToThreadNext()

# Second way: using side effects.
v = c.rootVnode()
while v.isValid():
....&lt;&lt; do something with v &gt;&gt;
....v.moveToThreadNext()

The first way uses the fact that v.moveToThreadNext() returns None if there is no threadNext position.  The second way relies on the fact that v.moveToThreadNext() alters v to point at the threadNext position.  If there is no such position v.isValid() will return false.  The two ways allocate exactly the same amount of memory.  Indeed, only one position ever gets allocated in either loop!

In conclusion, we have three extremely important results:

1. We can use the copying traversal routines with complete safety because those methods never cause side effects.  Such routines work _exactly_ like the corresponding vnode methods in previous versions of Leo.

2. We can use the moving traversal routines as needed to reduce memory allocation to an absolute minimum.  Leo's file i/o code and Leo's find/change code might be rewritten to sue the moving traversal routines.  To repeat, in the vast majority of cases the copying traversal routines would be perfectly good enough.

3. The "copying" routines are implemented in terms of the "moving" traversal methods, so only one version of each routine needs to be maintained.  Moreover, the "moving" traversal routines are similar, though not identical, to the present vnode traversal methods.  The simplicity of the code puts the whole shared vnode scheme on a firm foundation.

Odds and ends

4.0 alpha 1.x will make Leo more cvs friendly.  However, conflicts are still possible.  Stephen Schaefer pointed out that we can use cvs merely to flag conflicts.  In such cases, we would try to resolve conflicts using _valid_ .leo files or derived files.  I believe this approach makes excellent sense.  It shows, I think, that there aren't any insoluble problems regarding the resolution of conflicts.  At worst, we shall have to resolve conflicts by hand.  This is exactly how people use cvs with other editors.

Some 4.0 alpha version may support @file-thin and @include (actually @file x.leo).  The user will bear sole responsibility for resolving any conflicts that might arise.  I'll remove these features if they prove too dangerous.  More likely some conflict resolution scheme can be found.

It may be two or three months (or more) before 4.0 final goes out the door.  Just for fun, I'll probably implement the following projects during the 4.0 cycle:

- Replacing leoConfig.leo with menus that create leoConfig.txt.

- Adding spell checking.  Paul Paterson has just sent me a prototype spell checker!

- Regression testing.  I'll probably start work on this after releasing 4.0 beta 1.  Given the nature of the 4.0 release, I estimate that at least a month will separate 4.0 beta 1 and 4.0 final.

- Zero or more other fun projects.

Edward

P.S.  Here is an illustration of the problem with aliases.  The output of the following program is  5,5,6 (!)

class obj:
....def __init__(self,n):
........self.n = n
		
o = obj(5) ; print o.n
o2 = o ; print o2.n
o.n = 6 ; print o2.n

The reason for the "surprise" is that o2 is an alias for o, so changing o changes o2.  This means, for example, that routines such as p.threadNext() must return a separate and distinct copy of p.

P.P.S.  I welcome comments now, but on this post _only_.  Please respect quiet time for a few more days.  Thanks.

EKR
</t>
<t tx="ekr.20050421211313.3">Progress report: 4.0 beta 1 by July 1

It may not be apparent, and I am now completely focused on completing all unfinished projects that are in the works:

- For most of you, the spelling checker looks like yet another new project.  Actually, Paul and I have been working on it since last year.

- Two days ago I plugged a _huge_ memory leak.  This leak was the biggest performance problem in Leo.  Indeed, it was apparent that Leo was slowing down when many nodes were opened.  For example, you could see Leo slow down during the Find command as more and more parts of the outline became visible.

Expanding and contracting outlines now results in no net increase in memory.  This is in stark contrast with all previous versions of leo.py.  In the next few days I plan to put this issue completely to rest by ensuring that all objects associated with a Leo window are recycled when Leo closes that window.  This can't be a big issue unless you are editing several .leo files simultaneously, which is likely a rare occurrence.  Still, I plan to finish this project in the interest of closure.

Leo now contains simple and effective tools for monitoring memory allocation.  The new trace_gc.py plugin prints gc (garbage collection) statistics at every call to every hook.  This small-grained picture of storage allocation will provide conclusive information about what Leo is doing.  trace_gc.py use a new routine in leoGlobals.py call printGarbage, which shows the _change_ in memory usage from one call to the next.  Very simple; very handy.

Also in the interest of closure, I plan to ask the Python guru's about some fine points of the gc.  Python 2.2 contain a new gc.get_objects() routine that returns a list of all objects the gc knows about!  I have found it impossible to do anything with this list except compute its length; if it were possible to compare the contents of gc.get_objects() at successive times one would immediately be able to see _what_ objects have been newly created.  N.B.  This is very different from knowing what objects have been newly destroyed.

- I have decided to postpone the shared vnodes project indefinitely.  This is partly because fixing the memory leak promises to improve Leo's performance significantly.  More importantly,  I have heard no objections to the "cold feet" posting, and I see very little to be gained by this project.  The project can be put aside now without further cost:  the prototype code is clear and well documented and appears to contain no difficult outstanding issues.  So it would be possible to pick this project up again easily even years from now.

- In the next few days (or however long it takes) I intend to fix all serious outstanding bugs.  The cut/paste nodes commands appear largely fixed.  I say "largely" because there are issues involving clones that are difficult to get 100% correct while the code is a hybrid of 3.x and 4.x code.  There are also several problems with undo that must be fixed asap.  These were on hold until I resolved the question of whether to use shared vnodes.  The new spell checker really needs to have undo restore selection rangers, so fixing this bug is likely to happen this week.

- Also this week I plan to really start using unit testing in Leo.  There is some dummy code in leoTest.py that really should be completed before 4.0 beta 1.  

- When all this is completed, I estimate it will take about one more week to transition to the 4.0 files formats.  This is a major step and we are almost there!

In short, I plan to release 4.0 beta 1 in a few weeks, and I have some confidence in this plan.

Edward

P.S.  It should be much easier to release new versions of Leo after 4.0 is complete.  This is reason enough to finish 4.0 with all possible speed.  It is almost intolerable to wait months before fixing bugs in official releases:  many people find downloading files from cvs difficult or impossible.

P.P.S.  I am incredibly excited about Rodrigo's work with LeoN.  I think the best way I can support this project at present is to get 4.0 out the door so that it will be possible to change Leo more quickly.

EKR
</t>
<t tx="ekr.20050421212523">@nocolor

The major innovations in 4.0 were:

- New format for derived files: Eliminated child indices WITHOUT using gnx's.
    - Eliminated @node sentinels that indicate outline structure.
	- New @nl and @nonl sentinels simplified read logic.

- Eliminating the error-prone error-recovery scheme used when reading derived files.</t>
<t tx="ekr.20050421212523.1">New 4.0 ideas: your comments pls

I have been taking long bicycle rides and long baths trying to understand my reluctance to do 4.0.  Several interesting phrases have emerged from the creative unconscious.  I'll dignify these phrases with the term "design principles."

Design Principles for 4.0

1. I really like the present format of derived files, and I don't want to litter it with gnx's.  I just can not bear to look at dozens of @+node and @+body sentinels containing with gnx's like edream.071203150310.23.  Indeed, the new 4.0 will make derived files cleaner by eliminating child indices and related cruft.

2. The phrase "hidden machinery" came to me yesterday.  Leo will use hidden machinery in .leo files to eliminate gnx's and child indices from derived files.  This hidden machinery consists of new attributes of &lt;v&gt; elements in the xml format of .leo files and the corresponding Python logic.

3. The Resolve CVS command will work only on undamaged files, so we need no longer worry about the damage that cvs can do to files.  This removes some of the rationale for the old 4.0 derived file formats and permits us to use the simpler, more intuitive present format for derived files.

4.  The real key in the new design of 4.0 is the phrase "synchronized files".  In an earlier posting I talked of the "smallest unit of meaning" (SUM) in a program.  At that time I said that the SUM was a derived file.  That's not exactly wrong, and it's not the whole story.  Implicit in any cvs repository is that assumption that all files are updated in unison.  We can generalize this implicit assumption by the following "Synchronization Principle":

A .leo file makes sense only if it is in synch with all files derived from it.

If files aren't in synch we are going to find it impossible to make sense of different versions of nodes with the same gnx.  OTOH, many new possibilities arise if Leo can assume that all files are in synch.  This is a dramatic change in point of view.

BTW, the Read Outline Only command recreates the outline without reading any derived files, so the synchronization principle is trivially true in such cases.

5.  To support the synchronization principle, Leo will stamp each derived file with a gnx.  This gnx will appear in both the .leo file (in the &lt;v&gt; element corresponding to the @file node) and in the derived file itself in the @+leo sentinel.  For example:

#@+leo-ver=2-gnx=edream.071203150310-encoding=iso-8859-1.

In fact, this will be the _only_ gnx contained in each derived file!  With this single gnx Leo can determine whether derived files are in synch with the present .leo file.  BTW, principle 3 above implies that we don't much care whether cvs would report a conflict in the @+leo sentinel.

6.  Now that Leo can know for sure whether files are in synch with a particular .leo file, Leo can do things that weren't possible before.  For example, there is no need for gnx's to associate nodes in the .leo file with nodes in the derived file.  Data structures in the .leo file (that is, attributes of &lt;v&gt; elements) can refer to the "nth node" written to the derived file.  This is a reliable link as long as files are in synch.  The details are a bit complicated, and that doesn't matter.  What does matter is that neither explicit child indices nor explicit gnx's are required in derived files: they can be replaced by hidden machinery.  An appendix to this posting shows the proposed new format for derived files.

7. Both explicit gnx's and implicit machinery have their limitations.  In particular, I am still struggling to understand the conditions under which @file-thin and @file x.leo (@include) can be made to work safely.  As usual, issues of clones and of safe backups remain murky.  I do understand some people want small .leo files.  However, the present scheme contains redundant information that can be a lifesaver when files get out of synch.  I am still not convinced that all the complications can be handled safely.  I may defer these issues yet further...

8. Another Aha.  There is another way of getting the benefits of small .leo files while retaining the safety that comes from redundant information.    If we really want to "optimize" .leo files it would be better to use a Zope db file as the basis of .leo files.  That way we don't have to read the entire db:  we simply update the db when writing and we randomly access the DB when reading or drawing the screen.  The advantage of this approach is that the Zope has already solved all the issues concerning safety and fast access.

Please note that I have no immediate plans to move to a db implementation of .leo files.  I merely point out that an entirely different approach might actually give people fast access and better safety.

9. Even simpler formats for derived files are possible if Leo can reliably assume that all files are in synch.  For example, the hidden machinery in the .leo file could represent all outline structure and the derived file would only have to represent the start and end of body text.  Unadorned @+body and @-body sentinels could do this:  Leo could figure out the correspondence between snippets of body text in the derived file and nodes in the outline.

However, IMO this approach would be going too far, for several reasons.  First, it's a fragile design and bad engineering.  The crucial atFile read code would become much more complicated.  Second, the present derived file format contains highly useful information for human readers as well as for Leo.  If people really don't want sentinels there is a way to eliminate them all.  Otherwise, the present sentinels give a clear indication of where code came from.

Summary and conclusions

1.  The present format of derived files is an excellent basis for further improvements.   In 4.0 derived files will be exactly the same as at present except that
a) @node sentinels will no longer contain child indices and
b) @+leo sentinels will contain an unambiguous time stamp in the form of a gnx.

2. The synchronization principle forms the essential basis for thinking about Leo.  This principle opens the way for some implementation tricks (hidden machinery) that are not visible in derived files.  Good engineering demands that we not get carried away with these tricks.

3. It is not clear that @include and @file-thin are compatible with the synchronization principle.  The "single-owner" rule for clones may be difficult or impossible to enforce in the presence of these "thin" ways of representing structure.  These questions remain open.

I plan to use these design principles as the basis of 4.0 unless I hear good reasons not to do so.  I estimate 1-2 weeks of work will be needed to complete the basic hidden machinery for 4.0.  The hidden machinery must do the following:

a) Link nodes in the derived file with the proper tnodes in the .leo file.  This will set clone links properly.
b) Specify the order of section definition nodes.
c) Set mark bits in nodes.

The new code will be contained in LeoPlugins.leo until it meets with your full approval.  I plan no work on @file-thin or @include until 4.0 is completely solid.

Edward

Appendix:  Example outline and corresponding derived file

Here is the outline I am using to test the 4.0 code, i.e., the output of the Flatten Outline command:

+ @file c:\prog\test\gnxText.txt
# gnxText.txt

before
&lt;&lt; ref &gt;&gt; afterref
middle
&lt;&lt; ref2 &gt;&gt;
after
@others
after at-others
	- &lt;&lt; ref &gt;&gt; (this should remained cloned)
ref line 1
ref line 2
	- &lt;&lt; ref2 &gt;&gt;
ref2 line
	+ Organizer node
		- node 1
node 1 line 1
node 2 line 2
		- empty node
	- Node 2
node 2 line 1
node 2 line 2

Here is the corresponding derived file.  Note that @node sentinels are simpler than before and that a gnx has been added to the @+leo sentinel.

#@+leo-ver=2-gnx=edream.071203150310-encoding=iso-8859-1.
#@+node:@file c:\prog\test\gnxText.txt
#@+body
# gnxText.txt

before

#@&lt;&lt; ref &gt;&gt;
#@+node:&lt;&lt; ref &gt;&gt; (this should remained cloned)
#@+body
ref line 1
ref line 2
#@-body
#@-node:&lt;&lt; ref &gt;&gt; (this should remained cloned)
 afterref
middle

#@&lt;&lt; ref2 &gt;&gt;
#@+node:&lt;&lt; ref2 &gt;&gt;
#@+body
ref2 line
#@-body
#@-node:&lt;&lt; ref2 &gt;&gt;

after

#@+others
#@+node:Organizer node
#@+node:node 1
#@+body
node 1 line 1
node 2 line 2
#@-body
#@-node:node 1
#@+node:empty node
#@-node:empty node
#@-node:Organizer node
#@+node:Node 2
#@+body
node 2 line 1
node 2 line 2
#@-body
#@-node:Node 2
#@-others

after at-others
#@-body
#@-node:@file c:\prog\test\gnxText.txt
#@-leo

EKR
</t>
<t tx="ekr.20050421212523.10">The new ideas in the New Design Notes are baring fruit.

1. After much messing around, I finally realize that every line in body text corresponds to one or more lines in the derived file.  Seeing this simple fact wasn't possible if sentinels begin and end with newlines.

2. All "lines" end in a newline, except possibly the last line of body text.  In derived files there are no exceptions to this rule.

3. I have written the first draft of the write logic.  Unlike earlier versions, the write code handles each line of body text independently.  The code is much cleaner than before.  Instead of having separate putCode and putDoc methods containing much duplication of effort, the new code simply has an inCode var which indicates whether we are in code mode or doc mode.  The new @nonl and @ws directives look like they are just what are needed.

4. The original implementation was as a plugin that overrode a number of classes, and the atFile class in particular.  This was cute, and a cute mistake.  The problem is that we need both the old and new versions of the read code for compatibility, and overriding the old code breaks it (it can't read old-format derived files).

I'm not sure exactly the roadmap to integrating the two versions of code.  Probably I'll create a new module called leoDerivedFile.py that will implement the new read/write code.  Yes, this ends up duplicating lots of code.  However, it guarantees that the old atFile code won't ever be touched by mistake.  Eventually some of the duplication may be merged away, but that is a potentially error-prone operation and there is no urgent need to do that.

5. Once I figure out how to have two flavors of read code coexisting I'll actually write the new read code.  Hopefully I'll start in a few days.  The new read code should benefit greatly from the simplicity of the new write code.

Edward

P.S.  Yes, I have been ignoring everything else except 4.0 recently.  Sorry, but this is the only way 4.0 is ever going to get finished.

EKR</t>
<t tx="ekr.20050421212523.2">About @include

Yesterday during another long bicycle ride the situation regarding @include became clear.  To state my conclusions first:

1. Leo can support only @include nodes that open the included .leo file in a _separate_ window.

2. @url nodes can already do this, so @include is much  the same as @url.  For example, double clicking
@url file:///C:/prog/x.leo opens x.leo in another copy of Leo.  The only possible improvement would be to special-case the @url logic so that .leo files were opened in the same copy of Leo.

Clones are the reason nothing "better" can be done with @include.  Indeed, the purpose of opening the "included" .leo file in a _separate_ window is to emphasize that no clones can be created from the "including" .leo file to the "included" .leo file.  Such "cross-dot-leo-file" clones can not be allowed because they create a virulent form of the multiple update problem.

For example, suppose we allow cross-dot-leo-file clones.  Let a.leo include b.leo and suppose we clone a node in b.leo and move it into a.leo.  Now suppose we close a.leo and open b.leo and change any of the "cross-dot-leo-file" clones in b.leo.  We then open a.leo.  What version of the cross-dot-leo-file clones do we use?

The recent work on the synchronization principle makes the situation clear:  it is impossible to keep _separate_ .leo files in synch if they contain "cross-dot-leo-file" clones.  This is the end of the story.

Edward
</t>
<t tx="ekr.20050421212523.3">Here is another picture, perhaps clearer, illustrating the problems with @include.  The new 4.0 file format will feature a gnx timestamp in @+leo sentinels.   A .leo file is out-of-synch with a derived file if the timestamp in the &lt;v&gt; element of the @file node in the .leo file does not match the timestamp in the derived file.

The question is, which .leo file gets to write the timestamp in the derived file if a.leo includes b.leo?  You could glibly say:  "b.leo, of course".  But what happens if a.leo contains a clone of the @file node and @include b.leo gets put in an @ignore node?  The problem is that there are just way too many possibilities, and all such possibilities must be accounted for in the design and handled with special-case code.  This is not progress.

Furthermore, putting @include files in separate windows is best, regardless of implementation details.  For example, consider the various parts of the Python project.  The Python compiler really should be separate from each individual Python module, and from separate tools such as Idle.  What could possibly be gained from intermingling the code in a single .leo file?

I suppose there may be instances, for example when managing websites, where one would like to intermingle "code" in disparate projects.  However, Leo isn't the proper tool for that kind of job.  Zope comes to mind as a much better tool.  And you can always write Python scripts to do weird and wonderful things with files.

In short, it is just a bad idea to have @include intermingle code in the same outline.

Edward
</t>
<t tx="ekr.20050421212523.4">&gt; I'm not sure I understand what you mean when you say that files are
&gt; in sync. Specifically, what does it mean to say that a set of files are out
&gt; of sync? How do the new gnx's identify this situation?

Excellent question, Paul.   First of all, it will simplify matters to ask whether a particular derived file is in synch with a particular .leo file.  Several derived files are in synch with each other if and only if they are all in synch with the same .leo file.

So the question is, when is a derived file in synch with a .leo file?  There are several possible answers, all equivalent.

1.  A derived file is in synch with a .leo file if a) Leo wrote the derived file using the .leo file and the @file node corresponding to the derived file has not become dirty since then and no changes have been made to the derived file's sentinels.  The basic idea is simple:  the derived file must have come from the same version of the .leo file.

2.  A derived file is in synch with a .leo file if the gnx in the &lt;v&gt; element of the @file node corresponding to the derived file is the same as the gnx in the @+leo sentinel of the derived file, and the derived file has not become dirty and no changes have been made to the derived file's sentinels.

3.  The outline structure of the @file node corresponding to the derived file is the same as the implied structure of the derived file.

4. Neither the derived file nor the corresponding part of the .leo file has been changed in Leo nor written _by Leo_ since the derived file was created by Leo.

Note that all of these answers allow for the possibility that the derived file has been altered _outside of Leo_ by an external editor.  This is perfectly valid provided that no changes have been made to the sentinels in the derived file.  In particular, the definition of files being in synch does not, and can not, rely of file modification dates.  Provided that no changes have been made to the sentinels of the derived file, Leo can use the gnx in the @+leo sentinel of the derived file to determine whether the .leo file is in synch with the derived file.

&gt; 2. simpler file format: +1! This would be great if it can be done safely.

The synchronization principle tells us exactly when we can safely elide child indices.

BTW, I may tinker with the new file format to see if I can fold @no-nl sentinels into @+ref and @+others sentinels.

&gt; 3. @include: I would have two use cases for these,
&gt; 
&gt; a) cloning across .leo files. I might like to do this but it probably represents
&gt; a bad workflow design (if code nodes then this should be a library, if a text
&gt; node then a template file or maybe @template).

This can never happen.

&gt; b) organizing multiple .leo projects. This is the primary use and probably requires
&gt; more than just a navigational aid (like your @url example).
&gt; 
&gt; If I am organizing multiple .leo projects then I need to see them all in one
&gt; place. I'm happy with restrictions on what I can clone but I'd really like to
&gt; have them on one tree view.

This could happen, I suppose, and the putting in restrictions on cloning will turn out to be extremely complex. 

&gt; For me, Leo's power is that it allows you to see both the big picture and small
&gt; picture for a given project. @include should do the same *across* projects.
&gt; @include is purely a GUI function, the data from individual files is never mixed,
&gt; only the representation on the screen.

Again, the implementation details would be hairy.  It most likely will never happen.  The point is that @url is simple and good.  It is good because it shows quite clearly that cross-dot-leo-file clones can't happen.   Having the gui lead users to believe that cross-dot-leo-file clones are possible , and then putting in complex code to make sure it isn't possible does not seem like a good use of my time :-)

Edward</t>
<t tx="ekr.20050421212523.5">The question of when files are in synch is pragmatic.  The atFile read code needs to know that the structure of a derived file is unchanged from when Leo last wrote it from the present .leo file.  In practice, the atFile read code just needs to compare the gnx in the vnode with the gnx in the derived file.

Basically, files get out-of-synch when several people are using several different versions of the same .leo file to write derived files.  Committing changes to .leo files to cvs when committing any derived files should keep files in synch.

One can conceive of situations in which an @file node is ignored, and then one tries to use the Read @file Nodes command to try to read the "out-of-synch-with-itself" derived file.  In that case the atFile read code will complain.  Unlikely to happen and no big deal if it does.

What we must avoid like the plague is generating the same derived file from two _different_ .leo files.  This is related to the ban on cross-dot-leo-file clones.  Each no-no creates an unmanageable multiple-update problem.

Edward
</t>
<t tx="ekr.20050421212523.6">New Design Notes for 4.0

4.0 is the last best chance to improve both the format of derived files and the code that reads and writes derived files.  In this posting I discuss the new format in detail.  I am writing this to clarify my own thinking _before_ writing the code so some details will change.  Most people may be happy just reading the executive summary or skipping this posting entirely.

Again, this is a working document.  Details will change as the interactions between the actual write and read code become clearer.

Edward</t>
<t tx="ekr.20050421212523.7">The goals of the improved format for derived files are:

- Remove child indices from @+node and @-node sentinels.
- Write @+node sentinels and @-node sentinels only for nodes containing body text.
- Eliminate @+body and @-body sentinels entirely.
- Eliminate "extra" blank lines before and after sentinels.
- Simplify the code that reads sentinel lines, which presently is way too complex.

To accomplish these goals, derived files will contain the new @ws and @nonl sentinels.
</t>
<t tx="ekr.20050421212523.8">We can't eliminating extra blank lines unless we are willing to rewrite the old code (in leoAtFile.py) and substantially modify the conceptual framework on which the old code was based.  I wandered around a bit before I realized that the write code must "come first".  Not only does the write code create the format of derived files, but the write code creates the conceptual framework that will simplify (or not!) the job of the read code.

The old read code has great difficulties deciding whether newlines "belong" to sentinels or not.  The culprit came from considering that sentinels begin _and_ end with newlines.  The old write code "pretended" to write newlines before and after all sentinels, then used a "newline pending" scheme to eliminate redundant newlines between adjacent sentinels.

The advantage of the old scheme was that it did, in fact, ensure that all sentinels started on their own line and that a newline follows the last sentinel line (in any group of sentinels).  Alas, this way really makes life miserable for the read code.  Moreover, this scheme has the effect of using "extra" blank lines to indicate newlines in body text.

With these problems in mind, a new way of writing and reading derived files has taken shape.  Here are its key features:

1. Sentinel lines will be "regular" lines.  That is, they will end with a newline but they will _not_ start with a newline.  This will eliminate the old "newline pending" scheme used in the write code.  As a result, the new read code will be much simpler.  This is very important.

2. Because sentinels are no longer deemed to start with a newline, some new way must be found to ensure that all sentinels do, if fact, start on a new line.  As we shall see, this is fairly easy to do.  Sometimes we know that a sentinel will start on a new line because the corresponding construct in the body text starts on a new line.  We call this kind of assumption "borrowing" a newline.  In any event, the write code must in fact ensure that all sentinels start on a new line.

3. A notion of "contribution" clarifies the workings of both the write and read code.  Sentinel lines "contribute" text to the "output stream" being accumulated while _reading_ a derived file.  Some examples:

-  The read code creates a new stream (for a new vnode) when it sees an @+body sentinel.  The read code sets the vnode's body text when the read code sees an @-body sentinel.   So the @+body and @-body sentinels make no contribution to the current stream.  Rather, they switch streams.

- The @verbatim sentinel contributes the entire next line (including the trailing newline) the "output" stream.

- The @@ sentinel contributes everything following the first @ (including the trailing newline) to the output stream.

- Non-sentinel lines contribute the entire line (including the trailing newline) the output stream.

4. Because sentinel lines must start on a new line, it is sometimes necessary to end a non-sentinel line with an "extra" newline.  The new @nonl sentinel will _remove_ that extra newline.  That is, the @nonl has a _negative_ contribution.  The @nonl sentinel allows the read code to remove newlines _after_ they have been added to the output stream, and this is the key element that must be present in a "look behind" scheme.

5. Sentinel lines don't happen "at random".  The old code did not take sufficient advantage of this fact.  Indeed, sentinels tend to occur in groups, and some sentinels appear only following other sentinels.  For example, the @+node sentinel occurs only directly following @ref or @+others sentinels, possibly with an intervening @+nonl sentinel.

Moreover, there are only a few ways for groups of one or more sentinels to start:

- As the result of a section reference or an @others directive.  These are the most complex cases for Leo to handle.  We'll look at these cases in detail below.

- As the result of some other directive that creates a single @@ sentinel.  Directives start on a new line, so we know that the corresponding sentinel line will also start on a new line.  This is an example of "borrowing" a newline.

- As the result of something that "looks" like a directive but isn't.  This can create an @verbatim sentinel.  Again, we can borrow a newline.

- At the end of body text.  The @-node sentinel occurs only after a newline in the body text or the @nonl sentinel.  If an extra newline must be inserted at the end of body text to "get ready" for an @-node sentinel, that extra newline will be followed immediately by an @nonl sentinel (and then the @-node sentinel).

6.  With a single exception, the whitespace that precedes sentinel lines makes no contribution to the output stream.  The write code outputs this whitespace just to make the derived file look good.  The single exception is the new @ws sentinel.  The leading whitespace preceding the @ws sentinel is contributed to the output stream.  The newline that follows the @ws sentinel is _not_ contributed to the output stream.

7. As examples will soon show, the write logic must be careful to insert newlines and @ws and @nonl sentinels for the various special cases.  This should be relatively easy to do.  The rewards are large for the read code.  Also note that simplifying the read code will also simplify the task of scripts that handle derived files.

8. The "line-splitting" convention used in @doc parts should not need to change.</t>
<t tx="ekr.20050421212523.9">The new @ws and @nonl sentinels allow the write code to ensure that the sentinels created by section references and @others directives start on a new line.

Here is a summary of the new derived file format.  Each example starts with a single construct from the body text of the outline, followed by one or more lines written to the derived file.  Notes may accompany the lines written to the derived file.

Conventions:

- The body text of each example starts a line.
- [ws] indicates non-empty whitespace.
- [ws*] indicates possibly empty whitespace.
- [in] indicates present whitespace corresponding to self.indent
- [in2] indicates new values of self.indent as the result of processing [ws].
- x,y indicate string of characters that begins with a non-whitespace character.
- We assume that # is the comment delimiter
- Extra information in @+node and @-node sentinels not shown.

Example 1: &lt;&lt;ref&gt;&gt;

[in]#@ref

Note: the @ref sentinel borrows the nl from the body text

Example 2: [ws]&lt;&lt;ref&gt;&gt;

[in][ws]#@ws     
[in2]#@ref

Note:  @ws sentinel contributes [ws] to the stream being accumulated by the read code.  The "extra" newline following @ws is _not_ contributed.  In effect, the @ws sentinel is borrowing the preceding newline and the @ref sentinel is borrowing the newline that ends the @ws sentinel.

Example3 : [ws*]x&lt;&lt;ref&gt;&gt;

[in][ws*]x
[in2]#@nonl
[in2]#@ref

Note: The first line is a nonsentinel line, so the entire line is contributed, including the extra inserted newline following x.  The @nonl sentinel removes this extra inserted newline.

Example 4: &lt;&lt;ref&gt;&gt;x

[in]#@ref
[in]#@+node
 ...
[in]#@nonl 
[in]#@-node
[in]#@verbatim
[in]x

Note: in this and similar examples Leo writes the @nonl sentinel before the @-node sentinel only if the body text did not end in a newline (so that an extra newline had to be added before the @-node sentinel).

Example 5: [ws]&lt;&lt;ref&gt;&gt;x 

[in][ws]#@ws
[in2]#@ref
[in2]#@+node
...
[in2]#@nonl  (if needed)
[in2]#@-node      
[in]#@verbatim
[in]x

Example 6: [ws*]x&lt;&lt;ref&gt;&gt;

[in][ws*]x
[in2]#@nonl
[in2]#@ref
[in2]#@+node
...
[in2]#@nonl  (if needed)
[in2]#@-node
[in]#@verbatim
[in]x

Examples involving @others are actually quite similar to the examples involving section references.

Example 7: @others

[in]#@+others
[in]#@+node
...
[in]#@nonl  (if needed)
[in]#@-node
[in]#@-others

Notes: no change to indentation.  One or more @+node...@-node groups may appear in this example and the next.

Example 8: [ws]@others

[in][ws]#@ws
[in2]#@+others
[in2]#@+node
...
[in2]#@nonl  (if needed)
[in2]#@-node
[in2]#@-others

Note: indentation changes.</t>
<t tx="ekr.20050421214628">I've just uploaded code that implements a simple and hugely effective scheme for handling read errors.  This scheme eliminates essentially all the old problems with error recovery, and is much simpler as well.  It took about 30 minutes to do.

The keys ideas:

1.  The read code does not set the body text of tnodes directly.  Instead, the read code sets a temporary attribute of tnodes called t.tempBodyString.  Not all tnodes need have such a string.

2. If no errors or found, the read code simply copies all t.tempBodyString attributes to the permanent t.bodyString attribute.  If the t.tempBodyString attribute does not exist, the empty string is used.

3. In all cases, the read code deletes all t.tempBodyString attributes after the read is finished.

This is extremely fast and neat, and it means that no outline structure needs to be restored if there are errors, and the outline remains exactly as it was before the failed read.  In particular, clone links in the outline never get broken.  In fact, the new read code _never_ changes outline structure in any way.  If a node is referenced that does not exist in the outline a read error will occur and absolutely no changes will be made to the outline.

In short, all error recovery problems have been solved.

Edward
</t>
<t tx="ekr.20050421214628.1">Progress (!) report: 9-17-03

4.0 is nearly complete.  This posting will summarize what has been accomplished and will highlight some non-obvious details.  As usual, this posting is something akin to "notes to myself".

From my point of view, 4.0 is a complete success:

1.  The derived files are clean and good looking.  They are easy to read and write and the read/write code is in fact simple and elegant.  The new @nonl and @nl sentinels make it absolutely clear where the newlines are and aren't.  There is no "cruft" left in sentinels such as clone indices or node indices.  Remember that this will eliminate false cvs diffs.  The @+leo sentinel now contains a version field so that Leo can distinguish old-style derived files from new-style.

Minor note:  looking at new derived files, it would appear that the @ws sentinel could be eliminated.  Usually it could, but not always, so trying to eliminate @ws would probably be a bad idea: it would just complicate matters and make the derived file less easy to understand.

2. The "hidden machinery" in .leo files is working very well.  The new read code never creates new vnodes or tnodes: the read code _only_ uses vnodes and tnodes that already exist in the outline.  This means that clone links and marks (and potentially any other information contained in the outline) never get disrupted.  In fact, the read and write code don't anything about them!  Everything "just works".

3. As I mentioned in another post, the "error recovery" logic which has caused so much misery in the past _no longer exists at all_.  This is a huge step forward.  When errors are found they will almost always be because the derived files are out-of-synch with the .leo file.  Leo will now refuse to do anything stupid, and instead will revert to the original @file tree of the outline.

In particular, when a derived file is out-of-synch, the Read @file Node command won't work either.  As an emergency measure, Leo will have a new Import Derived File command.  This will recover all body text in a derived file, placing distinct body text in distinct nodes.  However, the Import Derived File can't fully recreate the outline structure because that outline structure no longer exists in derived files.  That is, when putting @+node sentinels, Leo no longer writes the ancestor @+node sentinels needed to recreate the outline structure.

The Import Derived File command is, in fact, the best that one can hope for.  Recall that read errors typically arise from out-of-synch conditions.  In such cases, the semantic integrity of the derived file is dubious, so matching an out-of-synch body text to already-existing outline text would be actively misleading.   In other words, out-of-synch conditions are essentially unsolvable problems, so the approach used by the Import Derived File command is simple, good and reasonable.

BTW, the Import Derived File command will use all the new read code.  The _only_ difference between the read code and the Import Derived File command is that when the findNode routine will create a new vnode and tnode when "importing" instead of using the hidden machinery to get a tnode when reading.

Code status

The 4.0 code presently exists as the use_gnx.py plugin.  I did this to provide an easy way to back out of the new code during development.  I shall soon integrate the new code into LeoPy.leo itself.  It would not be proper to leave the 4.0 as a plugin--it's fundamental to how Leo works.

The present 4.0 code creates a _new_ atFile object created from a subclass of the old atFile class.  It is imperative that these two objects remain separate.  In particular, once I make a few trivial changes to the old atFile class (so that it calls the new atFile2 object to handle new derived files), I plan to make _no_ further changes to the old atFile code.  In other words, any remaining read/write bugs will be fixed only in the new code.

In short, aside from very minor changes needed to fold the plugin into LeoPy.leo, the fundamental 4.0 read/write code is complete.  I haven't tested the @nosentinelsfile and @rawfile code yet.  That will happen eventually, perhaps before transition, perhaps after.

Transition from 3.x to 4.0

The present 4.0 code appears to read and write leoAtFile.py correctly.  I've checked this "by hand" using a graphical diff program.  The next step is to write a script that compares the output of the old and new write code.  This script will ignore all blank and comment lines.  Hopefully, such a simple script will be enough to demonstrate the semantic equivalence of old and new derived files.  Once this script works I shall write each derived file in LeoPy.leo in 4.0 format and will verify that the new derived file passes the "semantic test".

Another test is the "round trip" test.  I use two temporary commands in the Read/Write menu: Read 4.0 Derived File and Write 4.0 Derived file.  All nodes of an @file tree should remain unchanged after writing a 4.0 derived file and then reading that file.  leoAtFile.py already passes the round-trip test.

N.B. The round-trip test can fail even when the semantic test passes.  This is due to minor variations in how the new write code works.  There are small difference in how Leo outputs doc parts and how Leo handles @first and @last lines.  The new code works a little better than the old, and I am willing to live with these small differences.  So in some cases a derived file will fail the round-trip test the first time only, and thereafter will pass the round-trip test.  I don't believe this will be upsetting to people.  In fact, it is unlikely that people will notice at all.  In particular, all files in LeoPy.leo will pass the round-trip test once transition is complete.

Transition will essentially be complete once all derived files of LeoPy.leo pass the semantic and round-trip tests.  At this point it will become feasible to use 4.0 for all files derived from LeoPy.leo.  Note that Leo will be able to read old-style derived files "forever".  I'll upload to cvs the 4.0 version of all derived files in LeoPy.leo shortly after transition is complete.

As a user option (and an escape in later emergency situations), I shall soon add a "use_old_derived_file_format" option to leoConfig.txt.  This will cause Leo to write old-style derived files when saving .leo files and when executing the Write @file Node command.  However, I plan to deprecate the use of old-style derived files shortly after 4.0 final is released.

Schedule

I estimate it will take a few days for all files to pass all tests.  I'll then turn my attention to nosentinel files and other minor details.  I am shooting for sometime next week to release 4.0 alpha 1 as an official release.

Edward

P.S. The transition to 4.0 similar to bootstrapping a compiler.  One uses an existing, debugged compiler to compile the new compiler, and then one verifies that the new compiler, after compiling itself, produces the same code as it did when compiled with the old compiler.

EKR
</t>
<t tx="ekr.20050421214628.2">Goodbye @ws

Over the last several days I just couldn't let go of the idea of eliminating @ws.  I finally discovered a clean way to do it.

The essence of the problem are (rare) lines in the outline of the form:

[whitespace]non-whitespace&lt;&lt;section reference&gt;&gt;

If it were not for those lines, we could recreate the whitespace the precedes sections references and the @others directive using the indentation in #@others and #&lt;&lt; sentinels.  Alas, in the special case given above the line:

[whitespace]non-whitespace

that precedes the @&lt;&lt; sentinel already will have already contributed the [whitespace], so we need a way of suppressing the leading whitespace that would otherwise be contributed by #@&lt;&lt; sentinel.

The solution is simple and effective.  Rather than using the indentation of the #@&lt;&lt; sentinel or #@others sentinel to determine how much whitespace should be contributed, we will expand the syntax of those sentinels to indicate the whitespace "explicitly".  That is, when an #@&lt;&lt; sentinel is to contribute [ws], it will have the form:

#@[ws]&lt;&lt;

Similarly:

#@[ws]@others

will indicate that:

[ws]@others

appeared in the outline.  BTW, section references is the only noweb construct and @others is the only directive that may be preceded by leading whitespace.

A new ivar, at.leadingWs, will be set by the write logic to indicate that [ws] is to be part of the #@others or #@&lt;&lt; sentinel.  In the special case above, at.leadingWs will be empty.  In the cases where @ws was formerly generated, at.leadingWs will be non-empty. When reading a derived file, the code that handles the #@others and #@&lt;&lt; sentinels can easily calculate the leading whitespace to be contributed.

This is a perfect solution.  The #@others and #@&lt;&lt; sentinels show clearly the whitespace they contribute.

This Aha removes the last "rough" edge in the new file format.  It took less than an hour to implement...

Edward
</t>
<t tx="ekr.20050421214628.3">4.0 is (almost) complete!

This is a big day for Leo.  At long last 4.0 alpha 1 is now on cvs! With a few exceptions (see below), the 4.0 project is complete! In particular:

- Leo will automatically read either old-format (3.x) derived files or new-format (4.x) derived files.
- Leo can write either 3.x or 4.x derived files by default, depending on a setting in leoConfig.txt.
- The code that reads and writes 3.x files is virtually unchanged, and should be safe to use.
- New commands allow you to write individual derived files either in 3.x or 4.x format at any time.
- Leo reads and writes all of its own files correctly, both in 3.x format and in 4.x format.
- Leo reads and writes "hidden machinery" in .leo files.  Old versions of Leo can't read this machinery.
- The new error recovery scheme makes using Leo much safer on flaky derived files.
- Some minor features haven't been fully tested.  See below.
- The new code has been fully integrated into Leo's code base; it no longer exists in leoPlugins.leo.
- The code contains several traces that write to the console window.  These will go away soon.

WARNING: this is still alpha quality code: there may be significant bugs lurking in the code.  Please make full backups and use extreme caution if you use this code.

Important details:

1. [Choosing the format of derived files]  The write_old_format_derived_files setting in leoConfig.txt determines which format of derived file Leo will write by default.  At present, that setting on cvs is:

write_old_format_derived_files = 1

so Leo will still writes old format derived files by default.   Today I shall complete the transition to 4.0 by setting.

write_old_format_derived_files = 0

in my copy of leoConfig.txt.  I'll change the setting on cvs once I am comfortable that Leo works well.

2. [Bugs and incomplete testing] There is one known bug that must be fixed asap.  At present the code that reads and writes doc parts does not fully preserve whitespace following @doc or @space directives.  If you follow those directives with exactly one space the read code recreates the outline exactly as it was.  Otherwise, Leo will add a blank if none exists and Leo will delete additional whitespace.  To fix this the @+doc and @+at sentinels will indicate exactly the whitespace that follows the corresponding @doc or @space directives.

The @raw, @endraw and @delims directives haven't been tested and may not work.  Also, Leo hasn't been tested @silentfile, @rawfile and @nosentinelsfile trees.

3. [Error recovery]  The new error recovery scheme works extremely well, and for both 3.x and 4.x derived files.  If there are problems reading a derived file Leo instantly reverts back to the code contained in the outline.  The new error recovery scheme makes using 4.0 a1 much safer than it would be otherwise.  It has saved my bacon several times already.

4. [New commands] Regardless of this setting, you can force derived files to be written in either 3.x or 4.x format using two new commands in the File:Read/Write menu.  The Write 3.x Derived Files command and the Write 4.x Derived Files command are just like the Write @file nodes command except that they force Leo to use the indicated format when writing the derived file.  In contrast, the Write @file Nodes command uses the write_old_format_derived_files setting to determine the format of the derived file.

Supposedly you may write a .leo file compatible with previous versions of Leo using the Write 3.x Outline command.  I've forgotten exactly what this does, and it may go away soon.  I would avoid using this command for now.

The Import Derived Files command in the File:Import menu works for 4.x derived files.  I'm not sure whether it works for 3.x derived files.  Probably not.  There is still some work to be done on this command.

5. [Auto-save, .leo files &amp; hidden machinery] Leo will automatically save the .leo file when writing 4.x derived files with either the Write 4.x Derived Files command or with the Write @file Nodes command that actually writes 4.x derived files.  This is essential so that the hidden machinery in the .leo file always remains in synch with the corresponding 4.x derived files.  Leo will write a message in blue to the log pane when it auto-saves a .leo file.

Leo writes a new "tnodeList" attribute to &lt;v&gt; elements of .leo files for all @file nodes whose derived files have been written in 4.x format.  Old versions of Leo can't read such .leo files.  Old versions of Leo _should_ be able to read .leo files that don't contain any tnodeList attributes.

Leo is careful to remove tnodeList attributes of _vnodes_ when writing 3.x derived files, but Leo does _not_ do an autosave in that situation. So to remove the tnodeList attributes _in the .leo file_ you must save the .leo file.  This seems wrong.  I'll probably change Leo so that either it either does an autosave or marks the .leo file dirty when removing tnodeList attributes.  That way you can be sure of writing a .leo file that old versions of Leo can read simply by writing all derived files in 3.x format.

6. [No gnx's &amp; no leoID.txt]  There seems to be no need for _any_ gnx's anywhere in Leo.  The hidden machinery makes it extremely unlikely that Leo could read an out-of-synch derived file without detecting an error.  Unless there are howls of protest, @+leo sentinels will _not_ contain a gnx.

Eliminating gnx's entirely will eliminate the need for a leoID.txt file in the config directory.  This will make setting up Leo easier for newbies.  In particular, there is no need for a dialog demanding that the use specify a unique cvs id.  I think this is a nice step forward.

7. [Derived file format] I don't plan any more changes to the 4.x derived file format.  It seems just about perfect.  In particular:

- No sentinels will change when nodes are moved in the outline.
- The @nl and @nonl sentinels eliminate all extra blank lines in the derived file.
- 4.x derived files are simpler and better looking than 3.x derived files.
- The code to read and write 4.x code is much simpler and better than the corresponding 3.x code.

The potential for cvs conflicts involving sentinel lines still exists.  I see no way at all of eliminating this entirely.  As I have said before, the correct way of handling such situations is to implement the Resolve CVS conflicts command.

8. [Documentation &amp; schedule]  There aren't any official docs yet for 4.0.  I'll be writing them this week.  I would like to release 4.0 a1 officially in about a week, and 4.0 b1 about a week later.  So by the time SourceForge gets cvs fully working again (supposedly this week) we will have a solid code base for Leo's developers to use.

Edward

P.S. Quiet time is now officially over.  However, I have a lot of email and other business to catch up on, so please consider _not_ sending me email just yet if it is not urgent.  Thanks.

EKR
</t>
<t tx="ekr.20050421214628.4">Transition to 4.0 complete

I am now using 4.0 a1 to develop Leo.  This is a major milestone.  I encourage all developers to start using the latest cvs code.  I'll be creating an official 4.0 a2 release in Leo's Files section today or tomorrow.

I have just made two mass updates to cvs.  The first contains all derived files in 3.x format just before the transition to 4.0.  The second contains all derived files in 4.x format just after the transition.

Please be sure that the write_old_format_derived_files setting in leoConfig.leo/.txt matches what you want: I'm not sure what the settings on cvs are.  If you want Leo to generate 4.x derived files automatically make sure this setting is 0 in leoConfig.txt.

The code upped to cvs today has the following differences from yesterday:

1.  The code that reads .leo files computes the join lists of all vnodes _before_ reading @file nodes.  This must be done so the new read logic for derived files can find the hidden machinery properly.  I'm not sure why this didn't fail earlier, and it did fail as soon as I attempted the transition to 4.0.

2. The new code now does an auto-save of the .leo file for all commands in the Read/Write menu that write derived file.  I added new logic so that the auto-save is done only if some derived file was actually written.

3. I made @raw work properly.

It should be safe to use 4.0 a1.  Indeed, the new error recovery scheme is a giant step forward: _nothing_ changes in the outline when there are read errors:  the outline structure remains exactly the same as do all clone links, marks--everything.  In fact, the code that reads @file nodes _never_ changes anything in the outline except body text, regardless of whether the read succeeded or failed.  No more "clone links will be broken" messages!

The only place where I would expect some problems are the "unusual" directives like @comment, @raw, @file-nosent, etc.  These really don't get exercised by LeoPy.leo very well.  The file leoTest.leo contains simple tests of these constructs (and they all appear to pass), but that doesn't mean much.

Edward
</t>
<t tx="ekr.20050421214628.5">@nocolor

- Leo uses gnx's in .leo files (but not derived files)</t>
<t tx="ekr.20050421214628.6">4.1 a1 released

I have just uploaded 4.1 alpha 1 to Leo's cvs site.  This is alpha quality code:  Please make full backups before playing with it!

Warning: Older version of Leo can not read files that Leo 4.1 writes by default.  Leo 4.1 should be able to read all previous Leo files..  A setting allows Leo 4.1 to create .leo files compatible with Leo 3.x versions.  Leo converts between 4.1 .leo files (using gnx's) and earlier files (using ints) automatically depending on a setting in leoConfig.txt (see below).  I have tested this feature as well as I can, and the process is tricky.  The possibilities of glitches in the conversion is the main reason I am calling this an alpha release.

This release marks another significant milestone in Leo's history.  The highlights:

- Leo's 4.x file code is complete.  At present I have no plans to make any further changes to the format of .leo files or derived files.  I am eating my own dog food: I do all my editing with the 4.1 code base.

- By default, Leo 4.1 uses immutable gnx's (id:timestamp:n) to associate tnodes with vnodes in .leo files.  This makes Leo as cvs-friendly as possible.  From now on .leo files will be checked in to cvs with the -ko (text/keywords off) option.  To repeat: previous versions of Leo can not read .leo files containing gnx's.

- The use_gnx setting in leoConfig.txt determines whether uses gnx's (cvs friendly) or ints.  Only gnx's are immutable: Leo recomputes all non-gnx indices from scratch whenever writing a .leo file. It should be possible to convert between 3.x and 4.1 file formats by changing the use_gnx setting.

- The 4.1 code base has been reorganized to support gui's other than tkinter.  Leo's src directory contains several new source files.

- The file test.leo in the test directory contains real regression tests for syntax coloring.   Regression testing scripts create regression tests dynamically from data in Leo's outline.  Very cool, very easy, very general.  In particular, regression tests may use temporary nodes in test.leo rather than creating separate Tk windows for testing.

Schedule:

- The Extract commands eat one character too much.  Will be fixed this week.

- Leo needs a new command to check for possible _user_ mistakes involving clones, such as cloning a node in two different @file trees.  This is very easy to do, and will be done shortly.

- I plan to fix all long-standing bugs by Nov. 20.  Fixing bugs is once again easy now that Leo has a unified code base.

- I plan to release 4.1 beta 1 this Friday with as many bug fixes as possible.

- The __wx_gui.py plugin is only partly functional.  I shall move code from gui-specific classes to gui-independent base classes in Leo's core.  This will reduce the amount of work that gui plugins must do.  This work will wait until all major bugs have been fixed.  OTOH, I'll make small changes as needed to support other people's work on gui plugins.

Edward
</t>
<t tx="ekr.20050421214704">@nocolor

The major innovation in 4.2 were:

- Representing clones as shared subtrees.

- Using positions (and especially position iterators) to traverse Leo outlines.

- @thin trees and 'thin' derived files using gnx's.

    (gnx's in .leo files were introduced in 4.1)</t>
<t tx="ekr.20050421214921">Transition

In the past I would have converted LeoPy.leo to use positions in one big step.  This time, I am testing the position class first.  This gets tricky:  there are many files involved:

- The original LeoPy.leo and all derived files.  This isn't changing, so I can always edit .leo files properly.
- The new Leo42.leo.  This contains all the changes.  Again, I edit this with a stable copy of Leo.
-  A new copy of test.leo.

I can run test.leo in two ways:
A) using the original copy of Leo.
B) by running a script that loads test.leo with the _new_ code, that is, the code derived from Leo42.leo.

B is where things get interesting, and B is where I run my "mini tests" of the position class.  Here is a typical mini test.  I run this using the Execute Script command:

from leoGlobals import *
c = top()
c.convertTreeToSharedNodes()
c.disableSaveCommands()

print ; print "start"

p = c.rootPosition() # same as p = position(c.rootVnode(),[ ])
while p and p.v.headString().strip().lower() != "clone test":
..p.moveToThreadNext()

after = p.copy().moveToNodeAfterTree()

while p and p != after:
..print p.level(), p.v.headString()
..p.moveToThreadNext()

print ; print "done"

Some notes:

1. This script must be running under "plan B", that is, with the code derived from Leo42.leo.
2. At this point, _no_ big changes have been made to Leo42.py.  The only thing new is the position class.  Therefore, when this script is executed for the first time, the internal data structures (the tree of vnodes) are the _old_ data structures.
3. The c.convertTreeToSharedNodes() converts the tree of vnodes to the new format.
4. Naturally, changing the data structures without "telling" freaks all the old code. The call to c.disableSaveCommands() ensures that I don't absent mindedly destroy test.leo by saving it.

My plan is to test the new position class as thoroughly as possible in the next day or so.  This will provide the raw material for unit tests.  I may also write some unit tests for some other crucial code. At that point it will no longer be possible to avoid the "one big conversion" step.  This will replace all the vnode-based traversal code with position-based code.  Naturally, this will break everything potentially.  However, my experience converting Leo from C++ to Python indicates that getting Leo working again will take just a day or so.  At that point, every line of Leo must be retested.  I plan to make a unit test for everything that fails during the conversion.

Edward

P.S.  Here is c.convertTreeToSharedNodes.  It shows how easy it is to convert to the new data format.  Note that we must not make changes that affect v.threadNext until the very last step.  This last step will break all of Leo's code.  For example, once this routine is executed the tree redraw code will crash.

def convertTreeToSharedNodes(self):

..c = self

..# Return if the tree has already been converted.
..v = c.rootVnode()
..while v:
....if v._firstChild and not v._firstChild._parent:
......print ; print "already converted"
......return
....v = v.threadNext()

..# Init.
..v = c.rootVnode()
..while v:
....v.t.vnodeList = []
....v = v.threadNext()

..# Set _firstChild and vnodeList in tnodes.
..v = c.rootVnode()
..while v:
....v.t._firstChild = v.firstChild()
....try:    v.t.vnodeList.append(v)
....except: v.t.vnodeList = [v]
....v = v.threadNext()
....
..# Clear _parent field of any node whose parent is a clone.
..v = c.rootVnode()
..clearList = []
..while v:
....next = v.threadNext()
....if v.firstChild() and len(v.t.vnodeList) &gt; 1:
......child = v.firstChild()
......while child:
........clearList.append(child)
........child = child.next()
....v = v.threadNext()

# Doing this will break all vnode-based traversal code.
..for v in clearList:
....v._parent = None

P.P.S. Consider the line in the script (in the main body of this letter, not in convertTreeToSharedNodes):

after = p.copy().moveToNodeAfterTree()

This line did not work the first time I tried it.  The reason is that the p.moveToX routines that I published a few days ago did not return anything (i.e., they implicitly returned None).  I've changed the conventions so that all the p.moveToX routines now return self.

BTW, this line shows why I am not eager to use iterators in the position class.  If Leo used iterators we would have to use the full iterator machinery here to define after.  That would be clumsy.

EKR
</t>
<t tx="ekr.20050421214921.1">
4.2 liftoff near

Mixed metaphor alert!  I am about to start "eating my own dog food".  4.2 looks considerably ahead of schedule.

I'll release 4.2 pre-alpha 1 after using it to edit itself for several days: certainly before PyCon.  4.1 alpha won't happen until there are at least minimal unit tests for all commands.

Lot's of code in Leo42.py presently runs (and runs correctly) in "confused mode".  That is, the code appears to be using vnodes when in fact it is using positions.  That's fine for now, and I shall soon write a script (probably a find-script) that changes v to p properly.  I'll make this script as spiffy as I can: it may be useful for others.

As an aid to transition, the position class defines a p.__getattr__ method that keeps p.v.t and v.t in synch.  That will be turned off during testing, but will be on as an aid to compatibility in 4.2 final.  All unit tests presently pass, whether or not p.__getattr__ exists or not.

Edward

P.S.  It's fine with me if your code runs in "confused mode" forever :-)

EKR
</t>
<t tx="ekr.20050421214921.10">Compatibility report

The version of Leo42.leo on cvs is now mostly compatible with previous scripts:

A.  I just ran the Find command and the code works without any changes!!  This is just amazing.  What makes this works is the following:

1. c.rootVnode() and c.currentVnode() now return positions, just like c.rootPosition() and c.currentPosition()
2. The position class supports most methods of the old vnode class.
3. The position class has a __getattr__ method that returns p.v.t when ever p.t gets referenced (and that's all it does).

So while the find code thinks it is dealing with vnodes, it is actually dealing with position, and everything "just works".

B.  I made a few minor mods to leoPlugins.leo and Leo now loads those plugins correctly.  I had to make the following changes:

1. Replace app() by app.
2. Replace scanAllDirectives(x,v=v) with scanAllDirectives(x,p=v).

In other words, I changed the name of the v keyword parameter to p.  I suppose this could support both v and p keyword parameters, but what the heck, a few changes to scripts won't kill anybody :-)

Still to do:

1. I have removed all the g.sharedNodes switches from the code, and in doing so I found that the atFile.read logic has been disabled.  So that has never been tested.  This will be working in a day or so.

2. I will be making the code "say what it means" by converting v to p throughout.  This will take some time...

3. After that, every line of code must be retested.

Edward
</t>
<t tx="ekr.20050421214921.11">&gt; I have some difficulties in understanding the design. I'd be obliged if you could clarify the next issues further.

I'm glad you asked.  This is the perfect time for these questions.  I'm sure others have similar questions.

&gt; What are back and next links in v-nodes; can you show/draw an example?

Internally Leo represents the tree as a set of vnodes.  The back/next/parent/firstChild links indicate the structure.

Consider the tree:

root
..A
..B

In the _present_ scheme we have:

root.firstChild is A.
root.back is None
root.next is None
a.parent is root.
a.back is None
a.next is B.
b.back is A.
b.next is None

All other links, i.e., all other parent and firstChild links are None.

The new scheme uses the same links, but in slightly different ways.  In particular:

1.  The links are called _back, _next, _parent, _firstChild rather than mBack, mNext, mParent and mFirstChild.
2.  The _firstChild field gets moved to v's tnode, that is, v.t._firstChild.
3.  The _parent link is None if the parent is a clone.  In that case, there are _multiple_ parents, so the _vnode_ can not tell what the "proper" parent in the traversal is.

Here is where positions come in.  The "next" parent in a traversal is a property of the traversal, _not_ of the vnode.  So we have to have a _position_ at hand to figure out what the next vnode should be.

&gt; Slowly the difference between the 'outline as drawn in the outline pane' and the 'outline under the hood' is dawning on me.

This is a most perceptive comment.  In the old way, Leo actually created copies of vnodes when creating clones.  Updating these "dependent" trees of vnodes can take a _lot_ of time.

&gt; As a newbie I thought they were the same...

They are the same at present.  They will be different in 4.2.

&gt; Would it be an idea to call the representation in the outline pane: the tree; and the outline under the hood: the outline? Or will this lead to other difficulties in terminology and discussions?

Don't go there.  We don't need new terms.  The proper way is to think of the fundamental classes:  vnodes and tnodes in the "old" (pre 4.2) Leo, with the addition of positions in the "new" Leo.

&gt; It makes to me clearer that in the tree
A'
..B
C
A'
..B

B has two parents, but that B has one and the same v-part (vnode) since the outline looks like
A'
..B
C
A'

Yes, this is basically correct.  Note that both A' nodes "point" to B.  More correctly, in the new way, A.t._firstChild is B for both nodes A.

&gt; Now I'll start pondering again about the positions...

It appears you have an excellent grasp of the general situation.  Thanks for these questions.

Edward

P.S.  The next/back/parent/firstChild links have been around since Leo's earliest days.  Doing things the simplest possible way like this has turned out to be an extremely good decision.  I have never regretted it for an instance.  In particular, last year I experimented with using a Python list to represent the children of a node.  Bad idea.  That way uses more memory and is slower than using "C-style" links.

EKR
</t>
<t tx="ekr.20050421214921.12">Details and schedule

Some leftovers:

1. Why are shared tnodes so much faster?

The short answer is that Leo no longer needs v.createDependents and v.destroyDependents.  At present, Leo calls these routines to delete all "dependent" trees at the start of a move and to recreate the dependent trees at the end of a move.  These routines can be _very_ expensive in practice when an outline has many cloned copies of a node.  These routines must do a lot of work  for example, they call joinTreeTo and unjoinTree to create "join link".  Such links do not exist in the shared tnode scheme: there is only one copy of every cloned tree, so there is nothing to join.   The work done by v.createDependents and v.destroyDependents is proportional to c * s, where c is the number of trees cloned to the moved tree and s is the size of each tree.  c * s can be large enough to cause Leo to appear to die.

There are several other benefits to the shared tnode scheme.  As shown in the code-level posting, it becomes trivial to determine whether a node is a clone or not:  v is a clone if and only if len(v.t.vnodeList) &gt; 0.  This means that Leo never needs to compute a separate clone bit at all: the code that draws the outline pane can recompute the clone bit as needed.  This in turn means that Leo never needs to call c.computeAllCloneBits, which in turn calls v.shouldBeClone.  Well, v.shouldBeClone is expensive, again proportional to something like c * s.  Moreover, the old code had bugs resulting from missing calls to c.computeAllCloneBits and similar routines.  These are all pure gains, and big gains at that.

In short, the worst case performance of the old code is O(n**2) where n is the number of nodes in the outline.  The worst-case performance of the shared tnode scheme is O(1).  That is, performance of crucial algorithms is bounded and small.

2.  Again, why do I care about not making copies of positions?

My answer to this question was a bit incomplete and a bit misleading.  The keys to this questions are the following:

a)  The new code is simple and good.  If this were not true my original answer would not hold up.

b) The new code bases the "copying" methods on the "moving" methods.  This is vital.  Even if I chose to included the copying code, there would _still_ be only one code base, namely the "moving" methods.  Thus, the code can be made solid; there will never be a need to keep two different versions of what amounts to the same code in synch.

Aside:  if I did choose to implement p.next, p.back, etc. these "copying" methods would have to have new names, like p.copyNext, p.copyBack, etc.

In short, I can justify the "optimization" of not creating copies of positions because the optimization is simple and good.  In other words, the issue is not _just_ a management issue; it is a matter of engineering judgment.  After years and years of wondering, confusion and revision, at last there is a way that looks simple enough to be doable.

3. Incremental drawing

It's no good having gigantic outlines if they can't be drawn in the outline pane.  I don't believe this will be too difficult.  The basic ideas are these:

a) We want to bound the number of gui widgets that Leo allocates.  This bound will be something like 3 * n, where n is the number of headlines that can actually appear in the outline pane.  The number n is typically about 20 on my screen.  100 would be about tops.

b) Each headline will contain a copy of a position, so that Leo can figure out what the current position is.  The number of such copies is again bounded by something like 3 * n, so it _does not matter_ that Leo must create those copies.

c) In order to make scrollbars work properly, Leo probably must allocate _space_ in the outline pane (e.g., the Tk.Canvas) for all widgets that are _potentially_ visible.  However, Leo need only allocate actual gui widgets for headlines that are _actually_ visible.  To reduce jerkiness when scrolling, Leo will probably allocate widgets for nodes just before and after the visible nodes.  Again, the number of such nodes can be bounded: we could even make that number a configuration option.

To repeat: Leo can strictly limit the number of gui widgets that it allocates.  Leo can do this by recycling widgets (putting them on a list) when widgets are no longer visible.  That way, Python's (or Tk's) storage allocator never gets stressed at all.  If fact, after startup, the drawing code may not call the storage allocators ever again.

BTW, the scheme I have just discussed could be used to drive any gui, not just Tk.  For example, it could be used to reduce the load on wxWindows or Java tree classes.

4. Schedule

Clearly, implementing shared tnodes will take some work.  However, it should not take as long as transliterating Leo from C++ to Python, and that project only took a month or two.  OTOH, speed is not paramount here.  I definitely want to make this a "test driven" project.  To repeat, this is a perfect time to do unit tests.  Yesterday the phrase "quick test" popped into my mind.  These would be really simple scripts that could be done "instantly" to check code.  Presumably these could "grow" into unit tests.  Or maybe there is a way to make adding unit tests so easy that I'll just naturally do it.  It will be worth days or even weeks of effort to make creating unit tests natural and automatic.  So a matter of 4-8 weeks might be a reasonable guess for how long this will take.

Comments please

So that's about it.  This is a kind of "decompression" time for me now that the big picture is pretty clear.  In my experience, this kind of decompression allows new details to emerge.  I'll be writing about such other nits as they appear.

This is a real good time for comments.  About the only controversial choice that I see is the decision to revise all the tree-traversal code in Leo's core, plugins and scripts.  Again, I think this extra work is worthwhile.  Given good unit tests, it would be very easy to catch mistakes in the transliteration.  Anyway, it's probably not as big a deal as the gui reorg.  Moreover, I am looking forward to really making good unit tests for Leo's tree operations.

Edward

P.S.  I haven't mentioned changes to Leo's Commands class.  That's because there aren't likely to be any major changes needed.

EKR
</t>
<t tx="ekr.20050421214921.13">Heavy lifting

In the last several days I have been deep into debugging the position class.  This is not too surprising: the new code is subtly different from the vnode code.  It's a good thing that the fundamentals are simple.  Even so, the details are hairy.

There is no doubt that the code can be made to work.  Actually doing so will take a day or too more.  In particular, the p.moveToVisBack and p.moveToThreadBack routines continue to cause problems.  That's not surprising either.

I moved the lastVisible routine from the tk tree code to the position code where it belongs.  This is another method that is very tricky to get right at all, and even trickier to do without creating a large number of temporary positions.  Right now even the simple way doesn't work :-)

I squashed some subtle bugs in the p.linkX and p.unlink methods. For the first time the level of nodes appears to be computed correctly in all cases.  This satisfies a crucial assert in p.moveToThreadBack.

I added p.dump and v.dump methods, and improved the p.__repr__ method so it shows the level and stack size.  It is becoming more obvious when things go right or wrong.  It turns out that putting a trace in the tree.force_draw_node method is highly useful.

Edward
</t>
<t tx="ekr.20050421214921.14">I've just made the following experiment:

from leoGlobals import *
import leoNodes

sharedNodes = false

class traverseAllNodes:
..def __init__(self,porv=None):
....if porv is None:
......c = top(); v = c.rootVnode()
......if sharedNodes:
........self.p = leoNodes.position(v,[])
........self.after = leoNodes.position(None,[])
......else:
........self.p = v
........self.after = None
....else:
......if sharedNodes:
........self.p = porv
........self.after = porv.copy().moveToNodeAfterTree()
......else:
........self.p = porv

..def __iter__(self):
....return self
....
..def next(self):
....if sharedNodes:
......self.p = self.p.moveToThreadNext()
......if not self.p: raise StopIteration
......return p.v
....else:
......self.p = self.p.threadNext()
......if not self.p: raise StopIteration
......return self.p

# It's easy to use this iterator: it always returns a list of vnodes.
for v in traverseAllNodes():
..print v.headString()

The nice thing about this code is that the traverseAllNodes class hides the details of whether we are using shared nodes.  In both cases the iterator "returns" a list of vnodes.  I'm not sure whether having this will actually simplify Leo's core code: much of that code _does_ need to know about positions.  However, there is no doubt that user scripts will benefit from this class.

Thanks again, Bernhard, for the suggestion.

Edward
</t>
<t tx="ekr.20050421214921.15">iterators make positions safe

Bernhard Mulder's suggestion to use iterators is the last important piece of the puzzle, for the following reasons:

1. Iterators simplify the code, especially during transition.

At present the code base is filled with fragments such as:

if sharedNodes:  porv = c.rootPosition()
else: porv = c.rootVnode()

while porv:
..&lt;&lt; do something &gt;&gt;
..if sharedNodes: porv.moveToThreadNext()
..else: porv = porv.threadNext()

With iterators the code would be:

while porv.allNodes_iter():
..&lt;&lt; do something &gt;&gt;

In other words, the iterator hides most of the messy details of the transition.  

2. Iterators make the position class safe to use.

Iterators create a safe policy for using positions.  This policy is:

** Only methods in the position class should call p.moveToX methods **

Recall that the p.moveToX methods create the potential for somewhat subtle bugs involving positions that "magically" change when code isn't expecting it.  But iterators remove the need for "user code" (code outside the position class) to call the moveToX routines.  So I only need to verify that p.moveToX methods are correct inside the position class.

This is a _big_ step forward for 4.2.  It removes all my doubts that the position class can be used safely throughout Leo.  The position class now rests on a firm foundation: The position class needs only to maintain the following invariant:

** A position method must make a copy of p before calling p.moveToX**

N.B.  The copies made inside the methods of the position class are "innocent".  That is, they will be created seldom enough that they won't stress the storage allocator.  In particular, generators create exactly one copy throughout the entire tree traversal.

Edward

P.S. The position class will define at least the following iterators:

p.allNodes_iter():  steps through the entire tree, starting at p.c.rootPosition().
p.subtree_iter(): steps through all nodes of p's tree in threadNext order.
p.children_iter(): steps through p's children.
p.parents_iter(): steps through p's parents: parent, then grandparent, etc.
p.siblings_iter(): steps though all of p's siblings.
p.visNodes_iter(): steps through all visible nodes, starting at p.c.rootPosition().

I'll add other iterators as needed.

Edward
</t>
<t tx="ekr.20050421214921.16">New plans for 4.2.

I am going to change direction for 4.2.  The major feature of 4.2 will be the new shared tnode scheme.  There are four reasons for this change:

1.  If I knew I only had six months to live I would focus on the following projects:

- Shared tnodes:  makes Leo's internal data structures as efficient as possible.
- Incremental drawing of the screen: Leo allocates only visible outline widgets.
- @file-thin:  makes Leo useable in typical work environments.
- The unification of @file and @root nodes:  a major conceptual simplification.

The shared tnode project fundamentally changes Leo's internals.  The other two projects will require big changes to the format of Leo's derived files.  I want to work on the fundamentals first

2. Last week I revised the shared-tnode code. (I didn't plan this: an aha came to me in the bathtub, and then I just couldn't let go of the code :-) For the first time, the code is a) complete and b) simple enough to warrant serious consideration. To put the code on the shelf now would be wrong.  Moreover, converting Leo to use this code is the perfect place to write unit tests that exercise the vnode and tnode class.

3. Now that shared tnodes look feasible, actually getting the code to work becomes a matter of completing an already-started project.  Such projects have high priority, especially when they are as fundamentally important as the shared tnode project.   Moreover, I am probably the only person in the world who can finish the shared tnode project properly.  Finishing this project is thus crucial to Leo's future.

4.  Spending months on spiffy configuration code just isn't a high enough priority now.  4.2 _will_ fix the problems with the present configurations scheme, but that should take only a few days.  The present plans:

- Leo will write leoConfig.txt only when settings actually change.
  This will prevent most conflicts that arise when more than one copy of Leo is open.
- I shall fix the bugs in the Apply Settings command.  Not stupidly writing leoConfig.txt will help.
- Leo will update leoConfig.txt so as to keep comments.  This implies not using the ConfigParser for writes.
- Leo will search in more than one place for settings:  per-file settings will override more general settings.
- Leo will keep track of default settings, probably in default settings files.

These changes will give people about 75% of everything people have ever asked for, without massive amounts of gui-specific code.  Indeed, replacing one outline (leoConfig.leo) with another (the gui) does not seem like such a big step forward.

Coming posts

Several posts will discuss the shared tnode scheme.  A design document will discuss the overall design.  A separate document will discuss code-level details.  In no other project are such details so important.  This is maybe the sixth major revision of the code I've work on over the years.  This is the first time I am reasonably confident that the code can be made to work.  There are a _lot_ of details that must be juggled, and the time to debug is now while I remember them all :-)

The shared tnode scheme will require substantial changes to Leo's core, to all plugins and to all scripts.  As you may know, I have considered several "clever" schemes that supposedly reduced the changes needed to code.  I now believe that such "clever" schemes would be unwise:  much better to "say what we mean" and be done with it.  I'll discuss this issue in detail in the coming posts.  Notice that the new scheme will require massive testing no matter what kind of changes we make, so requiring changes throughout Leo's code really doesn't change the fundamental situation much.

There will be ample time to comment on the new plans, new design and new code.  4.2 will be another "major" release--probably comparable in length to 4.1.  We are talking months rather than weeks.  I plan to make Leo's internal's "perfect" for 4.2 so that no other major changes will (or even could) be made.  In other words, 4.2 will contain no compromises.

Edward

P.S.  There is a close connection between shared-tnodes and incremental drawing of the outline pane.  Indeed, LeoPy.leo contains prototype code for both projects.  Moreover, outlines big enough to make shared-tnodes essential also make incremental drawing essential.  Incremental drawing is almost, but not quite, good enough to use at present.  I plan to remove all flaws in the incremental drawing scheme for 4.2.

P.P.S.  I have started work on the shared-tnode scheme in a separate sandbox.  I shall soon upload Leo42.leo to cvs.  This will contain the work in progress.

P.P.P.S.  There has been lots of talk lately about more flexible tangling/untangling.  I'll implement both schemes in the 4.3 release.  4.3 will be another major release:  it will support both @file-thin and the unification of @file and @root nodes.  These two projects will be considered together because they both imply important changes to derived file.  It would be stupid to change the format of derived files twice if only one change would suffice.

P.P.P.P.S Using jEdit's xml description files to drive syntax coloring is still a high priority.  I would like to have it be part of 4.2.

EKR
</t>
<t tx="ekr.20050421214921.17">New read logic works

Leo now reads .leo files as discussed in the "little gem" posting, creating an outline in which clones share their child subtrees.  Leo passes mini tests that run when running with the new code in g.sharedNodes mode.  More unit tests are coming.

This is a big step forward.  Running the "new" Leo now automatically uses the new data structures.  There is no need to run c.convertTreeToSharedNodes or to switch between g.sharedNodes = true and g.sharedNodes = false.  This makes the testing process more reliable and less confusing.

The present code is on cvs.  A functional (pre-alpha!) version of Leo 4.2 should be ready in maybe 5-10 days.  I'll post that version to Leo's Files section.

Edward
</t>
<t tx="ekr.20050421214921.18">Aha: positions &amp; vnodes _can_ be compatible

Yesterday I realized that Leo's position class _can_ be made compatible with the present vnode class.  This is a big breakthrough and it probably is, really and truly, the last piece of the puzzle in the saga of converting Leo from vnodes to positions.

Recall that the reason why I did not want the position class to be compatible with the vnode class was that the following code would create a large number of copies of positions:

p = c.rootPosition() # same as c.rootVnode()
while p:
..&lt;&lt; do something with p &gt;&gt;
..p = p.threadNext()

Indeed, p.threadNext must create a copy of self: otherwise positions will change in unwanted situations.  Moreover, methods called by p.threadNext could create further copies.

However, iterators change matters in a crucial way.  With iterators, Leo's core won't execute code this kind of loop shown above.  Instead, Leo's core will use:

root = c.rootPosition()
for p in root.allNodes_iter():
..&lt;&lt; do something with p &gt;&gt;

That is, iterators will replace most calls to next(), threadNext() etc., so any _remaining_ calls to these routines do not have to be efficient!  In other words, iterators are the place to optimize tree traversals.  The old-style code will continue to work, but in a non-optimized way. This means that most of Leo's core code will continue to work without modification!

To increase compatibility further I will do the following:

1.  Add proxies methods to the position class that simply call the corresponding vnode method.   For example,

def isDirty(self): return self.v.isDirty()

In other words, the intention is to have the position and vnode classes support the same methods.

The only exception to this is that the various v.link and v.unlink methods will _not_ be a part of the position class. These should be considered private methods, to be used only within the vnode class or by code (such as the undo/redo code) that really knows what it is doing.  In particular, scripts should never need to call these v.link and v.unlink methods.  Scripts _can_ do so using p.v.link or p.v.unlink: I just want to make sure that code never calls these methods by mistake.

2. Added t ivar to position class: self.t = self.v.t  This means that p.t means the same thing as p.v.t.  This small change will save me a great deal of work during the transition.

3.  The vnode class will implement v.setCurrentVnode and v.currentVnode in terms of p.setCurrentPosition and p.currentPosition.  Leo's core will use p.setCurrentPosition and p.currentPosition exclusively.  Furthermore, during the transition p.setCurrentPosition and p.currentPosition will return _vnodes_ if g.sharedNodes is false.

4. Now that compatibility with old scripts is important,  I am going to revert back to true and false in Leo's core code rather than True and False.  This means that the new "standard" way to import files is:

import leoGlobals as g
from leoGlobals import true,false

However, the _old_ way will work _exactly as before.  That is, scripts that do:

from leoGlobals import *

should continue to work!

Conclusion

The new aha is extremely important: most scripts and plugins will continue to work with minimal changes in Leo 4.2.

Unchanged code is unoptimized code: replacing traversal loops by iterators is the way to reduce stress on Python's storage allocator.

Leo42.leo is a thorough test environment for this new/old approach.  I don't expect any major problems.

Edward

P.S.  My plan of action is as follows:

- Changes in Leo42.leo are marked with "if g.sharedNodes".  I'll review all the changed code to see whether it makes sense to revert to the old way of doing things.

- Change calls to c.currentVnode to c.currentPosition.

- Change calls to c.rootVnode to c.rootPosition.

- Convert old-style traversal loops to iterator loops.

P.P.S.  Even though the position and vnode class will implement similar methods, there won't be much room for confusing the two classes.  Leo's core denotes positions by p and vnode by v.  I would recommend doing the same in all new scripts and plugins.  However, for compatibility it is important that code such as the following continue to work:

v = rootVnode() # returns a _position_
while v:
..whatever
..v = v.threadNext()

EKR
</t>
<t tx="ekr.20050421214921.19">Progress report 3/2/04

I now have a work flow that makes creating unit tests easy and natural.  This is a big step forward for Leo and for me personally.

The idea is to start with "mini tests" in test.leo (or test42.leo).  These are simply fairly short scripts such as the following:

from leoGlobals import *

c = top()
c.convertTreeToSharedNodes()
c.disableSaveCommands()

print ; print "checking next/back links"

p = c.rootPosition() ; count = 0
while p:
..back = p.getBack()
..next = p.getNext()
..if back:
....count += 1
....assert(back.getNext() == p)
..if next:
....count += 1
....assert(next.getBack() == p)
..p.moveToThreadNext()

print ; print "checked %d nodes" % count

I put each mini test in its own node, and run these scripts with the Execute Script command.  Some notes on this process:

-  Creating a new mini tests just involves cutting and pasting code from a previous mini test to a new node.  The new node is essentially a new test case.  Nothing could be simpler.

-  As more mini tests get written the common parts of the code naturally get refactored into methods.  That's how c.convertTreeToSharedNodes() and c.disableSaveCommands() came to be.

-  The more mini tests get written the less convenient to becomes rerun these tests.  So there is a natural incentive to create actual unit tests from these mini tests.  Defining a new unit test case repays itself over and over again because it's so easy to rerun unit tests.

- As you can see, the mini test above uses asserts to do the checking.  This is good unit-test style, and turning mini tests into unit tests will be easy:  just eliminate the print statements.  The test above simply checks that the outline in test42.leo does itself pass the tests, so the only setup required is to put test cases somewhere in test42.py.

- I plan to move the @file leoTests.py node from LeoPy.leo to test.leo and test42.leo.  Putting all the test stuff in one place will simplify the process of creating new test classes.

Mini tests have revealed their first bug

I now have small but powerful set of mini tests that demonstrate that Leo's new position class fulfills all expected properties.  The consistency check of the p.moveToThreadBack and p.moveToThreadNext methods revealed the first bug in the position class.  I had to rewrite the p.moveToThreadBack routine using a new helper method.  I won't give the code here:  it's in Leo42.py on cvs.

Mini tests inspired new convenience methods

The initial versions of the mini tests had quite a number lines of code such as:

parent = p.copy().moveToParent()

I quickly got tired of this verbiage, so I decided to add the following convenience routines in the position class:

def getBack          (self): return self.copy().moveToBack()
def getFirstChild    (self): return self.copy().moveToFirstChild()
def getLastChild     (self): return self.copy().moveToLastChild()
def getLastNode      (self): return self.copy().moveToLastNode()
def getNext          (self): return self.copy().moveToNext()
def getNodeAfterTree (self): return self.copy().moveToNodeAfterTree()
def getNthChild    (self,n): return self.copy().moveToNthChild(n)
def getParent        (self): return self.copy().moveToParent()
def getThreadBack    (self): return self.copy().moveToThreadBack()
def getThreadNext    (self): return self.copy().moveToThreadNext()
def getVisBack       (self): return self.copy().moveToVisBack()
def getVisNext       (self): return self.copy().moveToVisNext()

These should look familiar :-)  It's OK to have these routines as long as:

-  Users know that they create copies of positions.
-  They do _not_ have the same names as the corresponding vnode routines.

BTW, now that the p.moveToX routines return self, these routines don't have to use the idiom: "return self and self.copy().moveToX".

Conclusion

This has been a big day for Leo.  I created substantial tests of the position class in a natural manner, and there is now a simple and clear way to convert mini tests to unit tests.  Unit testing revealed bugs in the code and suggested useful convenience routines.  It is now easy and natural for Leo to get the benefits of unit testing.

To repeat, what is important about the new work flow is:

1.  It is simple and natural to create mini tests.
2.  There is a substantial incentive to convert mini tests to full-blown unit tests.

The file test42.leo presently contains the mini tests I created today.  I'll add some more tests tomorrow.  Leo is almost ready for the grand transition!

Edward

P.S.  This paper might form the basis of a chapter in "The Book of Leo" called "Leo and Unit Testing".

P.P.S. I had budgeted about two weeks to do today's work.  Once again, the combination of Leo and Python is remarkably effective.

EKR
</t>
<t tx="ekr.20050421214921.2">4.2 looks solid

For the last several days I have been fixing nits.  At present Leo appears at least as solid and easy to use as any previous version of Leo.  In several ways Leo 4.2 now works better than previous versions:

- Moving nodes containing cloned subtrees is noticeably faster than before.
- Reading and writing files is approximately as fast as before.
- Leo redraws the screen better than ever before: some duplicate redraws are gone.  In particular, Leo handles mouse clicks in headlines more smoothly.

Leo now apparently handles resources frugally in all situations, especially when drawing the screen and reading and writing files.  Yesterday I squashed three of the weirdest performance/resource allocation bugs I have ever seen.  To have them all appear in one day was truly bizarre.  See yesterday's diary entries for details...

Reading and writing of .leo files could always be faster.  However, moving to @file-thin-wait will deliver more payoff than tweaks could possible deliver.  The idea is simple:  Leo won't load @file-thin-wait nodes until the user actually uses them.  This will make the initial load of a .leo file almost instantaneous.  As always, cross-file complicate matters substantially.  We shall see how this all works out.

The todo list looks like this:

1. Complete the transition by replacing "confused" code by code that explicitly uses positions.  I'll probably write some scripts to help do this.

2. Add support for @file-thin.  Part of that support will included deferred loading of files (@file-thin-wait).

3. Add support for different kinds of undo granularity.

4. Testing, especially unit tests.  Unit testing continues to show its value and I would like to add tests for many more commands.

5. Documentation.  I'll probably update the documentation in a few days so people can take full advantage of 4.2.

At present 4.2 is _way_ ahead of schedule.  I suspect it will be a month or two before 4.2 final goes out the door.

Edward

P.S. Leo 4.2 will require Python 2.2 or later.  This is because 4.2 makes heavy use of iterators.  Leo will never again be compatible with Python 2.1.  That being so, I have converted the vnode, tnode and position classes to new-style classes.  This may bring some slight performance benefits.

P.P.S.  I'm not so sure 4.2 will optimize screen redraws further by allocating only visible or "nearly visible" widgets.  This doesn't seem to be a big deal in practice, and I don't want to delay important projects to mess with this.  It's pretty clear at this point that this project is feasible, so there doesn't seem to be any real reason to do it now.

EKR
</t>
<t tx="ekr.20050421214921.20">I have just uploaded Leo42.py to cvs. This shows yesterday's work.  It was a big day:

- Completed initial unit testing of the position class.  I added some more routines and those will need better tests.

- Added a sharedNodes global in leoGlobals.py.  This controls whether Leo uses the new or old code, i.e., whether Leo uses the position class and makes only a single copy of cloned trees.

- Added c.currentPosition and similar routines.

- Changed the code obviously needed to get Leo started up with the new code.

- Now the real work began:  I fired up the new leo.py and waited for Python to throw exceptions.  I didn't have to wait long :-)  I made dozens if not hundreds of changes.  Leo42 now starts up properly, draws the screen, handles keystrokes in the body and outline and can insert nodes.  Oh yes, I can select the menu and the menu enablers don't crash.

Nothing else works, including undo, read/write, import/export and most commands.  Use with extreme caution:  do NOT save files!

Notes

1. My general plan, done more by accident or instinct than forethought, was to insert new code rather than replace code.   In the case of small routines such as the menu enables, the routine looks like:

def x():
..if sharedNodes:
....all the new code
..else:
....all the old code

For more complicated routines only parts of the code are changed.  Again, those parts are marked with "if sharedNodes"

2. One result of this plan, again unplanned, was that I replaced a lot of v arguments to methods by an argument called porv.  The idea is that in the old code porv corresponds to v, and in the new code porv corresponds to p (a position).  This scheme actually looks like a good intermediate step.  The reason is that code often needs _both_ porv and v, so replacing v by p wouldn't work:  the code would have to replace v sometimes with p and sometimes with v.  This would be error prone.  Using porv preserves some of the "history" of the code.

So the plan is to do the complete conversion by adding code and using porv as needed.  When all is complete I shall eliminate the old code and replace porv by p throughout.

4. This is a _big_ project.  It looks like perhaps 50% of all of Leo's lines may eventually change.  One happy consequence of this fact is that it becomes feasible to make other big changes that would themselves require lots of changes.  So as part of 4.2 I plan to replace "from leoGlobals import *" by "import leoGlobals as g".  This will fix the namespace pollution created by leoGlobals.  However, it means that all calls to routines in leoGlobals must be preceded by g.  For instance, g.match.

I am also starting to wonder whether the app proxy, defined in leoGlobals.py, is really such a good idea.  It was created to solve some startup problems.  However, I discovered yesterday that a test such as "if app.sharedNodes" will not work in the top level of most modules because the app class may not have been created.

So perhaps using g.app is the right approach.  To make this work, Leo's startup code will have to be very careful about when things get imported.  This is still very much an open question, and it's "nasty" in the sense that a lot of code might have to change before the question is properly resolved.

The big question and second thoughts

The _big_ question, and the question that was dogging me all yesterday, is this:  am I doing the right thing by messing with all of Leo's code?  There are several aspects to this question.

1. There is no doubt that the new way "intrudes" itself into all parts of Leo.  Not just Leo's core must change, but also all plugins and all scripts, and also a considerable part of Leo's documentation.  This is most  unsettling.  Yesterday this question was at or just below the surface of my consciousness all day.

2.  Besides the "raw" changes required, using "moveable" positions has the potential to introduce subtle bugs.  In particular, one must remember to call p.copy() in a method if that method calls any p.moveToX method.  Usually this is easy to do.

The most subtle version of this bug I've seen so far was in the code that bound callbacks in the position class.  Those callbacks had to be copies, otherwise clicking a headline invoked the callback and the c.currentPosition changed!  I've just realized the setting c.currentPosition should create a copy (returning c.currentPosition already creates a copy.

3.  Despite these drawbacks, I don't see a better alternative.  Indeed, the porv hack clearly shows that the new code must preserve and highlight the distinction between positions and vnodes.  Any scheme that would make the position class a proxy for the vnode class would muddy this distinction.  This confusion would affect every line of Leo's code.

Keeping the positions clearly separate from vnodes has many advantages.  For example, I realized that what are now the "outer level" vnode methods that insert nodes, delete nodes, etc. must be part of the position class.  However, the "helper functions" that link and unlink nodes should remain in the vnode class.  Indeed, these helper function only use the v._next, v._back, v._parent and v.t._firstChild links, so the natural place for these functions is in the vnode class.  This kind of small aha would be harder if the position class tried to support most vnode methods.

It is true that using positions is a bit more tricky than using vnodes.  Why should we be surprised?  Positions are naturally more complex than vnodes.  However, it's not all that hard to find bugs arising from missing calls to p.copy().  I'll do the real debugging work in Leo's core.

I don't expect users to suffer much.  If you are worried about your scripts, you can always use the p.getX routines to make copies of all positions.  That is about as safe as you can imagine.  But if you don't want to generate unnecessary copies of positions there is an easy way to do that.

Finally, I would like to follow Bernhard Mulder's suggestion that Leo could support various iterator protocols.  Leo could use two such protocols in particular:  p.allNodes and p.allParentsOf.  I'll have to see how feasible this is.  If it can be done it would be a nice improvement.

Conclusions?

I now know in my bones, literally my arms, wrists and elbows, just how much work the "shared nodes" project will be.  I'll probably do less work today to let my body recover.

In spite of the short-term pain of converting to use positions, I think the benefits are real.  The advantages of making positions explicit in the code (saying what we mean) far outweigh the disadvantages.   Leo's core code, especially read/write/import/export code, will have to be thoroughly tested to make sure that no subtle bugs can ever cause problems.  But that has to happen in any case.

Again, I do not see a better way.  If you see a simpler way please let me know immediately :-)

Edward
</t>
<t tx="ekr.20050421214921.21">Design of shared tnodes

This post discusses the overall design of the shared tnode scheme.  A separate posts will discuss the code issues.  Such code issues are more important for this project than usual.

Here are the most important parts the new shared-tnode design:

1.  Data structures

Kent Tenney made the crucial discovery that the head of a shared tree looks like a tnode.  Without Kent's crucial contribution none of the following would have been possible.

- A "node" consists of a vnode-tnode pair.
- Cloned nodes have different vnodes that share the same tnode.
- Vnodes contain parent, back and next links.
- Tnodes contain firstChild links and headline and body text.

Several vnodes may point at the same tnode, and this tnode t forms the (logical) root of a tree of _vnodes_ whose root is t.firstChild.  When traversing the entire outline, this subtree will be visited once for every vnode v such that v.t is t.  An example should make this clearer:

a) When we create a node for the first time we create a vnode-tnode pair.  The vnode contains the links that link the pair into the outline.  The tnode contains the link to the node's subtree.

b) When we clone a node, we make a copy of the original vnode, and link that into the outline using the parent, back and next links.  We do _not_ copy the tnode: we simply point the new vnode at the tnode used by the original vnode.  Therefore, the new vnode will automatically use the same subtree and the same headline and body text as the original vnode.

You can think of a node as consisting of a "variable part" (the vnode) and a "shared part" (the vnode).  To repeat: vnodes contain parent, back and next links, while tnodes contain firstChild links.  This makes everything "just work".  In particular, we can move vnodes around the screen by just changing the parent, back and next links.  We do _not_ have to change the firstChild link: that link does _not_ change when vnodes move.  When the firstChild link _does_ change, the effect is "felt" by all the vnodes that point at the tnode.

I think of vnodes now as "virtual" nodes rather than "visual" nodes.  vnodes are the part that changes when a node is cloned.  tnodes are the part that doesn't change when a node gets cloned.

vnodes no longer correspond exactly to headlines in the outline pane.  The drawing code will have to create nodes (I think I'll call them xnodes) to represent "real estate" on the Tk canvas (or widget in other gui's).  The drawing code will create and destroy xnodes as needed: this is the reason that the "incremental drawing" project is intimately related to the shared-tnode project.

[Note: the drawing code actually creates Tk widgets for screen items, but no nothing like vxnodes]

2. Positions

A vnode is no longer sufficient to locate a "spot" in the tree traversal.  Indeed, the first child of a cloned vnode has no unique parent.  Therefore, we need the concept of _position_ in order to keep track of where we are in the tree traversal.  A position is much like an iterator.  For example, suppose we have the tree:

A' (1)
..B
C
A' (2)
..B

As usual, A' indicates that A is a clone.  Note that both nodes B are the _same_ vnode.   Therefore, B has two parents.  The first time the traversal wants to know B's parent, the answer should be A(1).  The second time, the answer should be A(2).  The full traversal is  A(1), B, C, A(2), B.

Leo will implement a position as a vnode plus a stack of parent cloned vnodes.  This stack tells us where to "go next" when we reach a vnode that does not have a unique parent.  When we move to the first child of a cloned node c, we push c on the stack.  When we ask for the parent of a node that does not have a unique parent we pop the stack to get the particular parent in the tree traversal.

So far, so good.  This is a simple, robust scheme.

3.  Traversing trees.

At first glance it would seem good to use the same interface as always to traverse the tree.  We can do this "faux-cleverly" as follows:

- c.rootVnode() and c.currentVnode() will return positions.
- Traversal routines, like v.threadNext(), v.back(), v.next() and v.firstChild() will return positions.
- Positions will be proxies for the vnode class, so if v is a position, then v.headString, v.bodyString, etc are valid.

But all is not as simple as it seems...

4. Memory considerations.

There is a problem with this supposedly clever scheme.  To be safe, the following tree traversal must create at least one position for every node of the tree:

v = c.rootVnode() # note: v is  position
while v:
..&lt;&lt;do something with v&gt;&gt;
..v = v.threadNext()

The reason is straightforward.  The section &lt;&lt; do something with v &gt;&gt; might contain the following kinds of statements:

if v.parent():
or
v2 = v.back()

This kind of code will break the traversal if the effect of v.parent() or v.back() is to _move_ the position.  Instead, these routines must return _copies_ of the position.  But in that case, every call to v.threadNext() will also create a copy of the position.

[Note: The following concern turned out to be completely justified.  Eliminating extra postiions was very important.]

Alas, positions are not mere pointers (object references), such as are presently returned by the "traversal routines" v.threadNext(), v.back(), v.next() and v.firstChild().  Instead, positions are fairly "heavy" objects: they contain an arbitrarily large stack of information.  We do not want tree traversals  to create one or more positions for each node in the entire tree.

This starts to be worrisome.  The reason we want shared tnodes in the first place is to reduce the time needed to create, destroy and move nodes.  In other words, the shared tnodes scheme is an optimization.  It seems strange to replace an inefficiency in time with an added burden on the storage allocator.  What are we to do?

The next paper will discuss the solution to this problem and other code-level details that heavily influence the design.

Edward
</t>
<t tx="ekr.20050421214921.22">@nocolor
Small code--big aha

I have just discovered a code improvement that puts the 4.2 code on a rock-solid foundation.

In the old vnode-oriented world there was a  "harmless" inefficiency in v.lastNode:
@color

# Returns the last node of v's tree, or v itself if v has no descendents.
def lastNode (self):
    v = self ; level = v.level()
    result = None
    while v:
        result = v
        v = v.threadNext()
        if not v or v.level() &lt;= level:
            break
    return result
    
@nocolor
In the position-oriented world of Leo 4.2 this kind of code causes no end of
difficulties: it generates lots of copies of positions. Like this:
@color

def oldLastNode (self): # The old, inefficient way.
    p = self.copy() ;  result = p.copy()
    level = p.level()
    while p:
        p.moveToThreadNext()
        if not p or p.level() &lt;= level:
            break
        result = p.copy() # Very expensive.
    return result

@nocolor
This wasn't so bad in itself, but it caused me to rewrite p.moveToThreadBack to
avoid these extra copies, and that lead to really horrible code.

But there is a _much_ better way to define moveToLastNode!
@color

def moveToLastNode (self): # Huge improvement for 4.2.
    p = self
    while p.hasChildren():
        p.moveToLastChild()
    return p

@nocolor
As mathematicians say, "this is the way it is written in The Book".

So now p.moveToThreadBack becomes dead simple again:
@color

def moveToThreadBack (self):
    p = self
    if p.hasBack():
        p.moveToBack()
        p.moveToLastNode() # No longer expensive.
    else:
        p.moveToParent()
    return p

@nocolor
I can't tell you how important these code changes are: they are the difference
between reliable and unreliable code.

Edward
</t>
<t tx="ekr.20050421214921.23">Short status report 3/5/04

1.  I am now convinced that the position class will be safe to use.  See the posting called "Iterators make positions safe" in the Developers Forum.  NOTE: 4.2 will require Python 2.2 because 4.2 will use iterators.

2.  Leo's code now accesses all constants and functions x in leoGlobals using g.x.  I used a script to do most of the work.  Leo's code now does:

import leoGlobals as g
from leoGlobals import True,False # See point 3 below.

instead of

from leoGlobals import *

N.B.  Scripts may still use the old way of accessing leoGlobals: the choice is yours.  Furthermore, scripts and plugins are free to import other frequently used attributes from leoGlobals.  For example:

import leoGlobals as g
from leoGlobals import True,False # See point 3 below.
from leoGlobals import es,trace,match # whatever you use frequently.

BTW, Leo no longer uses the app proxy class in leoGlobals.py.  Instead, Leo carefully initializes the app attribute in leoGlobals.py, so scripts.  Again, plugins and scripts may access app as before (or g.app if using "import leoGlobals as g").

3.  I am about to convert true and false to True and False throughout Leo's core.   The proper way to initialize a module will then be:

import leoGlobals as g
from leoGlobals import True,False

N.B: This will work in Python 2.2, even though True and False are first defined in Python 2.3.

Edward
</t>
<t tx="ekr.20050421214921.24">Status report 3-11-04

The version of Leo42.leo on cvs boasts the following features:

- Position class is complete and passes substantial unit tests.

Both the vnode and position classes will have similar link/unlink and insert methods.  The read logic uses the vnode counterparts of the position methods.  This makes sense: positions don't exist during the read logic.

- Leo42 can read .leo files correctly.

- Screen drawing, node selection, and expanding and contracting nodes all work.

- All move, promote and demote commands work and are undoable.

- Typing in the headline and body pane work and are undoable.

Still to do:

- Write .leo files.

- Write derived files.

- Debug and test all remaining commands.

A fully functional pre-alpha version of Leo 4.2 should be ready in about a week.

Edward
</t>
<t tx="ekr.20050421214921.25">Status report 3-23-04

I have been using the Leo 4.2 code base for several days without serious incident.  The 4.2 code base seems remarkably robust and there are now no longer any serious difficulties in using Leo.

4.1 alpha 1 is getting close.  The following will happen before alpha 1:

- Replace v.parents by  v._parent.t.vnodeList and add a v.parents method (and unit test).
- Speed up menu enablers.
- (Maybe) implement undo granularity setting.

Edward

P.S.   @file-thin should be ready about a week after alpha 1 goes out the door.

EKR
</t>
<t tx="ekr.20050421214921.26">@nocolor

Eating my own dog food: status report

Yesterday I switched over to using the Leo 4.2 code base.  Here is a status report.

1.  My concern, nay obsession, with generating the minimum number of positions has been fully justified.

With g.app.debug on, Leo generates enough positions to trigger the gc.  Even without g.app.debug on, Leo initially generated about 250 positions in the tree redraw code when redrawing Leo42.leo.  The exact number depended on how many nodes were visible.

I added a simple and highly effective optimization to most iterators: the iterator doesn't make a copy of the starting position if the iterator is going to generate an empty sequence.  For example, the __init__ method in p.children_iter() tests p.hasChildren() before calling copy().  This optimization reduce the number of positions generated by a factor of more than 2.

Moving a node up, down, left or right now generates about 10 positions or so.  This is quite an improvement.  A bug in one of the iterators initially caused the Move Up command to generate about 20,000 positions!

2.  At present typing generates one or more position for each character typed.  This is due to the undo logic, which generates a separate undo node (containing at least one position) for each letter.  This is probably the time to improve undo by consolidating sequences of typing operations.  Having undo/redo work on words rather than characters will reduce the memory overhead by a factor of n, where n is the average word size.  The undo logic generates a copy of the _entire_ body text for each undo node, so consolidating undo operations has the potential to save a _lot_ of memory.  Undo may be the reason why Leo has a hard time handling nodes containing a lot of body text.

3.  I received quite a shock at the start of the changeover.  Changes made in a clone were not marking as dirty @file nodes that were parents of other cloned nodes.  This caused Leo not to save some derived files when it should.

In previous versions of Leo, marking ancestor @file nodes dirty is handled by looking at all nodes joined to a particular node.  In 4.2, there are no join lists, so p.setAllAncestorAtFileNodesDirty must find some other way. I created the following helper routine:
@color

def findAllPotentiallyDirtyNodes(self):
    p = self
    # Start with all the vnodes in p's vnodeList.
    nodes = []
    newNodes = p.v.t.vnodeList[:]
    # Add nodes until no more are added.
    while newNodes:
        addedNodes = []
        nodes.extend(newNodes)
        for v in newNodes:
            for v2 in v.t.vnodeList:
                if v2 not in nodes and v2 not in addedNodes:
                    addedNodes.append(v2)
                for v3 in v2.parents: 
                    if v3 not in nodes and v3 not in addedNodes:
                        addedNodes.append(v3)
        newNodes = addedNodes[:]
    return nodes

@nocolor
This is one of those kinds of graph algorithms that one might study in school.  It creates a list of all nodes that might be an ancestor of p, or an ancestor of a clone of p, or an ancestor of one of the nodes already on the list.

This method is fast.  The loop is executed approximate n times, where n is the total number of parents of any p and any clone of p.  Worst case n could be very large, but the worst case will never happen.  The typical case is 5 to 50, which is no problem at all.

4.  Sharp-eyed readers will have noticed that p.findAllPotentiallyDirtyNodes use a new data structure, namely v.parents.  Leo needs something like this to discover the parents of _all_ p's clones.  In other words, the position p can only give us the parents of p, not the parents of the clones of p (or the parents of other cloned ancestors of p).

I was in a bit of shock that such a data structure was needed.  I hacked the v.link and v.unlink so that they update v.parents properly.  Later, I realized that v.parents is equivalent to pv.t.vnodeList, where pv is _any_ parent vnode of v.

The present code clears the v._parent field of a vnode when the vnode has more than one parent.  This is done as a signal to pop the p.stack when trying to compute p.parent().  However, there is no need to do this.  Instead of using a null v._parent link as a signal, the code could look at len(v._parent.t.vnodeList).  If that length is greater than 1 v has more than one parent.  IN any case, v._parent.t.vnodeList is equivalent to v.parents.

In short, Leo can do without the new v.parents list, provided that Leo doesn't clear the v._parent field of vnodes that have multiple parents.  Leo will just have to be sure not to "believe" everything the v._parent field seems to be saying.  Instead of being _the_ parent, the v._parent field will point to _some_ parent.

Conclusions

Leo appears quite solid.  I am now using the code derived from Leo42.for most of my work.  I can freely switch back and forth between using Leo 4.1 and Leo 4.2.

A lot of the code in Leo42.leo "just works" unchanged from previous versions of Leo.  This is due to the fact that code can use positions without being aware of doing so.  

The v.parents list is going to go away.  It's completely redundant.  Worse, the code that supposedly keeps v.parents up-to-date has not been thoroughly tested.

I'll be releasing Leo 4.2pre-alpha-1 in a day or so.

Edward
</t>
<t tx="ekr.20050421214921.3">4.2 alpha 1 now on cvs

I have now completely changed over to the 4.2 code base.  The new code base is on cvs: you should see it in an hour or two...

The code appears to be solid.  However use EXTREME caution when using this code.  Make frequent backups!  I always do.

Much of Leo's code uses the "confused" style: it appears to be using vnodes when in fact it is using positions.  This is fine for now.  Fairly soon I will write a script to help convert to the "explicit positions" style.

I'll be writing documentation for the new position class and especially iterators in the next few days.

The enabled plugins in LeoPlugins.leo work.  However, I haven't run the script that converts the code to the new "import leoGlobals as g" style.  There was some kind of weird reversion that happened recently.  I'll be redoing the lost work soon.  LeoPlugins.leo contains Paul Paterson's new typing completion plugin.  This isn't quite ready for prime time...

The Save command appears to be a little slower than before.  I am investigating.  However, the real speedup will be due to @file-thin.  I plan to have Leo load and save @file-thin trees only when the user actually opens those nodes.  This will result in a big speedup.  @file-thin should be ready in one to three weeks.  It will definitely be part of 4.2 final.

Adding more kinds of undo granularity turns out to be very complicated.  It is still scheduled for 4.2 final.

Edward
</t>
<t tx="ekr.20050421214921.4">A little gem

This paper presents a solution to another problem related to positions.  Unlike some of the other problems, I have always known a solution existed.  The happy surprise is how elegant the solution is.

The problem is this:  How will Leo read .leo files in 4.2?  That is, how will Leo create the tree of vnodes such that the cloned nodes will share their descendent subtrees?

Clearly, a solution exists: just create the tree as usual, then run a slightly modified version of the c.convertTreeToSharedNodes method that appears in Leo42.py.  This is a brute-force approach:  it creates a pre-4.2 tree of vnodes, then throws away duplicate portions of the tree.  I won't discuss this algorithm further:  it's slightly complicated and completely uninteresting.

There is a much better way:  Read the .leo file (that is, the files &lt;v&gt; and &lt;t&gt; elements) as at present, with the following changes:

1. When we see a &lt;v&gt; element, create a vnode v (as at present), link v into the tree (as at present) and add v to v.t.vnodeList.  We determine v.t as at present, using an auxiliary data structure called atFile.tnodesDict.

2. If v.t.vnodeList contains another vnode, then all nodes in v.t.vnodeList are clones (see discussion later).  In that case, 

a) Skip all &lt;v&gt; elements that are embedded in the &lt;v&gt; element giving rise to v.  In effect, this skips all descendents of v.  In particular, we generate no further vnodes for any descendent &lt;v&gt; elements.

b) Set v.t._parent to None.  This in effect sets v.t to None for all vi in v.t.vnodeList because all such vnodes share the same tnode.  Setting v.t._parent to None indicates that the node is a clone. (See the p.moveToParent method).

That's _all_!  This is a little gem of an algorithm:

- it generates only those vnodes that will appear in the final outline.
- it operates in a single pass and will be no slower than (and usually considerably faster than) the present way of reading .leo files.
- it requires no new helper data structures.
- it uses the present code with only minor changes.

Why does this work?  The intuition for the proof goes something like this:

1.  In the present (pre-4.2) data structures, vnodes that share the same tnode are joined, but they are not necessarily clones.

2.  In any tree of joined nodes, the topmost element that is joined must be a clone.

3.  The algorithm will traverse every cloned &lt;v&gt; element exactly twice.  Proving this statement constitutes a formal proof.  To see that this statement is plausible, consider the following three trees.  (As always, primes denote clones):

Tree 1:

a'
b'
..a'
b'
..a'

Tree 2:

b'
..a'
b'
..a'
a'

Tree 3:

b'
..a'
..a'
b'
..a'
..a'

You can verify for yourself that the algorithm visits the &lt;v&gt; elements corresponding to a' and b' exactly twice in each tree.

A formal proof relies of on the definition of a clone implicitly implemented by v.shouldBeClone.  Explicitly, a node is a clone if and only it is structurally _dissimilar_ to a node joined to it.  Structurally _similar_ joined nodes have non-null, distinct and joined parents, and have the same child indices.

Here is the formal proof:

Let Vx denote the &lt;v&gt; element corresponding the vnode x.   The proof proceeds by induction on N, the maximum nesting level of clones in a tree.  We shall show that the algorithm visits Vc exactly twice for all cloned nodes c in the tree.

N = 0:  Let c be any clone of the tree.  By the definition of clone given above, one of the following must be true:

Case 1:  [different parents] :  c has distinct parents p1 and p2.

If either p1 or p2 is empty, c appears at the top level and the algorithm will visit Vc once for each empty parent.  Otherwise, both p1 and p2 exist, and because N is 0 neither p1 nor p2 are clones, and neither p1 nor p2 appear in a cloned tree, so the algorithm visits both Vp1 and Vp2, and thus the algorithm visits both Vc twice.

Case 2: [different child indices]:  c appears in two different positions c1 and c2 of the same parent p.  p is neither a clone nor appears in any cloned tree.  Thus, the algorithm visits Vp and thus visits both Vc1 and Vc2.

N = N + 1:  Assume the theorem applies to all nodes having N or fewer cloned parents.  We must show that the theorem applies for all nodes having N+1 cloned parents of a node.

Case 1: [different parents]  By the induction hypothesis, the algorithm visits Vp1 and Vp2 once or twice, depending on whether Vp1 and Vp2 are themselves clones.  However, the algorithm will visit Vc exactly once for each parent: if either parent p is a clone the algorithm will visit its subtree once, and skip the subtree the second time the algorithm visits Vp.

Case 2: [different child indices]:  c appears in two different positions c1 and c2 of the same parent p.  By the induction hypothesis, the algorithm visits Vp twice.  The first time the algorithm visits Vp the algorithm will visit both Vc1 and Vc2.  The second time the algorithm visits Vp the algorithm will skip then entire descendent tree, including both Vc1 and Vc2.

A similar argument shows that the algorithm visits every non-cloned node exactly once.  This concludes the proof. 

Edward
</t>
<t tx="ekr.20050421214921.5">@file-thin now works.  I am now using a LeoPy.leo file that contains mostly @file-thin nodes.

The code is on cvs.  Please use with caution, if at all. A bad clone bug remains and some non-trivial changes to are likely to both the code and the file format.  In particular, reading an @file-thin derived file when expecting @file confuses Leo:  I plan to add a "-thin" field to the @+leo sentinel to help Leo choose the proper atFile.read methods.

The sentinels created by @file-thin are almost exactly the same as the sentinels created in 4.x by regular @file trees.  This will simplify needed changes to the Go To Line number command.  This is another big step forward for Leo.  In particular, for the first time in Leo's history, there is a firm foundation for both @file-shadow and @file-import.  I plan to do both in the next few weeks.

Clones "just work".  The gnx mechanism works perfectly.

Edward

P. S.  I plan to discuss the following topics in separate postings:

-  I added a new directive, @all, to handle @file-thin trees that contain mostly clones of data.  

-  I realized that @file-thin-wait is probably not ever going to work.  

- @file-thin implies a different way of organizing projects.  

EKR
</t>
<t tx="ekr.20050421214921.6">&gt; &gt; The other major area of development will be @file-thin.

&gt; Have there been any developments on this? Last I heard, you were thinking on
it. I'm sure that the upcoming 4.1 takes most of your leo project time, but
I was wondering the status, as this is a very desirable feature in the mixed-editor
groups I work in.

I'm glad you asked.  The short answer is: this _will_ happen, and for exactly the reason you mention: it is essential if Leo is to be used in mixed-editor groups.

How can @file-thin be made to work?  First, we must abandon the so-called synchronization principle, which states (or stated) that an entire .leo file is the "small unit of meaning" of a program.  Instead, we put the burden of "file-level consistency" on the work environment (where it belongs).  For example, in a multi-user workgroup that uses cvs, it is up to the _users_ (with the help of cvs) to ensure consistency when several people are working on files simultaneously.  There is really no other option.  Perhaps Leo can assist in making files consistent, but that is  a minor point: in the general case users will have to discuss potential problems, cvs conflicts, failing (or missing) unit tests or any other issue that might affect the integrity of a program.

Once we absolve Leo of a task that it simply cannot perform in a multi-user setting it becomes possible to design @file-thin in a relatively straightforward manner.

1.  @file-thin implies that all "essential" data, including especially outline structure, must reside in derived files.  There is no other place for the data.

2.  We no longer need to be afraid of "old" data coming "out of the attic" to corrupt the integrity of a project.  Again, this is the responsibility of the workgroup (that is, the _humans_ in the workgroup using whatever tools are available.)  Thus, the old posting, gnx's must die, is completely invalid.  It _is_ valid to use gnx's to identify nodes uniquely!

3. Now that gnx's have been resuscitated, we can use them to associate (cloned) nodes in an outline with corresponding nodes in derived files.   This means that clone links can not break.

4.  @file-thin is inherently less robust than at present, because no full error recovery is possible when there are read errors.  We expect such errors to be quite rare in a "well-run" project, i.e.,

a) a project in which the consistency of files is maintained by the users and
b) a project in which users respect Leo sentinels.

In the presumably rare cases in which derived files are corrupted the user could use one of Leo's Import commands to recover most of the data: structure information will likely be completely lost, as well as all clone links.

Leo's read logic must associate nodes in derived files with corresponding cloned nodes in the outline.  Now that we don't worry about "corrupted" or "out-of-date" derived files (because the project is "well-run") we need not worry about linking clones in the .leo file with the "wrong" data in derived files.  Moreover, we can use a recent suggestion for dealing with clones in different files (two or more derived files or a .leo files and one or more derived files) to deal more intelligently with the situation in which the "same" node has two different headline or body texts.  This is not a matter of "recovering" clone links, it is simply a matter of having the user choose which of two or more conflicting cloned data to use.  Presumably in a "well-run" project this choice would be the result of a discussion amongst the project members, or this choice would trigger a discussion between project members.  In any event, Leo isn't responsible for the choice :-)

5. We have now everything we need to design sentinels for Leo 5.0 (or whatever it will be called).

- All nodes will probably be delimited explicitly, and @+node sentinels must contain gnx's.
- The present sentinels will probably be retained pretty much as they are otherwise.
- The _only_ data available will be the data in the derived files, so Bernhard Mulder's algorithm can not be used.  Indeed, that algorithm is essentially meaningless in the @file-thin context: derived files _are_ the outline.

Let's compare this proposed format for derived files with the present format.  The present format is "optimal" in the sense that it contains absolutely the bare minimum of data required in a derived file.  To do this, the present format assumes that some "hidden data" exists in the .leo file.  Leo uses this hidden data to associate nodes in the derived file with particular nodes in the outline, without even using gnx's!  In the present file format no real outline structure exists in the derived files.  In the proposed file format _all_ structure must be present (somehow) in the derived file, and moreover all nodes must be identified with gnx's.

In short, the proposed format for derived files will have more @node sentinels and @node sentinels will have gnx's.  As a result, derived files will be slightly more cluttered than 4.x derived files.  Clearly, that can not be helped.

6.  In @file-thin there is no longer any need for the slightly odious "auto-save" feature because there is no essential information that must be kept in synch between .leo files and derived files.

7.  The preceding discussion has been about "essential" information:  headline text, body text and outline structure.  As in all previous versions of Leo we definitely do not want to have derived files change when "inessential" information like marks change.  The strategy is clear:  save such info in .leo files, indexed by gvx (global _vnode_ index).  Probably Leo would silently ignore problems in associated gvx's with gnx's in derived files.

That's about it.  I believe that @file-thin will be the new "gold standard" for Leo files.  I envision that LeoPy.leo would become mostly a series of @file-thin trees.  It may be possible to create an @file-thin projects.txt node corresponding to the present (Projects) tree, but this might result in many conflicts between clones in projects.txt and corresponding nodes in other derived files.

This shows that Leo will need a policy for automatically updating clones: changes to clones in derived files probably should take priority over the corresponding data in the .leo file.  I suspect this will work well in practice, though this remains to be demonstrated conclusively.

Thanks again for this question.

Edward

P.S.  I would like to have done this a long time ago :-)  However, it is not completely clear when I shall get around to this.  There are several factors that are interacting, and these factors may delay @file-thin considerably.  I hope not.

EKR
</t>
<t tx="ekr.20050421214921.7">Better convertTreeToSharedNodes

The previous version of convertTreeToSharedNodes set v.t.vnodeList improperly for nodes that were joined but not cloned.  The following makes more "mini unit tests" pass:

def convertTreeToSharedNodes(self):

..c = self

..# Return if the tree has already been converted.
..v = c.rootVnode()
..while v:
....if v._firstChild and not v._firstChild._parent:
......print ; print "already converted"
......return
....v = v.threadNext()

..# Init.
..v = c.rootVnode()
..while v:
....v.t.vnodeList = []
....v = v.threadNext()

..# Create a list of cloned nodes:
..v = c.rootVnode() ; cloneList = []
..while v:
....if v.isCloned():
......print "clone",v
......cloneList.append(v)
....v = v.threadNext()

..# Set _firstChild in tnodes.
..v = c.rootVnode()
..while v:
....child = v.firstChild()
....# Careful: set the field only the first time we see a shared tree.
....# This logic must work with the logic below that sets v.t.vnodeList.
....if child:
......if not hasattr(v.t,"_firstChild"):
........v.t._firstChild = child
....else: v.t._firstChild = None
....v = v.threadNext()

..# Set v.t.vnodeList.
..v = c.rootVnode()
..while v:
....# Careful: only set one value for non-cloned joined nodes.
....if v in cloneList: # Cloned
......try:    v.t.vnodeList.append(v)
......except: v.t.vnodeList = [v]
....elif not hasattr(v.t,"vnodeList"): # Maybe joined.
......v.t.vnodeList = [v]
....v = v.threadNext()

..# Clear _parent field of any node whose parent is a clone.
..v = c.rootVnode() ; clearList = []
..for v in cloneList:
....child = v.firstChild()
....while child:
......clearList.append(child)
......child = child.next()

..for v in clearList:
....v._parent = None

This is starting to get tricky.  It's probably worth it, though, to do as many tests as possible on the position class before the grand transition.

Edward
</t>
<t tx="ekr.20050421214921.8">At present, Leo's vnode class contains neither a __cmp__ nor a __nonzero__ method.  There is plenty of code that does:

if v == v2:

or

while v != v2:

or

while v:

so any __cmp__ or __nonzero__ method must continue to make those usages work.  Otherwise, you are free to define an ordering as you please.

HOWEVER, I would urge caution, for several reasons:

1.  Leo 4.2 will use positions as well as vnodes, and the position class defines both p.__cmp__ and p.__nonzero__.

2. Leo 4.2 will use iterators as the primary way of traversing trees, and these require p.__cmp__ and p.__nonzero__.

3. There is no natural order for either positions or vnodes, though I suppose threadNext is a total ordering.  But using threadNext to define a total order would be very expensive.  Moreover, vnodes that have been deleted may be undeleted by undo (v.exists returns whether a node has been deleted), so some vnodes do not appear in threadNext order at all.

4. It would be better, I think, to define some more explicit ordering and use that for your particular purpose rather than try to coerce Leo's fundamental vnode or position methods.

Edward
</t>
<t tx="ekr.20050421214921.9">Code details of shared tnodes

As in no other project that I have ever worked on, the design of the shared tnode project is heavily influenced by code-level details.  This paper will examine those details in depth, pointing out potential pitfalls and the solutions that I have invented.

In the last paper I posed the problem of how to avoid making copies of positions while traversing trees.  This paper will discuss this question at great length.  The reason is simple: the answer to this question is the tail that wags the dog.

1. How to avoid copying positions

There are several alternative ways to handle this problem.  Most are complicated, messy and ugly.  However, after much experimentation I believe I have come up with a _truly_ clever solution--one that is simple, "lightweight", safe and pleasant to use.

The trick is _not_ to make positions be proxies for the vnode or tnode class.  Instead, we will force traversal code to "say what it means" and convert from positions to vnodes explicitly.  Like this.

p = c.rootPosition() # Say what we mean:  p is a position.
while p:  # p evaluates to "nonzero" as long as p points to a valid position: see below.
..v = p.v # v won't change when p changes.
..&lt;&lt; do something with v &gt;&gt; # safe
..p.moveNext () # changes p "in place"

It is crucial that the assignment v = p.v is "safe". That is, v won't change even if p does change later.  See the P.S. for details.  BTW, in the pattern above p won't change inside &lt;&lt;do something with v&gt;&gt; so we could, in fact, safely use p.v instead of v.  The point is, however, that we could use copies of p.v _outside_ the loop above safely.

We must not allow any possibility of confusion between v.next and p.next.  So to be safe we must do the following:

- eliminate all traversal routines from the vnode class.
- replace v.threadNext with two separate routines: p.moveNext and p.hasNext.
- make sure _not_ to define p.next() as a synonym for p.moveNext().  Thus, we can never inadvertently move a position with a test like if p.next().  p.next() doesn't exist: we must either explicitly move the position or test whether a position exists.
- Use p.copy() to make a separate copy of position p in those rare cases where a separate copy is required.

This is the solution that I presently favor.  It does require a rewrite of _all_ tree traversal code in Leo.  But that can't be helped.  Again, the benefits are the following:

1. The code says what it means clearly an unambiguously.
2. There is no need for "proxifying" the position class.
3. It implies the best possible use of memory without any particular effort on the part of users.
4. The "while p" construct is an elegant way of testing for the end of the tree traversal.
5. Python will immediately catch any errors made in transliterating tree traversal code.  It is not possible for mistakes in transliteration to create serious "time-bomb" errors.

2.  Why do we care about copying positions?

This a crucial question.  One would think that we should measure Leo to see whether copying positions will result in an undo burden on Python's storage allocator.  But what outline should we measure?

In fact, this is one situation where measurements would be misleading, no matter what measurements we made.  To repeat a statement made in the design paper, we are interested in shared tnodes as an optimization.  That is, we want Leo outlines to work well with no intrinsic limits to their size.  So no matter how well Python handle's copied positions, copied positions will stress Python's memory allocator unnecessarily.

There is another reason why I want to completely eliminate copies of positions:  I can do so now, so not doing so creates an unnecessary incompletion in the Leo project.  The time to optimize Leo's storage allocation is _now_, not later.  There is _no way_ to guarantee that copying positions will _never_ be a problem, so we might as well solve the problem completely no, once and for all.  Again, we are going to have to test everything anyway, so we might as well test the best possible code.

You could say this is primarily a management issue, and a fairly unusual one at that.  Usually the operative motto is: keep it simple.  In this case, this motto doesn't apply.  Instead, the principle is: solve the problem once and for all.

3.  The representation of positions

Positions are simple.  The ctor for the position class is:

def __init__ (self,v,stack):
..self.v = v
..self.stack = stack[:] # Creating a copy here is safest and best.

4. Testing for and creating invalid positions

A position will become "invalid" when it moves past the last node of the tree.  For example, if p points to the last node of the tree, calling p.moveToThreadNext() results in p being an invalid position.

There are two valid ways of testing for valid positions: test p directly or test p.v.  Examples:

- if p: &lt;&lt; do something&gt;&gt;
- if p.v: &lt;&lt; do something &gt;&gt;
- while p: &lt;&lt; do something&gt;&gt; 
- while p.v: &lt;&lt; do something &gt;&gt;

Tests like "if p is None" or "if p is not None" will not work properly.  Tests such as "while p" implies the following position method:

def __nonzero__(self):
..return self.v != None

So to mark a position as invalid positions methods need only to set self.v = None.  In particular, they do _not_ need to reset the stack, although in all normal cases the stack will be [ ] when a position becomes invalid.  However, leaving the stack unchanged might potentially be useful for code that somehow wants to "back out" of a move.

5. Methods that move positions

These routines have been very tricky to get right.  I have not tested the following code, and I am pretty confident that they will work.  The reason I am confident is that they are now simple enough to be understandable.

a)  Moving a position to it's back or next position is straightforward:

def moveToBack (self):
..p = self
..p.v = p.v and p.v._back

def moveToNext (self):	
..p = self
..p.v = p.v and p.v._next

In both cases, the code does nothing if p is already invalid (the "p.v and" clause).  Otherwise, the code sets p.v to either p.v._next or p.v._back.  In particular, the stack does not change.

b) We push the stack when moving to the child of a cloned vnode:

def moveToFirstChild (self):
..p = self
..if not p: return
..child = p.v.t._firstChild
..if child:
....if p.v.isCloned():
......p.stack.append(p.v) # only push clones.
....p.v = child
..else: p.v = None

Let's go this step by step:
- As always, we return if p is not valid: "if not p: return".
- The child link is contained in the tnode: "child = p.v.t._firstChild".
- If the child exists, and p is cloned, we push p.v on the stack.
- If the child exists, we set p.v to the child link: "p.v = child".
- if the child does not exist, the position becomes invalid: "else: p.v = None".

BTW, tnodes contain a Python list of all vnodes that point to them.  The test whether a vnode is cloned is very simple:

def isCloned(self):
..v = self
..return len(v.t.vnodeList) &gt; 1

This is a stupendous improvement over the corresponding code now in Leo.  In particular, there is no need for routines such as c.initAllCloneBits, etc.

c) We pop the stack when moving to the parent of a cloned vnode.

def moveToParent (self):	
..p = self
..if p.v._parent:
....p.v = p.v._parent
..elif p.stack: # only pop clones.
....p.v = p.stack.pop()
..else:
....p.v = None

Again, step by step:
- p.v._parent will exist if and only if p.v has exactly one parent.
- Otherwise, if p.v has more than one parent we get it by popping the stack.
  N.B.  Popping the stack implies that we are _moving_ the position.
- Otherwise, p.v has no parent at all, and we set p.v = None

6. Defining routines that do copy positions

After a stupendous amount of work (I'm not kidding, I have rewritten these routine more than 20 times) I have at long last created a clear and simple way of describing the relationship between traversal routines that copy positions and traversal routines that do not.  It's easiest just to show you the final result in all its simple glory:

def copy (self):
..""""Return an independent copy of a position."""
..return position(self.v,self.stack)

def back (self): return self and self.copy().moveToBack()
def firstChild (self): return self and self.copy().moveToFirstChild()
def lastChild (self): return self and self.copy().moveToLastChild()
def lastNode (self): return self and self.copy().moveToLastNode()
def next (self): return self and self.copy().moveToNext()
def nodeAfterTree (self): return self and self.copy().moveToNodeAfterTree()
def nthChild (self,n): return self and self.copy().moveToNthChild(n)
def parent (self): return self and self.copy().moveToParent()
def threadBack (self): return self and self.copy().moveToThreadBack()
def threadNext (self): return self and self.copy().moveToThreadNext()
def visBack (self): return self and self.copy().moveToVisBack()
def visNext (self): return self and self.copy().moveToVisNext()

You see the pattern: to create a "safe" traversal, we create a copy of the present position and then move the _copy_.  The idiom  "return self and self.copy().moveToX()" will return an invalid position if self evaluates to False.  Actually, the test will result in self.__nonzero__ being called.  Anyway, the code should work spiffily.

N.B. I do _not_ plan to include these methods in Leo's position class.  Indeed, doing so would invite all sorts of problems, as I said in point 1 above.  In particular, we want _new_ names of position methods so that Python will complain about any confusions or omissions in converting from vnode-based traversals to position-based traversals.  Also, we do not want to create opportunities for users to stress Python's storage allocator needless.

The point of this section is simply this:  it is now absolutely clear what the relationship between copying and non-copying traversals are.  I trust this section shows why avoiding copies is A Good Thing.

7. Utility routines that do not create copies of positions.

This is a recent and important development.  It's not good enough for the moveToX routines not to create copies of positions directly.  The moveToX routines must not create copies of positions _anywhere_, directly or indirectly.

I invented a crucial utility routine this week.  This utility provides the pattern for all similar code:

def vParentWithStack(self,v,stack,n):
..if not v: return None,n
..elif v._parent: return v._parent,n # don't change stack.
..elif stack and n &gt;= 0: return self.stack[n],n-1 # simulate popping the stack.
..else: return None,n

Given a position, this utility allows us to look up the stack to get information without
-  having to change a position, or
-  having to make a copy of a position.

It may be easiest to show vParentWithStack in action, rather than trying to describe exactly what it does.  The vParentWithStack method allows us to define the hasThreadNext method as follows:

def hasThreadNext(self):
..p = self ; v = p.v
..if not p.v: return false
..if v.t._firstChild or v._next:
....return true
..else:
....n = len(p.stack)-1
....v,n = p.vParentWithStack(v,p.stack,n)
....while v:
......if v._next: return true
......v,n = p.vParentWithStack(v,p.stack,n)
....return false

Compare this with the moveToThreadNext method:

def moveToThreadNext (self):	
..p = self
..if not p: return
..if p.v.t._firstChild:
....p.moveToFirstChild()
..elif p.v._next:
....p.moveToNext()
..else:
....p.moveToParent()
....while p:
......if p.v._next:
........p.moveToNext() ;break
......p.moveToParent()

The pattern of both methods is the same.  The difference is p.vParentWithStack allows us to simulate the effect of p.moveToParent without changing or copying any position.

For more complicated examples, I had to rewrite p.moveToLastNode and p. p.moveToThreadBack routine using different algorithms than presently used in the corresponding v.lastNode and v.threadBack routines that are presently used in Leo.  The reason is that I had to simulate calls to p.moveToParent() without actually calling p.moveToParent, and I didn't want the algorithms to use anything except the simulation of p.moveToParent.  The actual code in now on cvs in Leo42.py.

Finally, the p.moveToLastNode and p. p.moveToThreadBack methods are right at the limit of what can be done robustly.  Leo42.py contains some "lookahead" routines called p.vNext and p.vBack that allow code to peek at p.moveToNext().v, p.moveToBack().v and similar routines without copying a position.  (These routines return the value of p.v at the "moved" positions.) However, more complex routines such as p.vThreadBack are just too complicated to do reliably and they probably aren't ever going to exist.

Edward

P.S.  It is crucial that the assignment v = p.v is safe, that is, that v does not change even if p changes "in place" later.  As usual, Python works just as is required.  I just tested this out with the following code.

from leoGlobals import *
class position:
..def __init__(self): self.v = "a"
..def move(self): self.v = "b"
		
p = position()
v = p.v
print "before move", v, p.v, v is p.v
p.move()
print "after  move", v, p.v, v is p.v

The output of this script is:

before move a a True
after move a b False

This verifies that the assignment v = p.v is safe.  The reason this works is that the "assignment" self.v = "b" actually makes self.v a reference to a _new_ object, namely "b".  Of course, we Python gurus already new that ;-)

P.P.S.  It is conceivable that there will be a big problem with the code, and if so another complete iteration of the both the design and the code would be required.  I am starting to have confidence, though, that all can be made right.  This is another reason to do xnodes (incremental screen refresh) now.  I want to prove the design _and_ the code as much as possible as soon as possible.

EKR
</t>
<t tx="ekr.20050421221330">@all directive

Leo now supports an @all directive.  This works something like @others, except that @all simply puts all nodes into the derived file without disallowing orphan or ignored nodes.

Leo needs this directive to be able to put the (Projects) node in LeoPy.leo in a separate derived file.  Without this directive there would be no way to put this node except by using @file-nosent.

I took the trouble to do @all because I wanted to create a .leo file that contained no data except @file-thin nodes.  Such a .leo file would be very small: probably about 10K or less.  The present LeoPy.leo file contains @file-nosent nodes to generate documentation files.  I'll probably move those nodes over to LeoDocs.leo.

Edward

</t>
<t tx="ekr.20050421221330.1">@file-thin-wait won't work

@file-thin-wait is probably a bad idea.  Suppose Leo has not loaded an @file-thin-wait file, and suppose that the user changes a node whose clone is in the @file-thin-wait file, or _would_ be in the file if Leo had loaded it.  Leo has no way to mark the @file-thin-wait file dirty, except by reading it, but that means that all @file-thin-wait files would eventually have to be read before saving any file!

Not being able to properly mark @file-thin-wait files as dirty is an extremely serious problem.  If @file-thin-wait files are not updated properly files will become dangerously out-of-synch.  Out-of-date clones become the enemy of changed clones.  This would be disaster waiting to happen.

Given the problems with "disappearing and reappearing" clones in @file-thin-wait files, it seems that @file-thin-wait is just a bad idea.  Moreover, we now have a much better alternative: reading all @file nodes in a separate thread.

Edward</t>
<t tx="ekr.20050421221330.2">Organizing projects with @file-thin

The @file-thin code has worked remarkably well, and the changeover to a LeoPy.leo file went without serious incident.  However, I was troubled by the change.  In part, this is because the new LeoPy.leo files looks quite different from the old, and changes to one's entrenched routine is always stressful.  However, there was a bit more to it than this.

The main reason, IMO, for having @file-thin is so that developers will _not_ have to commit any .leo files at all: thin derived files contain all essential information, both data and structure.  But developers obviously _do_ need some .leo file in order to use Leo.  What to do?

The answer turns out to be simple:  commit a _reference_ .leo file to cvs.  This reference file contains only @file-thin nodes and possibly organizer nodes.  Presumably the project's administrator would be the only person to commit changes to this file.

Developer's would _rename_ the reference file and use the renamed file in their work.  Their renamed copy of the .leo file would be private: each developer would use their private .leo file as they wish.  In particular, they could create clones of nodes (especially of nodes in @file-thin trees) and put those clones in their private .leo file.  For example, I plan to add LeoPy(ref).leo to cvs soon, and to remove the actual LeoPy.leo.  To use Leo to develop Leo itself you would get LeoPy(ref).leo from cvs, then rename it to LeoPy.leo and use it as you please.

N.B.  Although the @all directive makes it possible to create thin derived files containing "random" clones, my present thinking is that it does not make a lot of sense to commit such derived files to cvs.  Most project clones are probably of interest to a single developer.  Moreover, keeping files containing the @all directive in synch with other files is a bit of a challenge.  Presumably data in derived files _not_ containing @all should take precedence over data in files that do contain @all.

So @all should also serve as a marker to Leo that if there is a choice the data in an @all tree is to be considered secondary to other data.  At present, Leo does _not_ enforce this policy.  One of the items on Leo's to-do list is to update conflicting clones more sensibly.  I plan to address both issues at the same time sometime before releasing 4.2 final.

In short, it is probably best not to commit to cvs file that use @all.  I believe this rule resolves my anxiety about @file-thin.  If clones need to be widely distributed, they can be put in the reference .leo file.

Edward</t>
<t tx="ekr.20050421221330.3">Embedded sentinels are essential

I look forward to implementing @file-shadow.  All the pieces for this are now in place.  While thinking about this, I considered again various schemes for eliminating sentinels altogether.  It's sort of a hobby of mine :-)

One suggestion that crops up from time to time is to reduce the visual annoyance of sentinels by putting them "somewhere else".  I think that putting them in a separate file is a _really_ bad idea, except in the case of @file-shadow, where we assume (perhaps unwisely) that the burden of keeping files in synch can be handled reliably.

But I digress.  The other place to put sentinels "somewhere else" is grouped at beginning or end of the (thin) derived file.

One scheme that could actually work is to use "dummy" sentinels.  The idea is to write a sentinel such as #@ and have Leo associate such a dummy sentinel with a "real" sentinel while reading the derived file.  This reduces the visual effect of sentinels, but actually increases the number of sentinels.  This is conceivable.  Actually implementing it would probably require a two-pass algorithm: reading and writing such files would likely be slower.

Deleting or inserting a "dummy" sentinel would corrupt a file just as surely as corrupting any other sentinel, so perhaps this scheme is slightly more dangerous than the present scheme.

However, it would not be possible to entirely delete sentinels embedded in the derived file.  I hesitate to explain why: people aren't going to remember the explanation, they won't believe it, and they won't like it even if they do believe it :-)  Nevertheless, embedded sentinels _are_ essential.  The proof is as follows:

1.  Sentinels should be immutable so that cvs does not report "false" diffs.  But this statement applies to _all_ "sentinel-like" data in a derived file.  A derived file must _never_ contain data that will change when nodes (in the outline) are moved, inserted, deleted, etc.

2.  If sentinels are grouped at the beginning or end of a derived file, then there is _no way_ to associate those sentinels with parts of a derived file _immutably_ except by using dummy sentinels.  For example, any "pointer" (index) that tells which line a sentinel "belongs to" will be mutable: the pointer will change when the outline changes.  Worse,  changes to the derived file will corrupt all such pointers or indices.

This concludes the proof.

So are dummy sentinels a good idea?  I don't much like "fancy" schemes, and I think dummy sentinels qualifies as fancy.  Moreover, most sentinels have value as comments.  One half-way measure is to remove just the gnx field from dummy sentinels.  This makes the scheme more robust, and makes it unlikely that users will remove dummy sentinels by accident.

However, I think the extra complication of grouping the "real" sentinels isn't worth the trouble.  What people _really_ want, I think, is no sentinels at all in derived files.  IMO, @file-shadow or @file-import are the only ways to do this.

Again, I'm sure this won't end the noodling, but hey, I tried :-)

Edward
</t>
<t tx="ekr.20050421221914">@nocolor

This file contains many design-related postings from my hard drive. These
correspond fairly closely, but not exactly, to SourceForge postings.

**You do not need to read this file to understand Leo.**

This file is for 'historians' only. These are the raw data for Chapter 10,
History of Leo, in LeoDocs.leo. This file shows the complexities involved in
designing Leo. For historians (that is, me), the big problem with an Aha is that
it alters the mental landscape in such a way as to make it essentially
impossible to remember what the world was like before the Aha. This file is an
attempt to retains some of what the world was like before the big Aha's that
made Leo what it is today.</t>
<t tx="ekr.20050422055636">Theme 1: Global Tnode Indices 
2002-10-21 08:51 
Previously I have dismissed the notion of global node (or clone) indices, saying that there is no way to guarantee
uniqueness. This is correct, but it is not the whole story. CVS provides another model: that of resolving conflicts. 

So suppose every node (in a derived file or in a .leo file) has a global node index. This index would be associated with _all_ tnodes, not just cloned nodes. These indices (global tnode indices, or gti’s for short) would be _semi_ mutable. That is, they would be immutable except for the (rare) instances in which gti conflicts occurred. A gti conflict occurs when two nodes in a .leo file have the same gti. At that point, one or both of the gti’s must change.

A subsidiary part of this scheme is that .leo files (and derived files?), would contain an indication of the maximum
gti used so far. When writing a file, the maximum gti would be set to the maximum of all the max gti fields found
when reading all the files. When creating a tnode for the very first time, its gti would be set to the max gti value,
and that value then would be incremented. Possibly a new @max_gti sentinel would be needed in derived files.
This might (will?) cause conflicts: to resolve the CVS conflict we would simply pick the larger number. 

In short, we have a “semi-global” index that is _not_ guaranteed to be unique, but which will be “almost” unique.
We resolve conflicts as follows. Each .leo file will contain a gti table, associating gti’s with headlines. A conflict
occurs when two nodes have different headlines and the same gti’s. The gti table will give the headline that was
written last for each gti, so the gti table will usually match one of the headlines of the conflicting nodes. The other
tnode will have it’s gti changed. If neither node’s headline is in the table we will change both gti’s.
</t>
<t tx="ekr.20050422055636.1">Theme 2: Small (template) .leo files 
2002-10-21 08:54 
Presently, updating LeoPy.leo to CVS takes a long time and isn’t very useful. CVS doesn’t understand the format of .leo files; CVS destroys the XML structure of .leo files when trying to merge conflicts. 

The “heroic” solution is the “Resolve CVS Conflicts” command, but I think another way might be better, and much simpler. The idea is just this: don’t include LeoPy.leo in CVS at all. 

To do this, we need to make LeoPy.leo “irrelevant” as far as CVS is concerned. We can do that by having the
derived files contain _all_ information in LeoPy.leo. So LeoPy.leo would become nothing but a shell, or
placeholder. This would be just what is needed for truly huge projects, starting with the Linux kernel. 

The elements of this scheme might be as follows: 

1. A user option: copy_derived_info = 0 that causes @file trees in .leo files not to be saved when derived files are written. For completeness, perhaps we would have a Write Full Outline command that does write everything. Such a small .leo file might be called a “template” .leo file. It would consist mainly of the vnodes describing @file nodes (but not their descendents), as well as nodes that exist only in the outline. We would want to reduce or eliminate such non-@file nodes: see below. 

2. Tables within .leo files (or in other files?) would allow marks and clones links to be recreated when derived files are read. I call these gti tables. Regardless of where these tables reside, these tables are conceptually _local_ to each .leo file; they do _not_ have to be managed by CVS. In other words, clone links are created by each .leo file, _not_ by derived files. The acid test of this scheme is whether it can handle the section in LeoPy.leo now called (Project views) 

3. A @read_only option causes the .leo file to be read-only. The Save command would be dimmed in that case.
This isn’t really needed, and it might emphasize that the .leo file isn’t part of the CVS distribution and isn’t
mutable. 

4. All information in LeoPy.leo should be “carried” by derived files. An @file local_notes.txt section creates a
derived file _not_ part of CVS, for the private use of people besides me. The Notes section becomes @file
EdwardsNotes.txt, containing the "official" (i.e., my) notes about Leo. “Official” derived files are the files handled by CVS. Local files are not managed by CVS, and can be changed by anyone at any time for any reason. LeoDocs.leo and leoConfig.leo would probably exist as before. 

I’ve got to double check that this scheme avoid the horror’s of the old “backup” .leo files. I think it does because there is will be no such thing as read errors. Such errors arose because of mismatches between the structure of the derived file in the .leo file and the real structure as specified by the derived file. If the .leo file contains no structure information no such mismatch can exist. </t>
<t tx="ekr.20050422055636.10">By: edream ( Edward K. Ream ) 
Setting global name: LeoID.txt 
2002-10-26 13:15 
There are a few nits concerning setting the global id:loc field that I'd like to mention here. 

1. I don't believe the id:loc field should be part of leoConfig.leo/leoConfig.txt. If it were it would be difficult to distribute leoConfig.leo without risk of resetting the id:loc. Instead, a separate file, say LeoID.txt should contain the id:loc field. The first non-comment line of the file will contain the field, with comment lines starting with #. 

2. LeoID.txt might be placed in a location associated with the user. But this creates the further problem: how
do we tell Leo about this location? 

Remember what the situation is: we want to make sure that the id:loc field disambiguates possibly identical
timestamps. Even with a single user, it would be conceivable, with a time grain of one second, to create two
nodes in two different copies of Leo on the same machine at the "same" time. To handle this, we could add
another timestamp, the time at which Leo was launched (which would be unique, even for two copies of Leo on
the same machine). However, this extra timestamp would be ugly, and it would destroy the ability to have a
default id:loc associated with every machine. 

So I favor the simple solution of either a) putting LeoID.txt in the same folder as leo.py, or having the user
specify the location of the directory containing LeoID.txt. If different users on Linux can specify different
locations for LeoID.txt, so much the better. 

3. Leo 4.x must raise an error if LeoID.txt can not be found. It will put up a dialog saying that LeoID.txt can not
be found, and prompting the user either 

a) for the location of an existing LeoID.txt file or 
b) for an id:txt field and the location of the place to put a new LeoID.txt file. 

Once again, we have the problem of where to put LeoID.txt. Any ideas, especially from Linux people, would be
welcome here. 

4. leoConfig.txt will contain at least one option related to the new gti scheme. This option tells whether to use
gti's at all, or whether to use the old way. (For the foreseeable future, leo.py will be able to read both old and
new style derived files and .leo files.) We must have this option to suppress the dialog mentioned above. This
option causes no distribution problem: it will always be set to use_gtis = 1 when updating to CVS, and nobody
using CVS would be tempted to change it. 

So that's it. These are minor details which, except for the problem of where to put LeoID.txt are easily solved.
Any suggestions for where to put LeoID.txt so Leo can find them would be appreciated. Note that is a minor
problem: gti's are going to happen. </t>
<t tx="ekr.20050422055636.2">Theme 3: @@file nodes 
2002-10-21 08:56 
@@file nodes would be read only when the user first selects them. This eliminates the reading of derived files when the template .leo file is opened. 

By: edream ( Edward K. Ream ) 
Theme 5: @option headlines 
2002-10-21 09:03 
Given that there are a number of ways to write .leo files, the new attitude says that rather than trying to figure out
which is best we give the user the ability to pick any way. When writing a file, we might want to specify: 

1. Whether to use the old, compatible file format, or the new file format without the &lt;tnodes&gt; element. 

2. Whether to redundantly write information to the .leo file, as is done presently, or whether to write template @file nodes. 

The natural place to specify such options is in a headlines, not body text, because only headlines are guaranteed to be written in template @file nodes. Moreover, such options are more similar to @file, @rawfile and @silentfile than to “real” directives. 

We could generalize such “options headlines” to specify many other options. For example, 

@option remove_sentinels_extension = .txt 
@option body_text_font_family = Courier New 
etc. 

Not all options could or should be “localized” this way. I doubt, for example, that keyboard shortcuts should be
dependent on the location of the headline, although I think it probably could be done. And the read_only option
presently applies to the entire .leo file. We could create a local-read only option, but that would be a new options and the @ignore also has the same effect. I'm not going to get bogged down in details here: the point is that we don't have to dogmatically say that all options in leoConfig.txt should be available in an @options headline. 


</t>
<t tx="ekr.20050422055636.3">By: edream ( Edward K. Ream ) 
Yes, GTI's _are_ possible 
2002-10-23 19:58 

Standing in the shower after my noon workout I realized that there is a very easy and simple way to generate true, immutable, unique GTI's. We simply concatenate a user name, specified, say in leoConfig.txt, with any local timestamp. 

The beauty of this scheme is that it works so well with CVS. Indeed, we simply use the Sourceforge user name,
edream, dthein, jmgilligan (but not nobody :-) and we instantly have a guarantee of uniqueness. Presumably this
will be good enough, though if several people could log in under the same id from several places at the _same_
time we could extend the scheme to include the location: edream:home, and edream:fiji_islands,
edream:crab_nebula, for example. 

This truly changes everything. At last there is a really good way to uniquely identify nodes, and in a meaningful
form as well. What could be better than creator:creationTime? Thanks to all who have kept pestering me about
this. 

Of course there are minor complications. Maybe leo will use "nobody" if the creator_name field in leoConfig.txt is not specified. Or maybe we could fall back on other schemes: just using the local timestamp (while issuing a
warning). It may even be possible to use the CVS $Author: edream $ variable. In any case, these are just nits related to
initialization. 

Victory is at hand! 

BTW, with such nice indices, there is probably no need to "improve" the present .leo file format. We must,
however, write these true gti's in node sentinels in derived files. So these sentinels will be longer, but that won't
matter because they will never ever show up in spurious diffs. 
</t>
<t tx="ekr.20050422055636.4">By: dthein ( Dave Hein ) 
 RE: Yes, GTI's _are_ possible, more 
2002-10-24 00:21 
I just thought of an even more compelling reason for adding
sequence numbers -- file imports. File imports will add nodes as fast
as possible, guaranteeing multiple GTI generation requests within the
minimum clock interval. 

By: dthein ( Dave Hein ) 
 RE: Yes, GTI's _are_ possible 
2002-10-23 22:54 
Boy, that was easy. :-) All we had to do was wait :-) :-) 

If you are happy with "timestamp:userid:location" as the GTI, then so
am I -- almost. I'd also add a sequence number in case somebody
hits Ctrl-I key several times quickly and so generates several GTIs
within the resolution of the timestamp. Just increment a global seqno
with each GTI generation request. So I'd like to see
"seqno:timestamp:userid:location". 

Note: You also probably want to use time.time() to get the
timestamp ... it gets sub-second times on different platforms;
time.clock() only gets seconds on UNIX. Of course, if you want a
human readable time string instead of a floating point number, then
seconds is the finest resolution you can get and the seqno
mentioned above becomes even more important. </t>
<t tx="ekr.20050422055636.5">By: edream ( Edward K. Ream ) 
sequence numbers in gti's 
2002-10-24 11:17 
&gt; I'd like to see "seqno:timestamp:userid:location". 

I agree. Leo would actually generate the seqno field only if the presently generated timestamp matches the previousTimestamp ivar in the app class. 

Derived files will specify a default userid:location field. Within
derived files, Leo will only write seqno:timestamp if the
userid:location does not match this default. When reading a
derived file, Leo will use the default userid:location field in the
derived file's header to reconstitute the full gti. Since seqno
will mostly be empty, within derived files gti's will reduce, in
most cases, to just the timestamp! This will make derived
files much more pleasant to read. In particular, all files derived
from LeoPy.leo will have edream as the default userid:location
field, so those few headlines originally created by somebody
else will be clearly visible. It's perfect. 

Neither the format nor the resolution of the timestamp affects this scheme in any way. 

1. The formats of the timestamps do not need to match
"across" different machines because the 
userid:location part of the gti (whether explicit or implicit)
disambiguates the gti. All that is required is that timestamps
be unique on a particular machine, whatever their format. 

2. The resolution of timestamps similarly does not matter
provided that we use seqno's as you suggest. </t>
<t tx="ekr.20050422055636.6">By: edream ( Edward K. Ream ) 
Theme 4: Revised XML file format 
2002-10-21 08:58 
The present XML file format is “regular”, that is, all vnodes and tnodes are represented in the same way. In particular, tnodes are associated with vnode using tnode indices. Tnode indices are computed each time the .leo file is written, and a “small” change in a .leo file can cause many of these indices to change. This causes CVS to report many diffs.

It would be possible to use a slightly more complex scheme that would take an entirely different approach. 

We don’t use the &lt;tnodes&gt; element at all. Instead, we associate text with the first headline that uses it, using an
attribute field. For example, instead of: 

&lt;v t="T4"&gt;&lt;vh&gt;The headline text&lt;/vh&gt;...any nested vnodes&lt;/v&gt; 

We would have: 

&lt;v tx="The body text"&gt;&lt;vh&gt;The headline text&lt;/vh&gt;...&lt;/v&gt; 

To make this work, we must ensure that two nodes are cloned if and only if their headline texts are identical. Leo can do this as follows. When writing a .leo file, Leo will first scan the entire list of vnodes, entering headline text into a Python dictionary. If two vnodes have the same headline text but are _not_ clones a disambiguating tag will be added, something like this: 

&lt;v tag="1" tx="The body text"&gt;&lt;vh&gt;The headline text&lt;/vh&gt;...&lt;/v&gt; 

The read logic will consider that vnodes with the same headline text are clones provided that: 

1. They don’t use the old t= tnode index field 
2. They either have no new tag= field or they have the same tag= field. 

This scheme will eliminate all the old tnode index fields, and the new tag= fields won’t change when other vnodes or tnodes change. This means that essentially all changes to .leo files will be as the result of adding vnodes or changing tnode text. That is, changes in .leo files will correspond directly to user changes. This presumably will make CVS happier. 
</t>
<t tx="ekr.20050422055636.7">By: dthein ( Dave Hein ) 
RE: Theme 4: Revised XML file format 
2002-10-22 20:23 

Putting the body text into the value of an XML attribute is, I think, not a good thing. There are limits as to what
kind of characters can be used for attribute values, and there may be length limits for attributes imposed by
some tools. In general, it is better for the inner text (the text between the start and end XML tags) to be used
for _data_ and the attributes to be used for _metadata_ -- which is the way the current leo XML is defined.
Usually, using an attribute for basic data will lead to trouble down the road. 

What is the problem here? Is not that inserting a new node or moving a node causes all the tnode indexes
to be renumbered (thus generating lots of diffs)? 

Would we not be better served by relying on the vnode structure to indicate the node positioning in the tree,
and using "semi-mutable" :-) identifiers for the tnode indexes so that they are not resequenced each time a
node is inserted, removed, repositioned? 

By: edream ( Edward K. Ream ) 
RE: Theme 4: Revised XML file format 
2002-10-22 23:06 
&gt; Careful. 

Ok. I misspoke. The data should be in an element (like the headline is), rather than in an attribute. This
doesn't change matters materially. 

&gt; What is the problem here? 

Yes, one of the problems is tnode indices changing, with the resulting "false" diffs. Making tnode
indices semi-mutable may or may not be feasible or desirable. 

My present thinking is that we might be able to get rid of all, (or almost all) indices. Exactly what the
file format becomes is a secondary issue. For sure it won't be compatible with old versions of Leo. 

Again, the idea is to identify nodes by headline, with a disambiguating attribute field in those rare cases
where non-cloned nodes have identical headlines. We could even associate vnodes with tnodes (and
vice versa) this way. 

But these are details. As always, the main problems are: 

1. Clones links (maybe solved in this case, but maybe not in general) and 

2. The multiple-update problem that CVS addresses but which also may occur outside the CVS context
and which have, in the past, been the source of the dreaded read errors. Perhaps I should list read
errors themselves as major problems in their own right. 

3. Reducing false diffs in CVS. 

4. Handling CVS conflicts, or better yet, eliminating them completely. 

Anyway, I wouldn't take what I say in this posting too seriously, or worry too much about details. There
is so much to handle that even broad-brush ideas will probably not be too close to the final solutions.
Maybe the only idea to take from this theme is that Leo's XML format is, after years of stasis, possibly
open to major change. Obviously, I won't do this unless the new scheme solves major outstanding
problems. Secondary issues like compactness are, to me, completely irrelevant. 

Finally, it should be obvious that nothing is going to happen on this matter for quit a while. </t>
<t tx="ekr.20050422055636.8">By: edream ( Edward K. Ream ) 
Summary of themes 1-3 
2002-10-21 08:56 
What underlies these first 3 themes is that the user experience of Leo would be almost completely unchanged! The only difference is that Leo would load template .leo files much more quickly, and the messages about loading derived files would happen only on demand. New user options would control all the “controversial” parts of these proposals. 

I’m excited about these new directions. It’s possible that there will be gotchas that can’t be resolved. I’ve tried some of these ideas before without success. However, I think it is more likely that gti’s form a firm foundation for new implementations. 

The key to all of this is minimizing gti conflicts, and handling them properly when they happen. If this can’t be done without heroic decisions from the user the whole scheme fails. It may be possible, it may not be. By a “heroic” decision I mean a decision at the time the conflict is announced requiring information that will not be present until after the decision was made. Some of the old error recovery schemes required such heroic decisions, which is why they didn’t work. 

By: dthein ( Dave Hein ) 
RE: Summary of themes 1-3 
2002-10-23 11:41 
I like having the marks and clone links only in the .leo file. Of course, this does require having a unique
identifier for the tnode. 

I agree that some sort of reorg is required to address performance issues (both CVS diff times and load/save
times). 

As you note elsewhere, it is worth calling these ideas out and mulling them over for a while. Eventually, an
elegant solution will arise :-)</t>
<t tx="ekr.20050422055636.9">By: edream ( Edward K. Ream ) 
Glorious unification &amp; leo.py 4.0 
2002-10-24 00:51 
Today’s breakthrough ranks among the very most important events in Leo’s history. Let me explain why true immutable gti’s are so important. 

Clones are the real complicating factor in Leo’s implementation. In particular, it has always been a major problem associating cloned nodes in .leo files with cloned nodes in derived files. The present mirroring scheme is a relatively good way, maybe the only good way, of linking nodes between .leo file and derived files. However, the mirroring scheme essentially requires that information be duplicated in .leo files. Read errors result when the mirroring scheme breaks down. 

True, immutable gti’s solve all these problems in a way never before possible. We can dispense with writing @file nodes in .leo files! This in turn means that read errors are eliminated! At a stroke, .leo files become much, much smaller. For the first time we can represent the entire Linux kernel in a compact .leo file. Indeed, the file would have approximately one @file node per kernel file, plus a few other nodes for our own local use. 

And there is more, much more. Because true, immutable gti’s simplify everything, it is possible to consider even
more improvement. Sitting in my bath (how decadent, a shower followed by a bath) I realized that it is now possible to create an entirely new organization for derived files. This new organization takes full advantage of the new opportunities, and in turn creates even more opportunities. 

The immediate impetus for my thoughts was the realization that the new gti’s will not, by themselves, remove all
“spurious” diffs reported by CVS. Indeed, even with immutable gti’s, moving a node in the outline will change the child indices of all siblings that follow the moved node. And it’s not so easy to see how to represent child indices in such a way that they mostly remain the same when nodes moved. 

I first considered replacing child indices with back and next gti’s, but this would make the derived file even more
cluttered: gti’s are going to be pretty ugly. Then the Aha hit: instead of representing the structure of the outline in
“pieces”, using @node and @body sentinels, we can represent the entire outline structure in one place, just as is
done in the .leo file itself. This is now completely safe, because gti’s provide bullet-proof links. 

Indeed, the structure of a derived file (say in 4.x of leo.py) will mirror the structure of .leo files almost exactly. Like this: 

@+leo sentinel 
@…other leading sentinels 
@+vnodes 
…the XML-like representation of vnodes, just as in the .leo file, in comments, on many comment lines 
@-vnodes (the end of the xml representation of vnodes) 
code 

Note that the representation of vnodes will be exactly like the “real” XML in the .leo file, except that the XML will all be contained in the comment delimiters in effect for the derived file. So this isn’t “real” XML, it is “XML in comments.” However, it would be very easy for code to strip off the comments to get at the embedded XML. 

We no longer need @+-node sentinels sprinkled throughout the text! Indeed, within the actual code, we need only represent the tnode, something like this: 

@tnode gti=name:time 

As an added nicety, we can put the tnode’s headline (the headline of any vnode that contains the tnode) in a
comment on the very next line, like this: 

@tnode gti=name:time 
# &lt;&lt; the headline for the reference &gt;&gt; 

This will look much nicer than the present way; they won’t be surrounded by @node sentinels, so they will look much cleaner. Of course, these sentinels will, as at present, be properly indented. 

And as a further nicety, the tnode sentinel can indicate whether the expansion of the reference was followed by text, like this: 

@tnode nonewline=1, gti=name:time 
# &lt;&lt; the headline for the reference &gt;&gt; 
@+body 
the body 
@-body 
whatever follows the reference, if nonewline is1 

This scheme solves a number of nagging or hard problems. 

1. It is much easier to generate the nodes of the outline because it is done in a single spot, and like the &lt;vnodes&gt;
element of .leo files we can recreate the outline structure without using any vnode indices at all. When the structure of the outline changes, only lines of the @vnodes sentinel area will change. That is, the @vnodes area in the derived file will change if and _only_ if the outline structure changes. Spurious diff’s will be completely eliminated, and moving nodes will probably not result in CVS conflicts. 

2. The sentinels embedded within code no longer carry any structure information; that is all carried in the @vnodes area. So the visual appearance of the text area is simplified. Much more importantly, tnodes will never change, so CVS conflicts involving @tnode lines are impossible. Conflicts will still happen if two people change the text of a headline “simultaneously”, but CVS will then mess with the line _following_ the @tnode sentinel, not the sentinel itself. 

3. While I am messing with the format of derived files, it would be a good idea to solve the problem of how to
represent text that follows a references. The present code ends up inserting spurious newlines to represent the end of a section reference, and some people have complained about those extra newlines. As shown above, we can represent text following a reference with the nonewline “attribute”. 

4. The similarity between the structure of .leo files and derived files suggests that other XML fields might be placed at the start of derived files. For example, we could have an @+-preferences sentinels corresponding to the real XML &lt;preferences&gt; element in the .leo file. This would be the natural place to carry per-file preferences such a font. 

5. Leo can recreate marks and clone links very easily, without “polluting” the derived file. That is, we do _not_ want to represent marks in a derived file, because to do so would cause essentially spurious diffs. Rather, Leo will represent marks in a new &lt;marks&gt; element in the .leo file. Now that gti’s are solid, this is completely safe. Similarly, gti’s create rock-solid “join links”. That is, when reading a derived file Leo can be absolutely sure that two nodes are joined if and _only_ if they have the same gti. It is then trivial to recreate clone marks and links. Two nodes v1 and v2 are cloned if and only if they have the same gti and shouldBeCloned(v1,v2) returns true. 

This last point is of supreme importance. It means that cloned nodes can be carried in a .leo file _without_ having to mirror structure. If and when we ever read a derived file, we can reconstitute the clone links (and marks) with _complete_ reliability! We no longer need redundant mirroring information in .leo files! This in turn makes it possible to use a single “template” .leo file to represent the entire Linux kernel, or all the entire Linux distribution, for that matter. All we need is a way of deferring the loading of derived files until they are wanted (until the user selects an @@file node) 

I called this posting “glorious unification” for several reasons. 

[1] There is no longer any doubt about how to associate nodes in .leo files and derived files. At last we can join nodes in such a way that they can never be parted. As has already been explained, this hugely simplifies the
implementation, and eliminates the need for redundant information in .leo files. .leo files can be very small! 

[2] The unification of file formats (that is, the intimate association of XML elements in the .leo file with similar
sentinels in the derived files) not only solves many lingering problems, but makes derived files much more elegant and good looking. Moreover, tools that apply to .leo files can now easily be adapted to apply to derived files. And the unification of formats suggests new areas in which information can be shared between .leo files and derived files. 

[3] It is now almost certain that we can include one .leo file inside another using a @include x.leo node. This was always iffy before, but with gti’s it becomes safe to do. Indeed, because mirroring is no longer required, we no longer need to worry about the various .leo files becoming out-of-synch! Whatever the included .leo file contains, the including .leo file will handle it with ease. No longer must we worry about error recovery when the mirroring scheme shatters. 

This is, indeed, a great day for Leo. 

Edward </t>
<t tx="ekr.20050422060227">By: edream ( Edward K. Ream ) 
Embedded XML != XML 
2002-10-24 13:54 

For languages without single-line comments, like HTML, the "embedded" XML within a derived file must
escape the ending comment delimiter used to create sentinel lines. Whatever escape convention used, it won't
be compatible with XML. So we can parse the embedded XML until that escape convention is undone. 

This issues does not arise for languages that have a single-line comment: that is, we convert from embedded
XML to "real" XML simply be stripping off the leading comment delimiter. 

Yes, this is a nit, and an important nit. </t>
<t tx="ekr.20050422060227.1">By: edream ( Edward K. Ream ) 
Embedded XML escapes 
2002-10-27 10:10 
Some scheme must be found to escape the ending block comment delimiter within embedded XML.
This would only be needed for languages without a single-line comment and without its own escape
convention. 

One general-purpose escape convention would be to define an escape sequence that "goes away", say
&amp;none. Suppose the ending block delimiter is ch1...chn. We escape this as &amp;nonech1,...&amp;nonechn.
We must also escape &amp;none to &amp;&amp;nonenone. To unescape such escaped text we simply delete all
instances of &amp;none. I believe this will work as long as the ending block comment delimiter of no
language is in fact &amp;none. 

For example, if HTML (or XML) did not already have its own escape convention, we would escape XML
embedded in HTML (or XML) by replacing all --&gt; strings by &amp;none-&amp;none-&amp;none&gt;. 

However, HTML does have an escape convention, so instead of using the above scheme we should
replace --&gt; by --&amp;gt. Perhaps this whole issue will not arise. HTML is one of the very few languages
without a single-line comment. But if there is, in fact, a language without a single-line comment, and
without its own escape convention, the &amp;none convention could be used. 

Hmm. I don't like the fact that under this scheme &amp;none can't be the ending comment delimiter of the
language being escaped. Perhaps a more general way can be imagined. The problem appears hard. For
example, the HTML/XLM escape convention works by escaping &amp;, &lt; and &gt;. The convention "complete";
there are no "holes" like the &amp;none hole in the scheme I have just proposed. But completeness is
possible precisely because the special characters in HTML/XML are known beforehand. 
</t>
<t tx="ekr.20050422060227.2">By: edream ( Edward K. Ream ) 
Embedded XML escapes: second thoughts 
2002-10-29 10:23 
The more I think of it, the less I like the &amp;none scheme. It's ugly, it doesn't work in general, it
probably isn't needed, and there is a better, simpler way. 

1. It's ugly: I mean this in the mathematical sense of something overly complex and not elegant. 

2. It doesn't work in general: Escaping &amp;none results in &amp;&amp;nonenone, so this scheme fails if the
ending comment delimiter is &amp;&amp;. 

3. It probably isn't needed: the only languages Leo knows about at present that don't have
single-line comments are HTML and XML, and both of these have a perfectly good escape
mechanism. 

4. There is a better way: we simply need to require that the "specification" for each language w/o
single-line comments include an escape mechanism to be used when generating embedded XML. 

I'm not saying definitely that I'll never try to come up with a general escaping mechanism, but
there seems to be no need to use this ugly scheme at present. </t>
<t tx="ekr.20050422065602.1">@nocolor


- 
- </t>
<t tx="ekr.20050422065602.2">There are two parts to this question:

1. Prior to using @file nodes, Leo used an exceedingly complex Untangle
algorithm to update an outline based on changes to files derived from @root.
Moreover, the user had to apply the Untangle command explictly.

@file trees greatly simplified Leo's read logic and eliminated the need for
explicit Untangling.

2. Leo 3.x versions had a not-very-reliable error recovery scheme that could, in
fact, corrupt the meaning of outlines. The big aha in 4.0 was that it would be
possible to eliminate such schemes entirely.

Leo 4.0 also introduced a new way of treating newlines surrounding sentinel
lines. The @ws sentinel is a key element in this scheme. As a result, the
atFile.read logic was dramatically simplified.</t>
<t tx="ekr.20050422065602.3">There are actually two issues:

1. How can Leo ensure that .leo files and derived files remain in-synch? In
other words, how can Leo ensure that out-of-synch derived files do not destroy
the integrity of projects?

2. How can we be sure that using clones is safe, especially if clones exist in
multiple derived files?

Both of these issues were vexing: they was discussed in various forms on Leo's
forums for several years.
</t>
<t tx="ekr.20050422065602.4">There are two parts of this question:

1.  How should Leo represent clones internally.

All versions of Leo prior to Leo 4.2 represented clones by copying vnodes so that vnodes represent (one for one) the headlines on the screen.  This scheme had major problems:

A: Somne outline operations slowed down considerably as the size of the outline grew.  Indeed, Leo had to scan the entire outline in order to move nodes!  BTW, even the great MORE outliner suffered from this defect.

1.  How should Leo represent cloned nodes in derived files.

In other words, how could we reliably associate nodes in derived files with nodes in outlines.

All 3.x versions of Leo used "child indices" to associate nodes in derived files with nodes in outlines.  Such scheme created many problems for cvs.</t>
<t tx="ekr.20050422065602.5"></t>
<t tx="ekr.20050422065602.7"></t>
<t tx="ekr.20050422065602.8"></t>
<t tx="ekr.20050422065602.9"></t>
<t tx="ekr.20050422071739">Leo 4.0 final                     October 17, 2003

More than a year in the making, Leo 4.0 is finally here.

Highlights of 4.0 final:
-----------------------

** Improved and simplified format of derived files.
	- Eliminated child indices, extraneous blank lines and @body sentinels.
	- Eliminated @node sentinels that indicate outline structure.
	- New @nl and @nonl sentinels indicate where newlines are and aren't.
	- These changes will largely eliminate unwanted cvs conflicts.

** Greatly improved error handling.
	- Reading derived files _never_ alter outline structure or links.
	- Read errors leave the outline completely unchanged.
	- Broken clone links are gone forever.
	- As a result, 4.0 is much safer than all previous versions.

** Full compatibility will previous versions of Leo.
	- Leo 4.0 reads all derived files properly, regardless of version.
	- Leo writes new-format derived files by default, and this default may be changed.
	- New commands in the read/write menu allow you to explicitly specify the format of derived files.

* New commands:
	- Write 3.x Derived File and Write 4.x Derived File.
	- Import Derived File.
	- Clear Recent Files.

* Dozens of other improvements, including:
	- Better Unicode support.
	- New configuration settings.
	- Several new plugins.</t>
<t tx="ekr.20050422071739.1">Leo 4.1 Final              February 20, 2004

Leo 4.1 Final is the culmination of four months of work. No significant bugs
have been reported since 4.1 rc4. Several people have contributed nifty plugins
recently. See leoPlugins.leo for full details.

The highlights of Leo 4.1:

- Leo runs in batch mode when invoked with --script aScriptFile.py
- Leo supports Unicode characters (e.g. Chinese) in path and file names.
- @directives and section references are now valid when executing scripts.
- @ignored and orphan nodes now valid in @file-nosent trees.
- Script-based find/change commands.
- Check Outline command.
- Hoist &amp; DeHoist commands.
- A new gui-agnostic architecture: useful for batch mode and unit tests.
- Several new configuration settings.
- Many new unit tests.
- Excellent new plugins.
- A host of bug fixes.</t>
<t tx="ekr.20050422071739.2">Leo 4.2 Final       September 20, 2004

The highlights of Leo 4.2:

- @thin trees make Leo much more friendly to cvs. Files derived from @thin can
be committed to cvs and updated from cvs without having to commit or update the
corresponding .leo file. There is no longer any need to keep .leo files and
derived files in synch.

- Leo's data structures have been reorganized. As a result, all outline
operations are much faster. To support this organization, scripts that traverse
Leo's data structures must now use a positions rather than vnodes. Old scripts
that appear to use vnodes will still work because methods like c.currentVnode
that appear to return vnodes actually return positions.

- A new mod_scripting plugin is a big advance in scripting and testing.
test.leo now uses @test and @script nodes to define unit tests without
explicitly creating subclasses of unittest.TestCase. Converting scripts to unit
tests now takes a few seconds!

- A much faster and more robust spell checker plugin. (requires Python 2.3)

- Leo is now much more friendly to using spaces instead of tabs.

- The Execute Script command reports erroneous lines more clearly.</t>
<t tx="ekr.20050422071828"></t>
<t tx="ekr.20050425053621">@nocolor</t>
<t tx="ekr.20050425053635"></t>
<t tx="ekr.20050425060514">A recurring theme in these discussions is eliminating unwanted copies of positions.

The problem was stated in: 2004-02-29 shared tnode design.doc

The problem was solved in various places in the code, and discussed in:

2004-03-05 iterators make positions safe.doc
2004-03-07 positions can be compatible.doc
2004-03-14 Small code--big aha.doc

and finally solved in:

2004-03-19 The taste of dog food.doc *** (eliminating positions)</t>
<t tx="ekr.20050425060514.1">@nocolor

- The Users Guide still refers to the synchronization principle.

    - In fact, the key is eliminating error recovery in the read logic!</t>
<t tx="ekr.20050425064819">@nocolor

This was the so-called "New Leo" (Aka Leo2) discussed in the History of Leo.

It's defining feature was @file (The "old" Leo had only @root).</t>
<t tx="ekr.20050425064819.1">The following sections give a pseudo-chronological list of the major Aha's involved in creating Leo2. These Aha's form the real design and theory of operation of Leo. See the "Diary", "Notes" and "Letters to Speed Ream" sections in LeoDocs.leo for a more accurate and less tidy history of Leo2.

I am writing these notes for several reasons.  First, the initial design and coding of Leo2, spanning a period of about 8 weeks, was some of the most creative and rewarding work I have ever done. The result is elegant and simple.  I'm proud of it.  Second, much of the design work is not reflected in the code, because improved design often eliminated code entirely. The final code is so elegant that it obscures the hard work that created it.  Third, you must understand this design in order to understand the implementation of @file trees and their derived files.  Someday someone else may take charge of Leo. That person should know what really makes Leo2 work.</t>
<t tx="ekr.20050425064819.10">At first I thought we could make sure that the .leo file always correctly mirrors all derived file, but disastrous experience showed that is a completely false hope. Indeed, backup .leo files will almost never mirror derived file correctly. So it became urgent to find a completely fool-proof error recovery scheme.

I had known for quite a while that error recovery should work "as if" the mirroring nodes were deleted, then recreated afresh. Several failed attempts at an error recovery scheme convinced me that error recovery would actually have to delete all dummy nodes and then do a complete reread. This is what Leo2 does.

But erasing dummy nodes would destroy any orphan and ignored nodes--by definition such nodes appear nowhere in the derived file. Therefore, I had to enforce the rule that @file nodes should contain no such nodes. Here is an email I wrote to my brother, Speed Ream discussing what turned out to be the penultimate error recovery scheme:

"The error recovery saga continues. After much pondering and some trial coding I have changed my mind about orphans and @ignored nodes. They simply should never appear as descendants of @file nodes. Fortunately, this simplifies all aspects of Leo2.
Leo2 will issue a warning (not an error) if an orphan or @ignored node appears as the descendant of an @file node when a .leo file is being saved. If any warnings occur while writing the derived file, Leo2 will write the "offending" @file tree to the .leo file instead of the derived file. This has several advantages:

1.	The user gets warned about orphan nodes. These are useful warnings! Orphan nodes arise from missing @others directives or missing section references.

2. The user doesn't have to change anything immediately in order to save an outline. This is very important. Besides warnings about orphans, Leo2 will also warn about undefined or unreferenced sections. User's shouldn't have to fix these warnings to do a Save!

3. No errors or alerts will occur during Reading or Writing, so the user's anxiety level goes way down. At worst, some informational message will be sent to the log. The user will never have to make important decisions during Loads or Saves. [At last the dubious distinction between errors and warnings disappears.]

4. Error recovery can be bullet-proof. Simple code will guarantee that after any read operation the structure of an @file node will match the structure of the derived file. Also, sentinels in derived files will now account for all children of an @file node. There are no more "missing nodes" that must be filled in using the .leo file. Finally, error recovery will never change the @file tree in any way: no more "recovered nodes" nodes.

5. The present read code can be used almost unchanged. The only addition is the posting of a warning if the structure of the .leo file does not match the structure of the derived file. We need a warning because non-essential attribute of nodes (like user marks) may be altered."

This ends the original history of Leo2. In fact, it took quite a while before Leo recovered properly from all errors. I finally saw that .leo files should duplicate all information in derived files. This allows a .leo file to be used a single backup file and allows maximal error recovery in all situations.  It took several months to stamp out several subtle bugs involving clones that caused spurious read errors. Such errors undermine confidence in Leo and can cause disastrous reversions. See my diary entries for January 2002 in leo.py for details.

</t>
<t tx="ekr.20050425064819.11"></t>
<t tx="ekr.20050425064819.2">In the summer of 2001 I began work on a project that for a long time I had considered impossible.  I had long considered that "private" file formats such as .leo files were the only way to represent an outline properly and safely.  I'm not sure exactly what changed my mind, but I finally was willing to consider that information embedded in derived files might be useful.  This meant accepting the possibility that sentinel lines might be corrupted.  This was a crucial first step.  If we can trust the user not to corrupt sentinel lines than we can embed almost any kind of information into a derived file.

There were several motivations for this work.  I wanted to eliminate the need for explicit Tangle and Untangle commands. I thought of this as "Untangle on Read/Tangle on Write."  If tangling and untangling could be made automatic it would save the user a lot of work.  I also wanted to make derived files the primary sources files.  .leo files might be made much smaller derived files contained the primary source information. This hope turned out to be false.

The result of this design work was something I originally called Leo2, though I now usually prefer to talk about @file trees.  Initially most design issues were unresolved or unknown. I resolved to attempt a robust error-recovery scheme, not knowing in advance what that might involve. I also wanted to solve what I thought of as the "cross-file clone" problem: clones that point from a .leo outline into a derived file. With Leo1 cross-file clones do not exist; everything is in the same .leo file. It was clear that Leo2 would have to change some aspects of clones, but all details were fuzzy.
</t>
<t tx="ekr.20050425064819.3">The next step was also crucial.  I started to use Leo1 as a prototype to design what the new body pane would look like to the user. In retrospect, using Leo1 as a prototype for Leo2 was just as inspired as using MORE as a prototype for Leo1.  Both prototypes marked the true beginning of their respective projects.  The Leo2 prototype was a mockup in Python of the code for reading and writing derived files. The file LeoDocs.leo contain these first prototype nodes.

Writing the prototype got me thinking about improving noweb.  With my experience with Leo1, I was able to create a new markup language that took advantage of outline structure.  I called the new language  "simplified noweb", though that terminology is obsolete.  I created @file nodes to distinguish between the old and new ways of creating derived files.  In Leo1, the @code directive is simply an abbreviation for a section definition line.  Simplified noweb used @c as an abbreviation for @code.  More importantly, simplified noweb used @c to separate doc parts from code parts without necessarily specifying a section name.  It quickly became apparent that most nodes could be unnamed.  All I needed was the @others directive to specify the location for all such unnamed nodes.

From the start, simplified noweb was a joy to use. Indeed, the @others directive could replace all section definition lines.  Furthermore, I could make @doc directive optional if the body pane started in "code mode".  But this meant that plain body text could become a "literate" program! This was an amazing discovery.  These Aha's got me excited about Leo2. This was important, as it motivated me to do a lot of difficult design work.</t>
<t tx="ekr.20050425064819.4">In spite of this excitement, I was uneasy. After much "daydreaming" I realized that I was afraid that reading and writing derived files would be interrupted by a long series of alerts. I saw that designing the "user interaction" during reading and writing would be very important. The next Aha was that I could replace a long series of alerts with messages to the log window, followed by a single "summary" alert. Much later I saw how to eliminate alerts entirely.

At this time I thought there would be two kinds of "errors" while reading derived files. Warnings would alert the user that something non-serious had happened. True errors would alert the user that data might have been lost. Indeed, if Leo2 saves orphan and ignored nodes in a .leo file under an @file node, then read errors could endanger such nodes. Much later I saw that a robust error recovery scheme demands that @file nodes not contain orphan and @ignored nodes. (More on this subject later.) But if orphan and @ignored nodes are moved out of @file trees, there are no read errors that can cause data loss! So the distinction between warnings and errors finally went away.</t>
<t tx="ekr.20050425064819.5">I next turned my attention to writing @file nodes.  A huge Aha: I realized that sentinel lines must contain both a leading and a trailing newline.  The general principle is this: the write code must contain absolutely no "conditional" logic, because otherwise the read code could not figure out whether the condition should be true or false.  So derived files contain blank lines between sentinel lines. These "extra" newlines are very useful, because the read (untangle) code can now easily determine exactly where every blank, tab and newline of the derived file came from.  It would be hard to overstate how important this simplifying principle was in practice.

Much later, with urging from a customer, I realized that the write code could safely remove "extra" newlines between sentinels with a caching scheme in the low level atFile::os() routine. This scheme does not alter the body of the write code in any way: in effect, sentinels still contain leading and trailing "logical" newlines. The read code had to be modified to handle "missing" leading newlines, but this can always be done assuming that sentinels still contain logical leading and trailing newlines!

At about this time I designed a clever way of having the write code tell the read code which newlines were inserted in doc parts. (The whole point of doc parts is to have the write code format long comments by splitting long lines.) To quote from my diary:

"We can use the following convention to determine where putDocPart has inserted line breaks: A line in a doc part is followed by an inserted newline if and only if the newline is preceded by whitespace. This is a really elegant convention, and is essentially invisible to the user.

Tangle outputs words until the line would become too long, and then it inserts a newline. To preserve all whitespace, tangle always includes the whitespace that terminates a word on the same line as the word itself. Therefore, split lines always end in whitespace. To make this convention work, tangle only has to delete the trailing whitespace of all lines that are followed by a 'real' newline."

</t>
<t tx="ekr.20050425064819.6">After the write code was working I turned my attention to the read (untangle) code.  Leo's Untangle command  is the most complex and difficult code I have ever written. Imagine my surprise when I realized that the Leo2 read code is essentially trivial! Indeed, the Leo2 untangle code is like an assembler. The read code scans lines of a derived files looking for "opcodes", that is, sentinel lines, and executes some simple code for each separate opcode. The heart of this code is the scanText routine in atFile.cpp.

The read code was written and debugged in less than two days! It is the most elegant code I have ever written. While perfecting the read code I realized that sentinel lines should show the complete nesting structure found in the outline, even if this information seems redundant. For example, I was tempted to use a single sentinel to represent an @other directive, but finally abandoned this plan in favor of the @+other and @-other sentinels.

This redundancy greatly simplified the read code and made the structure of derived files absolutely clear. Moreover, it turned out that we need, in general, all the information created by the present sentinel lines. In short, sentinels are as simple as they can be, and no simpler.

The atFile::createNthChild method is a very important: it ensures that nodes will be correctly inserted into the outline. createNthChild must be bullet-proof if the Read code is to be robust. Note that the write code outputs @node sentinels, that is, section definitions, in the order in which sections are referenced in the outline, not the order in which sections appear in the outline. So createNthChild must insert the n'th node of parent p properly even if p contains fewer than n-1 children! The write code ensures that section references are properly nested: @node sentinels are enclosed in @node sentinels for all their ancestors in the @file tree. createNthChild creates dummy siblings as needed, then replaces the dummy siblings later when their actual definitions, that is, @node sentinels, are encountered.

At this point the fundamental read/write code was complete. I found three minor bugs in the code over the next week or so, but it was clear that the read/write code formed a rock-solid base from which to continue design and implementation. This was an entirely unexpected surprise.</t>
<t tx="ekr.20050425064819.7">At this point I could read and write derived files "by hand", using temporary Read and Write commands. The next step was to integrate the reading and writing of derived files with the loading and saving of .leo files.  From time to time I made minor changes to the drivers for the read/write code to accommodate the Load and Save code, but at no time did I significantly alter the read or write code itself.

The user interaction of the Load and Save commands drove the design and implementation of the load/store code. The most important questions were: "what do we tell the user?", and "what does the user do with the information?" It turns out that the user can't make any complex decision during error recovery because the user doesn't have nearly enough information to make an informed choice. In turn, this means that certain kinds of error recovery schemes are out of the question...</t>
<t tx="ekr.20050425064819.8">I now turned my attention to "attributes" of nodes.  Most attributes, like user marks, are non-essential. However, clone information is essential; we must never lose clone links. At this time I had a preliminary design for cross-file clones that involved a two part "pointer" consisting of a full path name and an immutable clone index within the derived file. Eventually such pointers completely disappeared, but the immutable clone indices remain.

My first thought was that it would be good to store all attributes in @node sentinels in the derived file, but experience showed that would be irritating. Indeed, one wants Leo2 to rewrite derived files only if something essential has changed. For example, one doesn't want to rewrite the derived file just because a different node as been selected.

At this point I had another Aha: we can use the .leo file to store all non-essential attributes. For example, this means that the .leo file, not the derived files, will change if we select a new node. In effect, the .leo file mirrors the derived file. The only reason to store nodes in the .leo file under an @file node is to carry these attributes, so Leo2 wrote dummy nodes that do not reference body text.  Much later I saw that dummy nodes were dangerous and that .leo files should contain all information found in derived files.</t>
<t tx="ekr.20050425064819.9">The concept of mirroring created a huge breakthrough with cross-file clones: Here is an excerpt of an email i sent to my brother Speed:

"I realized this morning that since a .leo file contains dummy vnodes for all nodes in a derived file, those dummy nodes can carry clone info! I changed one line to make sure that the write code always writes clone info in dummy vnodes and voila! Cross-file clones worked!"

All of Leo1's clone code could be used completely unchanged. Everything "just works".</t>
<t tx="ekr.20071028032354"></t>
</tnodes>
</leo_file>