Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoEditor Unable to Reconcile Simultaneous Changes #185

Open
ccotter opened this issue Jun 12, 2012 · 4 comments
Open

CoEditor Unable to Reconcile Simultaneous Changes #185

ccotter opened this issue Jun 12, 2012 · 4 comments
Assignees
Milestone

Comments

@ccotter
Copy link
Member

ccotter commented Jun 12, 2012

The CoEditor appears to work properly when clients atomically make changes, but when two clients make simultaneous changes, the operational transform algorithm destroys the HTML structure embedded in the raw character text. See the example below.

Initially:
  • Client 1 html: <br>
  • Client 2 html: <br>
  • Client 1 changes <br> into <b>a</b><br> and sends the following syncs to the server:
    • insert 2 >
    • insert 3 a
    • insert 4 <
    • insert 5 /
    • insert 6 b
    • insert 7 >
    • insert 8 <
    • insert 9 b
  • Client 2 changes <br> into s and sends the following syncs to the server:
    • delete 0
    • delete 0
    • delete 0
    • update 0 s
At this point:
  • Client 1 html: <b>a</b><br>
  • Client 2 html: s
After the operational transform:
  • Client 1 receives the following remote syncs:
    • insert 0 >
    • insert 1 a
    • insert 2 <
    • insert 3 /
    • insert 4 b
    • insert 5 >
    • insert 6 <
    • insert 7 b
  • Client 2 receives the following remote syncs:
    • delete 0
    • delete 0
    • delete 8
    • update s 8

The final HTML of both clients does in fact remain the same - however, it is malformed: >a</b><bs

The core issue is the operational transform is agnostic to information encoded in HTML tags (i.e. the OT only provides syntactical consistency, not semantic consistency). This is not an OT bug, but rather a design flaw in how the editor data is synced.
#167 is related in behavior, although the underlying problems are different and perhaps more severe for this issue.

@ccotter
Copy link
Member Author

ccotter commented Jun 12, 2012

Try to brainstorm some ideas here...

Solutions

  • We could treat the editor content as a DOM tree and not as a single sequence of characters. We could use the same method of syncing the tree as the CoTree demo.

Current Approach

I have been working for the past two days on a DOM tree based approach to representing and syncing the editor data. I've come up with a representation (likely to change somewhat) of the DOM tree that we can use to sync. The difficult part is doing a diff on two different DOM trees. I had not considered how to diff two different trees initially, and by far figuring out an efficient approach had taken up most of my time.

Tree Diffing

I could not find any useful existing libraries that could diff trees (in JavaScript), so I found some papers describing techniques for doing so. I have implemented an algorithm described by Chawathe et. al. [1], and have been testing it for correctness and designing a series of regression tests. The algorithm essentially takes two trees and finds an EditScript, or a sequence of operations (insert, update, move, and delete) to change one tree into another.

Currently, one part of the algorithm is unacceptably slow, but [1] and [2] offer solutions. Another caveat is that [1] makes some assumptions about the tree structure and data in order to guarantee to find a minimal EditScript - these assumptions do not hold for our editor content. Fortunately, [2] and others (not listed here) offer more efficient solutions that relax assumptions about the trees that are acceptable for the CoEditor.

Update June 17th 2012

I've created a branch (treeBasedEditor https://github.com/opencoweb/cowebx/tree/treeBasedEditor) with the code changes. In the current state, the coeditor has been converted to use the tree based approach, but there are some bugs that prevent syncing to work properly. In other words, I just haven't finish fully implementing everything, but the majority of the work is done, and with some tweaking the editor should be fully functional.

[1] http://ilpubs.stanford.edu:8090/115/1/1995-46.pdf
[2] http://www.ifi.uzh.ch/pax/uploads/pdf/publication/704/fluri-tse2007.pdf

@ccotter
Copy link
Member Author

ccotter commented Jul 31, 2012

I believe we can improve the raw HTML based editor to a state that eliminates the effects of the inherent issue discussed above.

  • Add a moderator to the editor to track changes to the rich text content.
  • Anytime the raw HTML becomes invalid, the moderator will revert back to a version that is valid HTML.
  • Anytime the raw HTML becomes invalid, browser clients will destroy their collaborative object, and create a new one, and will retrieve the valid copy from the moderator.

Since all clients will converge to the invalid HTML (guaranteed by the OT algorithm), eventually each will request a fresh copy from the moderator, who will always have a valid HTML string.

@ccotter
Copy link
Member Author

ccotter commented Aug 1, 2012

The coedit app uses a moderator to track the rich text editor widget changes. Now, I need to

  • Add logic to the browser javascript to check for inconsistent states and reinitialize the collab object to pull valid state from the moderator.
  • Implement a validator in the moderator to check if a string is valid html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant