syncdiff

GSoC-2014-Ideas

You are looking at an old revision of the page GSoC-2014-Ideas. This revision was created by John Hawley.

Table of Content

GSoC 2014 - Ideas Page

--== Google Summer of Code 2014 - Ideas Page ==--

About:

SyncDiff(erent) is a state-full file synchronizer, with ideas shamelessly stolen from rsync, git, csync2, unison and probably a variety of other places. Two of the major design goals were to lower the time it takes to determine what has changed on disk (particularly in comparison to rsync), and to allow allow for multi-way synchronization using a star topology, and accepting eventual consistency.

Why is this important? There are two primary targets where a state-full file synchronizer would be useful: data mirroring and cloud environments. More or less situations where a clustered or cloud file systems isn't appropriate, but where keeping data in sync is important.

=====================================================

SyncDiff(erent) is a very young project, and small project, currently comprising one primary developer and a number of alpha testers which gives students an opportunity to get in on the ground floor so to speak. There is a lot of work that can be done, as the basic framework is what is in place. It is hoped, and looked for, that students who work on SyncDiff(erent) would be interested in working on the project long term.

We are looking to only take on two students this year, both because there is likely to be a lot of flux in the code base over the summer, but that there aren't that many folks intimately familiar with the code base yet to help support the students.

Ideas

HTTP Accessor

Mentors: John 'Warthog9' Hawley, TBD

Difficulty: Medium

Right now SyncDiff(erent) currently transfers files using a slightly custom protocol over TCP. This project would add an intermediate CGI that would allow a client to communicate with the server running via a web server. The communication protocol involves JSON.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs
  • Basic understanding of TCP/IP
  • Knowledge of Wireshark, and ability to run it
  • Good understanding of CGI interfaces
  • Understanding of Apache and Nginx (targeted web servers, knowledge of others beneficial)

This may require extension or modification to the current protocol, and students should be willing to make suggestions and recommendations, if that is required to meet the end goal.

File collision avoidance

Mentors: John 'Warthog9' Hawley, TBD

Difficulty: Easy-Medium

SyncDiff(erent) retains enough information about it's own current state, and the state of it's remote syncs to be able to detect if the file has been changed in both places. It is desirable to properly detect this, and take action based on user configurable options. The simplest case would be to detect the conflict, and declare one copy the "good" copy, and create a secondary file .conflict.hostname and then let the new conflict file resync back outwards, eventually letting the user determine which file should actually win.

It would be preferable if the resolution mechanism was done as a module of sorts so that new policies could be written.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs

Automatic Detection and notification of file changes

Mentors: John 'Warthog9' Hawley, TBD

Difficulty: Medium

SyncDiff(erent) retains a lot of information about it's own current state, and one thing needing to be done is the integration of a live file watcher using inotify type events. This would run as an independent process, and allow for the notification of of remotes that files have changed, so they can schedule a synchronization.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs
  • Understanding of inotify
  • JSON

Soft Delete Support

Mentors: John 'Warthog9' Hawley, TBD

Difficulty: Medium

One thing that would be nice to implement in the way that SyncDiff(erent) synchronizes data is a "soft" delete. Basically take the file, and move it to a pre-ignored folder by SyncDiff(erent) so that the file is now missing from the normal hierarchy, after a time (configurable), the file can be fully deleted from the file system. This solves a couple of problems:

  • transient errors on the server side
  • accidental file deletions
  • file disappears and reappears

What should happen is the file is moved to the "hidden" area, and the space is not reclaimed. Should a file matching the same hash reappear, the file can be copied into the new location, or on the case of the file completely coming back the hashed file can be moved back into the original location. This can speed up transfers considerably, and shorten oops times considerably, however at the cost of some additional disk space on the client side.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs
  • File systems knowledge

This may require extension or modification to the current protocol, and students should be willing to make suggestions and recommendations, if that is required to meet the end goal.

TLS implementation

Mentors: John 'Warthog9' Hawley, TBD

Difficulty: Hard

Right now SyncDiff(erent) currently transfers files using a slightly custom protocol over TCP. This project would add TLS to the communications channel to encrypt all traffic being transferred. The communication protocol involves JSON.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs
  • Basic understanding of TCP/IP
  • Knowledge of Wireshark, and ability to run it
  • Understanding of TLS communications, including how revocation certificates work

This may require extension or modification to the current protocol, and students should be willing to make suggestions and recommendations, if that is required to meet the end goal.