syncdiff

GSoC-2014-Ideas

Table of Content

GSoC 2014 - Ideas Page

--== Google Summer of Code 2014 - Ideas Page ==--


About:

SyncDiff(erent) is a state-full file synchronizer, with ideas shamelessly stolen from rsync, git, csync2, unison and probably a variety of other places. Two of the major design goals were to lower the time it takes to determine what has changed on disk (particularly in comparison to rsync), and to allow allow for multi-way synchronization using a star topology, and accepting eventual consistency.

Why is this important?

There are two primary targets where a state-full file synchronizer would be useful: data mirroring and cloud environments. More or less situations where a clustered or cloud file systems isn't appropriate, but where keeping data in sync is important.

What we are looking forward to in GSoC 2014

SyncDiff(erent) is a very young project, and small project, currently comprising one primary developer and a number of alpha testers which gives students an opportunity to get in on the ground floor so to speak. There is a lot of work that can be done, as the basic framework is what is currently in place. It is hoped, and looked for, that students who work on SyncDiff(erent) would be interested in working on the project long term.

We are looking to only take on two students this year, both because there is likely to be a lot of flux in the code base over the summer, but that there aren't that many folks intimately familiar with the code base yet to help support the students.

How to find us

The best way to find us is on IRC (Internet Relay Chat), and we hang out in #syncdiff@irc.freenode.net

A web based IRC client can be found at http://webchat.freenode.net/?nick=syncdiffGSoC.&channels=syncdiff&uio=d4

We've also got a mailing list at http://syncdiff.org/mailman/listinfo/syncdiff

A list of mentors can be found at on our 2014 Mentors

Ideas


HTTP Accessor

Mentors: John 'Warthog9' Hawley, Greg Lund-Chaix

Difficulty: Medium

Right now SyncDiff(erent) currently transfers files using a slightly custom protocol over TCP. This project would add an intermediate CGI that would allow a client to communicate with the server running via a web server. The communication protocol involves JSON.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs
  • Basic understanding of TCP/IP
  • Knowledge of Wireshark, and ability to run it
  • Good understanding of CGI interfaces
  • Understanding of Apache and Nginx (targeted web servers, knowledge of others beneficial)

This may require extension or modification to the current protocol, and students should be willing to make suggestions and recommendations, if that is required to meet the end goal.

SSH Transport

Mentors: John 'Warthog9' Hawley, Greg Lund-Chaix

Difficulty: Medium

Right now SyncDiff(erent) currently transfers files using a slightly custom protocol over TCP. This project would allow for that stream to happen over an authenticated SSH connection, as opposed to the straight TCP connection, similar to scp, sftp, rsync, etc. The communication protocol involves JSON.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs
  • Basic understanding of TCP/IP
  • Knowledge of Wireshark, and ability to run it
  • Good understanding of ssh

This may require extension or modification to the current protocol, and students should be willing to make suggestions and recommendations, if that is required to meet the end goal.

File collision avoidance

Mentors: John 'Warthog9' Hawley, Greg Lund-Chaix

Difficulty: Easy-Medium

SyncDiff(erent) retains enough information about its own current state, and the state of it's remote syncs to be able to detect if the file has been changed in both places. It is desirable to properly detect this, and take action based on user configurable options. The simplest case would be to detect the conflict, and declare one copy the "good" copy, and create a secondary file .conflict.hostname and then let the new conflict file resync back outwards, eventually letting the user determine which file should actually win.

It would be preferable if the resolution mechanism was done as a module of sorts so that new policies could be written.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs

Automatic Detection and notification of file changes

Mentors: John 'Warthog9' Hawley, Greg Lund-Chaix

Difficulty: Medium

SyncDiff(erent) retains a lot of information about it's own current state, and one thing needing to be done is the integration of a live file watcher using inotify type events. This would run as an independent process, and allow for the notification of of remotes that files have changed, so they can schedule a synchronization.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs
  • Understanding of inotify
  • JSON

Soft Delete Support

Mentors: John 'Warthog9' Hawley, Greg Lund-Chaix

Difficulty: Medium

One thing that would be nice to implement in the way that SyncDiff(erent) synchronizes data is a "soft" delete. Basically take the file, and move it to a pre-ignored folder by SyncDiff(erent) so that the file is now missing from the normal hierarchy, after a time (configurable), the file can be fully deleted from the file system. This solves a couple of problems:

  • transient errors on the server side
  • accidental file deletions
  • file disappears and reappears

What should happen is the file is moved to the "hidden" area, and the space is not reclaimed. Should a file matching the same hash reappear, the file can be copied into the new location, or on the case of the file completely coming back the hashed file can be moved back into the original location. This can speed up transfers considerably, and shorten oops times considerably, however at the cost of some additional disk space on the client side.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs
  • File systems knowledge

This may require extension or modification to the current protocol, and students should be willing to make suggestions and recommendations, if that is required to meet the end goal.

TLS implementation

Mentors: John 'Warthog9' Hawley, TBD

Difficulty: Hard

Right now SyncDiff(erent) currently transfers files using a slightly custom protocol over TCP. This project would add TLS to the communications channel to encrypt all traffic being transferred. The communication protocol involves JSON.

Requirements:

  • Perl
  • Understanding of IPCs and Multi-processed programs
  • Basic understanding of TCP/IP
  • Knowledge of Wireshark, and ability to run it
  • Understanding of TLS communications, including how revocation certificates work

This may require extension or modification to the current protocol, and students should be willing to make suggestions and recommendations, if that is required to meet the end goal.

How things work / What do I do now / How do I proceed:

So a lot of students this may be their first year participating, or you may never have worked with us, so lets break this down:

  • February 25th - March 10th: Students are encouraged to start looking at the project, join the IRC channel, talk on the mailing list and generally interact with us. This will help you get a feel for who we are, and if you want to work with us. This is a great time to ask about the ideas we have up too, and to start thinking about what you might want to work on.

  • March 10th - March 21st: Students should be working on their applications. This is a collaborative process, we want you to ask questions, get feedback, make revisions and make as good a proposal as possible. Holing up and not talking to us will make your proposal suffer.

  • March 21st - April 6th: This is where our organization is going to differ from some others. Students will be asked to sit down with us for a 1hour long IRC based interview during this time. We will discuss your proposal, ask you questions and there will be a coding exercise or two. Students shouldn't feel the need to cram, or worry much about this, this is mostly a time to chat with the student and for questions to be traded. This interview is, however, required by our organization.

  • March 21st - April 21st: This time period in general is for more discussion to be happening with the students and organization. Submit bugs and patches, talk with the mentors. A general hint, lobbing the application at us and disappearing will generally count against you. Stick around, and keep the conversation going that you've already started.

  • April 2st: Notifications go out to students about acceptance. At which point the summer really starts!