Wednesday, July 20, 2011

So that's what happened to Wave

I tried Wave, was unimpressed, but felt like it would follow the classic hype curve. At any rate, I'm working on my Masters of Science in Information Science at the School of Information and Library Science. It's a wonderful program, though challenging with family + full time job + music gig. I'm holding it together (but procrastinating on some homework right now).

I am interested in collaborative search and sense-making from back in the day at RENCI when I was looking at InfoMesa, and recent research in my summer class has me thinking about this again. Many aspects of collaborative search and sense-making were captured in the original intent of Wave. I never bought that it was a new form of communication, or that it was revolutionary in that respect. What I still think is that it has some aspects of a platform for collaborative search and sense making. I especially like how it can combine time-shifted asynchronous activities with synchronous activities. What's most compelling is to look at what Wave was doing as an application platform. Collaborative search and knowledge discovery could be a great one for Wave.

I guess I've been busy working, because I'd hardly noticed that Wave has gone over to Apache as 'Wave in a Box'. While it appears to be in early incubator stage (they probably don't know what to do with it either), it bears watching.

Tuesday, July 19, 2011

Working on releasing Jargon-core

I'm working on a beta release of Jargon core, and all the fun involved in setting up the maven release plug-in for git and Nexus. Needless to say, I've made a few U-turns.

Here's my tip (I'm saving this for myself) on deleting the git tag to tackle some config errors:


Thanks Nathan!

Saturday, July 16, 2011

Recent Jargon Updates

There's a lot of activity on various types of interfaces, but I wanted to update folks on things at the API level, especially in the jargon-core API.

https://code.renci.org/gf/project/jargon/

This is late beta, we are working on moving to a schedule of releases, frankly, we need to work out maven and git and the maven release plug-in, there is a lot going on and we're trying to get that done.

Some highlights:

  • The big push has been to put a new public API out that is easier to use and maintain. I think this is shaping up (you can tell me whether that's so). There are plenty of capabilities that have never been exposed outside of the C API that are either in there, or planned.
  • There are lots of implementing and testing going on in several places, providing a nice amount of friction to help the API come along. As things settle in, we're starting to be able to shift perspective more to optimization. There are lots more places where buffering is implemented, and baby steps to looking at NIO. With the presence of the RENCI team, we're starting to stand up resources where we could start doing benchmarks to help guide optimization.
  • We will be working to pull the jargon-trunk and php code out of the main iRODS trunk into separate areas for the next release. The schedules are pretty tight, so that's still a tentative plan.
  • Several key community wish list items are under development, some things that may be of interest are below:

Configuration, setting options for operations, defaulting

With the 'clean code' practice of separating code from metadata in mind, there is a jargon.properties file now. This is where config info is being consolidated over time. The IRODSSession object is the place where expensive stuff like loading configuration properties, extensible metadata mappings, and such occurs. You can access the default jargon properties there, or override them. (So in Spring, you can wire in configured properties at startup for your web app, etc).

When you do transfers, you can call methods like below:

/**
* Put a file or a collection (recursively) to iRODS. This method allows
* registration of a TransferStatusCallbackListener that will
* provide callbacks about the status of the transfer suitable for progress
* monitoring
*
* @param sourceFile
* File with the source directory or file.
* @param targetIrodsFile
* {@link org.irods.jargon.core.pub.io.IRODSFile} with the target
* iRODS file or collection.
* @param transferControlBlock
* an optional
* {@link org.irods.jargon.core.transfer.TransferControlBlock}
* that provides a common object to communicate between the
* object requesting the transfer, and the method performing the
* transfer. This control block may contain a filter that can be
* used to control restarts, and provides a way for the
* requesting process to send a cancellation. This may be set to
* null if not required.
* @throws JargonException
*/
void putOperation(
final File sourceFile,
final IRODSFile targetIrodsFile,
final TransferStatusCallbackListener transferStatusCallbackListener,
final TransferControlBlock transferControlBlock)
throws JargonException;

I wanted to point out the 'transferStatusCallbackListener'. You can implement this interface, pass it to the method, and get callbacks like initiation of operation, file transfer update, completion of operation. We are working on also providing a callback for 'messages', like 'starting parallel transfer', or 'computing checksum' so those can start surfacing in UI like iDrop. There are plans soon to provide optional 'intra-file' callbacks, so you could throw up a progress bar for what's going on inside of a file transfer. Look for that soon. Of course, you can leave that null if you don't care.

The 'transferControlBlock' is important. It is the communication pipeline between the thing calling the transfer, and the transfer itself. You can peek at aggregate numbers, set a cancel flag, etc. The point is, there is now also a transferOptions that can be set. The transfer options can include things like 'no parallel transfers', or 'max transfer threads', or buffer sizes, or things like the recently added 'compute and verify checksum'. You can either set a transfer options in that transfer control block, or leave it alone and defaults will be computed based on the settings in the jargon properties. The checksum validation is in there now (a community request), and the rerouting of gets/puts to the owning resource server should be there soon. The checksum and checksum validation functions were just added

Parallel Transfer Pools

Another community request in the optimization department was to set up a thread pool for parallel transfer threads. This has been done in a first implementation, but turned off by default. This should be handy in mid-tier apps, where you might want to cap the number of threads. More testing needs to be done, is it fixed? What are the time-outs, etc.

GSS/Kerberos, etc

There are threads here about security, GSS, etc. This is a high-priority item from the user group meeting, and this is next on the list of 'core' development. Folks who are really facile with GSS, Kerberos and the like who have comments and insight are invited to give input. I think once this part is in jargon-core, we can go to a full-on 'release'. The code is pretty solid, much more so than the older API, we're seeing, and will continue to see, performance gains, and it's much easier to use in mid-tier situations, or if you are 'Spring'-happy. Please let me know your experiences, comments, criticisms, and keep an eye out!