The purpose of the Edflow project is to investigate models, algorithms, and propose an architecture of a system helping scientists to organize and make the most out of their data. The research work spans over three related, yet distinct areas, among which we expect it to build bridges: workflow modeling; database execution and optimization; and information visualization. Milestones of the work can be specified as follows:
An appropriate process model dialect has been identified, modeling user-dataset interactions in scientific applications. The model will allow for the usual process composition primitives (in the style of BPML or the Workflow Coalition Model) but extended to allow the user to inspect, interrupt, resume, and guide the process evolution for all or part of the active instances, with little effort, at any point. This interactivity is crucial for the success of real-life data processing applications. Therefore, the design of the process model will strive to capture, to the extent possible, the various interactions that rich data manipulations interfaces allow.
A prototype platform called Ediflow has been developed, supporting processes described in the above model. The platform is built in Java, and it has been deployed on top of the Oracle database management system. It integrates visualization and computation units.
Performance and optimizations will be studied based on these applications. Blending powerful querying and visualization on large data volumes poses several performance challenges, ranging from internal system issues such as access path selection (where ”fast” means of the order of seconds or minutes) to issues of performance at the level of user interface (where a response time of several seconds is considered slow!) The purpose in this stage is to study the ways in which the data storage layer can best serve the needs of a workflow engine driven by frequent, extensive user interaction. Caching, replication, vertical fragmentation are promising avenues.