The StreamX Platform: Powerful and Easy to Use


Capture: The data stream using your own system or the StreamX Data Recorder. The StreamX data recorder can capture multi-sensor data at up to 4.4 GB/s, eliminating the need have separate recorders for each sensor stream. Up to 256 TB of removable storage holds extended data collection sessions and then removed and shipped to the data center without having to disconnect the sensors.


Store: The incoming data needs to be incorporated in the distributed storage cluster and organized in a way that the StreamX tools can see the data. This is done with an import function that associates each of the streams in a multi-sensor dataset with a time-synchronized project. When using the StreamX data recorder, the data is already stored in projects and is immediately accessible.


Tag: Once the data is recognized by the system, the next step is to tag or annotate the data based on the content important to the application. The user provides one or more image or pattern recognition algorithms in the form of a small application that will run on the data and indicate when the content of interest is detected in a particular data frame. The StreamX platform uses those mini-applications to crawl through the data distributed across the cluster nodes in the background, mining the data in parallel for interesting content. When the content is found, the system tags all the timestamps of the data file where the content is located. Time stamps can contain many different tags to indicate multiple types of content or multiple versions of the detection algorithm.


Search: When a user wants to find specific sequences of data within any of the large multi-sensor data files, he uses a web-based interface to setup a parallel search. The search criteria are the user-defined tags setup during the tagging and mining phase, and the criteria can be as simple or complex as needed. The search runs in parallel on every data node in the cluster and returns a list of time sequences that matched the search criteria. The search results can be stored, partitioned, and managed for repeated testing and validation.


Use: Once the user has identified what sections of each data file are needed, he then uses a web-based interface to launch an application, such as a simulator, into the cluster to run in parallel on every node where any of the target data is located. The application is the same one he runs on a desktop or workstation without recoding. Often the application is large, complex, and proprietary, such as a simulator. The application results are collected and stored separately. The process can be saved so that the same test run can be performed the same way in the future automatically.


Automate: Every part of the process can be saved and automated to save time and provide reliable testing and validation. Complex searches can be saved and re-run after new datasets are imported. Parallel execution runs can be saved and re-run on updated sets of search results. Training runs can be set up with thousands or millions of examples to train a classifier. Simulators can be given slightly different input parameters automatically to vary the environment test conditions to create more robust testing and validation. The test automation system supports active feedback, so that the results from one simulation run can be used as input to the next.