Quality’s is a product that cover the gap that we have now in the market related with tools for intake data quality. Our focus is to provide to the customer a powerful tools that can monitor the quality of their data in real time. Quality’s is a system based in rules and analytic patterns, where the user can define the rules that he can apply to their data at the moment it arrives.
Quality has defined multiples rules like this:
- Cell based rules: based on analysing each data by itself, detecting nulls, regular expression rules, finding their completeness against a source of truth, and more.
- Unicity rules: focused on detected if there is duplicated data arriving at the system
- Timeliness rules: focused on detecting if the data arriving is still valid in time and their lifespan is not expired
- Static Value Distribution: rules based on detecting if the histogram or distribution of some key indicators of the data has variated more than some threshold detected
- Dynamic Value Distribution: rules based on measure ad-hoc metrics at each arrival and compare them based on time-series analysis in order to detect anomalies.
- Quality allows to create rules at each quality dimension: veracity, validity, timeliness, unicity, completeness, timeliness and more.
- Ad-hoc rules created by users in order to cover other quality rules not presented before.
Quality is designed to work with your real time ingestion system and batch data load. It is an easy and not invasive mechanism to integrated with your ingestion flow, your Quality Department define the Quality Point, the Data Source and the Rules. The application of quality rules to your ingestion processes is done with only two lines of code. With only that, Quality can apply the rules to your data and collect all the quality metrics of your data.
Quality allows to configure in one single line of code a flow-control mechanism, ir order to stop, warn or inform in the case of errors arriving.
At the moment we support Spark and Flink the most important frameworks in data processing on the Big Data Ecosystem.