Most analytical data processing systems need to perform quite similar tasks, such as data ingestion, data storage, data transformation, data querying, and data visualization. This is a pipeline of services to build an analytical system.

- this data could be taken from devices that are (for example) measuring environmental information like temperature, and pressure
- it could be from point of sales devices that are recording items being purchased by customers in a supermarket
- it could be recording financial data where we are collecting the movement of money between bank accounts
- it could be a weather data that coming from a weather station
- some of the data can also come from your OLTP system or your relational databases from on premises.
So to process and analyze this data, we must first store the data in some repository, this repository could be a file store, a document database, or could be a relational database so then we can transform and process the data.
- perform cleaning operation (data cleansing) and data standardization
- perform data aggregations such as calculating profits, calculating the margin, or any other key performance metrics or KPIs that are used to evaluate how the business is doing in terms of growth and performance against the industry or in the market
- newly arrived data elements are collected into a group and the whole group is processed at a future time as a batch
- based on a scheduled time interval (ex: every hour, or some trigger)
- the advantage is it can process a large volume of data at a convenient time
- some of the disadvantages, there’s often a time delay between ingesting the data and getting the results
- handles and processes the data in real-time
- there’s no waiting and beneficial for scenarios where new dynamic data is being generated on a continuous basis
There are many database management systems that provide the tools to enable you to perform ad-hoc queries against your data and generate regular reports.
Once we have the results from our queries, we probably want to visualize them and presented them in a visual way, so this is when the data visualization stage happens.
- generate charts, like bar charts, and line charts, and plotting the data onto geographical maps if it’s geographical data
- illustrate how data changes over time and pick out the trends in your data