Main visual representing CrackSense's interview about Data Engineering and Process Optimization

Spotlight on: Data Engineering and Process Optimization Up Close

EXPERT IN THE SPOTLIGHT: Teodora Savic Popovic is a professional data engineer, PhD student in applied mathematics at the University of Novi Sad, Serbia, and an enthusiastic cyclist. She is profiled in the domain of telecommunication and supply chain data management and machine learning and she offers insights into the critical role of data engineering and its impact on modern businesses.

TOPIC IN THE SPOTLIGHT: In the world of data-driven decision-making, the role of data engineers is ever increasing. These professionals are the architects behind the scenes, responsible for constructing the frameworks that process, manage, and organize vast amounts of data. Their expertise lies in transforming raw data into meaningful insights, thereby enabling businesses to make informed decisions. In this interview we explore intricacies, challenges, and the captivating stories behind the scenes of data engineering. 

With a diverse expertise portfolio, our guest today, Teodora Savic Popovic, will share intriguing anecdotes and provide the most recent insights in the data engineering and data management field.

1. To begin with, it is a great pleasure to have you as a guest in CrackSense “Spotlight on” session. Please, provide us with a short summary of your background, work, and participation in the data engineering sector.

To summarize things shortly: I work in the IT industry as a data engineer, with experience ranging from telecommunications to supply chain in the food production industry, also I am a PhD student in Applied Mathematics at the University of Novi Sad in Serbia.

2. From the perspective of a data engineer, what are the crucial points to address at the very beginning of the project, particularly in the context of data accessibility and technology evaluation for business reports?
Like in every project the expectations must be transparent, and the deadlines and milestones must be well defined. Technology constraints must be highlighted to clients, but without going to too much with technical details. For example, in one project that I have been involved in, clients and business analysts expected to have a live streaming of data, but all architecture was based on daily refresh rate! So, the goals of the project had to be changed on the way, which resulted in a little bit of disappointment on the clients and business analysts’ side.
CrackSense Spotlight on interview 2 Teodora Savic Popovic
source: freepik.com
3. Could you share an engaging anecdote or experience from your work in supply chain data management regarding the duplicate data?

Of course! So, like I have mentioned in our small talk, to have a well-defined data flow is essential, and well-defined data flow certainly does not tolerate duplicates. Anyway, at my desk the following task was dropped: find why the table that marks load delivery has duplicates. And really, for one driver we had duplicate deliveries for the same load. Of course, I have made a lineage of data from its entering to data staging area until it’s export to the DWH and business reports. And I could find neither one logical mistake. 

In the call with the client, I described him the issue. And after some time thinking about it, he shouted: “Wait a minute! Read me again the driver’s ID and name. I think I know what is in between! I know who the driver is – the old Joe is in his 70ties, for surely he was practicing how to use the application in which he must mark the load delivery, so probably while exploring the application, he entered the same load twice!” So, anyway, sometimes the unclean data could tell a real-life story of its contributors.

4. How would you describe the intersection between data engineering and client relationships in your field?
Building strong client relationships is paramount. Collaborating closely with business analysts, managers and clients helps to deliver the KPIs that are essential for insightful reports. Also, trust and responsibility must exist between us all, that is not only essential for the current project, but also opens the door for new projects and deals.
Agricultural technology farmer man using tablet computer analyzing data and morning image icon.
source: freepik.com
5. Across different media today, data science and ML have become strong buzzwords and seem to promise very disruptive changes across different sectors, including agriculture. CrackSense, for instance, promises to utilise the latest advancements in AI and ML to mitigate the risk of fruit cracking. Could you elaborate on the significance of clean data in the realm of ML and its relevance to businesses?

Well, the data science, AI and ML really can be found in every soup today. But, just to make it clear, there is no efficient machine learning without clean and reliable data. You must have clean data on which you want to test your machine learning model, you want to avoid the scenario: garbage in, garbage out. And to do so, you must have well-structured data, and well-structured data comes with good data preparations and engineering. I will put it this way: First on the plate comes data engineering, then for the desert we have machine learning.

First on the plate comes data engineering, then for the desert we have machine learning.

6. What are some types of data issues that pose challenges for both data engineers and business analysts?

Well, the data issues manifest in various forms. But let’s just mention a few of them. For example, discrepancies arising from faulty ETL (Extract, Transform, Load) logic pose significant challenges for both data engineers and business analysts. Also, erroneous data due to user input errors adds complexity to data processing (have in mind the old Joe’s scenario with his duplicates loads).

Final thoughts 

So, at the end of the story we can conclude the following – for a successful project, it is essential to have a good collaboration between all players involved in it, to focus on data preparation and cleaning, to build efficient data pipelines that are based on business logic to deliver reliable reports. In CrackSense, consortium partners with overlapping but distinguished areas of expertise join forces and explore the possibilities of utilizing different data collection and sensory tools as well as differentiating an efficient, effective and insightful data pipeline systems to combat fruit cracking.