The Price of Data

Going Odoo Episode 4

If you prefer audio, click above to listen to the podcast. If you prefer a written format, take a look at the show notes below.


Data buzzwords are becoming more common than ever. It's a struggle to find a business article that doesn't talk about the rise of machine learning and AI, or the growing importance of big data. But for ERPs, data is more than just a buzzword, it's at the heart of how they work. So today I want to look at what data you really need and how to get it.
It comes to the surprise of a lot of clients just how expensive data can get, and how easily it can set back the roll out. Odoo themselves have stated that if you want every bit of transactional data imported, it can make up over 50% of your implementation cost. For this reason, it's advised to pay attention to data from the start.

My Approach

The way I do this is with the infamous 6 W's.: Who, What, When, Where, Why, How. Interrogating the data like this will help you decide what data is needed, and the approach to be taken.


The first port of call is the what. What data are we looking at? How many records are there? Is it every facet of that data, or are we just importing a summary or a subset of the data? For instance for inventory we summarise by importing the stock on hand at changeover date, not every historic stock move, while for purchase orders we may take a subset only import those that we are still waiting on delivery, but not the completed orders.


Next is where. Where is your data coming from? Is there a single source of truth, or do we need to marry up multiple data sets? You may have customers in your sale system, and in your accounting system, but no link between them. Every time more than one system has the same information, time and complexity can rapidly rise.


It's time to get critical with the why. Why are we importing this data? 
When starting a new project, I break data into three categories:

Functional Data: Data that is required to make the system run moving forward. This can be entity data, you can't run an online store without products or transactional data, because if you're cutting off access to the old system you need to continue current work. These are generally the non-negotiables.

Legacy Data: Data that is required to be accessible, but only in certain cases. This may having copies of old delivery receipts if a vendor requests it, or accounting records for government compliance, or even just old sales for you staff to be able to look through to know a customers regular orders.

Analytical Data: Data that is there purely for measuring and improving business performance. While many times this crosses over with legacy data, I define this a data for for internal use only. So if you've tracked the time breakdown of employees on a project, but it doesn't change the billing and the customer can't get the information, it's analytical data.

This is the first time we really get critical and decide whether data needs to be imported. Often we can just export the data from your old system and just refer that whenever it's needed. All too often we can spend hours getting that exported data into the system and it will only ever be accessed a handful of times. In these cases we would have been better off taking the small time loss from searching through the exports and investing our time in other parts of the system.


With the data categorised, we can now ask when. Simply put, functional data is required before going live, and legacy and analytical data generally aren't. While it would be beneficial for some of this data to be in beforehand, it's import to be critical, and decide what data is worth pushing back the project for in case it can't get done. Again, categorising each dataset will give everyone a clearer understanding: is it required for go-live, preferred before go-live and after go-live. 
In most cases this data is able to be imported to the system just as easily after go-live. So there's no loss from waiting.

Who and How?

And lastly we have who and how. How are we accessing, cleaning and inputting the data, and who is responsible for each of these tasks? Of all the questions, these have the most variations. 

First step is getting the data out. Depending on where that the data is located, you may be able to do a simple export to excel, we may be able to use an API, or you may have to request a backup of your data from the provider.

Next is cleaning the data. Often other systems will not have the all of the same information as Odoo in the same format. This is especially true for product template and product variants. Some system will have the same parent-child relationship, while others will have no relationship, so we need to create one.

And lastly is how are we inputting the data. This could be through importing an excel file, programmatically with an API, or through manual input. While many people shy away from manual inputting data, there are cases where it's actually a really good option. My favourite is manufacturing companies with a small number of bills of materials. Instead of trying to clean up all of the references to different products and units of measure, as part of training I sit down with anyone who will be adding or configuring bills of materials in the future and we put them in by hand. Now not only do we have the data, staff are well versed in that part of the system and can ask any questions they need.


By taking the time to answer these questions will help ensure only the right data is imported, and lower the number of surprises that can pop up later in the project. Because let's face it, no one want's to spend 50% of their budget on data.