VOOZH about

URL: https://thenewstack.io/is-this-the-end-of-data-refactoring/

⇱ Is This the End of Data Refactoring? - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2022-11-18 07:52:27
Is This the End of Data Refactoring?
contributed,sponsor-neo4j,sponsored,sponsored-post-contributed,
Data

Is This the End of Data Refactoring?

For “extract/transform/load,” are you transforming your application’s data for the purposes of your applications or the requirements of your database?
Nov 18th, 2022 7:52am by Jennifer Reif
👁 Featued image for: Is This the End of Data Refactoring?
Image via Pixabay.
Neo4j sponsored this post.

In one of my early coding projects, I remember going through requirements meetings, hashing out a data model, drawing up a database design and submitting it to a DBA team for review and approval. There were numerous back-and-forth communications on naming, data types and structure conventions. Weeks later, the tables were created in the development environment so that I could ingest test data, and build and test the code against it.

When requirements change, or when stakeholders’ understanding of the data model evolve, or test data iterations produce different results, or scope-creep get into the project, we would start the data model —> database design —> DBA review/approval —> development creation process all over again.

A project like this isn’t a one-off. Typically, a lot of project hours are burned during each iteration of development, production, as well as maintenance and enhancement. By reducing or eliminating the translation step between a data model and database design, you can dramatically improve time to market and maintenance costs down the road. That’s something I wish I could’ve done before.

What Is Data Refactoring?

The roots of data refactoring likely point back to code refactoring applied to computer programs. As Martin Fowler defines refactoring in his book by that title, it’s “a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.”

But how does refactoring apply to data and databases?

Code refactoring is often done after an initial draft, or when improvements or features need to be made to the code. Any of these changes can affect a project before or after its initial implementation, in either case making code cleaner, more efficient and more maintainable. Some refactorings may also occur because of changes to the data structure supporting the code. Such changes often take place due to a deeper understanding of the data model, as well as additions and subtractions to the collected data.

Refactoring takes place in both the data and data model as well. While code refactoring can happen without data model refactoring, code is nearly always affected by changes to the underlying data model. Data refactoring becomes the source of refactors for the model, storage system and code. Each piece needs careful planning and thorough understanding, which adds considerable project resources.

Neo4j is the company behind the world’s principal graph database, helping organizations make sense of their data by revealing how people, processes, and digital systems are interrelated.  Our focus on connections drives new business value from AI, real-time analysis, and anomaly detection.
Learn More
The latest from Neo4j

We see positive benefits from refactoring through better data and code. So isn’t the time spent well worth it? What’s the trouble here?

Looking at the data refactoring component of a project, there are several tasks that accompany it, including:

  • Making changes to the data model to fit the new case, feature, etc.
  • Aligning the data model with the storage technology’s format, such as relational, graph or other. Each type of data storage format comes with its own set of rules, so we often end up making further changes to the data to accommodate the required structure.

This “translation” step between data model and data storage steps is what can be cumbersome, especially when it’s accompanied by change approvals from project managers for properly aligning data structures between the real-world configuration of data and the structured format of the database. Graph databases can shorten or cut the translation phase because they more naturally model data as it exists in the real world.

Data Refactoring for Databases

In a previous article, we used data sets for a coffee shop with sales receipts, products and customers. We can use this same data set to look at data refactoring, but in this instance, focus on a different section of the data: stores and their staff assignments. All of the data, change scripts and more in these examples may be downloaded from this GitHub repository.

Our example data set is in Tables 1 and 2 below. Shop locations include data about the location type, size, address details, phone number and assigned manager. The staff table contains names, positions, start date and assigned shop location for each associate.

👁 Image

Table 1: The shop location table

👁 Image

Table 2: The staff table

A graph representation of this data could look something like Figure 1 below.

👁 Image

Figure 1: Graph representation of shop locations and staff

In this graph model, we have two main entities (nodes): Shop and Staff. The relationships between these nodes tell us how they are connected. Either a staff member is assigned to work at a location or a shop is managed by a particular staff member.

Next, let’s look at how a few specific refactorings would affect each of these models.

Refactoring 1: Adding a New Column

Storing additional information is a common change among database projects. In our coffee shop case, we might want to also track the open date for a shop location (how long a location has been in operation). For a relational format, the change process would involve adding a new column to the table. This likely also entails:

  • Explaining the change to stakeholders.
  • Writing and executing a data definition language (DDL) statement to alter the table structure, or dropping the entire table and rebuilding it with the new column in the DDL.
  • Adding the new data to the population set.
  • Finally, importing the actual data to the table.

Between certain of these tasks, additional change approval steps may be required. This change does not affect the staff table, so no changes would be required there. The result of this refactoring would look like Table 3.

👁 Image

Table 3: Refactor No. 1 for shop location table

Meanwhile, for a graph format, we would need to add a new property to the Shop node. Similar to our relational process above, we would need to explain the change to stakeholders and get any necessary change approvals. However, dropping the data structure, making the addition and setting the new data structure are all eliminated with graph methodology because there is no strict DDL. With graph, the structure is not forced on the data. Rather, the data itself determines the structure and can be adapted when the information represented by the data shifts, as depicted in Figure 2.

👁 Image

Figure 2: Refactor No. 1 for graph

Example scripts for both relational and graph processes are included in our code repository on GitHub.

Refactoring 2: Adding a New Table and Relationship

Next, suppose we have issues with staffing coverage at our locations. We want to retain employee addresses to help us determine who might be able to cover a shift at another location.

While we could store employee addresses directly in the Staff table, addresses are more likely to change than other data, and we might want to keep staff’s personal details separate from their business information. We can create a separate table to store addresses, which means creating a foreign key relationship between the staff row and related address with a new column, as well as statements for the new table structure and data insert.

👁 Image

Table 4: Adding foreign key column address_id to Staff table

👁 Image

Table 5: New staff_address table

For graph, we would need to add the data for a new StaffAddress node and relationship to Staff nodes, as shown in Figure 3. Existing data would not be affected, so we would not need to alter Staff entities in the database.

👁 Image

Figure 3. Refactor No. 2 for graph

Refactoring 3: Adding Data to Existing Tables

For our third and final refactoring, business is booming, and we might want to add newly hired staff members to a new shop location.

With the relational structure, creating a new staff member in the table means adding a shop location assignment to make the row complete. We would likely need some sort of dependency rule (constraint) that ensures we could not insert a value in the Staff table’s location column if it does not exist in the Shop_Location table. If we want to add the new staff member’s address to the Staff_Address table, we’d need to set up the same guardrails in that table, as well. This means any new staff member assigned to a new location requires us to first create the location, and then the staff member, then their address. Doing these steps in the wrong order would result in errors.

👁 Image

Table 6: Shop_Location table with new row

👁 Image

Table 7: Staff table with 2 new staff

👁 Image

Table 8: Staff_Address table with 2 new addresses

For our graph version in Figure 4, we simply need to add the new data. Structure remains the same, and existing data is not affected.

👁 Image

Figure 4. Refactor No. 3 for graph

Graphs Reduce Data Refactoring

You’ve seen how data refactoring affects both relational and graph databases. Relational databases require a more intensive process for making changes because of the separation of table structure and actual data. By contrast, graphs remove the extra translation step between real-world data and database structure because they more naturally model data as it exists in the real world.

This example might have seemed so small-scale as to appear trivial compared to the real world. Yet what happens when you have dozens of shops, thousands of staff members and hundreds of thousands of delivery addresses? Business-critical systems discourage teams from making changes due to the labor involved and potential impacts to their organizations’ workflow. Graphs keep the data and model we already have and alter only what has changed.

Graph refactoring allows businesses to be adaptable and agile, giving them the power to morph as the industry or data around them changes. Moving existing projects to graphs can reduce time to market right now, while improving future maintainability, risk mitigation and additional feature development.

Where to Begin

Neo4j is the company behind the world’s principal graph database, helping organizations make sense of their data by revealing how people, processes, and digital systems are interrelated.  Our focus on connections drives new business value from AI, real-time analysis, and anomaly detection.
Learn More
The latest from Neo4j
TRENDING STORIES
Jennifer Reif is a developer relations engineer at Neo4j. Her MS degree is in computer management information systems, and is recognizable to many as a prolific public speaker and blogger, appearing in CodeProject and Foojay.io. An avid developer and problem-solver,...
Read more from Jennifer Reif
Neo4j sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.