VOOZH about

URL: https://www.javacodegeeks.com/2016/10/neo4j-detecting-rogue-spaces-csv-headers-load-csv.html

⇱ Neo4j: Detecting rogue spaces in CSV headers with LOAD CSV - Java Code Geeks


Last week I was helping someone load the data from a CSV file into Neo4j and we were having trouble filtering out rows which contained a null value in one of the columns.

This is what the data looked like:

load csv with headers from "file:///foo.csv" as row
RETURN row
╒══════════════════════════════════╕
│row │
╞══════════════════════════════════╡
│{key1: a, key2: (null), key3: c}│
├──────────────────────────────────┤
│{key1: d, key2: e, key3: f} │
└──────────────────────────────────┘

We’d like to filter out any rows which have ‘key2’ as null, so let’s tweak our query to do that:

load csv with headers from "file:///foo.csv" as row
WITH row WHERE NOT row.key2 is null
RETURN row
(no rows)

Hmmm that’s odd, it’s got rid of both rows. We’d expect to see the 2nd row since that doesn’t have a null value.

At this point we might suspect that what we’re seeing on the screen isn’t actually what the data looks like. Let’s write the following query to check our header values:

load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, SIZE(key)
╒═════╤═════════╕
│key │SIZE(key)│
╞═════╪═════════╡
│key1 │4 │
├─────┼─────────┤
│ key2│5 │
├─────┼─────────┤
│ key3│5 │
└─────┴─────────┘

The second column tells us that there are some extra characters in the columns for ‘key2’ and ‘key3’ or rather ‘ key2’ and ‘ key3’. In this case they are spaces, but it could easily be another character:

load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, replace(key, " ", "_SPACE_") AS spaces
╒═════╤═══════════╕
│key │spaces │
╞═════╪═══════════╡
│key1 │key1 │
├─────┼───────────┤
│ key2│_SPACE_key2│
├─────┼───────────┤
│ key3│_SPACE_key3│
└─────┴───────────┘

If we clean up our CSV file and try again everything works as expected:

load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, SIZE(key)
╒════╤═════════╕
│key │SIZE(key)│
╞════╪═════════╡
│key1│4 │
├────┼─────────┤
│key2│4 │
├────┼─────────┤
│key3│4 │
└────┴─────────┘
load csv with headers from "file:///foo.csv" as row
WITH row WHERE NOT row.key2 is null
RETURN row
╒═══════════════════════════╕
│row │
╞═══════════════════════════╡
│{key1: d, key2: e, key3: f}│
└───────────────────────────┘
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
I agree to the Terms and Privacy Policy

Thank you!

We will contact you soon.

Tags
Neo4j
👁 Photo of Mark Needham
Mark Needham
October 19th, 2016Last Updated: October 19th, 2016
0 107 1 minute read
Subscribe

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Back to top button
Close
wpDiscuz