The lost art of XML — mmagueta

Kissaki@programming.dev · 1 day ago

The lost art of XML — mmagueta

AnitaAmandaHuginskis@lemmy.world · edit-2 24 hours ago

I love XML, when it is properly utilized. Which, in most cases, it is not, unfortunately.

JSON > CSV though, I fucking hate CSV. I do not get the appeal. “It’s easy to handle” – NO, it is not. It’s the “fuck whoever needs to handle this” of file “formats”.

JSON is a reasonable middle ground, I’ll give you that

unique_hemp@discuss.tchncs.de · 18 hours ago

CSV >>> JSON when dealing with large tabular data:

Can be parsed row by row
Does not repeat column names, more complicated (so slower) to parse

1 can be solved with JSONL, but 2 is unavoidable.

entwine@programming.dev · edit-2 2 hours ago

{
    "columns": ["id", "name", "age"],
    "rows": [
        [1, "bob", 44], [2, "alice", 7], ...
    ]
}

There ya go, problem solved without the unparseable ambiguity of CSV

Please stop using CSV.

flying_sheep@lemmy.ml · 7 hours ago

No:

CSV isn’t good for anything unless you exactly specify the dialect. CSV is unstandardized, so you can’t parse arbitrary CSV files correctly.
you don’t have to serialize tables to JSON in the “list of named records” format

Just user Zarr or so for array data. A table with more than 200 rows isn’t ”human readable” anyway.

abruptly8951@lemmy.world · 13 hours ago

Yes…but compression

And with csv you just gotta pray that you’re parser parses the same as their writer…and that their writer was correctly implemented…and they set the settings correctly

unique_hemp@discuss.tchncs.de · 8 hours ago

Compression adds another layer of complexity for parsing.

JSON can also have configuration mismatch problems. Main one that comes to mind is case (in)sensitivity for keys.

abruptly8951@lemmy.world · 7 hours ago

Nahh your nitpicking there, large csvs are gonna be compressed anyways

In practice I’ve never met a Json I cant parse, every second csv is unparseable

thingsiplay@lemmy.ml · 20 hours ago

Biggest problem is, CSV is not a standardized format like JSON. For very simple cases it could be used as a database like format. But it depends on the parser and that’s not ideal.

flying_sheep@lemmy.ml · 7 hours ago

Exactly. I’ve seen so much data destroyed silently deep in some bioinformatics pipeline due to this that I’ve just become an anti CSV advocate.

Use literally anything else that doesn’t need out of band “I’m using this dialect” information that has to match to prevent data loss.