1683: "Digital Data"

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

User avatar
Coyoty
Posts: 195
Joined: Wed Jun 06, 2012 5:56 pm UTC

Re: 1683: "Digital Data"

Postby Coyoty » Fri May 20, 2016 6:43 pm UTC

The most enduring format is the Akashic Records. The original cloud storage. Unfortunately, you need someone like Edgar Cayce to read them.

Yu_p
Posts: 51
Joined: Wed Oct 12, 2011 9:00 am UTC

Re: 1683: "Digital Data"

Postby Yu_p » Fri May 20, 2016 6:50 pm UTC

I experienced data degredation by user error myself with tex files which I edited both on Windows and on Linux, without knowing about text encodings and that they are not detected automatically.

Bye the way, can anyone thing of a file format suitable for long-term document storage? Postscript sounds fine, but as far as I know it lacks the ability to properly compress embedded pixel images, and features like internal links, which are among the features that make digital documents actually useful outside of archiving.

Tub
Posts: 475
Joined: Wed Jul 27, 2011 3:13 pm UTC

Re: 1683: "Digital Data"

Postby Tub » Fri May 20, 2016 7:17 pm UTC

jc wrote:I wouldn't be too trusting of PDF.[snip]

There's PDF/A. It's an iso standard. It cannot be changed retroactively, not even by Adobe. It is documented. I'm not aware of any required licenses or patents. It has multiple open source implementations. If you need to archive documents while preserving the exact layout (including fonts, images, vector data, ..), then PDF/A is currently the best there is. For simpler tasks, ascii, png/tiff or maybe simplified html will do.

User avatar
wisnij
Posts: 427
Joined: Mon Jun 26, 2006 5:03 pm UTC
Location: a planet called Erp
Contact:

Re: 1683: "Digital Data"

Postby wisnij » Fri May 20, 2016 7:37 pm UTC

Even worse is when the original content is moved or removed entirely, and old links simply stop working. This strikes me as particularly infuriating when it's a news article or something else historical you would reasonably be able to access well after the fact, rather than random memes or whatever.
I burn the cheese. It does not burn me.

User avatar
da Doctah
Posts: 996
Joined: Fri Feb 03, 2012 6:27 am UTC

Re: 1683: "Digital Data"

Postby da Doctah » Fri May 20, 2016 8:31 pm UTC

Flashback to the days when CDs were replacing vinyl as the medium of choice for music. Funny how nobody ever tried to argue that digital was better because it didn't degrade; the selling point was always that it was a more faithful reproduction of the sound.

An argument I never bought, because I realized that the human ear is an analog instrument.

User avatar
Pfhorrest
Posts: 5482
Joined: Fri Oct 30, 2009 6:11 am UTC
Contact:

Re: 1683: "Digital Data"

Postby Pfhorrest » Fri May 20, 2016 8:57 pm UTC

da Doctah wrote:the selling point was always that it was a more faithful reproduction of the sound.

An argument I never bought, because I realized that the human ear is an analog instrument.

The reproduced sound off a CD is analog too; the speaker vibrates in an analog fashion causing analog sound waves to travel to your ear.

And just because the recording medium is analogue, and you skip any ADC process between the sound waves of the original performance and the sound waves of your ear, doesn't mean that there's no loss in the analogue recording and playback.

With sufficiently high sampling rate and dynamic range, it's possible to digitally record and reproduce sound completely indistinguishable to the human ear from its original source.

That's also possible with sufficiently well-engineered analog recording equipment, but it's a lot harder to engineer that kind of quality in analog than it is to just capture and store more samples in a digital recording.

And then, back on topic, the digital recording stays exactly as it was forever, and doesn't slowly lose quality over use and time like any analog recording would.
Forrest Cameranesi, Geek of All Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
The Codex Quaerendae (my philosophy) - The Chronicles of Quelouva (my fiction)

User avatar
Keyman
Posts: 340
Joined: Thu Jun 19, 2014 1:56 pm UTC

Re: 1683: "Digital Data"

Postby Keyman » Fri May 20, 2016 9:23 pm UTC

Vinyl is back! I worked part time at a major bookseller and I knew they were serious about it when I saw we were selling these:
Spoiler:
Image

Also, most of the new releases also come with a 'code' that allows you to donwload a digital version to abuse on your favorite device.
Nothing could be more ill-judged than that intolerant spirit which has, at all times, characterized political parties. - A. Hamilton

Mikeski
Posts: 1113
Joined: Sun Jan 13, 2008 7:24 am UTC
Location: Minnesota, USA

Re: 1683: "Digital Data"

Postby Mikeski » Fri May 20, 2016 10:45 pm UTC

Keyman wrote:Vinyl is back!
Also, most of the new releases also come with a 'code' that allows you to donwload a digital version to abuse on your favorite device.

Ah, the worst of both worlds. A physical copy in a terrible mutant* analog format that degrades every time it's played, and whose data-holding surfaces aren't physically protected in any way. With a free non-physical copy (if it's still downloadable) that should be as close to flawless as the recording equipment allows, but is likely corrupted with a lossy compression algorithm, and expected to be played back on equipment with fidelity that really only loses out to ... vinyl playback equipment.

And people buy it. Marketing is an amazing force.

* - treble rolled off; bass frequencies rolled off and summed to mono; recording based entirely on physical movement, so having the playback equipment in audible range of the speakers causes microphonic effects; noise and wow/flutter specs that were obsolete 40 years ago; etc etc etc etc.

commodorejohn
Posts: 1200
Joined: Thu Dec 10, 2009 6:21 pm UTC
Location: Placerville, CA
Contact:

Re: 1683: "Digital Data"

Postby commodorejohn » Fri May 20, 2016 11:56 pm UTC

Or, you know, people actually prefer the sound of the technically inferior format because they find it aesthetically pleasing.

Nah. That's just crazy talk.
"'Legacy code' often differs from its suggested alternative by actually working and scaling."
- Bjarne Stroustrup
www.commodorejohn.com - in case you were wondering, which you probably weren't.

Mikeski
Posts: 1113
Joined: Sun Jan 13, 2008 7:24 am UTC
Location: Minnesota, USA

Re: 1683: "Digital Data"

Postby Mikeski » Sat May 21, 2016 12:08 am UTC

commodorejohn wrote:Or, you know, people actually prefer the sound of the technically inferior format because they find it aesthetically pleasing.

In a thread about preservation of data, I spoke of the preservation of data. I figured that would be assumed, but I guess some data got lost along the way. :mrgreen:

User avatar
Eebster the Great
Posts: 3485
Joined: Mon Nov 10, 2008 12:58 am UTC
Location: Cleveland, Ohio

Re: 1683: "Digital Data"

Postby Eebster the Great » Sat May 21, 2016 2:25 am UTC

HES wrote:
dennisw wrote:xeroxed

Another example for the brand/generic discussion in the Contrails thread. I had to look this up to know it meant "photocopy", since as far as I'm concerned "Xerox" is a just a brand name. Though it turns out the photocopier I'm currently sharing a room with is, in fact, a Xerox.

Some people in my county seem to have the opposite problem.

User avatar
jc
Posts: 356
Joined: Fri May 04, 2007 5:48 pm UTC
Location: Waltham, Massachusetts, USA, Earth, Solar System, Milky Way Galaxy
Contact:

Re: 1683: "Digital Data"

Postby jc » Sat May 21, 2016 3:43 am UTC

commodorejohn wrote:Or, you know, people actually prefer the sound of the technically inferior format because they find it aesthetically pleasing.

Nah. That's just crazy talk.


Heh. Reminds me of a fun set of tests of audio equipment that Consumer Reports published years ago. What they did was organize two sets of "expert listeners" to judge the equipment. One was made of professional sound engineers; the other was made of professional musicians. The results showed that, within each group, there was close agreement on the perceived quality of various sound equipment, but the two groups consistently disagreed with each other about which equipment sounded best.

The summary was that the sound engineers liked the equipment that did the best job of reproducing the original sound (which they hadn't actually heard; they only heard the recordings, but they knew enough to know when things were missing). The musicians, however, preferred the equipment that produced the clearest music, which in many cases meant cutting back on some parts of the sound produced by the instruments or the recording hall/studio.

Of course, many people think that all musicians are a bit crazy, including the musicians themselves. Why would anyone devote so much time to something that returns so little respect and so little money to its practitioners? But that's a different topic. Fact is that the musicians wanted to hear the music, and didn't particularly like the non-musical stuff that was part of an accurately-reproduced recording. So they preferred the equipment that (partially) suppressed the parts of the sound that they didn't want to hear.

User avatar
CatCube
Posts: 38
Joined: Wed Sep 21, 2011 5:28 pm UTC

Re: 1683: "Digital Data"

Postby CatCube » Sat May 21, 2016 3:47 am UTC

richP wrote:Before considering your (quite valid) data retention argument, has your office considered the workflow? All-digital needs to be more than just buying a big-ass scanner to slurp in old documents. You also need:

* big enough monitors to work with the necessary data.
* Big monitors and heavy-duty workstations where the drawings need to be used. Key example is on the prototype area of a small airplane manufacturer: they have a huge screen so they can pull up engineering drawings out where they're needed, not just upstairs at the engineer's desk.
...
* A way to usefully bring in all those paper docs for archival (even if that means re-creating in electronic versions)


The problem with "big enough monitors" is that almost everybody in our office needs to read drawings at some point or another. I personally wouldn't want a big enough monitor to read them on my desk, because I've got other things I want to have on the desktop. If the big monitors were on a common workstation, then I couldn't use my own desk and computer while reading drawings. The solution we tend to have is to print (reduced size) hard copies of the drawings--which are often discarded when no longer needed, because there's no place to keep the damn things.

The other thing is that electronic drawings aren't really convenient if you don't know exactly what you're looking for. I'm working with facilities that might be over a mile long in their longest dimension, and when I'm starting to gather information for a new project, it's nice to be able to just lazily flip through a book of drawings that roughly covers the area I'm working on and see what jumps out at me. Some of the most important things I've found have been while flipping to the stuff I was looking for. Also, when you're moving from drawing to drawing quickly, lag in opening the next drawing becomes irritating--especially when trying to work two drawings together.

As far as re-creating vector copies of old drawings, we do that from time to time, but these are for civil works projects from the '30s to the '60s so it's an extremely onerous task compared to just using the paper copies. Edit to add: there's also thousands of them.

jc wrote:I wouldn't be too trusting of PDF. I've gotten PDFs that get fatal errors from my readers and/or printers on the day they were created. Even when using software from Adobe, they're sometimes unreadable. PDF has gone through a number of major versions, and it's a proprietary format, so its owner (Adobe, or whoever buys them out, or whoever buys them out, or ...) can make undocumented, incompatible changes at any time. They can also withdraw legal permissions to use the format. Granted, the cat's probably out of the bag on this one, and there will be people with good information about the format for at least a few decades. But for archival storage, it's not in any sense an acceptable, reliable format.


The actual drawing copies are generally in .TIF format, the PDF is just used to bundle them together into logical groups. Raster formats in general are actually a poor way to hold engineering drawings, which are "inherently" vector. There's no such thing as shading or grayscale on an engineering drawing: there's either a line of a specified weight or there is not. That's why hatches and patterning are used extensively--no grayscale, just lines.

All this aside, no matter what the format is, if you don't maintain knowledge of the file format you lose the information in it. I think that old file formats will be forgotten quickly once they fall out of favor, and if there's no signal to convert the information in an old format to a newer format before that forgetting happens, you can have digital copies of information that you can't read. This leaves aside if you happen to have the data on an orphaned hardware system.

User avatar
Steve the Pocket
Posts: 707
Joined: Mon Apr 23, 2007 4:02 am UTC
Location: Going downtuuu in a Luleelurah!

Re: 1683: "Digital Data"

Postby Steve the Pocket » Sat May 21, 2016 3:53 am UTC

Justin Lardinois wrote:
suso wrote:Best alt text ever.

Its getting harder and harder to read alt text in newer browsers.


I'm going to be that pedantic asshole.

xkcd's popup postscripts are title text, not alt text, and those are two different things. Alt text is alternative text to be used if the image itself can't be shown; title text is additional information that's historically rendered as a tooltip when you hover over an element.

The name confusion comes from older versions of Internet Explorer (and maybe newer ones as well, I don't know) rendering the alt text as a tooltip if no title element was present. This is and was always wrong, but it definitely confused several generations of web developers about what alt text really was.

Oh right, forgot to mention: Now that Randall has officially referred to those things as "mouseover text", can we take that as a sign to adopt it as standard terminology from now on, at least on this forum? I've seen it described as at least four different things depending on who posts the new thread.
cephalopod9 wrote:Only on Xkcd can you start a topic involving Hitler and people spend the better part of half a dozen pages arguing about the quality of Operating Systems.

Baige.

ijuin
Posts: 1150
Joined: Fri Jan 09, 2009 6:02 pm UTC

Re: 1683: "Digital Data"

Postby ijuin » Sat May 21, 2016 4:06 am UTC

jc wrote:Paper is an interesting fallback, but people are missing the fact that in practice, paper is a very low-quality storage medium, and is much of the reason that we are so ignorant of so much of our history. Information that people actually decide they want to keep might be better kept in binary form (in as many archives as will accept them), with paper as an another-path sort of backup.


Cheapest-grade paper, such as newsprint or the old-style "pulp" magazines, may last eighty or ninety years at most if kept dry and out of sunlight. High-quality cotton or linen paper may last two to three centuries under similar conditions (barring pests). That is still fourfold less than parchment, and fiftyfold less than stuff engraved on clay or stone (5k years of more--writing literally has not existed for long enough for stone carvings to fade away if they are protected from the weather).

LockeZ
Posts: 51
Joined: Mon Jan 19, 2009 8:30 am UTC

Re: 1683: "Digital Data"

Postby LockeZ » Sat May 21, 2016 4:27 am UTC

Eebster the Great wrote:
LockeZ wrote:Here's some amazing irony for you. Even though the alt text is supposed to look like gibberish characters that your browser can't render correctly, sardia's browser apparently legitimately can't render the last fake unrenderable gibberish character.

Image

It's a C1 control code character called 'Device Control String', DCS, or U+0090. Its intended use was to introduce a string of 8-bit characters that would be passed as an instruction to the device. It's basically obsolete, so it's not surprising to see it rendered differently in different environments.

By the way, here it is again, between the single-quotes: ''


Man, weird. Apparently the lack of rendering is in my browser, not Sardia's. It shows up in the mouseover text, and it shows up in my input box if I quote you, but it doesn't show up in your message.

I do like how Firefox shows the numerical code inside the square, as opposed to Chrome which just shows a plain square.

Mikeski
Posts: 1113
Joined: Sun Jan 13, 2008 7:24 am UTC
Location: Minnesota, USA

Re: 1683: "Digital Data"

Postby Mikeski » Sat May 21, 2016 6:09 am UTC

jc wrote:Heh. Reminds me of a fun set of tests of audio equipment that Consumer Reports published years ago. What they did was organize two sets of "expert listeners" to judge the equipment. One was made of professional sound engineers; the other was made of professional musicians. The results showed that, within each group, there was close agreement on the perceived quality of various sound equipment, but the two groups consistently disagreed with each other about which equipment sounded best.

The summary was that the sound engineers liked the equipment that did the best job of reproducing the original sound (which they hadn't actually heard; they only heard the recordings, but they knew enough to know when things were missing). The musicians, however, preferred the equipment that produced the clearest music, which in many cases meant cutting back on some parts of the sound produced by the instruments or the recording hall/studio.

Of course, many people think that all musicians are a bit crazy, including the musicians themselves. Why would anyone devote so much time to something that returns so little respect and so little money to its practitioners? But that's a different topic. Fact is that the musicians wanted to hear the music, and didn't particularly like the non-musical stuff that was part of an accurately-reproduced recording. So they preferred the equipment that (partially) suppressed the parts of the sound that they didn't want to hear.


That sounds to me like "sound engineers liked music that sounded like studio monitors" and "musicians liked music that sounded like stage monitors". (Or, engineers like the sound of a violin as heard from the 10th row of the theater, musicians like the sound of a violin as heard from the first violinist's chair on stage, if this isn't amplified music we're talking about.)

That is, they both thought "what I hear all the time" is "what music should sound like".

User avatar
HES
Posts: 4896
Joined: Fri May 10, 2013 7:13 pm UTC
Location: England

Re: 1683: "Digital Data"

Postby HES » Sat May 21, 2016 11:05 am UTC

CatCube wrote:The problem with "big enough monitors" is that almost everybody in our office needs to read drawings at some point or another. I personally wouldn't want a big enough monitor to read them on my desk, because I've got other things I want to have on the desktop.

I don't know, if my desk was the monitor, with some sort of smart touch input, that would be pretty handy.
He/Him/His Image

User avatar
Eebster the Great
Posts: 3485
Joined: Mon Nov 10, 2008 12:58 am UTC
Location: Cleveland, Ohio

Re: 1683: "Digital Data"

Postby Eebster the Great » Sat May 21, 2016 11:27 am UTC

LockeZ wrote:
Eebster the Great wrote:
LockeZ wrote:Here's some amazing irony for you. Even though the alt text is supposed to look like gibberish characters that your browser can't render correctly, sardia's browser apparently legitimately can't render the last fake unrenderable gibberish character.

Image

It's a C1 control code character called 'Device Control String', DCS, or U+0090. Its intended use was to introduce a string of 8-bit characters that would be passed as an instruction to the device. It's basically obsolete, so it's not surprising to see it rendered differently in different environments.

By the way, here it is again, between the single-quotes: ''


Man, weird. Apparently the lack of rendering is in my browser, not Sardia's. It shows up in the mouseover text, and it shows up in my input box if I quote you, but it doesn't show up in your message.

It's not supposed to show up at all. Unicode is transparent to DCS strings. On Chrome it always shows up as an invisible character.

User avatar
Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 4060
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

Re: 1683: "Digital Data"

Postby Soupspoon » Sat May 21, 2016 12:24 pm UTC

HES wrote:I don't know, if my desk was the monitor, with some sort of smart touch input, that would be pretty handy.
You'll doubtless need a coffee-holder!

(Which would probably a mug or cup of some kind, as per usual - unless you usually pour it into the bowl of your hands - so it'd be a coffee-cup-holder you'd actually want. 100%§ MugOfTea™ and CanOfSoda™©®°^ Compatable. § May not be 100% - See manual for exemptions, pages 427 through 791, plus regional Addenda pack.)

User avatar
CatCube
Posts: 38
Joined: Wed Sep 21, 2011 5:28 pm UTC

Re: 1683: "Digital Data"

Postby CatCube » Sat May 21, 2016 2:45 pm UTC

HES wrote:
CatCube wrote:The problem with "big enough monitors" is that almost everybody in our office needs to read drawings at some point or another. I personally wouldn't want a big enough monitor to read them on my desk, because I've got other things I want to have on the desktop.

I don't know, if my desk was the monitor, with some sort of smart touch input, that would be pretty handy.


Well, I work for the US Government, so if you think that purchasing that for everybody is a good use of your taxpayer dollars....

Seriously, though, just maintaining a space to keep large-size drawings is a much easier and more cost-effective way for everybody to have access to the ability to see a whole sheet at once.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26824
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: 1683: "Digital Data"

Postby gmalivuk » Sat May 21, 2016 6:00 pm UTC

ijuin wrote:
jc wrote:Paper is an interesting fallback, but people are missing the fact that in practice, paper is a very low-quality storage medium, and is much of the reason that we are so ignorant of so much of our history. Information that people actually decide they want to keep might be better kept in binary form (in as many archives as will accept them), with paper as an another-path sort of backup.

Cheapest-grade paper, such as newsprint or the old-style "pulp" magazines, may last eighty or ninety years at most if kept dry and out of sunlight. High-quality cotton or linen paper may last two to three centuries under similar conditions (barring pests). That is still fourfold less than parchment, and fiftyfold less than stuff engraved on clay or stone (5k years of more--writing literally has not existed for long enough for stone carvings to fade away if they are protected from the weather).
Among common storage media, typical digital formats are at the high end of portability and copiability and the low end of longevity, while carvings are at the high end of longevity and the low end of portability and copiability. The average person's ignorance of history is likely more due to the fact that old information is not easily accessible (low portability) than the fact that it has degraded (low longevity), whereas the ignorance of experts is due to a lack of longevity.

Which is all to say, "low quality" isn't an especially clear description without further context.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

Brandesianisme
Posts: 6
Joined: Wed Jan 16, 2013 8:31 am UTC

Re: 1683: "Digital Data"

Postby Brandesianisme » Sat May 21, 2016 8:24 pm UTC

Golly gee, thanks for reminding me of the inevitable end of all and the rational futility of everything

MechR
Posts: 10
Joined: Fri Sep 30, 2011 1:45 pm UTC

Re: 1683: "Digital Data"

Postby MechR » Sat May 21, 2016 8:33 pm UTC

eidako wrote:Metal Gear Solid 2 made a pretty good case for why data loss is a good thing.
The mapping of the human genome was completed early this century. As a result, the evolutionary log of the human race lay open to us. We started with genetic engineering, and in the end, we succeeded in digitizing life itself. But there are things not covered by genetic information. Human memories, ideas. Culture. History. Genes don't contain any record of human history. Is it something that should not be passed on? Should that information be left at the mercy of nature?

We've always kept records of our lives. Through words, pictures, symbols... from tablets to books... But not all the information was inherited by later generations. A small percentage of the whole was selected and processed, then passed on. Not unlike genes, really. That's what history is, Jack. But in the current, digitized world, trivial information is accumulating every second, preserved in all its triteness. Never fading, always accessible. Rumors about petty issues, misinterpretations, slander...

All this junk data preserved in an unfiltered state, growing at an alarming rate. It will only slow down social progress.

That reminds me of a good rebuttal from Shut Hell.

Yurul wrote:
Image

If they deceive, it is not the letters, but someone's brush, that lies. The Mongols have already gained writing. The Mongols can speak with the brush of the victor. But Temujin, what the brush of the victor writes is always full of deceit in the eyes of the vanquished. One day someone will reach for that brush. As long as people struggle for that single victor's brush, what it writes will always be lies to someone.

That is why we must make more brushes, and more kinds of brushes. Surely you, who have achieved the greatest power on earth, are capable of that. Everyone, everywhere will write their own truths with those words and letters that deceive the heart. Those truths will continue to accumulate. It will be a mass of countless happenings and hearts. And it is precisely because they are each written in letters that deceive, that the mass will be no one's property, and no one color. Someone's victory, as well as someone's sadness, someone's death. Strong rule, and the hearts that rebel against it. If all of it is written down, and continues to live, that mass of countless happenings and hearts will itself one day become a thing free from deception.

eidako
Posts: 126
Joined: Wed Apr 06, 2011 10:24 am UTC

Re: 1683: "Digital Data"

Postby eidako » Sun May 22, 2016 6:50 am UTC

Brandesianisme wrote:Golly gee, thanks for reminding me of the inevitable end of all and the rational futility of everything

Image

MechR wrote:Shut Hell

Best manga title ever?

rmsgrey
Posts: 3655
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1683: "Digital Data"

Postby rmsgrey » Sun May 22, 2016 6:51 pm UTC

Pfhorrest wrote:And then, back on topic, the digital recording stays exactly as it was forever, and doesn't slowly lose quality over use and time like any analog recording would.


A digital recording is much more robust to use than an analog one, but time still degrades it - if nothing else, the physical medium itself decays...

speising
Posts: 2365
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: 1683: "Digital Data"

Postby speising » Sun May 22, 2016 10:31 pm UTC

rmsgrey wrote:
Pfhorrest wrote:And then, back on topic, the digital recording stays exactly as it was forever, and doesn't slowly lose quality over use and time like any analog recording would.


A digital recording is much more robust to use than an analog one, but time still degrades it - if nothing else, the physical medium itself decays...

Media decay, but data is eternal.

User avatar
Wildcard
Candlestick!
Posts: 253
Joined: Wed Jul 02, 2008 12:42 am UTC
Location: Outside of the box

Re: 1683: "Digital Data"

Postby Wildcard » Sun May 22, 2016 11:03 pm UTC

eidako wrote:Metal Gear Solid 2 made a pretty good case for why data loss is a good thing.

I don't know about Metal Gear Solid 2, but have you read Haunted By Data?

Justin Lardinois wrote:Alt text is alternative text to be used if the image itself can't be shown; title text is additional information that's historically rendered as a tooltip when you hover over an element.

I, for one, appreciate this clarification. If you're being pedantic, then so is Noah Webster a pedant. Knowing the actual meanings of words is important, so thanks.

CatCube wrote:Seriously, though, just maintaining a space to keep large-size drawings is a much easier and more cost-effective way for everybody to have access to the ability to see a whole sheet at once.

This is 100% true, but of course a paper system doesn't create employment for as many people. Mind you, I think that's a good thing—the fact that you don't need a whole team of highly technical people to manage the system which allows you to access your construction drawings should be cause for celebration—but the technical people like to feel necessary and to create systems that no one else can understand. Also known as "making yourself irreplaceable." I'm not a fan of that practice.
There's no such thing as a funny sig.

Solarn
Posts: 14
Joined: Fri Dec 07, 2012 1:27 am UTC

Re: 1683: "Digital Data"

Postby Solarn » Mon May 23, 2016 9:26 am UTC

ijuin wrote:
jc wrote:Paper is an interesting fallback, but people are missing the fact that in practice, paper is a very low-quality storage medium, and is much of the reason that we are so ignorant of so much of our history. Information that people actually decide they want to keep might be better kept in binary form (in as many archives as will accept them), with paper as an another-path sort of backup.


Cheapest-grade paper, such as newsprint or the old-style "pulp" magazines, may last eighty or ninety years at most if kept dry and out of sunlight. High-quality cotton or linen paper may last two to three centuries under similar conditions (barring pests). That is still fourfold less than parchment, and fiftyfold less than stuff engraved on clay or stone (5k years of more--writing literally has not existed for long enough for stone carvings to fade away if they are protected from the weather).


But digital data doesn't need to be kept protected from weather, doesn't have pests and doesn't shatter into a hundred tiny pieces if dropped. And the more durable the analog storage medium is, the less information it can store in the same space. Meanwhile, regardless of how durable the device you store your digital data on is, as long as you copy it over to another device before the original one fails, the data is kept without loss, possibly indefinitely.

User avatar
Soupspoon
You have done something you shouldn't. Or are about to.
Posts: 4060
Joined: Thu Jan 28, 2016 7:00 pm UTC
Location: 53-1

Re: 1683: "Digital Data"

Postby Soupspoon » Mon May 23, 2016 9:44 am UTC

speising wrote:Media decay, but data are eternal.

FTFY, but kudos for not saying "media decays" ;)

rmsgrey
Posts: 3655
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1683: "Digital Data"

Postby rmsgrey » Mon May 23, 2016 12:19 pm UTC

Solarn wrote:Meanwhile, regardless of how durable the device you store your digital data on is, as long as you copy it over to another device before the original one fails, the data is kept without loss, possibly indefinitely.


As long as you copy it over without errors and then copy it off the new device before it fails, again without errors...

After all, DNA is digital, and pretty much all copying errors by now.

User avatar
mikael
Posts: 28
Joined: Mon Feb 16, 2015 6:56 pm UTC
Location: Avignon, France
Contact:

Re: 1683: "Digital Data"

Postby mikael » Mon May 23, 2016 4:57 pm UTC

For the longest time, I've been thinking about a related problem: the fact that digital data is mostly created by transforming existing data, but that the transformation process itself is almost always lost.

For example, to a given picture (the existing data), I can apply a filter (the process) and get a new picture (the new data). Once that data has been created, the fact that the new picture is just the old one with that filter applied is not "remembered" in any way by the system.

I've been thinking about this mostly in the context of peer-to-peer file-sharing networks: given a single movie (or ISO or whatever), you will find dozens (if not hundreds) of shared files that were all derived from a common data source through subtly different processes. But because these relations ("applying function F to file A gives file B") are not shared within the system, the original data and its derivatives are all perceived as independent data.

In effect, from the point of view of the network, and because it's so easy to transform files with modern devices, the users end up creating vasts amounts of data out of thin air.

How hard would it be to get computer systems to remember these processes by which we, the users, endlessly create new data? And how could this information be shared and used by the network?

Any ideas or pointers you could share with me?

User avatar
ucim
Posts: 6890
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: 1683: "Digital Data"

Postby ucim » Mon May 23, 2016 5:51 pm UTC

mikael wrote:but that the transformation process itself is almost always lost
Audacity (audio processor) is an example where it is not (although it can be discarded). When you make edits within Audacity, the system remembers what you did, but keeps the original and "re-does" it each time. So, you can chop a sound file into bits and reassemble those bits; what's saved are the edit points. Some transformations are destructive; but then this is true even when creating a text document in a word processor. Do you save every version of your document? If not, all the intermediaries are lost

In a sense, the act of creation never stops. We're just used to the idea that if it's the same creator, the intermediaries are not important. But once it passes from one creator to another, there's something about the intermediate version that's sacrosanct.

More troubling for me is that when I search on the net for something, I find hundreds of nearly identical undated versions that are clearly ripoffs of one another, but with no indication of which (if any) are the original, and who is the original author.

There's a shoe cartoon that goes something like: "Why do they call it a word processor?" "Well, you know what a food processor does with food..."

mikael wrote:How hard would it be to get computer systems to remember these processes by which we, the users, endlessly create new data? And how could this information be shared and used by the network?

Any ideas or pointers you could share with me?
How would you deal with intellectual property issues? Especially when the derived version uses just a small piece of the source work.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Heartfelt thanks from addams and from me - you really made a difference.

commodorejohn
Posts: 1200
Joined: Thu Dec 10, 2009 6:21 pm UTC
Location: Placerville, CA
Contact:

Re: 1683: "Digital Data"

Postby commodorejohn » Mon May 23, 2016 6:43 pm UTC

mikael wrote:How hard would it be to get computer systems to remember these processes by which we, the users, endlessly create new data? And how could this information be shared and used by the network?

Step 1: create a diff every time a document is written to.
Step 2: wonder where all your disk space went.
Step 3: stop doing that.

We've had versioning filesystems since all the way back on ITS (if not earlier.) The problem is that by and large you're expending a lot of space to hang onto information that for the most part nobody cares about.
"'Legacy code' often differs from its suggested alternative by actually working and scaling."
- Bjarne Stroustrup
www.commodorejohn.com - in case you were wondering, which you probably weren't.

speising
Posts: 2365
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: 1683: "Digital Data"

Postby speising » Mon May 23, 2016 7:11 pm UTC

Soupspoon wrote:
speising wrote:Media decay, but data are eternal.

FTFY, but kudos for not saying "media decays" ;)

Either i don't get it, or i disagree. I'm using it here as a mass noun, which is more appropriate in this context, in my humble opinion. (Especially in juxtaposition to the multiple physical media…)

Tyndmyr
Posts: 11443
Joined: Wed Jul 25, 2012 8:38 pm UTC

Re: 1683: "Digital Data"

Postby Tyndmyr » Mon May 23, 2016 8:22 pm UTC

commodorejohn wrote:
mikael wrote:How hard would it be to get computer systems to remember these processes by which we, the users, endlessly create new data? And how could this information be shared and used by the network?

Step 1: create a diff every time a document is written to.
Step 2: wonder where all your disk space went.
Step 3: stop doing that.

We've had versioning filesystems since all the way back on ITS (if not earlier.) The problem is that by and large you're expending a lot of space to hang onto information that for the most part nobody cares about.


Honestly, it's not that hard, or disk space intensive, particularly for text data. Subversion or whatever is pretty easy, and for basic backup/restoration, is quite sufficient.

commodorejohn
Posts: 1200
Joined: Thu Dec 10, 2009 6:21 pm UTC
Location: Placerville, CA
Contact:

Re: 1683: "Digital Data"

Postby commodorejohn » Mon May 23, 2016 8:27 pm UTC

No, but you'll note that A. you're talking specifically about plain text, whereas the original question was general with a specific bent towards multimedia data (videos/etc.) and B. you're suggesting a solution like Subversion which strictly targets a specific set of files or folders rather than generally tracking all document activity. Imagine if the sytem made a new copy (or even a new partial copy) every time there was a change in your bash history or logfiles or whatever.
"'Legacy code' often differs from its suggested alternative by actually working and scaling."
- Bjarne Stroustrup
www.commodorejohn.com - in case you were wondering, which you probably weren't.

speising
Posts: 2365
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: 1683: "Digital Data"

Postby speising » Mon May 23, 2016 8:53 pm UTC

I'd fancy a filesystem which versions files with a specific attribute only. Text editors would probably set that attribute, anything that writes to /tmp (or %TEMP%) wouldn't.

commodorejohn
Posts: 1200
Joined: Thu Dec 10, 2009 6:21 pm UTC
Location: Placerville, CA
Contact:

Re: 1683: "Digital Data"

Postby commodorejohn » Mon May 23, 2016 9:31 pm UTC

Yeah, I could see that being handy. It'd just need some kind of fundamental restriction or configuration so that only the files/file types you actually want a history on are stored that way.
"'Legacy code' often differs from its suggested alternative by actually working and scaling."
- Bjarne Stroustrup
www.commodorejohn.com - in case you were wondering, which you probably weren't.

User avatar
mikael
Posts: 28
Joined: Mon Feb 16, 2015 6:56 pm UTC
Location: Avignon, France
Contact:

Re: 1683: "Digital Data"

Postby mikael » Mon May 23, 2016 9:33 pm UTC

ucim wrote:Audacity (audio processor) is an example where it is not (although it can be discarded).

Now that I think of it, every application that has an "undo" command basically behaves this way. The problem is that this information is not accessible from outside the application: while you're mixing your tracks with Audacity, the process you apply is stored and actionable (for example with "undo"). But once you export your project, that history is severed and what you get just looks like "brand new" data.

ucim wrote:More troubling for me is that when I search on the net for something, I find hundreds of nearly identical undated versions that are clearly ripoffs of one another, but with no indication of which (if any) are the original, and who is the original author.

Exactly. Just because a general framework to keep track of what works are derivatives of other exists doesn't mean that everybody would use it, but given that it's sufficiently unobtrusive and useful, it could greatly alleviate that problem.

ucim wrote:How would you deal with intellectual property issues? Especially when the derived version uses just a small piece of the source work.

The short answer is that I wouldn't: intellectual property issues are best kept in the ambiguous and messy world of the legal system, where they belong. Try transposing them to the technical world and you'll get DRM-like debacles every time.

But a clear way of deriving data from other data could actually help with IP issues. For example, the LAME project started as a set of patches against some proprietary encoder. In the same way, one could publish a new work that would require some copyrighted data to be "plugged in" without actually distributing any of the protected material. The burden of acquiring that material would thus shift to the end user.

commodorejohn wrote:We've had versioning filesystems since all the way back on ITS (if not earlier.) The problem is that by and large you're expending a lot of space to hang onto information that for the most part nobody cares about.

OK, so keeping every single intermediate step of a computation just wastes space. But on the other hand, not keeping intermediate steps can also waste space: consider the case of large file which is slightly modified and then written to a new file. You get twice the storage requirement even though the two files are mostly the same.

So it seems the question is to decide which steps to keep and which to discard. But before that question can be answered, the actual use of that information needs to be specified much more precisely...


Return to “Individual XKCD Comic Threads”

Who is online

Users browsing this forum: ObsessoMom, Old Bruce and 90 guests