Thursday, April 20, 2023

Meta Stuff


The A.I. Monster


No, this blog post is not about Zuckerberg’s wet dream of virtual people walking around without legs. It’s an investigation into the nature of the metadata that explicitly and implicitly surrounds datapoints and should be accounted for in any A.I. discourse.


The GPT technologies can only be as bright as all the data and text … and its metadata that exists on internet servers … how this data, text and metadata are processed. And there is an umbra of important information about this data that does not exist therein today and is not accounted for … or, if it does exst, is not included in these A.I. responses. I contend this is the Achilles heel of the current A.I. mania.


I obviously must elaborate:


The metadata that surrounds a single datapoint includes:


- Precise identification: What exactly does this datapoint represent


- Data type: integers, text, fractions, rational numbers, imaginary numbers, Boolean values, percentages, angles, vectors, probabilities, random numbers, dates, transcendental numbers, geographic locations, calendar dates, clock times, global date/time reconciliations, locations in outer space, etc.


- Data units: Standard International Units, derivative units, monetary units, industry-specific counting units, units agreement during calculations, international units translations, recognized established constants


- Data scaling: scientific notation, range of textual scaling options, scaling ambiguity (England vs. U.S.), scaling agreement during calculations


- Data accuracy: confidence intervals, estimates vs. actual measures, approximations, significant digits lost during calculations, accuracy lost during data type conversions or combinations


- Data source: known vs. anonymous source, veracity level of source, source accuracy history


- Data freshness: last update, update frequency


- Data validity: peer review, reproducibility, suspected propaganda, multiple data source resolution


- Data ownership: data usage fees, data usage legal constrictions, government secret classifications, encryption



So, I contend Pilgrim, that this umbra of metadata surrounding a single computer-server datapoint is enormous and does not all exist electronically … or is often not accounted for … or inaccurately accounted for when these A.I. bots construct their responses.


Until these metadata problems … and some language deep structure issues … are effectively dealt with, these GPT apps are just dangerous toys.


Trust me Pilgrim … I’ve wrestled this A.I. monster before …


Afterward: Traditionally, computer datapoints were 32 or 64 bits long. Powerful A.I. which accounts for this needed metadata may require datapoints to be many hundreds of thousands of bits long. May need quantum computers to deal with effectively.


I make no claim that the above listing is complete.


After afterward: It is possible that a soupçon of metadata can be captured from the surrounding text in the GPT apps. How effectively this is dome is unclear to me.


 STAND UP FOR REAL A.I.!


2 comments:

DEN said...

Your point is obfuscated by your dazzling erudition.
What is a Pilgrim supposed to do with this?

George W. Potts said...

Not to be sucked into the current A.I. maelstrom …