Submitted by logTom t3_10qlx29 in MachineLearning
smyliest t1_j6rlgp9 wrote
Do we have large data sets for alien signals to train model?
cdsmith t1_j6tg9z7 wrote
Awesome question! I definitely laughed.
The serious answer that the GitHub link clarifies is that the model is semi-unsupervised. That means they have a lot of data, but only some of it is labeled. Presumably, the labeled data is all negative because we understand its natural origin. So effectively this becomes almost an anomaly detection sort of thing, looking for data that is least like the known natural signals.
Even if it just directs scientists to look at new natural phenomena, this sounds like a valuable task.
FedRCivP11 t1_j709ds5 wrote
Wouldn’t the sorts of signals our own planet emits be a good dataset to train to recognize the sorts of signals a civilization might generate? I’d assumed from the article this is what they’d done. Seems to me the key is whether we can discern, not necessarily interpret, communications, perhaps encrypted, from cosmic noise and natural phenomena, right? So train a model to recognize any human signals from noise. You’d look in those bands that we emit that are likely to make the journey to our neighbors.
To make the data more useful, you could simulate phase shifting in the datasets of our own EM communications. Perhaps you’d want to simulate other phenomena that is likely to modify celestial signals from a neighbor civilization.
logTom OP t1_j6rq944 wrote
All data used in this paper are stored as high-resolution FILTERBANK and HDF5 format collected and generated from observations by the Robert C. Byrd Green Bank Telescope, which are available through the Breakthrough Listen Open Data Archive at http://seti.berkeley.edu/opendata.
Viewing a single comment thread. View all comments