danby t1_ishjkfs wrote on October 16, 2022 at 1:05 AM

Agreed. "Proportion of non-shared base pairs" is at least a decent enough, semi-objective way to compare the differences between two genomes without getting too far in to the weeds about what exactly constitutes a difference. There are, in the end of the day, lots of differences that simply can't be expressed as a percentage difference (like gene/chromosome translocation)

Fmatosqg t1_ishmusp wrote on October 16, 2022 at 1:32 AM

Since all of this is meant to produce proteins, it's only fair that the calculation is biased towards things that make different proteins.

So if a gene/allele gets moved to a different place, it still counts as no difference.

DreamWithinAMatrix t1_ishq800 wrote on October 16, 2022 at 1:58 AM

Protein production used to be the thinking back in the day of the term "junk DNA" but we've since learned that actually there are sequences that have non-protein generating functions. Promoters and alternative splicing are the ones that come to mind. There are viral gene inserts which were originally thought to have no function but seem to be amplified in some regions and is now hypothesized to be a source of accelerated evolution, such as, in neurons which may have contributed to how humans diverged from chimps. The epigenome is the methyl groups around the DNA which can open or close to prevent the genes from being expressed, which might be mainly driven by environmental conditions and change frequently. There are some portions of DNA which might fold on itself to prevent expression as well.

If you only look at the raw gene sequence and say only the protein producing ones count. You have no way of telling:

how much
how many kinds
speed
and whether the protein is currently being expressed

without taking all those things into account. Also there are so many of the above being discovered that there's really no way to calculate all that yet

joalheagney t1_isi6mvr wrote on October 16, 2022 at 4:18 AM

Not to mention all the various segments that code for functional but non-protein encoding RNA.

[deleted] t1_isjaww4 wrote on October 16, 2022 at 12:39 PM

[removed]

doc_nano t1_ishqweh wrote on October 16, 2022 at 2:04 AM

Well… sort of. While encoding proteins is arguably the most important and certainly the most visible function of the genome, there are parts that code for RNA that does not get translated into protein. These and other non-coding segments actually make up the majority of the human genome, and many of them play important roles. Though it is true that almost all those roles support the expression or regulation of proteins in some indirect way.

Also, a gene moving to a different locus can actually make a big difference, because the way it is expressed and regulated can change, even if it codes for the same protein.

danby t1_isis6zk wrote on October 16, 2022 at 8:39 AM

> So if a gene/allele gets moved to a different place, it still counts as no difference.

Definitely not. Translocation often leads to or implies different expression of genes. As an aside many, many translocations over large amounts of evolutionary time can lead to things like chromosome loss and/or speciation events. These are important forms of genetic change/mutation that do lead to important functional change. And they do make genomes quite different in ways that aren't measurable by simple percentages.

BryKKan t1_isj5b4f wrote on October 16, 2022 at 11:38 AM

See, that's the problem though. Simply translocating a sequence, with no alteration, can diminish or amplify expression dramatically. So that could still be considered a difference.

derefr t1_isih06b wrote on October 16, 2022 at 6:11 AM

"Easy" — but impractical to calculate in practice — concrete answer: it's the information-theoretic co-compressibility of the all the dependent information required to construct one individual's proteome relative to another indivdual's.

(I.e., if you have all the DNA + methylations et al of one person's genome, stored in a file, which you then compress in an information-theoretical optimal way [not with a general-purpose compressor, but rather one that takes advantage of the structure of DNA, rearranging things to pack better], and then measure the file-size of the result; and then you create another file which contains all that same [uncompressed] information, plus the information of a second person's DNA + methylations et al; and you optimally compress that file; then by what percentage is the second optimally-compressed file larger than the first?)

Or, to use a fanciful analogy: if we had a machine to synthesize human cells "from the bottom up", and you had all the information required to print one particular human's cells stored somewhere — then how much more information would you need as a "patch" on the first human's data, to describe an arbitrary other particular human, on average?

Inariameme t1_isk4gr1 wrote on October 16, 2022 at 4:21 PM

idk that i tend to agree with any of the computational architectures ;)

Simply, is DNA as linear as has been suggested? probabilistic-ally_

[deleted] t1_isi9idi wrote on October 16, 2022 at 4:47 AM

[removed]

[deleted] t1_isisapp wrote on October 16, 2022 at 8:41 AM

[removed]

snuffleupugus_anus t1_isnsed2 wrote on October 17, 2022 at 11:11 AM

Would a metric like ratio of varying base pairs to the differential in expressed proteins be better metric? I realize that it's just a theoretical number and that we can't actually count literally every protein in a human body, but, as a thought experiment I suppose, is that a more meaningful depiction of actual genetic difference?

When it's said 99.9% of human DNA is the same in all humans, is this referring to only coding DNA or both coding and non-coding DNA combined?

RantingRobot t1_ishhfay wrote on October 16, 2022 at 12:49 AM

danby t1_ishjkfs wrote on October 16, 2022 at 1:05 AM

Fmatosqg t1_ishmusp wrote on October 16, 2022 at 1:32 AM

DreamWithinAMatrix t1_ishq800 wrote on October 16, 2022 at 1:58 AM

joalheagney t1_isi6mvr wrote on October 16, 2022 at 4:18 AM

[deleted] t1_isjaww4 wrote on October 16, 2022 at 12:39 PM

doc_nano t1_ishqweh wrote on October 16, 2022 at 2:04 AM

danby t1_isis6zk wrote on October 16, 2022 at 8:39 AM

BryKKan t1_isj5b4f wrote on October 16, 2022 at 11:38 AM

derefr t1_isih06b wrote on October 16, 2022 at 6:11 AM

Inariameme t1_isk4gr1 wrote on October 16, 2022 at 4:21 PM

[deleted] t1_isi9idi wrote on October 16, 2022 at 4:47 AM

[deleted] t1_isisapp wrote on October 16, 2022 at 8:41 AM

snuffleupugus_anus t1_isnsed2 wrote on October 17, 2022 at 11:11 AM