Viewing a single comment thread. View all comments

RantingRobot t1_ishhfay wrote

Also, a concrete answer to the question doesn't really exist since the number of differences vary depending on how you count them.

Some stretches of DNA do multiple, overlapping things, so is that counted as one difference or four? Some stretches of DNA can be the same in two people, but epigenetically expressed in one but not the other, so is that counted as one difference or none?

The number will always be kind of a guestimate.

238

danby t1_ishjkfs wrote

Agreed. "Proportion of non-shared base pairs" is at least a decent enough, semi-objective way to compare the differences between two genomes without getting too far in to the weeds about what exactly constitutes a difference. There are, in the end of the day, lots of differences that simply can't be expressed as a percentage difference (like gene/chromosome translocation)

80

Fmatosqg t1_ishmusp wrote

Since all of this is meant to produce proteins, it's only fair that the calculation is biased towards things that make different proteins.

So if a gene/allele gets moved to a different place, it still counts as no difference.

10

DreamWithinAMatrix t1_ishq800 wrote

Protein production used to be the thinking back in the day of the term "junk DNA" but we've since learned that actually there are sequences that have non-protein generating functions. Promoters and alternative splicing are the ones that come to mind. There are viral gene inserts which were originally thought to have no function but seem to be amplified in some regions and is now hypothesized to be a source of accelerated evolution, such as, in neurons which may have contributed to how humans diverged from chimps. The epigenome is the methyl groups around the DNA which can open or close to prevent the genes from being expressed, which might be mainly driven by environmental conditions and change frequently. There are some portions of DNA which might fold on itself to prevent expression as well.

If you only look at the raw gene sequence and say only the protein producing ones count. You have no way of telling:

  • how much
  • how many kinds
  • speed
  • and whether the protein is currently being expressed

without taking all those things into account. Also there are so many of the above being discovered that there's really no way to calculate all that yet

59

joalheagney t1_isi6mvr wrote

Not to mention all the various segments that code for functional but non-protein encoding RNA.

14

doc_nano t1_ishqweh wrote

Well… sort of. While encoding proteins is arguably the most important and certainly the most visible function of the genome, there are parts that code for RNA that does not get translated into protein. These and other non-coding segments actually make up the majority of the human genome, and many of them play important roles. Though it is true that almost all those roles support the expression or regulation of proteins in some indirect way.

Also, a gene moving to a different locus can actually make a big difference, because the way it is expressed and regulated can change, even if it codes for the same protein.

10

danby t1_isis6zk wrote

> So if a gene/allele gets moved to a different place, it still counts as no difference.

Definitely not. Translocation often leads to or implies different expression of genes. As an aside many, many translocations over large amounts of evolutionary time can lead to things like chromosome loss and/or speciation events. These are important forms of genetic change/mutation that do lead to important functional change. And they do make genomes quite different in ways that aren't measurable by simple percentages.

9

BryKKan t1_isj5b4f wrote

See, that's the problem though. Simply translocating a sequence, with no alteration, can diminish or amplify expression dramatically. So that could still be considered a difference.

2

derefr t1_isih06b wrote

"Easy" — but impractical to calculate in practice — concrete answer: it's the information-theoretic co-compressibility of the all the dependent information required to construct one individual's proteome relative to another indivdual's.

(I.e., if you have all the DNA + methylations et al of one person's genome, stored in a file, which you then compress in an information-theoretical optimal way [not with a general-purpose compressor, but rather one that takes advantage of the structure of DNA, rearranging things to pack better], and then measure the file-size of the result; and then you create another file which contains all that same [uncompressed] information, plus the information of a second person's DNA + methylations et al; and you optimally compress that file; then by what percentage is the second optimally-compressed file larger than the first?)

Or, to use a fanciful analogy: if we had a machine to synthesize human cells "from the bottom up", and you had all the information required to print one particular human's cells stored somewhere — then how much more information would you need as a "patch" on the first human's data, to describe an arbitrary other particular human, on average?

3

Inariameme t1_isk4gr1 wrote

idk that i tend to agree with any of the computational architectures ;)

Simply, is DNA as linear as has been suggested? probabilistic-ally_

2

snuffleupugus_anus t1_isnsed2 wrote

Would a metric like ratio of varying base pairs to the differential in expressed proteins be better metric? I realize that it's just a theoretical number and that we can't actually count literally every protein in a human body, but, as a thought experiment I suppose, is that a more meaningful depiction of actual genetic difference?

1