malachai926
malachai926 t1_j64oqam wrote
Reply to comment by terrykrohe in [OC] best-fit lines, correlations: ed spending vs evangelical –– 2020 election by terrykrohe
To be frank, it's just poor presentation. Statisticians like myself will see lots of problems with this. If I am confused, I guarantee that the layperson will be even more so.
>red = Rep states in 2020 election
blue = Dem states in 2020 election
Even here, you aren't being clear enough. Are they "republican" because their votes for president in the 2020 election were majority in favor of the Republican candidate? Republican because they elected more Republican House congresspeople / senators? I can infer that you're likely referring to the electoral college result, but when people have to infer what you mean with your data, that's just bad practice that is bound to get you in trouble in the future.
>t-tests are usually reported using the p-value
Not always, no. A lot of published research will tell you both the t-statistic AND the p-value. If you're giving us a p-value, you should say it's a p-value, end of story.
>the t-test is sensitive to small mean variations: the top right plot shows the means separated by a SD, which is NOT a small difference ( t-test = 0.000015).
That's great, but why didn't you state that result in the graph? And again, don't say "t-test equals", at least say "t-test p-value equals". It's nonsense to say that a test equals something. The test generates a statistic and a p-value which equal something, but the test itself is a test. It pays to be explicit with what you are saying, or else other statisticians could misinterpret what you are saying. In this case, if someone thought you meant the t-statistic was 0.000015, that would mean the results were highly non-significant and would think you screwed up your calculation.
You seem to have some idea in your mind of how things are "typically" interpreted by various groups of people, but you should NOT rely on those assumptions because inevitably someone will interpret gray area in a way you didn't intend. It is always far, far preferable to be as explicit as you can with your definitions of things.
Again I think showing this as a sorted scatterplot is just weird. You really ought to show this data as a histogram. You're using a t-test, yeah? So it's really incumbent on you to demonstrate that the data really does follow the shape of a t-distribution to prove to your audience that such a test is acceptable. A histogram achieves that; this scatterplot does not.
Finally, maybe it's just me, but grouping these things together on a state level just feels like you're losing so much detail and misclassifying so much data that I really question the validity of your results. Maybe this is the best you have to work with, but you are classifying a state that went 51% in favor of the Democrat as 100% Democratic and vice versa, which then classifies every single school district in that state, including the likely numerous rural school districts where people are more likely to be conservative, as "Democratic" school districts contributing however much money they contributed towards education. You'd get a lot more robust data and far less of this kind of error if you were able to get this data by school district. If you don't have that data, it is what it is, but the end result is that I'll consider everything I said here and think "eh, this is kinda just bad analysis and is meaningless" and it gets disregarded. And I imagine you wouldn't want the analysis you spent all of this time and effort on to be disregarded, yeah?
malachai926 t1_j63chqv wrote
Reply to [OC] best-fit lines, correlations: ed spending vs evangelical –– 2020 election by terrykrohe
Showing the first two plots as a sorted scatter plot is kind of an odd way to convey the data. You'd have been better served showing a histogram of sorts and what the fitted t-distribution would be between your two data sets. The overlap between those two curves is what gives you the best visual representation of a statistical difference.
It's also not really clear how you are classifying the data. Is every data point a state? Are you classifying a state as "Democrat" or "Republican" based on majority vote for president in some election? This info is necessary to properly interpret your results. If that's what you're doing, that's also kind of an odd analysis, since the state as a whole clearly doesn't represent just one party, not to mention that the population differences in conjunction with their classification really ought to be weighted accordingly. You're destroying the whole concept of "per capita" if this is what you are doing.
The "t-test" number on the top left is confusing. Is that the t-statistic or the p-value? And why doesn't the top right graph have a number, especially when it looks more likely to have a statistical difference?
Your bottom chart has a typo. "Evangelival."
I see you post stuff like this regularly. IMO you ought to clean up your presentation quite a bit and give it more thought. It's kind of a mess.
Signed, a biostatistician
malachai926 t1_iu3318w wrote
Reply to comment by malachai926 in Cygnus region of the Milky Way setting over the Blue Ridge Mountains in Virginia [3376 x 4672] (OC) by MrJackDog
My brother, where do you intend to go tonight? I heard that you missed your connecting flight.... To the blue ridge mountains, over near Tennessee.
malachai926 t1_iu32qrv wrote
Reply to Cygnus region of the Milky Way setting over the Blue Ridge Mountains in Virginia [3376 x 4672] (OC) by MrJackDog
In the constellation of Cygnus, there lurks an invisible and mysterious force: the black hole of Cygnus X-1.
BWAHHHH!
Six stars of the northern cross BWAHHHH! in mourning for their sister's loss, in a final flash of glory, nevermore to grace the night.
BWAHHHH!
rumbling noises
...... doooo DOOO do dah DAH!
malachai926 t1_j65vrxn wrote
Reply to comment by terrykrohe in [OC] best-fit lines, correlations: ed spending vs evangelical –– 2020 election by terrykrohe
>I have never had a non-"lay person" ask if the t-test is the statistic or the p-value
Not everyone is as thorough as I am. You seem to have a strong interest in statistics, based on the content you typically post, and if you want to succeed in the field of statistics and get noticed, you'll have to start cleaning up your presentation.
>The data is a visualization of tabular data presented by the source. The data is visualized using the Mathematica function"ListPlot":
Then you really ought to use a different function. This is just a strange way of presenting your data. You've posted very similar types of graphs here often, and it seems like they don't get much of a response. The strange presentation is probably why.
>There is NO analysis being done here; just data presentation. Inferences are the Reader's prerogative.
A t-test is analysis.