currentscurrents OP t1_j2hdsvv wrote on January 1, 2023 at 8:12 AM

Reply to comment by MrAcurite in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Someone else posted this example](https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking), which is kind of what I was interested in. They trained a neural network to do a toy problem, addition mod 113, and then were able

currentscurrents t1_j2uwlrh wrote on January 4, 2023 at 2:47 AM

Reply to comment by Mental-Swordfish7129 in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo

think interpretability will help us build better models too. For example, in [this paper](https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking) they deeply analyzed a model trained to do a toy problem - addition `mod 113`. They found that

artifex0 t1_j50v0ju wrote on January 19, 2023 at 4:55 PM

Reply to comment by I_am_so_lost_hello in The year is 2058. I awake in my pod. by katiecharm

alignment problem is [not easy](https://www.alignmentforum.org/), but also not without [hope](https://www.lesswrong.com/posts/BfN88BfZQ4XGeZkda/concrete-reasons-for-hope-about-ai).

lol-its-funny t1_j55ge3m wrote on January 20, 2023 at 3:11 PM

Reply to GPT-4 Will Be 500x Smaller Than People Think - Here Is Why by LesleyFair

months back, also very useful for the future of scaling, time and (traditional) data limits. https://www.alignmentforum.org/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications Basically even the largest model to date, PaLM is very suboptimal by leaning towards WAY more parameters

DukkyDrake t1_j8fvyr5 wrote on February 14, 2023 at 12:22 AM

Reply to comment by FusionRocketsPlease in Altman vs. Yudkowsky outlook by kdun19ham

mostly in hand. Here are some more informed comments regarding alignment concerns and [CAIS](https://www.alignmentforum.org/posts/HvNAmkXPTSoA4dvzv/comments-on-cais), which is what I think we'll end up with by default at the turn of the decade

SchmidhuberDidIt OP t1_j9rqdje wrote on February 24, 2023 at 2:11 AM

Reply to comment by Tonkotsu787 in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt

Thanks, I actually read [this](https://www.alignmentforum.org/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer) today. He and Richard Ngo are the names I've come across for researchers who've deeply thought about alignment and hold views grounded in the literature

mano-vijnana t1_j9s5zl4 wrote on February 24, 2023 at 4:12 AM

Reply to comment by SchmidhuberDidIt in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt

they don't see doom as inevitable. This is the sort of scenario Christiano worries about: [https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like) And this is Ngo's overview of the topic: [https://arxiv.org/abs/2209.00626](https://arxiv.org/abs/2209.00626)

Search

8 results for www.alignmentforum.org:

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" [very detailed rebuttal to AI doomerism by Quintin Pope]