Search
8 results for www.alignmentforum.org:
currentscurrents t1_j2uwlrh wrote
Reply to comment by Mental-Swordfish7129 in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo
think interpretability will help us build better models too. For example, in [this paper](https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking) they deeply analyzed a model trained to do a toy problem - addition `mod 113`. They found that
artifex0 t1_j50v0ju wrote
Reply to comment by I_am_so_lost_hello in The year is 2058. I awake in my pod. by katiecharm
alignment problem is [not easy](https://www.alignmentforum.org/), but also not without [hope](https://www.lesswrong.com/posts/BfN88BfZQ4XGeZkda/concrete-reasons-for-hope-about-ai).
lol-its-funny t1_j55ge3m wrote
months back, also very useful for the future of scaling, time and (traditional) data limits. https://www.alignmentforum.org/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications Basically even the largest model to date, PaLM is very suboptimal by leaning towards WAY more parameters
DukkyDrake t1_j8fvyr5 wrote
Reply to comment by FusionRocketsPlease in Altman vs. Yudkowsky outlook by kdun19ham
mostly in hand. Here are some more informed comments regarding alignment concerns and [CAIS](https://www.alignmentforum.org/posts/HvNAmkXPTSoA4dvzv/comments-on-cais), which is what I think we'll end up with by default at the turn of the decade
SchmidhuberDidIt OP t1_j9rqdje wrote
Reply to comment by Tonkotsu787 in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt
Thanks, I actually read [this](https://www.alignmentforum.org/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer) today. He and Richard Ngo are the names I've come across for researchers who've deeply thought about alignment and hold views grounded in the literature
mano-vijnana t1_j9s5zl4 wrote
Reply to comment by SchmidhuberDidIt in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt
they don't see doom as inevitable. This is the sort of scenario Christiano worries about: [https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like](https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like) And this is Ngo's overview of the topic: [https://arxiv.org/abs/2209.00626](https://arxiv.org/abs/2209.00626)
currentscurrents OP t1_j2hdsvv wrote
Reply to comment by MrAcurite in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents
Someone else posted this example](https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking), which is kind of what I was interested in. They trained a neural network to do a toy problem, addition mod 113, and then were able