Search
50 results for arxiv.org:
Submitted by AutomaticVisit1543 t3_11wfxwx in singularity
Submitted by DragonForg t3_11zqmkt in Futurology
Submitted by currentscurrents t3_125uxab in MachineLearning
OpenAI publishes paper on the economic impact of GPT-4: Higher income workers most exposed
arxiv.orgSubmitted by Surur t3_11whznb in Futurology
Submitted by floppy_llama t3_1266d02 in MachineLearning
Submitted by FrereKhan t3_11zg5rr in MachineLearning
Submitted by LesleyFair t3_10fw22o in deeplearning
Think About Scaling LLMs In 2020, a team of researchers from OpenAI released a [paper](https://arxiv.org/pdf/2001.08361.pdf) called: “Scaling Laws For Neural Language Models”. They observed a predictable decrease in training loss when increasing ... that is what people did. The models got larger and larger with GPT-3 (175B), [Gopher](https://arxiv.org/pdf/2112.11446.pdf) (280B), [Megatron-Turing NLG](https://arxiv.org/pdf/2201.11990) (530B) just to name a few. But the bigger ... number of training tokens should double as well. This was published in DeepMind’s 2022 [paper](https://arxiv.org/pdf/2203.15556.pdf): “Training Compute-Optimal Large Language Models” The researchers fitted over 400 language models ranging from
Submitted by LesleyFair t3_11alh40 in singularity
www.siegemedia.com/seo/most-popular-keywords#:~:text=The) winner of most popular,or "weather" for short. \[5\] [https://twitter.com/vladquant/status/1624996869654056960?s=46&t=oAzVIB-avPf-JbQAnhcbtA](https://twitter.com/vladquant/status/1624996869654056960?s=46&t=oAzVIB-avPf-JbQAnhcbtA) \[6\] [https://arxiv.org/pdf/2112.09332.pdf](https://arxiv.org/pdf/2112.09332.pdf) \[7\] [https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/](https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/) \[8\] [https://arxiv.org/abs/1706.03762](https://arxiv.org/abs/1706.03762) \[9\] [https://arxiv.org/abs/2201.08239](https://arxiv.org/abs/2201.08239) \[10\] [https://arxiv.org/abs/2112.04426](https://arxiv.org/abs/2112.04426) ... www.quora.com/What-percentage-of-web-search-queries-are-navigational](https://www.quora.com/What-percentage-of-web-search-queries-are-navigational) \[13\] [https://www.statista.com/statistics/413229/search-query-size-search-engine-share/](https://www.statista.com/statistics/413229/search-query-size-search-engine-share/) \[14\] [https://www.forbes.com/sites/johanmoreno/2021/08/27/google-estimated-to-be-paying-15-billion-to-remain-default-search-engine-on-safari/?sh=40cbbfcf669b](https://www.forbes.com/sites/johanmoreno/2021/08/27/google-estimated-to-be-paying-15-billion-to-remain-default-search-engine-on-safari/?sh=40cbbfcf669b) \[15\] [https://businessquant.com/microsoft-revenue-by-product](https://businessquant.com/microsoft-revenue-by-product) \[16\] [https://arxiv.org/abs/2209.01667](https://arxiv.org/abs/2209.01667)
Submitted by fromnighttilldawn t3_y11a7r in MachineLearning
popular practice/belief is unsound or useless. Some famous examples are: **Troubling Trends in ML** [https://arxiv.org/pdf/1807.03341.pdf](https://arxiv.org/pdf/1807.03341.pdf) **ML that Matters** [https://arxiv.org/abs/1206.4656](https://arxiv.org/abs/1206.4656) **On the Convergence of ADAM** [https://arxiv.org/abs/1904.09237](https://arxiv.org/abs/1904.09237) **On the Information Bottleneck ... iopscience.iop.org/article/10.1088/1742-5468/ab3985](https://iopscience.iop.org/article/10.1088/1742-5468/ab3985) **Implementation Matters in Deep Policy Gradients** [https://arxiv.org/abs/2005.12729](https://arxiv.org/abs/2005.12729) (showed a certain purported algorithm gain is actually mainly due to code-level optimization) **Critique of Turing Award** [https://people.idsia.ch/\~juergen/critique-turing-award-bengio-hinton-lecun.html](https://people.idsia.ch/~juergen/critique-turing-award-bengio-hinton-lecun.html) ... basically a critique on the citation practice in ML) **Deep Learning a Critical Appraisal** [https://arxiv.org/abs/1801.00631](https://arxiv.org/abs/1801.00631) However, these are a little bit dated. Does anyone have any recent critique papers of similar flavour
Submitted by mjrossman t3_11ws42u in Futurology
trend has been AI's societal impact. if anyone's read the[ recent job impact paper](https://arxiv.org/abs/2303.10130), one of the factors that jumped out was the exposure of blockchain engineering to AI-based ... function of any group of market participants. with respect to ML frameworks like[ sparsely-gated MoE](https://arxiv.org/abs/1701.06538v1),[ world models](https://arxiv.org/abs/2301.04104v1),[ multimodality](https://arxiv.org/abs/2303.03378), and[ adaptive agents](https://arxiv.org/abs/2301.07608):
Submitted by kizumada t3_11rfxca in MachineLearning
understanding model in 2019 and evolved to ERNIE 3.0 Titan with 260 billion parameters. ERNIE 1.0: [https://arxiv.org/abs/1904.09223](https://arxiv.org/abs/1904.09223) ERNIE 2.0: [https://arxiv.org/abs/1907.12412](https://arxiv.org/abs/1907.12412) ERNIE 3.0: [https://arxiv.org/abs/2112.12731](https://arxiv.org/abs/2112.12731) ERNIE for text-to-image ... arxiv.org/abs/2210.15257](https://arxiv.org/abs/2210.15257) ERNIE Bot live-stream on YouTube: [https://www.youtube.com/watch?v=ukvEUI3x0vI](https://www.youtube.com/watch?v=ukvEUI3x0vI)
Submitted by IamTimNguyen t3_105v7el in MachineLearning
papers: Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes: [https://arxiv.org/abs/1910.12478](https://arxiv.org/abs/1910.12478) Tensor Programs II: Neural Tangent Kernel for Any Architecture: [https://arxiv.org/abs/2006.14548](https://arxiv.org/abs/2006.14548) Tensor Programs III: Neural ... Matrix Laws: [https://arxiv.org/abs/2009.10685](https://arxiv.org/abs/2009.10685) Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks: [https://proceedings.mlr.press/v139/yang21c.html](https://proceedings.mlr.press/v139/yang21c.html) Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer: [https://arxiv.org/abs/2203.03466](https://arxiv.org/abs/2203.03466)
InfuriatinglyOpaque t1_ivb9otw wrote
Reply to Training a board game player AI for an asymmetric game by computing_professor
Dorka, N., Burgard, W., Koltun, V., & Brox, T. (2020). Scaling Imitation Learning in Minecraft. [http://arxiv.org/abs/2007.02701](http://arxiv.org/abs/2007.02701) Bramlage, L., & Cortese, A. (2021). Generalized Attention-Weighted Reinforcement Learning. Neural Networks. [https://doi.org/10.1016/j.neunet.2021.09.023](https://doi.org/10.1016/j.neunet.2021.09.023) Frey ... Characterizing the dynamics of learning in repeated reference games. Cognitive Science, 44(6), e12845. [http://arxiv.org/abs/1912.07199](http://arxiv.org/abs/1912.07199) Kumaran, V., Mott, B. W., & Lester, J. C. (2019.). Generating Game Levels for Multiple Distinct Games with ... Hjelm, D., Bachman, P., & Courville, A. (2021). Pretraining Representations for Data-Efficient Reinforcement Learning. [http://arxiv.org/abs/2106.04799](http://arxiv.org/abs/2106.04799) Sibert, C., Gray, W. D., & Lindstedt, J. K. (2017). Interrogating Feature Learning Models to Discover Insights