MambaByte: Token-Free Language Modeling

  Переглядів 5,519

Sasha Rush 🤗

Sasha Rush 🤗

17 днів тому

Mamba for efficient token-free language modeling - arxiv.org/abs/2401.13660 from Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan
Tutorial on Mamba: • Do we need Attention? ...

КОМЕНТАРІ: 11
@ChaseFreedomMusician
@ChaseFreedomMusician 15 днів тому
Great presentation!! Thank you!!
@JunxiongWang-jxw
@JunxiongWang-jxw 13 днів тому
Hi, for correction: The number of parameters for the transformer is 361M instead of 261M in this video, as shown in the paper.
@BooleanDisorder
@BooleanDisorder 14 днів тому
I definitely think attention in some form will survive even into a refined future mamba model due to its powerful ability to capture high dimension representations.
@tan-uz4oe
@tan-uz4oe 10 днів тому
Thank you for a great presentation Sasha@srush_nlp. You mentioned that MambaByte is still behind token-based models. I wonder what makes Mamba still theoretically inferior to token-based transformers or is it just a matter of discovering best practices and tricks?
@AM-yk5yd
@AM-yk5yd 15 днів тому
Charformer(which is mentioned in paper, gradient based tokenizers sound like pog) tests itself in multilingual tasks. It is interesting how RNNs would behave there. RWKV can easily jump to wrong language(model card talks about how using space in the end of the prompt can "upset the tokenizer") Also can we go lower? MambaBit when. Just imagine. Vocab size of 2. 😱
@donnychan1999
@donnychan1999 15 днів тому
Although it's an interesting idea, I don't understand how that is beneficial. Humans neither understand nor produce outputs in bits. I think it's more reasonable that each token is a smallest semantic/graphical unit (sememe/grapheme), but bit does not hold semantic/graphical information though.
@AM-yk5yd
@AM-yk5yd 15 днів тому
@@donnychan1999 Humans also don't produce output in bytes: There is no reason for 'кот' to take 2 times more "thought unit" comparing to 'cat' Well, meanwhile I decided to be change in world I want to see and published Maykeye/MambaBit on HF after torturing my laptop for 10 hours. "The cat can never" -> "The cat can never many be my father, Or else and the good many be my father, In the good many lord, and my father come." This is so cursed. Yet it's much better than I've expected.
@DewEfresh
@DewEfresh 14 днів тому
Nice work. I see the models on huggingface. Is there also a github or notebooks to train or run inference on them?
@vibingcat1
@vibingcat1 15 днів тому
Great work and presentation! Have you also compared MambaByte to the baselines on any downstream tasks/benchmarks?
@Khari99
@Khari99 14 днів тому
Great work!
@marcfruchtman9473
@marcfruchtman9473 15 днів тому
Very impressive.
Do we need Attention? A Mamba Primer
33:50
Sasha Rush 🤗
Переглядів 6 тис.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Переглядів 172 тис.
одни дома // EVA mash @TweetvilleCartoon
01:00
EVA mash
Переглядів 5 млн
Что будет с кроссовком?
00:35
Аришнев
Переглядів 2,5 млн
КИРПИЧ ОБ ГОЛОВУ #shorts
00:24
Паша Осадчий
Переглядів 6 млн
Eurovision Song Contest 2024: Second Semi-Final (Live Stream) | Malmö 2024 🇸🇪
2:22:36
GraphRAG: LLM-Derived Knowledge Graphs for RAG
15:40
Alex Chao
Переглядів 33 тис.
MAMBA from Scratch: Neural Nets Better and Faster than Transformers
31:51
Algorithmic Simplicity
Переглядів 63 тис.
DeepMind’s New Robots: An AI Revolution!
8:39
Two Minute Papers
Переглядів 197 тис.
Thomas Dietterich, "What’s Wrong with Large Language Models, and What We Should Be Building Instead"
1:15:31
Johns Hopkins Institute for Assured Autonomy
Переглядів 16 тис.
This is What Limits Current LLMs
7:05
Edan Meyer
Переглядів 18 тис.
TransformerFAM: Feedback attention is working memory
37:01
Yannic Kilcher
Переглядів 31 тис.
Лучший Смартфон До 149 Баксов!!!??? itel s24
20:25
РасПаковка ДваПаковка
Переглядів 45 тис.
Вы поможете украсть ваш iPhone
0:56
Romancev768
Переглядів 135 тис.
Он Отказался от БЕСПЛАТНОЙ видеокарты
0:40
ЖЕЛЕЗНЫЙ КОРОЛЬ
Переглядів 733 тис.
Лучший Смартфон До 149 Баксов!!!??? itel s24
20:25
РасПаковка ДваПаковка
Переглядів 45 тис.