MambaByte: Token-Free Language Modeling

Do we need Attention? A Mamba Primer

The Most Important Algorithm in Machine Learning

одни дома // EVA mash @TweetvilleCartoon

Что будет с кроссовком?

КИРПИЧ ОБ ГОЛОВУ #shorts

Eurovision Song Contest 2024: Second Semi-Final (Live Stream) | Malmö 2024 🇸🇪

MambaByte: Token-Free Language Modeling

Переглядів 5,519

Sasha Rush 🤗

Sasha Rush 🤗

17 днів тому

Mamba for efficient token-free language modeling - arxiv.org/abs/2401.13660 from Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan
Tutorial on Mamba: • Do we need Attention? ...

КОМЕНТАРІ: 11

@ChaseFreedomMusician

@ChaseFreedomMusician 15 днів тому

Great presentation!! Thank you!!

@JunxiongWang-jxw

@JunxiongWang-jxw 13 днів тому

Hi, for correction: The number of parameters for the transformer is 361M instead of 261M in this video, as shown in the paper.

@BooleanDisorder

@BooleanDisorder 14 днів тому

I definitely think attention in some form will survive even into a refined future mamba model due to its powerful ability to capture high dimension representations.

@tan-uz4oe 10 днів тому

Thank you for a great presentation Sasha@srush_nlp. You mentioned that MambaByte is still behind token-based models. I wonder what makes Mamba still theoretically inferior to token-based transformers or is it just a matter of discovering best practices and tricks?

@AM-yk5yd 15 днів тому

Charformer(which is mentioned in paper, gradient based tokenizers sound like pog) tests itself in multilingual tasks. It is interesting how RNNs would behave there. RWKV can easily jump to wrong language(model card talks about how using space in the end of the prompt can "upset the tokenizer") Also can we go lower? MambaBit when. Just imagine. Vocab size of 2. 😱

@donnychan1999 15 днів тому

Although it's an interesting idea, I don't understand how that is beneficial. Humans neither understand nor produce outputs in bits. I think it's more reasonable that each token is a smallest semantic/graphical unit (sememe/grapheme), but bit does not hold semantic/graphical information though.

@AM-yk5yd 15 днів тому

@@donnychan1999 Humans also don't produce output in bytes: There is no reason for 'кот' to take 2 times more "thought unit" comparing to 'cat' Well, meanwhile I decided to be change in world I want to see and published Maykeye/MambaBit on HF after torturing my laptop for 10 hours. "The cat can never" -> "The cat can never many be my father, Or else and the good many be my father, In the good many lord, and my father come." This is so cursed. Yet it's much better than I've expected.

@DewEfresh 14 днів тому

Nice work. I see the models on huggingface. Is there also a github or notebooks to train or run inference on them?

@vibingcat1 15 днів тому

Great work and presentation! Have you also compared MambaByte to the baselines on any downstream tasks/benchmarks?

@Khari99 14 днів тому

Great work!

@marcfruchtman9473

@marcfruchtman9473 15 днів тому

Very impressive.

Do we need Attention? A Mamba Primer

33:50

Do we need Attention? A Mamba Primer

Sasha Rush 🤗

Переглядів 6 тис.

The Most Important Algorithm in Machine Learning

40:08

The Most Important Algorithm in Machine Learning

Artem Kirsanov

Переглядів 172 тис.

одни дома // EVA mash @TweetvilleCartoon

01:00

одни дома // EVA mash @TweetvilleCartoon

EVA mash

Переглядів 5 млн

Что будет с кроссовком?

00:35

Что будет с кроссовком?

Аришнев

Переглядів 2,5 млн

КИРПИЧ ОБ ГОЛОВУ #shorts

00:24

КИРПИЧ ОБ ГОЛОВУ #shorts

Паша Осадчий

Переглядів 6 млн

Eurovision Song Contest 2024: Second Semi-Final (Live Stream) | Malmö 2024 🇸🇪

2:22:36

Eurovision Song Contest 2024: Second Semi-Final (Live Stream) | Malmö 2024 🇸🇪

Eurovision Song Contest

Переглядів 2,8 млн

GraphRAG: LLM-Derived Knowledge Graphs for RAG

15:40

GraphRAG: LLM-Derived Knowledge Graphs for RAG

Alex Chao

Переглядів 33 тис.

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

31:51

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Algorithmic Simplicity

Переглядів 63 тис.

Is LangGraph the Future of AgentExecutor? Comparison Reveals All!

9:26

Is LangGraph the Future of AgentExecutor? Comparison Reveals All!

Eden Marco

Переглядів 886

DeepMind’s New Robots: An AI Revolution!

8:39

DeepMind’s New Robots: An AI Revolution!

Two Minute Papers

Переглядів 197 тис.

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

37:17

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Yannic Kilcher

Переглядів 45 тис.

Thomas Dietterich, "What’s Wrong with Large Language Models, and What We Should Be Building Instead"

1:15:31

Thomas Dietterich, "What’s Wrong with Large Language Models, and What We Should Be Building Instead"

Johns Hopkins Institute for Assured Autonomy

Переглядів 16 тис.

This is What Limits Current LLMs

7:05

This is What Limits Current LLMs

Edan Meyer

Переглядів 18 тис.

TransformerFAM: Feedback attention is working memory

37:01

TransformerFAM: Feedback attention is working memory

Yannic Kilcher

Переглядів 31 тис.

Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)

55:55

Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)

Simons Foundation

Переглядів 143 тис.

MASSIVE Step Allowing AI Agents To Control Computers (MacOS, Windows, Linux)

19:10

MASSIVE Step Allowing AI Agents To Control Computers (MacOS, Windows, Linux)

Matthew Berman

Переглядів 77 тис.

Все, о новых iPad Air M2 и iPad Pro M4 OLED, Apple Pencil Pro и Magic Keyboard 2024 за 6 минут!

6:34

Все, о новых iPad Air M2 и iPad Pro M4 OLED, Apple Pencil Pro и Magic Keyboard 2024 за 6 минут!

ProTech

Переглядів 40 тис.

Будильник на iPhone - Г*ВНО! #технологии #iPhone #apple #опоздание

0:39

Будильник на iPhone - Г*ВНО! #технологии #iPhone #apple #опоздание

Alisher Beisebai

Переглядів 120 тис.

НОВЫЕ iPAD: ТРИ ГЛАВНЫХ ВОПРОСА - ОТВЕЧАЕМ

18:43

НОВЫЕ iPAD: ТРИ ГЛАВНЫХ ВОПРОСА - ОТВЕЧАЕМ

Droider

Переглядів 80 тис.

Лучший Смартфон До 149 Баксов!!!??? itel s24

20:25

Лучший Смартфон До 149 Баксов!!!??? itel s24

РасПаковка ДваПаковка

Переглядів 45 тис.

Вы поможете украсть ваш iPhone

0:56

Вы поможете украсть ваш iPhone

Romancev768

Переглядів 135 тис.

Он Отказался от БЕСПЛАТНОЙ видеокарты

0:40

Он Отказался от БЕСПЛАТНОЙ видеокарты

ЖЕЛЕЗНЫЙ КОРОЛЬ

Переглядів 733 тис.

Лучший Смартфон До 149 Баксов!!!??? itel s24

20:25

Лучший Смартфон До 149 Баксов!!!??? itel s24

РасПаковка ДваПаковка

Переглядів 45 тис.