John Tan Chong Min

11:22

Tutorial #14: Write latex papers with LLMs such as Llama 3!

День тому

2:10:13

SORA Deep Dive: Predict patches from text, images or video

14 днів тому

1:47:56

OpenAI CLIP Embeddings: Walkthrough + Insights

21 день тому

1:46:47

TaskGen - LLM Agentic Framework that Does More, Talks Less: Shared Variables, Memory, Global Context

Місяць тому

2:00:36

CRADLE (Part 2): An AI that can play Red Dead Dedemption 2. Reflection, Memory, Task-based Planning

Місяць тому

1:37:39

CRADLE (Part 1) - AI that plays Red Dead Redemption 2. Towards General Computer Control and AGI

Місяць тому

1:51:37

TaskGen - A Task-based Agentic Framework using StrictJSON at the core

2 місяці тому

2:05:11

SymbolicAI / ExtensityAI Paper Overview (Part 2) - Evaluation Benchmark Discussion!

2 місяці тому

1:32:22

SymbolicAI / ExtensityAI Paper Overview (Part 1) - Key Philosophy Behind the Design - Symbols

2 місяці тому

1:15:20

Embeddings Walkthrough (Part 2): Context-Dependent Embeddings, Shifting Embedding Space

2 місяці тому

1:37:30

Embeddings Walkthrough (Part 1) - Bag of Words to word2vec to Transformer contextual embeddings

2 місяці тому

1:33:16

V* - Better than GPT-4V? Iterative Context Refining for Visual Question Answer!

3 місяці тому

2:01:20

AutoGen: A Multi-Agent Framework - Overview and Improvements

3 місяці тому

1:41:05

AppAgent: Using GPT-4V to Navigate a Smartphone!

3 місяці тому

15:21

Tutorial #13: StrictJSON, my first Python Package! - Get LLMs to output into a working JSON!

3 місяці тому

4:08

"Are you smarter than an LLM?" game speedrun

4 місяці тому

11:19

Is Gemini better than GPT4? Self-created benchmark - Fact Retrieval/Checking, Coding, Tool Use

4 місяці тому

2:31:50

Learning, Fast and Slow: 10 Years Plan - Memory Soup, Hier. Planning, Emotions, Knowledge Sharing

5 місяців тому

47:02

Tutorial #12: Use ChatGPT and off-the-shelf RAG on Terminal/Command Prompt/Shell - SymbolicAI

5 місяців тому

1:50:16

JARVIS-1: Multi-modal (Text + Image) Memory + Decision Making with LLMs in MineCraft!

5 місяців тому

47:21

Tutorial #11: Virtual Persona from Documents, Multi-Agent Chat, Text-to-Speech to hear your Personas

5 місяців тому

1:49:33

A Roadmap for AI: Past, Present and Future (Part 3) - Multi-Agent, Multiple Sampling and Filtering

5 місяців тому

13:39

Learning, Fast and Slow: My Landmark Idea for fast, adaptable agents (ICDL 2023 Best Paper Finalist)

5 місяців тому

1:39:52

A Roadmap for AI: Past, Present and Future (Part 2) - Fixed vs Flexible, Memory Soup vs Hierarchy

5 місяців тому

34:02

AI & Education: Education when AI tools are smarter than us - Discussion with Kuang Wen (Part 2)

6 місяців тому

57:04

AI & Education: RAG Question-Answer, Test Question Generator, Autograder by Kuang Wen! (Part 1)

6 місяців тому

1:31:09

A Roadmap for AI: Past, Present and Future (Part 1) - Expert Systems, Supervised, Unsupervised

6 місяців тому

7:08

Tutorial #10: StrictJSON v2 (StrictText): Handle any output - quotation marks or backslash!

6 місяців тому

1:25:18

ChatDev: Can LLM Agents really replace a software company?

6 місяців тому

КОМЕНТАРІ

@Karl-Asger 13 годин тому

Looking forward to more on task gen. Great work

@johntanchongmin 2 години тому

Tomorrow is TaskGen Ask Me Anything session, join my discord group for link!

@Karl-Asger Годину тому

@@johntanchongmin I joined the discord and saw that! Really happy for that, I spent my entire Sunday catching up on your projects through your great videos. Unfortunately I'm in CET so it will be 4am for me, but I'll catch up on it after! Thanks for sharing your work so enthusiastically 😎

@Karl-Asger 21 годину тому

Amazing work John, I hope I can be part of future meetings, I love the points that you are most focused on in these developments.

@ginisksam 3 дні тому

Hi John. Thanks for this package. Can be very powerful and flexible by adding more 'stuff' within the output_format like 'Thoughts', 'Reflection', etc. Will explore further, its so refreshing. BTW I have hooked up to Ollama API running local. Keep up the good work.

@johntanchongmin 2 дні тому

Hope you like it :)

@ginisksam День тому

Hi @@johntanchongmin Having this error ... An exception occurred: "Broad Trip Plan" not in json string output. You must use "###{key}###" to enclose the {key}. Current invalid json format: {... When using local llm. What is the likely cause. Look fwd to your kind guidance. Cheers

@johntanchongmin День тому

@@ginisksam Local LLMs are likely not as good at instruction following. I'll be releasing a patch in a few days. Meanwhile you can add this line to your system prompt: "Begin your response with {{ and end with }}"

@johntanchongmin День тому

@@ginisksam Could you also show me the invalid json format?

@ginisksam День тому

@@johntanchongmin Here is snippet running cell below from strictjson_AMA...ipynb res2 = strict_json(system_prompt = '''Given the triplet list, write some code to display this as a Knowledge Graph''', user_prompt = triplet_list, output_format = {'Code': 'Code in Python to display knowledge graph from triplet list'}, llm=llm) Error: An exception occurred: list index out of range Current invalid json format: { "'###Code###': " "import networkx as nx " "from networkx.drawing.nx_agraph import write_dot " "from itertools import permutations " " " "triplets = [{ " "'obj1': 'LLM', 'relationship': 'provides', 'obj2': 'Base Functionalities'}, " "'obj1': 'Tutorial.ipynb', 'relationship': 'refers to', 'obj2': 'Base Functionalities'}, " "'obj1': 'int, float, str, dict, list', 'relationship': 'supports', 'obj2': 'Base Functionalities'}, " "'obj1': 'Dict[], List[], Enum[]', 'relationship': 'supported by', 'obj2': 'Base Functionalities'}, " "'obj1': 'bool type forcing', 'relationship': 'handled by', 'obj2': 'Base Functionalities'}, " "'obj1': 'LLM-based error correction', 'relationship': 'used for', 'obj2': 'Base Functionalities'}, " "'obj1': 'Function (renamed from strict_function)', 'relationship': 'available in', 'obj2': 'Base Functionalities'}, " "'obj1': 'OpenAI JSON Mode', 'relationship': 'enabled by', 'obj2': 'Base Functionalities'}, " "'obj1': 'llm variable', 'relationship': 'exposed for', 'obj2': 'strict_json, Function'} " "] " " " "# Create a graph " "G = nx.DiGraph() " " " "# Add nodes and edges from triplets " "for obj1, rel, obj2 in triplets: " " G.add_edge(obj1, obj2, relationship=rel) " ... Thanks for your continous guidance.

@johntanchongmin 5 днів тому

Companion Notebook for this Tutorial: github.com/tanchongmin/strictjson/blob/main/strictjson_AMA_30Apr2024.ipynb

@johntanchongmin 5 днів тому

Repo Link: github.com/tanchongmin/strictjson

@johntanchongmin 13 днів тому

If you want the reference links to have a hyperlink to the exact place it is mentioned, insert \usepackage{hyperref}

@johntanchongmin 13 днів тому

Overleaf (free account can co-write online with 1 other person): www.overleaf.com/ Google Scholar: scholar.google.com/ Llama 3: www.meta.ai/ ChatGPT: chat.openai.com/

@johntanchongmin 13 днів тому

Basic prompt to generate overall paper code (Note: Use style guide if you have one for the conference): Give me sample latex for a research paper titled “Llama 3: Uses and Applications”. Include all useful packages for a research paper. Fill in all sections with placeholders. Use bib latex named “references.bib” for references. Basic prompt to generate code for specific figure, table, equation: Generate me latex code for <insert specific guidelines here> Basic prompt to refine the latex code: <Existing latex code snippet> <Your requested changes>

@chrisogonas 14 днів тому

While I also appreciate the flexibility of knowledge graphs (KGs) as far as them being able to easily represent relationships, I too agree with you that KGs are not necessarily the best or most effective way to represent intelligence. I will stay in tune with your works. I hope to publish on this in the near future. Thanks for the presentation.

@johntanchongmin 13 днів тому

Glad it helps. I am actively pursuing my idea of multiple abstraction spaces, and KG can be one of them. The rest of how we store memory will depend on what kind of memory - semantic facts, episodic memory and so on. These can be stored in various ways like traditional databases, or even in video/image format.

@chrisogonas 12 днів тому

@@johntanchongmin Thanks for sharing your research. I will particularly follow closely your work on context-dependent embeddings. That's an exciting angle to explore in depth.

@moglixdhd 14 днів тому

buuuuuu where is the footage! i cant trust u

@johntanchongmin 13 днів тому

Haha what footage?

@johntanchongmin 19 днів тому

Slides can be found here: github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/SORA.pdf

@boonkiathan 19 днів тому

honestly until Sora is publicly released (like stable diff), even to a select group of public influencers, to prompt we are looking at output that may be - from a much longer prompt - selected from a wide range of many failed output generation - carefully picked from video capability that is confident from the training set esp. avoiding fast movements, hand and eyes,occlusion etc... not to say, it may be tens of thousands in undivulged GPU/APU costs i believe it is not edited that is as much as I can trust OpenAI

@johntanchongmin 19 днів тому

That said, the spacetime patches is an interesting idea, and if it works out, could form the basis of a lot of innovation in the video prediction domain.

@johntanchongmin 19 днів тому

References Part 2: Blog comparing between DALL-E, Stable Diffusion, Imagen: tryolabs.com/blog/2022/08/31/from-dalle-to-stable-diffusion Paper Attempting to reverse engineer SORA (I only agree with 20% of the paper): arxiv.org/abs/2402.17177 Vision Transformer: arxiv.org/abs/2010.11929 Good blog post about Vision Transformer: towardsdatascience.com/vision-transformers-explained-a9d07147e4c8 Diffusion Transformer: arxiv.org/abs/2212.09748

@johntanchongmin 19 днів тому

References: SORA main page: openai.com/sora SORA technical report: openai.com/research/video-generation-models-as-world-simulators OpenAI CLIP Image and Text Embeddings: arxiv.org/abs/2103.00020 DALL-E: arxiv.org/abs/2102.12092 DALL-E 2: arxiv.org/abs/2204.06125 DALL-E 3: cdn.openai.com/papers/dall-e-3.pdf Stable Diffusion: arxiv.org/abs/2112.10752 Stable Diffusion XL - Making Stable Diffusion more high res: arxiv.org/abs/2307.01952 Stable Diffusion 3: arxiv.org/pdf/2403.03206.pdf ControlNet - adding more conditions to Stable Diffusion: arxiv.org/abs/2302.05543 I-JEPA (META): ai.meta.com/blog/yann-lecun-ai-model-i-jepa/ V-JEPA (META): ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/ Make-a-video (META): ai.meta.com/blog/generative-ai-text-to-video/ Imagen (Google): arxiv.org/abs/2205.11487 Denoising Diffusion Probabilistic Models (DDPM) - Diffusion in Pixel Space: arxiv.org/abs/2006.11239

@KimSiaSim 21 день тому

are the examples such as AWS Bot, etc are open sourced, by any chance?

@johntanchongmin 21 день тому

Brian has open sourced most of them on the TaskGen repo!

@johntanchongmin 26 днів тому

1:07:31 This is a mistake on my end - this is not the ImageNet Supervised Learning model. Li. et. al. is actually the Visual N-gram model where they predict n-grams (n words) for each picture. arxiv.org/pdf/1612.09161.pdf Here, I believe they did not even implement out their model (it is quite low performance of 10+% accuracy on ImageNet), but rather, just use the method of how they use the class name text directly. They applied this on CLIP. Basically, the paper was misleading - they did not even need to refer to Li. et. al. for that chart as the methodology is totally different. It is just CLIP with ImageNet class names without any added prompt engineering.

@johntanchongmin 26 днів тому

For the loss function at 1:00:15, they use Cross Entropy Loss with the input as the unnormalised logits (multiply by exponent term with temperature t). That is why there is a need to multiply the resultant cosine similarity matrix with the logits. In the Cross Entropy Loss function, this will be divided further by the summation of all other input terms multiplied by the exponent term (otherwise known as normalised). See pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html for details.

@johntanchongmin 26 днів тому

CLIP's loss function has also been described as InfoNCE loss, a common loss term for contrastive learning. See builtin.com/machine-learning/contrastive-learning for details. It is essentially Cross Entropy over cosine similarity terms, which is what is done in CLIP.

@johntanchongmin 26 днів тому

At 58:22, the weights W_i and W_t are the projections of the embedding space form the image model output and text model output respectively (allows for change in embedding dimension). This allows for more generic text and image models with different output dimensions, and they can all map to the same embedding dimension.

@johntanchongmin 26 днів тому

Jupyter Notebook Code can be found here if you want to do your own experiments too: github.com/tanchongmin/TensorFlow-Implementations/tree/main/Paper_Reviews/CLIP/CLIP%20Code

@Qzariuss 26 днів тому

going to try this tomorrow

@johntanchongmin Місяць тому

Updated the companion notebook to this video as OpenAI API and StrictJSON has been updated: github.com/tanchongmin/strictjson/blob/main/Experiments/LLM%20with%20Knowledge%20Graphs.ipynb

@evofx Місяць тому

How can I join your classes? Is there a better UI grounding model than SeeClick?

@johntanchongmin Місяць тому

Hey, the link to join in will be in my discord group (see the link in my UKposts profile) Also, for AppAgent, they actually use the xml directly to ground the agent's input space. For SeeClick, if I'm not wrong, it is using a Vision Transformer directly - they will not have the right nuances and precision for position on screen as what AppAgent did

@mwd6478 Місяць тому

Could you add examples of nested dictionaries / jsons? The lists you have are amazing. I think a lot of folks might have relational data they want in a nested way.

@johntanchongmin Місяць тому

Hey, definitely! Please see the Tutorial.ipynb on the github for the example. Also, I have modified how the Function works, I am thinking of updating this tutorial soon!

@mwd6478 Місяць тому

@@johntanchongmin awesome! Thanks for the reply. I saw in the tutorial how to use another LLM. I'm intending to use Mixtral 8x7B with groq as the provider. Will be interesting!

@mwd6478 Місяць тому

Does this work for making nested dictionaries?

@RealUniquee Місяць тому

Nice explanation, and I also agree with using SFT is better than using RLHF.

@yeong0120 Місяць тому

Very informative sharing, Brian!

@johntanchongmin Місяць тому

53:53 You can start here if you want to go straight to theory before the demo!

@johntanchongmin Місяць тому

Note that for Brian's showcase, there were some errors when running them live - but all these are interfacing issues and can be solved easily :) Don't be afraid to use TaskGen for your use case, and we can build and improve TaskGen together!

@johntanchongmin Місяць тому

Part 2 here: ukposts.info/have/v-deo/h4eca6OmiZyFsXk.html

@johntanchongmin Місяць тому

Part 2 here: ukposts.info/have/v-deo/h4eca6OmiZyFsXk.html

@johntanchongmin Місяць тому

Part 1 here: ukposts.info/have/v-deo/h4-JrYqLbod4qWg.html

@johntanchongmin Місяць тому

v1.3.0 out with Global Context instead of Additional Context: github.com/simbianai/taskgen

@johntanchongmin Місяць тому

Note: After this session, the "get_additional_context" variable will be renamed to "get_global_context" I'll create a series of Tutorial videos soon on how to use TaskGen once we finalise most details in a production version of it.

@MichaelChenAdventures Місяць тому

great video John!

@doctor2943 Місяць тому

Can i use the same thing with Gemini ?! Or the mistral?

@johntanchongmin Місяць тому

Sure thing, there is an llm variable that you can put to strict_json or Function, which allows you to use your own llm

@Constructive-ty6pl Місяць тому

For "use the hammer to open the door" I feel like old Sierra adventures are sensible enough to support that logic. But old LucasArts adventures are deliberately nonsensical sometimes, so no LLM logic could figure it out. Maybe you need to use a flowervase to pour water out on the floor, then use a can opener which causes the dog to come running and slip on the water, and they crash through the door. But, LucasArts games, you cannot die or lose or become "soft locked". So an agent can spend as much time being more and more creative until they find the solution, they will never break the game or run out of tries. Also the hint guides for LucasArts are structed I think where there are levels of hints, starting with text that is truly only a hint, like in-lore too. But the third level of hint is very explicit like [use] the [kerosene] on the [statue]. Compared to Sierra games, where your character can die, or miss an opportunity to pick up a required object, and now you can't progress or even go back. When to give up and load a previous save, or how far back to load, would be a difficult state to detect I think. Maybe you just need to increase the temperature of your LLM completion, to get a more creative result? So I wonder which is harder to train for. Sierra style, where the logic is straight and probably compatible with LLM knowledge, but the actual training itself has dead ends and punishment. Or, LucasArts, where it is a safe space, but the """logic""" is totally silly. I would love to see an agent trained on LucasArts, that outputs its reasoning for each step. "I will use the Parrot on the Intercom, because parrots mimic speech, so it will say the password". Even if that is not the solution to the puzzle, it would be entertaining to see how creative the LLM is being. Oh, also point and click adventures usually are waiting on the player, no worries about response time or Vision latency! And hey maybe a person can simply crank up the cycles in DOSBox so the game plays faster and training is accelerated, I bet there are a few games where the walk animations would be fast forwarded. If CRADLE is proof that this setup can work (extracting objectives text, using cosine similiary to limit the potential actions, etc) I bet playing old Point and Click adventures would reveal right away the limits of an LLMs reasoning and planning. Is there a LucasArts benchmark? lol

@Qzariuss Місяць тому

Very excited to see this advancement happening, it's been so many years of custom automation in games. LLMs will bring a true revolution to explore so much more than what was possible before

@johntanchongmin Місяць тому

It is great that they can use memory and auto skill learning for this. However, the prompts are very game-dependent and lots of game-specific things are mentioned. While this may not be general, I do feel like even for AGI-like systems, custom prompts will need to be added. Right now we perform this custom prompt addition ourselves, but it could very likely in the near future that the prompt be automatically learned

@johntanchongmin Місяць тому

Part 2 here: ukposts.info/have/v-deo/oICUjGquhap3rYU.html

@johntanchongmin Місяць тому

Part 1 here: ukposts.info/have/v-deo/hXR4p32lrY2XrJs.html

@leonlysak4927 Місяць тому

Ben Geortzel discusses something more "old school" like Kernal PCA being the best way they've found to create node embeddings

@leonlysak4927 Місяць тому

You're the first person I've hear mentioning this concept of context-dependent embeddings. I started tinkering with the same idea back in December of last year, never had a name for it. I was doing some self-reflection and thought about how some of my own behaviors and thoughts were contradictory sometimes- dependent on how my emotions were and such. If I could make a certain perspective of mine a 'node' it's embedding would very likely change given different contexts

@johntanchongmin Місяць тому

Nice, do let me know if you have any feedback / add-ons to this idea

@johntanchongmin Місяць тому

Also, video on Context-Dependent Embeddings here: ukposts.info/have/v-deo/kYqFiJ6jh51h04k.html

@snehotoshbanerjee1938 Місяць тому

John, BTW, you are a great teacher!! Your teaching method is great specially at the end "Questions to Ponder" :)

@snehotoshbanerjee1938 Місяць тому

John, one question... Is Cohere rerank algorithm uses embeddings behind the scene for semantic search and ranking? I guess embedding is necessary for semantic search. What I am confused is between "Embed each sentence and compare" vs "Put both document and Query to the algorithm"? Are these two approaches differs in creating two embedding vs one single embeddings space?

@johntanchongmin Місяць тому

I believe the Cohere rerank model takes in both query and document, and outputs a score. This means embeddings for query/document need not be generated as it is a full end-to-end system. The normal embedding method works at a sentence level, and allows us to compare arbitrary sentences by cosine similarity. Cohere reranker you need to keep redoing this comparison for every different query and key you have.

@snehotoshbanerjee1938 Місяць тому

Ok. Thank u John!

@snehotoshbanerjee1938 Місяць тому

knowledge packed and great insight on embeddings and it current scope of improvement.

@crypticnomad 2 місяці тому

I've been playing around with encodec, the python library and the huggingface transformer(slight differences in api), and as a mere mortal without infinite compute this is kinda hard to do much with. It is faster than realtime which is awesome but on my system it is taking roughly 1.2 seconds to encode a ~7 second long audio file and in google colab it is ~0.2 seconds using a gpu instance. That is fine in terms of end use cases but that makes training something else based on the output of this a huge pita with the standard transformer pipeline type methods. So what I did was take a balanced(by accent) subset of common voice spanish and then encode the audio files first. I'm saving them as numpy files and the file size reduction is really nice! It has been running for about 12 hours so far and this is just the audio encoding! I'll probably upload this dataset to huggingface once it is finished. Another issue I noticed is batch processing with encodec is pretty confusing and there seems to be little documentation on the subject. The transformer from huggingface has a param in the encode method for a padding mask but afaik no padding mask is output and decoding returns a slightly different shape so the original mask is useless. Did they train this thing one sample at a time? Would that not be painfully slow?

@crypticnomad 2 місяці тому

although VALL-E was only trained on English I tested Encodec with Spanish, Portugese, Russian and Mandrin and the errors in the encoding/decoding process were roughly the same for all languages tested

@TheHoinoel 2 місяці тому

Thanks for this, this talk was excellent. I've been looking to combine LLMs with KGs and have very similar intuitions when it comes to using the same embedding space for the KG as for the LLM. I really like your frame of having the right abstraction spaces to solve the problem at hand. Having written countless prompts, as well as looking at how humans have solved problems over the years, it seems to me that fostering the right context (abstraction space) is vital when trying to solve a new problem. Einstein's discoveries were, in part possible due to the context of his life experience that gave him intuitions to solve a certain type of problem. The cool thing with LLMs is that we can bootload intuition at will, allowing us to swap out abstraction spaces until we find a combination that gives us the right context to solve a problem. Great work!

@johntanchongmin 2 місяці тому

Slides: github.com/simbianai/taskgen/blob/main/resources/TaskGen.pdf

@arusharma5393 Місяць тому

dude wanted to know more about your project

@johntanchongmin Місяць тому

@@arusharma5393 sure, connect with me on discord

@arusharma5393 Місяць тому

@@johntanchongmin can you share me the link;

@snehotoshbanerjee1938 2 місяці тому

Knowledge packed video and excellent teaching skill.

@Qzariuss 2 місяці тому

Thank you John

@johntanchongmin 2 місяці тому

The multi-agent framework I said I was going to create is finally out: TaskGen - github.com/simbianai/taskgen It uses JSON as output, which is non-verbose and more targeted!

Найкраще на UKposts

КОМЕНТАРІ