КОМЕНТАРІ
@Karl-Asger
@Karl-Asger 13 годин тому
Looking forward to more on task gen. Great work
@johntanchongmin
@johntanchongmin 2 години тому
Tomorrow is TaskGen Ask Me Anything session, join my discord group for link!
@Karl-Asger
@Karl-Asger Годину тому
@@johntanchongmin I joined the discord and saw that! Really happy for that, I spent my entire Sunday catching up on your projects through your great videos. Unfortunately I'm in CET so it will be 4am for me, but I'll catch up on it after! Thanks for sharing your work so enthusiastically 😎
@Karl-Asger
@Karl-Asger 21 годину тому
Amazing work John, I hope I can be part of future meetings, I love the points that you are most focused on in these developments.
@ginisksam
@ginisksam 3 дні тому
Hi John. Thanks for this package. Can be very powerful and flexible by adding more 'stuff' within the output_format like 'Thoughts', 'Reflection', etc. Will explore further, its so refreshing. BTW I have hooked up to Ollama API running local. Keep up the good work.
@johntanchongmin
@johntanchongmin 2 дні тому
Hope you like it :)
@ginisksam
@ginisksam День тому
Hi @@johntanchongmin Having this error ... An exception occurred: "Broad Trip Plan" not in json string output. You must use "###{key}###" to enclose the {key}. Current invalid json format: {... When using local llm. What is the likely cause. Look fwd to your kind guidance. Cheers
@johntanchongmin
@johntanchongmin День тому
@@ginisksam Local LLMs are likely not as good at instruction following. I'll be releasing a patch in a few days. Meanwhile you can add this line to your system prompt: "Begin your response with {{ and end with }}"
@johntanchongmin
@johntanchongmin День тому
@@ginisksam Could you also show me the invalid json format?
@ginisksam
@ginisksam День тому
@@johntanchongmin Here is snippet running cell below from strictjson_AMA...ipynb res2 = strict_json(system_prompt = '''Given the triplet list, write some code to display this as a Knowledge Graph''', user_prompt = triplet_list, output_format = {'Code': 'Code in Python to display knowledge graph from triplet list'}, llm=llm) Error: An exception occurred: list index out of range Current invalid json format: { "'###Code###': " "import networkx as nx " "from networkx.drawing.nx_agraph import write_dot " "from itertools import permutations " " " "triplets = [{ " "'obj1': 'LLM', 'relationship': 'provides', 'obj2': 'Base Functionalities'}, " "'obj1': 'Tutorial.ipynb', 'relationship': 'refers to', 'obj2': 'Base Functionalities'}, " "'obj1': 'int, float, str, dict, list', 'relationship': 'supports', 'obj2': 'Base Functionalities'}, " "'obj1': 'Dict[], List[], Enum[]', 'relationship': 'supported by', 'obj2': 'Base Functionalities'}, " "'obj1': 'bool type forcing', 'relationship': 'handled by', 'obj2': 'Base Functionalities'}, " "'obj1': 'LLM-based error correction', 'relationship': 'used for', 'obj2': 'Base Functionalities'}, " "'obj1': 'Function (renamed from strict_function)', 'relationship': 'available in', 'obj2': 'Base Functionalities'}, " "'obj1': 'OpenAI JSON Mode', 'relationship': 'enabled by', 'obj2': 'Base Functionalities'}, " "'obj1': 'llm variable', 'relationship': 'exposed for', 'obj2': 'strict_json, Function'} " "] " " " "# Create a graph " "G = nx.DiGraph() " " " "# Add nodes and edges from triplets " "for obj1, rel, obj2 in triplets: " " G.add_edge(obj1, obj2, relationship=rel) " ... Thanks for your continous guidance.
@johntanchongmin
@johntanchongmin 5 днів тому
Companion Notebook for this Tutorial: github.com/tanchongmin/strictjson/blob/main/strictjson_AMA_30Apr2024.ipynb
@johntanchongmin
@johntanchongmin 5 днів тому
Repo Link: github.com/tanchongmin/strictjson
@johntanchongmin
@johntanchongmin 13 днів тому
If you want the reference links to have a hyperlink to the exact place it is mentioned, insert \usepackage{hyperref}
@johntanchongmin
@johntanchongmin 13 днів тому
Overleaf (free account can co-write online with 1 other person): www.overleaf.com/ Google Scholar: scholar.google.com/ Llama 3: www.meta.ai/ ChatGPT: chat.openai.com/
@johntanchongmin
@johntanchongmin 13 днів тому
Basic prompt to generate overall paper code (Note: Use style guide if you have one for the conference): Give me sample latex for a research paper titled “Llama 3: Uses and Applications”. Include all useful packages for a research paper. Fill in all sections with placeholders. Use bib latex named “references.bib” for references. Basic prompt to generate code for specific figure, table, equation: Generate me latex code for <insert specific guidelines here> Basic prompt to refine the latex code: <Existing latex code snippet> <Your requested changes>
@chrisogonas
@chrisogonas 14 днів тому
While I also appreciate the flexibility of knowledge graphs (KGs) as far as them being able to easily represent relationships, I too agree with you that KGs are not necessarily the best or most effective way to represent intelligence. I will stay in tune with your works. I hope to publish on this in the near future. Thanks for the presentation.
@johntanchongmin
@johntanchongmin 13 днів тому
Glad it helps. I am actively pursuing my idea of multiple abstraction spaces, and KG can be one of them. The rest of how we store memory will depend on what kind of memory - semantic facts, episodic memory and so on. These can be stored in various ways like traditional databases, or even in video/image format.
@chrisogonas
@chrisogonas 12 днів тому
@@johntanchongmin Thanks for sharing your research. I will particularly follow closely your work on context-dependent embeddings. That's an exciting angle to explore in depth.
@moglixdhd
@moglixdhd 14 днів тому
buuuuuu where is the footage! i cant trust u
@johntanchongmin
@johntanchongmin 13 днів тому
Haha what footage?
@johntanchongmin
@johntanchongmin 19 днів тому
Slides can be found here: github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/SORA.pdf
@boonkiathan
@boonkiathan 19 днів тому
honestly until Sora is publicly released (like stable diff), even to a select group of public influencers, to prompt we are looking at output that may be - from a much longer prompt - selected from a wide range of many failed output generation - carefully picked from video capability that is confident from the training set esp. avoiding fast movements, hand and eyes,occlusion etc... not to say, it may be tens of thousands in undivulged GPU/APU costs i believe it is not edited that is as much as I can trust OpenAI
@johntanchongmin
@johntanchongmin 19 днів тому
That said, the spacetime patches is an interesting idea, and if it works out, could form the basis of a lot of innovation in the video prediction domain.
@johntanchongmin
@johntanchongmin 19 днів тому
References Part 2: Blog comparing between DALL-E, Stable Diffusion, Imagen: tryolabs.com/blog/2022/08/31/from-dalle-to-stable-diffusion Paper Attempting to reverse engineer SORA (I only agree with 20% of the paper): arxiv.org/abs/2402.17177 Vision Transformer: arxiv.org/abs/2010.11929 Good blog post about Vision Transformer: towardsdatascience.com/vision-transformers-explained-a9d07147e4c8 Diffusion Transformer: arxiv.org/abs/2212.09748
@johntanchongmin
@johntanchongmin 19 днів тому
References: SORA main page: openai.com/sora SORA technical report: openai.com/research/video-generation-models-as-world-simulators OpenAI CLIP Image and Text Embeddings: arxiv.org/abs/2103.00020 DALL-E: arxiv.org/abs/2102.12092 DALL-E 2: arxiv.org/abs/2204.06125 DALL-E 3: cdn.openai.com/papers/dall-e-3.pdf Stable Diffusion: arxiv.org/abs/2112.10752 Stable Diffusion XL - Making Stable Diffusion more high res: arxiv.org/abs/2307.01952 Stable Diffusion 3: arxiv.org/pdf/2403.03206.pdf ControlNet - adding more conditions to Stable Diffusion: arxiv.org/abs/2302.05543 I-JEPA (META): ai.meta.com/blog/yann-lecun-ai-model-i-jepa/ V-JEPA (META): ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/ Make-a-video (META): ai.meta.com/blog/generative-ai-text-to-video/ Imagen (Google): arxiv.org/abs/2205.11487 Denoising Diffusion Probabilistic Models (DDPM) - Diffusion in Pixel Space: arxiv.org/abs/2006.11239
@KimSiaSim
@KimSiaSim 21 день тому
are the examples such as AWS Bot, etc are open sourced, by any chance?
@johntanchongmin
@johntanchongmin 21 день тому
Brian has open sourced most of them on the TaskGen repo!
@johntanchongmin
@johntanchongmin 26 днів тому
1:07:31 This is a mistake on my end - this is not the ImageNet Supervised Learning model. Li. et. al. is actually the Visual N-gram model where they predict n-grams (n words) for each picture. arxiv.org/pdf/1612.09161.pdf Here, I believe they did not even implement out their model (it is quite low performance of 10+% accuracy on ImageNet), but rather, just use the method of how they use the class name text directly. They applied this on CLIP. Basically, the paper was misleading - they did not even need to refer to Li. et. al. for that chart as the methodology is totally different. It is just CLIP with ImageNet class names without any added prompt engineering.
@johntanchongmin
@johntanchongmin 26 днів тому
For the loss function at 1:00:15, they use Cross Entropy Loss with the input as the unnormalised logits (multiply by exponent term with temperature t). That is why there is a need to multiply the resultant cosine similarity matrix with the logits. In the Cross Entropy Loss function, this will be divided further by the summation of all other input terms multiplied by the exponent term (otherwise known as normalised). See pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html for details.
@johntanchongmin
@johntanchongmin 26 днів тому
CLIP's loss function has also been described as InfoNCE loss, a common loss term for contrastive learning. See builtin.com/machine-learning/contrastive-learning for details. It is essentially Cross Entropy over cosine similarity terms, which is what is done in CLIP.
@johntanchongmin
@johntanchongmin 26 днів тому
At 58:22, the weights W_i and W_t are the projections of the embedding space form the image model output and text model output respectively (allows for change in embedding dimension). This allows for more generic text and image models with different output dimensions, and they can all map to the same embedding dimension.
@johntanchongmin
@johntanchongmin 26 днів тому
Jupyter Notebook Code can be found here if you want to do your own experiments too: github.com/tanchongmin/TensorFlow-Implementations/tree/main/Paper_Reviews/CLIP/CLIP%20Code
@Qzariuss
@Qzariuss 26 днів тому
going to try this tomorrow
@johntanchongmin
@johntanchongmin Місяць тому
Updated the companion notebook to this video as OpenAI API and StrictJSON has been updated: github.com/tanchongmin/strictjson/blob/main/Experiments/LLM%20with%20Knowledge%20Graphs.ipynb
@evofx
@evofx Місяць тому
How can I join your classes? Is there a better UI grounding model than SeeClick?
@johntanchongmin
@johntanchongmin Місяць тому
Hey, the link to join in will be in my discord group (see the link in my UKposts profile) Also, for AppAgent, they actually use the xml directly to ground the agent's input space. For SeeClick, if I'm not wrong, it is using a Vision Transformer directly - they will not have the right nuances and precision for position on screen as what AppAgent did
@mwd6478
@mwd6478 Місяць тому
Could you add examples of nested dictionaries / jsons? The lists you have are amazing. I think a lot of folks might have relational data they want in a nested way.
@johntanchongmin
@johntanchongmin Місяць тому
Hey, definitely! Please see the Tutorial.ipynb on the github for the example. Also, I have modified how the Function works, I am thinking of updating this tutorial soon!
@mwd6478
@mwd6478 Місяць тому
@@johntanchongmin awesome! Thanks for the reply. I saw in the tutorial how to use another LLM. I'm intending to use Mixtral 8x7B with groq as the provider. Will be interesting!
@mwd6478
@mwd6478 Місяць тому
Does this work for making nested dictionaries?
@RealUniquee
@RealUniquee Місяць тому
Nice explanation, and I also agree with using SFT is better than using RLHF.
@yeong0120
@yeong0120 Місяць тому
Very informative sharing, Brian!
@johntanchongmin
@johntanchongmin Місяць тому
53:53 You can start here if you want to go straight to theory before the demo!
@johntanchongmin
@johntanchongmin Місяць тому
Note that for Brian's showcase, there were some errors when running them live - but all these are interfacing issues and can be solved easily :) Don't be afraid to use TaskGen for your use case, and we can build and improve TaskGen together!
@johntanchongmin
@johntanchongmin Місяць тому
Part 2 here: ukposts.info/have/v-deo/h4eca6OmiZyFsXk.html
@johntanchongmin
@johntanchongmin Місяць тому
Part 2 here: ukposts.info/have/v-deo/h4eca6OmiZyFsXk.html
@johntanchongmin
@johntanchongmin Місяць тому
Part 1 here: ukposts.info/have/v-deo/h4-JrYqLbod4qWg.html
@johntanchongmin
@johntanchongmin Місяць тому
v1.3.0 out with Global Context instead of Additional Context: github.com/simbianai/taskgen
@johntanchongmin
@johntanchongmin Місяць тому
Note: After this session, the "get_additional_context" variable will be renamed to "get_global_context" I'll create a series of Tutorial videos soon on how to use TaskGen once we finalise most details in a production version of it.
@MichaelChenAdventures
@MichaelChenAdventures Місяць тому
great video John!
@doctor2943
@doctor2943 Місяць тому
Can i use the same thing with Gemini ?! Or the mistral?
@johntanchongmin
@johntanchongmin Місяць тому
Sure thing, there is an llm variable that you can put to strict_json or Function, which allows you to use your own llm
@Constructive-ty6pl
@Constructive-ty6pl Місяць тому
For "use the hammer to open the door" I feel like old Sierra adventures are sensible enough to support that logic. But old LucasArts adventures are deliberately nonsensical sometimes, so no LLM logic could figure it out. Maybe you need to use a flowervase to pour water out on the floor, then use a can opener which causes the dog to come running and slip on the water, and they crash through the door. But, LucasArts games, you cannot die or lose or become "soft locked". So an agent can spend as much time being more and more creative until they find the solution, they will never break the game or run out of tries. Also the hint guides for LucasArts are structed I think where there are levels of hints, starting with text that is truly only a hint, like in-lore too. But the third level of hint is very explicit like [use] the [kerosene] on the [statue]. Compared to Sierra games, where your character can die, or miss an opportunity to pick up a required object, and now you can't progress or even go back. When to give up and load a previous save, or how far back to load, would be a difficult state to detect I think. Maybe you just need to increase the temperature of your LLM completion, to get a more creative result? So I wonder which is harder to train for. Sierra style, where the logic is straight and probably compatible with LLM knowledge, but the actual training itself has dead ends and punishment. Or, LucasArts, where it is a safe space, but the """logic""" is totally silly. I would love to see an agent trained on LucasArts, that outputs its reasoning for each step. "I will use the Parrot on the Intercom, because parrots mimic speech, so it will say the password". Even if that is not the solution to the puzzle, it would be entertaining to see how creative the LLM is being. Oh, also point and click adventures usually are waiting on the player, no worries about response time or Vision latency! And hey maybe a person can simply crank up the cycles in DOSBox so the game plays faster and training is accelerated, I bet there are a few games where the walk animations would be fast forwarded. If CRADLE is proof that this setup can work (extracting objectives text, using cosine similiary to limit the potential actions, etc) I bet playing old Point and Click adventures would reveal right away the limits of an LLMs reasoning and planning. Is there a LucasArts benchmark? lol
@Qzariuss
@Qzariuss Місяць тому
Very excited to see this advancement happening, it's been so many years of custom automation in games. LLMs will bring a true revolution to explore so much more than what was possible before
@johntanchongmin
@johntanchongmin Місяць тому
It is great that they can use memory and auto skill learning for this. However, the prompts are very game-dependent and lots of game-specific things are mentioned. While this may not be general, I do feel like even for AGI-like systems, custom prompts will need to be added. Right now we perform this custom prompt addition ourselves, but it could very likely in the near future that the prompt be automatically learned
@johntanchongmin
@johntanchongmin Місяць тому
Part 2 here: ukposts.info/have/v-deo/oICUjGquhap3rYU.html
@johntanchongmin
@johntanchongmin Місяць тому
Part 1 here: ukposts.info/have/v-deo/hXR4p32lrY2XrJs.html
@leonlysak4927
@leonlysak4927 Місяць тому
Ben Geortzel discusses something more "old school" like Kernal PCA being the best way they've found to create node embeddings
@leonlysak4927
@leonlysak4927 Місяць тому
You're the first person I've hear mentioning this concept of context-dependent embeddings. I started tinkering with the same idea back in December of last year, never had a name for it. I was doing some self-reflection and thought about how some of my own behaviors and thoughts were contradictory sometimes- dependent on how my emotions were and such. If I could make a certain perspective of mine a 'node' it's embedding would very likely change given different contexts
@johntanchongmin
@johntanchongmin Місяць тому
Nice, do let me know if you have any feedback / add-ons to this idea
@johntanchongmin
@johntanchongmin Місяць тому
Also, video on Context-Dependent Embeddings here: ukposts.info/have/v-deo/kYqFiJ6jh51h04k.html
@snehotoshbanerjee1938
@snehotoshbanerjee1938 Місяць тому
John, BTW, you are a great teacher!! Your teaching method is great specially at the end "Questions to Ponder" :)
@snehotoshbanerjee1938
@snehotoshbanerjee1938 Місяць тому
John, one question... Is Cohere rerank algorithm uses embeddings behind the scene for semantic search and ranking? I guess embedding is necessary for semantic search. What I am confused is between "Embed each sentence and compare" vs "Put both document and Query to the algorithm"? Are these two approaches differs in creating two embedding vs one single embeddings space?
@johntanchongmin
@johntanchongmin Місяць тому
I believe the Cohere rerank model takes in both query and document, and outputs a score. This means embeddings for query/document need not be generated as it is a full end-to-end system. The normal embedding method works at a sentence level, and allows us to compare arbitrary sentences by cosine similarity. Cohere reranker you need to keep redoing this comparison for every different query and key you have.
@snehotoshbanerjee1938
@snehotoshbanerjee1938 Місяць тому
Ok. Thank u John!
@snehotoshbanerjee1938
@snehotoshbanerjee1938 Місяць тому
knowledge packed and great insight on embeddings and it current scope of improvement.
@crypticnomad
@crypticnomad 2 місяці тому
I've been playing around with encodec, the python library and the huggingface transformer(slight differences in api), and as a mere mortal without infinite compute this is kinda hard to do much with. It is faster than realtime which is awesome but on my system it is taking roughly 1.2 seconds to encode a ~7 second long audio file and in google colab it is ~0.2 seconds using a gpu instance. That is fine in terms of end use cases but that makes training something else based on the output of this a huge pita with the standard transformer pipeline type methods. So what I did was take a balanced(by accent) subset of common voice spanish and then encode the audio files first. I'm saving them as numpy files and the file size reduction is really nice! It has been running for about 12 hours so far and this is just the audio encoding! I'll probably upload this dataset to huggingface once it is finished. Another issue I noticed is batch processing with encodec is pretty confusing and there seems to be little documentation on the subject. The transformer from huggingface has a param in the encode method for a padding mask but afaik no padding mask is output and decoding returns a slightly different shape so the original mask is useless. Did they train this thing one sample at a time? Would that not be painfully slow?
@crypticnomad
@crypticnomad 2 місяці тому
although VALL-E was only trained on English I tested Encodec with Spanish, Portugese, Russian and Mandrin and the errors in the encoding/decoding process were roughly the same for all languages tested
@TheHoinoel
@TheHoinoel 2 місяці тому
Thanks for this, this talk was excellent. I've been looking to combine LLMs with KGs and have very similar intuitions when it comes to using the same embedding space for the KG as for the LLM. I really like your frame of having the right abstraction spaces to solve the problem at hand. Having written countless prompts, as well as looking at how humans have solved problems over the years, it seems to me that fostering the right context (abstraction space) is vital when trying to solve a new problem. Einstein's discoveries were, in part possible due to the context of his life experience that gave him intuitions to solve a certain type of problem. The cool thing with LLMs is that we can bootload intuition at will, allowing us to swap out abstraction spaces until we find a combination that gives us the right context to solve a problem. Great work!
@johntanchongmin
@johntanchongmin 2 місяці тому
Slides: github.com/simbianai/taskgen/blob/main/resources/TaskGen.pdf
@arusharma5393
@arusharma5393 Місяць тому
dude wanted to know more about your project
@johntanchongmin
@johntanchongmin Місяць тому
@@arusharma5393 sure, connect with me on discord
@arusharma5393
@arusharma5393 Місяць тому
@@johntanchongmin can you share me the link;
@snehotoshbanerjee1938
@snehotoshbanerjee1938 2 місяці тому
Knowledge packed video and excellent teaching skill.
@Qzariuss
@Qzariuss 2 місяці тому
Thank you John
@johntanchongmin
@johntanchongmin 2 місяці тому
The multi-agent framework I said I was going to create is finally out: TaskGen - github.com/simbianai/taskgen It uses JSON as output, which is non-verbose and more targeted!