61 points by Apanatshka 6 days ago | 2 comments
Scene_Cast2 3 days ago
codebje 3 days ago
An LR parser will produce a parse tree, showing how the given token stream satisfies the parser's grammar - or an error, if the token stream doesn't satisfy the grammar. Your typical parser algorithm is deterministic, even if the grammar it matches isn't, so you'd expect to get the same parse tree for the same token stream, every time. By and large LR parsers (as described in TFA) are shift/reduce parsers: there's a bunch of states, and in each state the next token in the input stream determines if the state shifts (pushes the token onto a term stack and moves to the next input token) and jumps to another state, or reduces (pops a number of terms off the term stack and replaces them with the term being reduced to) and goes back to a prior state. TFA talks about the situation where more than one shift or reduce action is possible for a given state and input token.
An LLM uses embeddings. AFAIK the current state of the art algorithm (method, perhaps?) for this is still transformers with attention: roughly speaking, each token in the input gets projected into an embedding vector, a query vector, and a key vector; attention means that for each token, all other tokens with a key vector near to their query vector have their embedding alter the query token's embedding, allowing for context to have an impact. Multiple layers of attention can run one after another, eventually producing a final set of token values. Different embedding models use different means to pick one of those values (eg, pick the last one in the input sequence, if your query/key vectors tend to push information 'forwards' to following tokens), and that's your whole input's embedding. Neural nets can be deterministic, but those used for embedding rarely are: the values produced by repeated runs over the same input will (or should) always be pretty close, but generally won't be exactly the same.