A list of puns related to "Parsec (parser)"
Consider this parser for dot separated integers
import Text.Parsec as T.P
import Text.Parsec.Char
import Control.Monad
dottedint :: Parsec String () [Int]
dottedint = map read <$> sepBy1 num (char '.')
where num = liftM (:[]) (char '0') <|> liftM2 (:) nzdigit (many digit)
nzdigit :: Parsec String () Char
nzdigit = satisfy (`elem` ['1'..'9']) <?> "non-zero digit"
which accepts dotted decimals where the numbers have no redundant zeroes.
The parsing works as expected but I'm not satisfied with the parsing errors:
> parseTest (dottedint <* eof) "1.."
parse error at (line 1, column 3):
unexpected "."
expecting "0" or non-zero digit
This isn't wrong but I'd prefer this to simply say
expecting digit
> parseTest (dottedint <* eof) "1.1a"
parse error at (line 1, column 4):
unexpected 'a'
expecting "." or end of input
This is more serious as it omits that a "digit" would be acceptable too. In other words I expected
expecting digit or "." or end of input
Can somebody explain to me how parsec computes the set of expected tokens as well as how to modify the parsec parser example above to address the two issues?
I have been working on this library for a while but I think it is now working to a point where others may get some us of it. Everything is still to be considered unstable for now though so expect breakage if you end up using it.
The library is exactly what it says on the box, it provides parser combinators which is, if you don't know, a simple way of writing (LL(1)) parsers by combining simple parsers into more advanced ones. It is based on the Haskell library parsec though internally it is quite different. On the user side this will mostly be noticed through the use distinct types for each parser (similar to how iterators work) which should allow for efficient static dispatch at the expense of compilation time.
The library has been on github for a while but I also uploaded to crates.io so that it would be easy get if someone wanted to try it out.
Questions or suggestions for improvement are welcome!
npm: https://www.npmjs.com/package/body-parsec (cos parsec is reserved)
github: https://github.com/talentlessguy/parsec
I have written a body parser that accepts form / json / raw / text data and can be easily used with Express as a middleware. Also, it can be used with built-in http server. I know that there are body and co-body already, but they have extra dependencies and I wanted to make something between Express body-parser
and co-body
, not too primitive but at the same time not too huge. Package is in development but can be used for simple tasks, such as parsing text forms.
application/json
for parsec.json)Please send some feedback or critics here.
A week ago I asked a question on Stack Overflow about writing a Parsec parser that uses constant heap space. It didn't receive any answers, but it did receive a comment that made me think I'm misunderstanding something fundamental. I would really appreciate it if anyone had any hints as to where I might be going wrong. Or even just pointers to papers/blog posts with more information.
I've reproduced the question below for ease of viewing.
I wrote the following function:
manyLength
:: forall s u m a. ParsecT s u m a -> ParsecT s u m Int
manyLength p = go 0
where
go :: Int -> ParsecT s u m Int
go !i = (p *> go (i + 1)) <|> pure i
This function is similar to many
. However, instead of returning [a]
, it returns the number of times it was able to successfully run the parser p
.
This works well, except for one problem. It doesn't run in constant heap space.
Here's an alternative way of writing manyLength
that does run in constant heap space:
manyLengthConstantHeap
:: forall s u m a. ParsecT s u m a -> ParsecT s u m Int
manyLengthConstantHeap p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
((p *> pure True) <|> pure False) >>=
\success -> if success then go (i+1) else pure i
This is a significant improvement, but I don't understand why manyLengthConstantHeap
uses constant heap space, while my original manyLength
doesn't.
If you inline (<|>)
in manyLength, it looks somewhat like this:
manyLengthInline
:: forall s u m a. Monad m => ParsecT s u m a -> ParsecT s u m Int
manyLengthInline p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
ParsecT $ \s cok cerr eok eerr ->
let meerr :: ParserError -> m b
meerr err =
let neok :: Int -> State s u -> ParserError -> m b
neok y s' err' = eok y s' (mergeError err err')
neerr :: ParserError -> m b
neerr err' = eerr $ mergeError err err'
in unParser (pure i) s cok cerr neok neerr
... keep reading on reddit β‘Since Parsec is quite procedural in how it consumes characters, it is easy to mis-parse input by eating too many or too few characters. In those cases having a function like this that outputs the current state of the input stream is useful:
seeNext :: Int -> ParsecT String u Identity ()
seeNext n = do
s <- getParserState
let out = take n (stateInput s)
println out
Here's a full program that shows usage:
import Text.Parsec
import Text.Parsec.Prim
import Debug.Trace
import Data.Functor.Identity
println msg = trace (show msg) $ return ()
seeNext :: Int -> ParsecT String u Identity ()
seeNext n = do
s <- getParserState
let out = take n (stateInput s)
println out
betweenBraces = char '{' >> manyTill (seeNext 10 >> anyChar) (char '}')
test = parseTest betweenBraces "{12345}"
{-
> test
"12345}"
"2345}"
"345}"
"45}"
"5}"
"12345"
-}
Hi.
In the following gist, I resumed a few issues I have with parsec and I'm sure you can help me to find a cleaner solution:
https://gist.github.com/guibou/509c3537f3a9e256296b
My issues are two-fold. First, I'm trying to parse lines such as "Hello I can contain *bold* content". This is resumed as:
data LineContent = Bold String | RawText String
anyItem :: Parser LineContent
anyItem = RawText . (:[]) <$> anyChar
bold :: Parser LineContent
bold = Bold <$> (char '*' *> manyTill anyChar (char '*'))
untilEol :: Parser [LineContent]
untilEol = manyTill (choice [bold, anyItem]) (string "\n")
However I'm not satisfied because I get one RawText per char instead of many char in a RawText. I wanted to write:
anyItem = RawText <$> (many anyChar)
but this does not work because anyItem can consume a bold sequence.
My second issue is that I have the feeling that most parser must know their surrounding and context to work. The syntax I'm trying to parse is composed of block starting with a pattern ("Q. " or "A. ") and followed by many lines. Currently I have something such as :
parseBlock = string "Q. " *> many lines
lines = notFollowedBy (choice [string "A. ", string "Q. "])
*> manyTill anyChar (string "\n")
I'm not satisfied because any new block will force an update and the lines parser.
Thank you.
Hello,
Does anyone have a link handy to a tutorial on writing efficient Parsec parsers? I have a parser (part of the External Core library, for anyone who cares) that's using a huge amount of memory, and I'm not sure where to start in improving the code. I know the advice to avoid backtracking as much as possible, and the parser certainly does make heavy use of backtracking, but I'm not sure where to begin. I don't expect a code critique here, but any links to general advice would be more than welcome. Googling didn't turn up much, nor did searching the haskell-cafe archives.
I'm using Parsec 2.1.0.1, fwiw.
I was wondering whether it would be better to use a hand written recursive descent parser or write my own parser combinator library. What are the pros and cons of each?
Thanks!
I have to write some DSL language in .NET and as I know Parsec from Haskell, I decided to find alternative in c#/f#. I've tried language-ext.Parsec because it's kinda popular, not abandoned and also because I don't know f#, but I didn't enjoy lack of expression in c# syntax - there are no custom operators, functions are with parentheses so even function composition starts to look like a lisp.
Should I try F#? Are there any good, not abandoned years ago, parsec-like libraries?
Hello everyone, Im a bit late but I am still working on AoC 2021.
I am currently at day 16, which was involves some parsing of binary data.
I thought that this is easily done with some data types and Parsec, however I am stuck on one thing:
I have an ADT:
data Type = Literal Int | Operator [Packet]
and I am parsing a group of 5 bits with some simple Parsec stuff (anyChar) and a parser with this signature:
groupParser :: ParsecT String u Identity String
The String part however is actually a binary number, which I now want to feed into the Literal constructor:pure $ Literal groupParser
This obviously won't work, since its a type mismatch... Even if you do this:
Literal (binToDec <$> groupParser) (where binToDec :: String -> Int)
it won't work since its not an Int...
Is there any way to get the Int out of the ParsecT? I know monads normally don't work like this (they're well defined) but since I know my input is safe id like to just get the Int there...
Even if the input is not safe a Maybe Int would work....
But I seem to miss some trick/idea to translate what I had into mind into working Haskell code...
So: Is there any way to directly parse the input via the groupParser
to an Int or do I have to rethink the way I implement my data types?
Remember when everyone said you needed to hit rank 14 if you wanted to compete in SoM?
https://vanilla.warcraftlogs.com/character/eu/dreadnaught/ytal
Laty is running 21/30/0 spec to get death wish, sweeping strikes, improved slam, and flurry... he's using a two-hander, even on single target fights... and he has exactly zero PvP gear equipped.
The only caveat is that this strategy works much better on horde. Laty is filling his extra global cooldowns with hamstring, which has a chance to proc windfury in classic. You can see on this top Ragnaros parse that he used hamstring 17 times, which would give him an average of 3.4 extra melee swings over the course of the fight that an alliance warrior could never get. The increased rage generation from windfury also allows for more slams and whirlwinds than an alliance warrior, although that's harder to quantify.
It remains to be seen if this strategy continues to be useful on BWL bosses, which have less adds than Molten Core bosses. But given his performance on even some of the pure single target fights in MC (Shazzrah, Baron Geddon) I think two-hand will continue to be a viable option for warriors.
I have this simple type of Boolean expressions:
data BoolExp a =
BSelect a
| BNot (BoolExp a)
| BAnd [BoolExp a]
| BOr [(BoolExp a)]
instance Show a => Show (BoolExp a) where
show (BSelect a) = show a
show (BNot a) = "! " ++ pwrap (show a)
show (BAnd as) = intercalate " & " ( (pwrap . show) <$> as)
show (BOr as) = intercalate " | " ( (pwrap . show) <$> as)
pwrap p = "(" ++ p ++ ")"
My goal is to be able to parse them. I have written this code:
import Text.Parsec
import Text.Parsec.String
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (emptyDef)
import qualified Data.ByteString as B
import Regex.Types
lexer = P.makeTokenParser emptyDef
parens = P.parens lexer
braces = P.braces lexer
symbol = P.symbol lexer
natural = P.natural lexer
pSelect :: Integral a => Parser (BoolExp a)
pSelect = BSelect <$> fromIntegral <$> natural
pAnd :: Parser (BoolExp a) -> Parser (BoolExp a)
pAnd prsr = BAnd <$> prsr `sepBy1` (symbol "&")
pOr :: Parser (BoolExp a) -> Parser (BoolExp a)
pOr prsr = BOr <$> prsr `sepBy1` (symbol "|")
pNot :: Parser (BoolExp a) -> Parser (BoolExp a)
pNot prsr = BNot <$> ((symbol "!") *> prsr)
pBool :: Integral a => Parser (BoolExp a)
pBool = try (pNot pBool)
<|> try (pOr pBool)
<|> try (pAnd pBool)
<|> try (parens pBool)
<|> pSelect
I think the grammar I am trying to go for is self explanatory based on the code. However, this goes into a loop for most expressions, even parse pBool "" "! 12"
. How can I fix this?
The org-parser is an amazing library written by 200ok-ch. I want to share my joy with you about it.
Certainly, this is not the first org parser out there. What's different, however, is that it's a programmatic parser with a formal specification. Lost? Let me explain.
TLDR: it parses a formal grammar of org-mode in its EBN form, and generates a parser!
Let's quickly go through what an EBN form is. Roughly, it is a specification of the grammar of grammars! For example, the following is an EBN form for simple math expressions (taken here).
(setq simple-math-grammar
"
<S> = VAL | EXPR | PAR
PAR = <'('> S <')'>
<EXPR> = S OP S
VAL = #'[0-9]+'
OP = '+' | '-' | '*' | '/'
")
With this grammar specification, it's an interesting problem to
write a parser generator parser
that turns
simple-math-grammar
into a programmatic parser:
(funcall (parser simple-math-grammar) "((2*3)+1+2)/4")
=> ([:PAR [:PAR [:VAL "2"] [:OP "*"] [:VAL "3"]]
[:OP "+"] [:VAL "1"] [:OP "+"] [:VAL "2"]]
[:OP "/"] [:VAL "4"])
How magical, right? (Check this blogpost to see how it allows you to transform the parsed data easily afterward.)
It's great because of several reasons. First, it is a specification of the grammar of grammars. It provides a theoretic framework to study grammars.
Practically, what's more important perhaps is that it provides a formal way to specify a language! This is far better than having a loose description of a language by examples. It makes the language rigorous, better, cleaner, portable, and easier to extend!
The official specification of org-mode did a good job, but it is
not machine readable nor formal. So there are only tests (but no
proofs) for that the official parser (org-element.el
) really
does what the spec says. It would be nice if the official
org-mode can have such a formal specification. (For other
benefits, see this).
Enter org-parser! It is indeed such a thing implemented
already! Remember the magical parser
I mentioned above? It is
already implemented here
[Engelberg/instaparse](https://githu
Hello, new to C#. I'm trying to parse some XML data from an API but there seems to be a lot of different XML parsers out there - XMLDocument, XDocument, XMLSerializer, XMLReader etc.
Is there a standard one everyone uses? Or which one is a newer/better performance/better syntax/version?
Would love some advice/suggestions. Thanks!
I want to implement a functional language so recently I decided to make the move from the Scala style function application ( f(x, y)
) to the Haskell style application ( f x y
).
So I've been trying to think of how to implement it in a way that respects operator precedence but I've come up blank. I tried Google but I haven't got any useful results (Maybe it's because I'm using the wrong search terms). So I decided to ask here.
After thinking, I have distilled the problem down to this: Function application is an infix operation. To put it in the precedence table, I need an operator token. The application operator is just the whitespace between the function and the argument. But, I throw away all the unnecessary whitespace tokens in the lexer so I can't use them in the parser.
By the way, my parser is based on the [blog post] (http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/) by u/munificent.
Thanks in advance!
E: Linked to the blog post.
There are many parsing combinator libraries in Rust but I've yet to find a good comprehensive comparison. Would be nice to read about things like ease-of-use, documentation, performance, ect.
My take:
The most popular seems to be nom
, there are many blogs using nom which is nice, but for some reason I find combine
much more readable while trying to learn all this. This is not a deal breaker but nom
returning (rest, match)
instead of (match, rest)
is very unintuitive and slightly hurts my eyes, combine seems to solve this. Also combine seems to be inspired by this parsing library from Haskell called Parsec, I don't know how good or famous Parsec is to know if it matters to follow its style.
There are many other libraries but I don't think I have the energy to explore all them, if someone with more experience can "parse" through all this it would be of great help.
I want to know about some simple method to parse json,
have read jsmn (https://github.com/zserge/jsmn)
is there some way to based on some middle library?
Still need to be nerfed with slower rotation speed and hit scan changed to projectile based.
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.