A list of puns related to "Binary Encoding"
Problem. I have a list of md5's that came out in a rather oddly delimited csv file. because there are null's, most tools think is binary.
#>head -n 2 md5s.txt
βfilemd5β
β00d75d3634eaeba5a29e5362e549d645β
#> hexdump -c -
0000000 \0 376 \0 f \0 i \0 l \0 e \0 m \0 d \0 5
0000010 \0 376 \0 \n \0 376 \0 0 \0 0 \0 d \0 7 \0 5
0000020
I copied my terminal window for the output of the head command, but they are actually the ΓΎ character. x376 is the "Latin lower-case thorn" character or ΓΎ and it is in place of the double-quote character (x034? I think?) because the other columns in the original csv can have embedded commas, colons, semicolons, single and double quotes...and even ebmedded newlines in them as valid (which sucks).Those nulls make it look binary to most things.
Attempted solutions: cat, file, iconv, vim, awk recode... Notepad++ on Windows works, but I need to script it on the linux box (CentOS 7).
Additional detail:
The csv uses the DC4 (x024) character as the delimiter. I won't try to do that below but the layout of the data is:
col1,col2,filemd5, .... col14,col15\n
To get the list of md5's, which is >700,000 lines long, I just do
awk -F'\024' {NR!=1}|awk -F'024' '{print $3}' weirdcsv.csv
That pipes an awk command, whose only job is to skip the first line, to another awk command whose only job is to print the 3rd column. Awk is actually pretty great at leaving the original weird character encoding (which everything thinks is binary) alone, which is helpful for the program that produced the csv in the first place if it needs to import a list.
edit: Locale was POSIX for some reason, and iconv -f utf16be weirdfile.csv
worked with errors, but tr -d '\000' | tr '\376' '\042'
worked a treat even when locale was wrong. I can't believe I forgot about tr! Thanks to all. The steps to diagnose the locale/terminal nonsense are gold too, I have always taken that for granted that it is some kind of utf-8 these days. This is year 36 for me bashing away (not daily for many years, but often). I love that I can still learn new and useful things. Also we need a new standard that is a meta set of all past, present, and possible future characters encoded in 64bits per glyph (/s)
Now I have to go look at xxd again to remember why I stopped using it/or hate it for some re
... keep reading on reddit β‘Let's say i want to encode information onto a necklace with black and white beads. There's no end to a necklace, it's cyclic. However beads have irregular shape (let's say arrows), so you can tell if it's clockwise or counterclockwise.
If we know length of a binary sequence, is there a good way to make a necklace from it? Is there a way to also keep the original message inside of this necklace, so i could just cut it out?
The Simple Binary Encoding (SBE) project now includes support for generating Rust code. Generated code produced does not use unsafe and has no dependencies on any other crates.
SBE is an OSI layer 6 presentation for encoding and decoding binary application messages for low-latency financial applications.
So first a bit of background to clarify what I mean (the graph transformation notation below is the same as used by Wolfram.)
Graphs can be seen as sets of relations, so {{1, 2}, {1, 3}, {3, 2}}
is a graph with nodes 1, 2 and 3, and edges 1->2, 1->3 and 3->2. Note that relations can be hyperedges like {1, 2, 3}
, meaning an edge 1->2->3.
Graph transformation rules can be defined so that they have a pattern of relations they match, and a pattern that they produce: with that example graph, applying the rule
{{a, b}, {a, c}} -> {{b, a}, {a, d}, {c, d}}
would match so that the variables would bind to a=1, b=2, c=3, and the free variable d would be a new node with the next free ID 4, and it would produce the new graph {{2, 1}, {1, 4}, {3, 4}, {3, 2}}
.
Rules have an arity, and since hyperedges (edges with more than one node) are allowed this arity can be mixed: eg. {{a, b}, {c, d, a}} -> {{a, d, b}, {d, c, a}}
has an arity of 1β1β -> 2β
To guarantee that the produced graphs are always at least weakly connected, the following invariants must hold for all rules:
{{a, b}, {c, d}} -> β¦
is not allowedNow, my question is this: how could I come up with a binary representation for rules that works with an arbitrary but predefined arity (ie. the arity depends on configuration parameters and won't change during the decoding), so that any binary string of some maximum length will always encode a correct rule that preserves those invariants? This would naturally mean that multiple strings could decode to the same rule, i.e. it'd be a surjective mapping. What I don't want is to have to reject encoded strings if they don't produce correct rules; all strings must result in a valid transformation.
Not looking to be handed an answer, any ideas or pointers are welcome.
Edit: just as a side note, this is related some neuroevolution ideas I'm thinking on like my earlier CA question. I'm trying to come up with ways of having compact encodings that can result in structured and potentially complex neural network topologies (I'll have to make them resilient to small mutations too, but that can be a problem "one level up" so to speak in the genome representation of the genetic programming framework)
I know the title is a bit weird, but how many bits are needed till the huffman encoding is really saving space? E.g. if you compress a file which is pretty small or can't be compressed a lot, wouldn't saving the binary tree take more bits than the encoding is saving?
I am reading "Designing Data Intensive Applications". It is mentioned that you can extend JSON encoding with binary encoding (using libraries like MessagePack).
I am getting a little confused because everything gets encoded down to binary to get sent across the network right?
Is the only difference between regular json and json with binary encoding (e.g. message pack) that the json characters like {
and :
are encoded using a bit or something, rather than using their utf-8 values (which are potentially multiple bytes per character) ?
Hi everyone, Iβm not really sure if this is the right place for me to be (sorry if it isnβt), but Iβm having a lot of trouble trying to figure out binary encoding. I honestly think itβs probably pretty easy to figure out but Iβm really struggling to figure it out. Any help would be appreciated, I need to encode the number 13 using 4 bits, 147 using 8 bits, and 62 using 6 bits. Thanks guys, Iβm a chemistry major and this definitely is not something I understand very well. Any resources would helpful!
I missed a question about the binary encoding of a word from a tree on my DM1 PA but I can't find the lesson that covers this. Where could I find the info on this?
I've just created a crate for binary encoding/decoding of Rust types with support for directly writing values to any type implementing AsyncWrite
and reading values from any type implementing AsyncRead
. All of this with async
/await
syntax!
Can I get binary data from a websocket in JS, send it through a port, and decode it in Elm?
Hi everyone,
I know that machines only understand numerical data, so one-hot encoding method is used. This question may sound different but i wondered. Is it necessary to use the one-hot encoding method for a CSV column containing binary values? What is the benefit if we use this? nothing or?
thanks.
stringsext
is a Unicode enhancement of the GNU strings tool with additional functionalities: stringsext recognizes Cyrillic, Arabic, CJKV characters and other scripts in all supported multi-byte-encodings, while GNU strings fails in finding any of these scripts in UTF-16 and many other encodings.
Binaries for Windows, Linux, iOS (new): Releases Β· getreu/stringsext
Source: getreu/stringsext: Find multi-byte-encoded strings in binary data.
Hi guys, I just joined Kaggle and I'm working on the Titanic competition. In the dataset they provide, there is a "Sex" feature that's either "male" or "female." I'm trying to figure out how I should encode this feature.
Conventional one-hot encoding would have me set "female" to 1 and "male" to 0 or the other way around. But I was wondering if this might negatively affect how GD and SGD function later on (I haven't really chosen a model yet, would appreciate suggestions). What I mean is that if "male" is encoded to 0, then the "Sex" partial derivative for a single "male" observation would always be 0. This is probably fine for GD because it calculates the gradient with the whole dataset. But for SGD, if the algorithm comes to a "male" observation, then the weight for the "Sex" feature won't get updated because the partial derivative would be 0.
Would appreciate some insight, thanks!
V sbyybjrq gur ehyrf (please ignore this)
I've got a text from my friend with a binary code and I thought it was simple ASCII encoding but when I tried to decode it all I got was random characters.
00111100011000110101001001001010
I also received another code, same lenght: 32 bits. It is also just random characters in ASCII.
01001111010111100101101101001010
She said it is not just random numbers and it is decodable. I think it is some other type of encoding, maybe it is a legacy system. I don't really know a lot about these stuff. Also we speak hungarian, so there might be some weird letters in the text.(Γ‘,Γ©,Γ³,Ε±,Γ...)
We're using the following codec IC as part of a project at work - http://www.ti.com/lit/ds/symlink/tlv320aic3107.pdf
Simply enough, I'm trying to find out what binary encoding scheme the thing uses - i.e., signed magnitude, ones complement, twos complement, or offset-N. I can't seem to find anything indicating this in the datasheet. Its geared for audio purposes so maybe there's some convention to be assumed?
I don't have direct access to the hardware layer so unfortunately I can't just check it directly.
Any insight here would be most appreciated.
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.