[REQ] How do you force the encoding of a text file that looks like binary to most things?

Problem. I have a list of md5's that came out in a rather oddly delimited csv file. because there are null's, most tools think is binary.

#>head -n 2 md5s.txt

β–’filemd5β–’

β–’00d75d3634eaeba5a29e5362e549d645β–’

#> hexdump -c -

0000000  \0 376  \0   f  \0   i  \0   l  \0   e  \0   m  \0   d  \0   5

0000010  \0 376  \0  \n  \0 376  \0   0  \0   0  \0   d  \0   7  \0   5

0000020

I copied my terminal window for the output of the head command, but they are actually the ΓΎ character. x376 is the "Latin lower-case thorn" character or ΓΎ and it is in place of the double-quote character (x034? I think?) because the other columns in the original csv can have embedded commas, colons, semicolons, single and double quotes...and even ebmedded newlines in them as valid (which sucks).Those nulls make it look binary to most things.

Attempted solutions: cat, file, iconv, vim, awk recode... Notepad++ on Windows works, but I need to script it on the linux box (CentOS 7).

Additional detail:

The csv uses the DC4 (x024) character as the delimiter. I won't try to do that below but the layout of the data is:

col1,col2,filemd5, .... col14,col15\n

To get the list of md5's, which is >700,000 lines long, I just do

awk -F'\024' {NR!=1}|awk -F'024' '{print $3}' weirdcsv.csv

That pipes an awk command, whose only job is to skip the first line, to another awk command whose only job is to print the 3rd column. Awk is actually pretty great at leaving the original weird character encoding (which everything thinks is binary) alone, which is helpful for the program that produced the csv in the first place if it needs to import a list.

edit: Locale was POSIX for some reason, and iconv -f utf16be weirdfile.csv worked with errors, but tr -d '\000' | tr '\376' '\042' worked a treat even when locale was wrong. I can't believe I forgot about tr! Thanks to all. The steps to diagnose the locale/terminal nonsense are gold too, I have always taken that for granted that it is some kind of utf-8 these days. This is year 36 for me bashing away (not daily for many years, but often). I love that I can still learn new and useful things. Also we need a new standard that is a meta set of all past, present, and possible future characters encoded in 64bits per glyph (/s)

Now I have to go look at xxd again to remember why I stopped using it/or hate it for some re

... keep reading on reddit ➑

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/zyzzogeton
πŸ“…︎ Nov 12 2021
🚨︎ report
Encoding binary sequence into a binary necklace

Let's say i want to encode information onto a necklace with black and white beads. There's no end to a necklace, it's cyclic. However beads have irregular shape (let's say arrows), so you can tell if it's clockwise or counterclockwise.

If we know length of a binary sequence, is there a good way to make a necklace from it? Is there a way to also keep the original message inside of this necklace, so i could just cut it out?

πŸ‘︎ 15
πŸ’¬︎
πŸ‘€︎ u/Zeta0114942
πŸ“…︎ Oct 05 2021
🚨︎ report
Simple Binary Encoding (SBE) now supports Rust

The Simple Binary Encoding (SBE) project now includes support for generating Rust code. Generated code produced does not use unsafe and has no dependencies on any other crates.

SBE is an OSI layer 6 presentation for encoding and decoding binary application messages for low-latency financial applications.

πŸ‘︎ 27
πŸ’¬︎
πŸ‘€︎ u/m2ward
πŸ“…︎ Sep 22 2021
🚨︎ report
Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection arxiv.org/abs/2108.06082
πŸ‘︎ 30
πŸ’¬︎
πŸ‘€︎ u/joxeankoret
πŸ“…︎ Aug 31 2021
🚨︎ report
Invariant-preserving binary encoding of graph transformation rules

So first a bit of background to clarify what I mean (the graph transformation notation below is the same as used by Wolfram.)

Graphs can be seen as sets of relations, so {{1, 2}, {1, 3}, {3, 2}} is a graph with nodes 1, 2 and 3, and edges 1->2, 1->3 and 3->2. Note that relations can be hyperedges like {1, 2, 3}, meaning an edge 1->2->3.

Graph transformation rules can be defined so that they have a pattern of relations they match, and a pattern that they produce: with that example graph, applying the rule {{a, b}, {a, c}} -> {{b, a}, {a, d}, {c, d}} would match so that the variables would bind to a=1, b=2, c=3, and the free variable d would be a new node with the next free ID 4, and it would produce the new graph {{2, 1}, {1, 4}, {3, 4}, {3, 2}}.

Rules have an arity, and since hyperedges (edges with more than one node) are allowed this arity can be mixed: eg. {{a, b}, {c, d, a}} -> {{a, d, b}, {d, c, a}} has an arity of 1β‚‚1₃ -> 2₃

To guarantee that the produced graphs are always at least weakly connected, the following invariants must hold for all rules:

  1. each side of the rule has to describe at least a weakly connected graph, so eg. {{a, b}, {c, d}} -> … is not allowed
  2. the right hand side must use all variables present on the left hand side

Now, my question is this: how could I come up with a binary representation for rules that works with an arbitrary but predefined arity (ie. the arity depends on configuration parameters and won't change during the decoding), so that any binary string of some maximum length will always encode a correct rule that preserves those invariants? This would naturally mean that multiple strings could decode to the same rule, i.e. it'd be a surjective mapping. What I don't want is to have to reject encoded strings if they don't produce correct rules; all strings must result in a valid transformation.

Not looking to be handed an answer, any ideas or pointers are welcome.

Edit: just as a side note, this is related some neuroevolution ideas I'm thinking on like my earlier CA question. I'm trying to come up with ways of having compact encodings that can result in structured and potentially complex neural network topologies (I'll have to make them resilient to small mutations too, but that can be a problem "one level up" so to speak in the genome representation of the genetic programming framework)

πŸ‘︎ 32
πŸ’¬︎
πŸ‘€︎ u/physicomorphic
πŸ“…︎ Jul 31 2021
🚨︎ report
Encoding binary files to include in scripts davejlong.com/encoding-bi…
πŸ‘︎ 13
πŸ’¬︎
πŸ‘€︎ u/davejlong
πŸ“…︎ Jul 15 2021
🚨︎ report
Doesn't a binary tree in Huffman Encoding take more space than it saves?

I know the title is a bit weird, but how many bits are needed till the huffman encoding is really saving space? E.g. if you compress a file which is pretty small or can't be compressed a lot, wouldn't saving the binary tree take more bits than the encoding is saving?

πŸ‘︎ 14
πŸ’¬︎
πŸ‘€︎ u/Michael428
πŸ“…︎ Jan 09 2021
🚨︎ report
What is the difference between normal JSON and JSON with binary encoding?

I am reading "Designing Data Intensive Applications". It is mentioned that you can extend JSON encoding with binary encoding (using libraries like MessagePack).

I am getting a little confused because everything gets encoded down to binary to get sent across the network right?

Is the only difference between regular json and json with binary encoding (e.g. message pack) that the json characters like { and : are encoded using a bit or something, rather than using their utf-8 values (which are potentially multiple bytes per character) ?

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/Ronan998
πŸ“…︎ May 16 2021
🚨︎ report
I click on Run Python file in Terminal to get a key generated but keep getting a message of "binary or unsupported encoding". What am I missing or doing wrong?
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/fromRonnie
πŸ“…︎ Jun 07 2021
🚨︎ report
Hello everyone! Im trying to "draw a sound wave in excel" for my students. I would love to find a data set online with all the voltage values that output from a mic and that arrive to a sound card during a 44.1 khz recording before the binary encoding process. Any ideas where to look?
πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/lapsaroundthesun
πŸ“…︎ Nov 27 2020
🚨︎ report
Need help binary encoding

Hi everyone, I’m not really sure if this is the right place for me to be (sorry if it isn’t), but I’m having a lot of trouble trying to figure out binary encoding. I honestly think it’s probably pretty easy to figure out but I’m really struggling to figure it out. Any help would be appreciated, I need to encode the number 13 using 4 bits, 147 using 8 bits, and 62 using 6 bits. Thanks guys, I’m a chemistry major and this definitely is not something I understand very well. Any resources would helpful!

πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/chemrisa
πŸ“…︎ Feb 03 2021
🚨︎ report
bare-cli version 0.4.0 was just release. You can now use already defined types inside code generated types. Isn't that awesome? πŸ˜€ Doing binary encoding was never easier! gitlab.com/nilshelmig/bar…
πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/whyyoushould
πŸ“…︎ Mar 13 2021
🚨︎ report
BareNET - .NET implementation of BARE. A simple and space efficient binary encoding standard with type safety and code generation gitlab.com/nilshelmig/bar…
πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/whyyoushould
πŸ“…︎ Feb 12 2021
🚨︎ report
Binary encoding of a tree in DM1

I missed a question about the binary encoding of a word from a tree on my DM1 PA but I can't find the lesson that covers this. Where could I find the info on this?

πŸ‘︎ 3
πŸ’¬︎
πŸ“…︎ Dec 29 2020
🚨︎ report
BareNET - .NET implementation of BARE. A simple and space efficient binary encoding format. Supports netstandard 2.0 gitlab.com/nilshelmig/bar…
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/whyyoushould
πŸ“…︎ Feb 12 2021
🚨︎ report
.NET implementation of BARE. A simple, fast and space efficient binary encoding format gitlab.com/nilshelmig/bar…
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/whyyoushould
πŸ“…︎ Feb 10 2021
🚨︎ report
My design for a Minecraft Ender-porter with universal teleportation between locations via 8-bit to binary encoding youtube.com/watch?v=e_s_D…
πŸ‘︎ 77
πŸ’¬︎
πŸ‘€︎ u/AdamantlyContent
πŸ“…︎ Aug 20 2020
🚨︎ report
In this video walkthrough, we demonstrated PHP filtering bypass by using base64 encoding to view the source file and appending the required parameters. Privilege escalation was accomplished by exploiting env binary. youtube.com/watch?v=z5_gI…
πŸ‘︎ 21
πŸ’¬︎
πŸ‘€︎ u/MotasemHa
πŸ“…︎ Nov 17 2020
🚨︎ report
Cute tricks for SIMD vectorized binary encoding and decoding of nucleotides. github.com/Daniel-Liu-c0d…
πŸ‘︎ 22
πŸ’¬︎
πŸ‘€︎ u/c0deb0t
πŸ“…︎ Jul 28 2020
🚨︎ report
In this video walkthrough, we demonstrated PHP filtering bypass by using base64 encoding to view the source file and appending the required parameters. Privilege escalation was accomplished by exploiting env binary. youtube.com/watch?v=z5_gI…
πŸ‘︎ 39
πŸ’¬︎
πŸ‘€︎ u/MotasemHa
πŸ“…︎ Nov 17 2020
🚨︎ report
Nimble: Async friendly binary encoding/decoding

I've just created a crate for binary encoding/decoding of Rust types with support for directly writing values to any type implementing AsyncWrite and reading values from any type implementing AsyncRead. All of this with async/await syntax!

  • Crates.io: https://crates.io/crates/nimble
  • Documentation: https://docs.rs/nimble/0.1.0/nimble/
  • Repository: https://github.com/devashishdxt/nimble
πŸ‘︎ 39
πŸ’¬︎
πŸ‘€︎ u/devashishdxt
πŸ“…︎ Feb 15 2020
🚨︎ report
In this video walkthrough, we demonstrated PHP filtering bypass by using base64 encoding to view the source file and appending the required parameters. Privilege escalation was accomplished by exploiting env binary. youtube.com/watch?v=z5_gI…
πŸ‘︎ 9
πŸ’¬︎
πŸ‘€︎ u/MotasemHa
πŸ“…︎ Nov 17 2020
🚨︎ report
In this video walkthrough, we demonstrated PHP filtering bypass by using base64 encoding to view the source file and appending the required parameters. Privilege escalation was accomplished by exploiting env binary. youtube.com/watch?v=z5_gI…
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/MotasemHa
πŸ“…︎ Nov 17 2020
🚨︎ report
In this video walkthrough, we demonstrated PHP filtering bypass by using base64 encoding to view the source file and appending the required parameters. Privilege escalation was accomplished by exploiting env binary. youtube.com/watch?v=z5_gI…
πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/MotasemHa
πŸ“…︎ Nov 17 2020
🚨︎ report
Encoding binary in ASCII very fast lemire.me/blog/2020/05/02…
πŸ‘︎ 37
πŸ’¬︎
πŸ‘€︎ u/skeeto
πŸ“…︎ May 02 2020
🚨︎ report
Handling binary encoding over Websockets with Elm

Can I get binary data from a websocket in JS, send it through a port, and decode it in Elm?

πŸ‘︎ 9
πŸ’¬︎
πŸ‘€︎ u/Agitates
πŸ“…︎ Mar 07 2020
🚨︎ report
one-hot encoding for binary values

Hi everyone,

I know that machines only understand numerical data, so one-hot encoding method is used. This question may sound different but i wondered. Is it necessary to use the one-hot encoding method for a CSV column containing binary values? What is the benefit if we use this? nothing or?

thanks.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/sidneyy9
πŸ“…︎ Jun 17 2020
🚨︎ report
In this video walkthrough, we demonstrated PHP filtering bypass by using base64 encoding to view the source file and appending the required parameters. Privilege escalation was accomplished by exploiting env binary. youtube.com/watch?v=z5_gI…
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/MotasemHa
πŸ“…︎ Nov 17 2020
🚨︎ report
World's first binary text encoding: Francis Bacon's 5-bit "Bi-literarie Alphabet" (1624)
πŸ‘︎ 503
πŸ’¬︎
πŸ‘€︎ u/okayIfUSaySo
πŸ“…︎ Sep 04 2018
🚨︎ report
Stringsext (search for multi-byte encodings in binary data) for iOS is available

stringsext is a Unicode enhancement of the GNU strings tool with additional functionalities: stringsext recognizes Cyrillic, Arabic, CJKV characters and other scripts in all supported multi-byte-encodings, while GNU strings fails in finding any of these scripts in UTF-16 and many other encodings.

πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/getreu
πŸ“…︎ Mar 19 2020
🚨︎ report
I'm developing a Discord bot which can help with encoding/decoding of Morse, Base64, Binary and more
πŸ‘︎ 200
πŸ’¬︎
πŸ‘€︎ u/BruceCCCCCC
πŸ“…︎ Jan 08 2019
🚨︎ report
One-hot encoding a binary feature for GD/SGD

Hi guys, I just joined Kaggle and I'm working on the Titanic competition. In the dataset they provide, there is a "Sex" feature that's either "male" or "female." I'm trying to figure out how I should encode this feature.

Conventional one-hot encoding would have me set "female" to 1 and "male" to 0 or the other way around. But I was wondering if this might negatively affect how GD and SGD function later on (I haven't really chosen a model yet, would appreciate suggestions). What I mean is that if "male" is encoded to 0, then the "Sex" partial derivative for a single "male" observation would always be 0. This is probably fine for GD because it calculates the gradient with the whole dataset. But for SGD, if the algorithm comes to a "male" observation, then the weight for the "Sex" feature won't get updated because the partial derivative would be 0.

Would appreciate some insight, thanks!

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/___word___
πŸ“…︎ May 25 2020
🚨︎ report
Finite State Machine (FSM) encoding in VHDL: binary, one-hot, and others insights.sigasi.com/tech/…
πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/oelang
πŸ“…︎ Mar 09 2020
🚨︎ report
Binary encoding of variable length options with Golang. On TLV encoding with go link.medium.com/CrXa8IViv…
πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/rotemtam
πŸ“…︎ Jun 14 2019
🚨︎ report
I need help with a binary code. I don't know the encoding system that was used.

V sbyybjrq gur ehyrf (please ignore this)

I've got a text from my friend with a binary code and I thought it was simple ASCII encoding but when I tried to decode it all I got was random characters.

00111100011000110101001001001010

I also received another code, same lenght: 32 bits. It is also just random characters in ASCII.

01001111010111100101101101001010

She said it is not just random numbers and it is decodable. I think it is some other type of encoding, maybe it is a legacy system. I don't really know a lot about these stuff. Also we speak hungarian, so there might be some weird letters in the text.(Γ‘,Γ©,Γ³,Ε±,Γ­...)

πŸ‘︎ 16
πŸ’¬︎
πŸ‘€︎ u/Exclusivefrog28
πŸ“…︎ Jan 10 2019
🚨︎ report
ADC binary encoding question

We're using the following codec IC as part of a project at work - http://www.ti.com/lit/ds/symlink/tlv320aic3107.pdf

Simply enough, I'm trying to find out what binary encoding scheme the thing uses - i.e., signed magnitude, ones complement, twos complement, or offset-N. I can't seem to find anything indicating this in the datasheet. Its geared for audio purposes so maybe there's some convention to be assumed?

I don't have direct access to the hardware layer so unfortunately I can't just check it directly.

Any insight here would be most appreciated.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Theweekendstate
πŸ“…︎ May 28 2019
🚨︎ report
Other software or tools to read esri mobile map package (mmpk) zip archive .geodatabase Sqlite database and it's geometry blob. Any Sqlite tool can read the attributes just wondering if anyone figured out the encoding of the geometry would be nice if it was just WKB well known binary
πŸ‘︎ 11
πŸ’¬︎
πŸ“…︎ Jan 25 2020
🚨︎ report
A CBOR (like JSON, but binary) encoding library for Solidity github.com/smartcontractk…
πŸ‘︎ 66
πŸ’¬︎
πŸ‘€︎ u/nickjohnson
πŸ“…︎ Apr 09 2018
🚨︎ report
Gray code - AKA Reflected Binary Code is a really interesting error correction method for binary encoding. :: blog.jordansitkin.com β€” blog.jordansitkin.com/pos…
πŸ‘︎ 3
πŸ’¬︎
πŸ“…︎ Jan 15 2020
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.