36 Hilarious Web crawler Puns - Punstoppable 🛑

I managed to build a web crawler…why? Because I was horny… reddit.com/r/ProgrammerHu…

👍︎ 145

💬︎

📅︎ Dec 28 2021

Does DDG have its own web crawler and search index?

Hello,

Does DDG have its own web crawler and search index? Can someone please provide an abstract overview of DDG web crawler and search index architecture?

👍︎ 38

💬︎

👤︎ u/git_world

📅︎ Jan 07 2022

🚨︎ report

Recently released a SSR Proxy (Server-Side Rendenring), which allows for SEO-friendly SPAs, serving pre-rendered web pages for Web Crawlers. Any feedback is more than welcome! github.com/Tpessia/ssr-pr…

👍︎ 13

💬︎

👤︎ u/pessiat

📅︎ Dec 06 2021

🚨︎ report

Anyone know what web crawler Muta uses for DW?

I want to browse funky stuff and find ARGs too

👍︎ 7

💬︎

👤︎ u/KombatWithTheWombat

📅︎ Jan 06 2022

🚨︎ report

crawley - the unix-way web-crawler

https://github.com/s0rg/crawley

features:

fast html SAX-parser (powered by golang.org/x/net/html)
small (<1000 SLOC), idiomatic, 100% test covered codebase
grabs most of useful resources urls (pics, videos, audios, etc...)
found urls are streamed to stdout and guranteed to be unique
scan depth (limited by starting host and path, by default - 0) can be configured
can crawl robots.txt rules and sitemaps
brute mode - scan html comments for urls (this can lead to bogus results)
make use of HTTP_PROXY / HTTPS_PROXY environment values

👍︎ 35

💬︎

👤︎ u/Swimming-Medicine-67

📅︎ Nov 10 2021

🚨︎ report

Found this Web Crawler in my dining room

👍︎ 39

💬︎

👤︎ u/0spectating0

📅︎ Dec 07 2021

🚨︎ report

Should we archive deleted YouTube videos archived on wayback that have been archived with the web crawler? /r/YTdatahoarding/comment…

👍︎ 30

💬︎

👤︎ u/Deathguard72

📅︎ Dec 12 2021

🚨︎ report

Can we really trust the SBU web crawler if he’s attacking our beloved mascot? v.redd.it/haptnehofc481

👍︎ 45

💬︎

👤︎ u/Nimenog

📅︎ Dec 08 2021

🚨︎ report

Recently released a SSR Proxy (Server-Side Rendenring), which allows for SEO-friendly SPAs, serving pre-rendered web pages for Web Crawlers. Any feedback is more than welcome!

It's focused on flexibility and customization, and also works with any SPA framework, such as React.js, Vue.js and Angular, using Puppeteer to render the pages.

https://github.com/Tpessia/ssr-proxy-js

https://www.npmjs.com/package/ssr-proxy-js

For more info about SSR in general, here is a very good article about it: https://medium.com/@baphemot/whats-server-side-rendering-and-do-i-need-it-cb42dc059b38.

👍︎ 7

💬︎

👤︎ u/pessiat

📅︎ Dec 12 2021

🚨︎ report

Do research web crawler programs exist?

This may be a stupid question, but bare with me.

I’m trying to automate tasks at work, and much of that work is web research - sorting/reading through public documents, govt websites, etc and create memos for my boss and our clients to read over. This work is extremely tedious and much of my time is spent just searching for the information. So, I feel like there has to be a program that can help me skip the searching portion and just review the results.

My question is:

Are there consumer programs available that use a web crawler (or some equivalent mechanism) to automate an advanced keyword search, and compile the results in a Word Doc template?

If not, how many hours would it take an average developer to create such a program?

I know very little about programming, but I feel like there has to be some sort of tech out there that can scrape data for me so I don’t have to.

Work smarter, not harder. Amirite?

Thanks in advance.

👍︎ 5

💬︎

👤︎ u/oliver--cromwell

📅︎ Dec 16 2021

🚨︎ report

A lightweight web crawler framework for your daily needs

Hello again,

I created another framework just for you, to ease your life and help you with your daily dose of scraping the internet!

It has a nice look and feel to it, try inspecting other web scraping framework examples available for Crystal and you will see what I mean!

When you write using Anonymous you feel like you are driving around a Mercedes-Benz vehicle, when you use something else to write your scraping logic you feel like you are driving a crusty Honda Civic.

Anyways thank you for your attention, feel free to contribute!

Behold the link to the GitHub page: https://github.com/grkek/anonymous

👍︎ 14

💬︎

👤︎ u/Fabulous-Repair-8665

📅︎ Dec 30 2021

🚨︎ report

Should we archive deleted YouTube videos archived on wayback that have been archived with the web crawler? /r/YTdatahoarding/comment…

👍︎ 22

💬︎

👤︎ u/Deathguard72

📅︎ Dec 12 2021

🚨︎ report

Does presearch employ its own web crawler?

Does presearch crawl sites on its own and serve results based on this data? If it uses external search engine APIs, are there plans to wean off this and employ their own crawler & algorithm to serve results?

My concern is that reliance on third party engines is prone to service denial, should presearch become a large enough competitor. Could Google et al lock out search queries originating from presearch nodes and effectively bring down the ecosystem?

👍︎ 14

💬︎

👤︎ u/cryptoburna

📅︎ Nov 26 2021

🚨︎ report

Can web crawlers reach pages that have no inbound links?

For example, say I create a "private" page on my webserver that isn't the index of the site, and has no links pointing to it.

Can it be reached by web crawlers somehow? Or is it un-indexable?

👍︎ 4

💬︎

👤︎ u/thelonious_skunk

📅︎ Dec 02 2021

🚨︎ report

How do Dark Web Crawlers/Scrapers Work?

How do companies like Recorded Future and their competitors scrape and index Dark Web data? While I understand they use NLP to processes and categorize the data, how do they get it in the first place? For example, do they use scripts that work in a similar fashion as ones that would scrape the Clear Net? Do Dark Web (Tor) hidden services employ things like the robots exclusion standard? I’m probably just over thinking this…

👍︎ 17

💬︎

👤︎ u/pwnanon

📅︎ Nov 10 2021

🚨︎ report

Steps to build a web crawler

I want to build a simple web crawler as a project, but I'm not sure where to start. Everything I'm trying to google wants to sell me something, have me download something, or just present the code to build it.

I want to know what a crawler does, how it processes dynamic sites, what to look for in terms of relevant links to follow, how to avoid ads. Is there a good resource for getting into web crawling that doesn't want to sell you something or do it all for you? Many thanks

👍︎ 3

💬︎

👤︎ u/Loose-Cranberry85

📅︎ Dec 06 2021

🚨︎ report

What is the difference between a web scraper, web crawler and a bot?

👍︎ 6

💬︎

👤︎ u/MagazineVivid

📅︎ Nov 24 2021

🚨︎ report

Sanity Eater, XTZ crawler among the web...

👍︎ 3

💬︎

👤︎ u/No_Claim_5706

📅︎ Nov 28 2021

🚨︎ report

[FOR HIRE] Web Scrapers, Crawlers, Spiders / Website automation. from 0.01 BNB

All size websites considered. Message/ chat with url and data requirements for a specific quote

0.01 BNB refers to a single-page site.
Github with plenty of scraping experience https://github.com/coderpaddy

👍︎ 3

💬︎

👤︎ u/coderpaddy

📅︎ Nov 21 2021

🚨︎ report

WTS> I want to do web crawler for XMR

I can help you crawling data in exchange for XMR

👍︎ 4

💬︎

👤︎ u/love_tinker

📅︎ Nov 24 2021

🚨︎ report

[Web][2010] Kid's educational dungeon crawler

**Platform(s):**This was a web game, which I believe was for school. It may have been its own standalone website, definitely wasn't attached to sites like Hoodamath or Coolmathgames. It was freely accessible online, because I played it at home.

Genre: It was part dungeon crawler and part-puzzle game I believe. There was definitely an educational aspect, so probably something to do with math.

Estimated year of release: I think I played it in 2010 but it could've very well been earlier.

Graphics/art style: It believe the background of the game and website was black, with the game being built off bright primary colors. The environment was in white and I believe it was top-down and 2D. Maybe tile based.

Notable gameplay mechanics: I believe you had to get a key of some sort, and there was stages/levels/floors. I believe there were enemies. I said dungeon crawler in the title but "maze" might be a more appropriate descriptor.

Other details: May have been fantasy/wizard themed. When I say this is a kid's game, I mean for very young kids. I couldn't have been beyond the 3rd grade. Looking through the subreddit, I found Fun School 6 Magic Land, which is similar but not it.

👍︎ 7

💬︎

👤︎ u/greyli

📅︎ Oct 24 2021

🚨︎ report

Does CSS impact SEO - is a Search Engine's web crawler CSS agnostic?

Suppose I have a standard landing page with a clear value proposition, visual, CTA, & social proof.

Now consider the following three scenarios for how I can host this landing page:

without any styling (just HTML)
with styling (external CSS files)
with styling but UX-unfriendly. For example, elements overlap or do not fit inside the viewport.

In all these three scenarios the content and HTML are exactly the same, so would the landing page score the same points regardless of which of the scenario?

TL;DR

Putting UX aside, does "good" CSS add any value to a landing page's SEO?

👍︎ 5

💬︎

👤︎ u/lewz3000

📅︎ Oct 12 2021

🚨︎ report

Crawley v1.0.0: fast unix-way web crawler/spider /r/golang/comments/qh8rrl…

👍︎ 3

💬︎

👤︎ u/Swimming-Medicine-67

📅︎ Oct 29 2021

🚨︎ report

a portable lightweight web crawler using Powerpage.

Just code a portable lightweight web crawler using Powerpage. Powerpage Web Crawler is a portable javascript-application running with Powerpage. It is coded by vanilla javascript in about 350 lines codes, without any dependency.

Screen Preview

Powerpage Web Crawler is a portable program, just simply download and run powerpage.exe. It is a powerful and easy-to-use web-scrawler suitable for blog site crawling and offline-reading.

Just simply define below, for example

base-url := https://dev.to/casualwriter // the home page of favor blog site
index-pattern := none // RegExp of the url pattern of category page
page-pattern := /casualwriter/[a-z] // RegExp of the url pattern of content page
content-css := #main-title h1, #article-body //css selector for blog content.

Program will

crawl all category pages.
find out all url of content pages.
crawl content for one page, or all pages.
save setting and links to database (support multiple sites)
save content pages to local files.

👍︎ 3

💬︎

👤︎ u/casualwriter-hk

📅︎ Nov 09 2021

🚨︎ report

Found mushrooms growing out of the bottom of my monstera, opened her up and found this. There were little creepy crawlers crawling around in the webs and the white dots all through the soil. Stripped all the soil out and put the clean roots in fresh water. What is it? reddit.com/gallery/oy451j

👍︎ 97

💬︎

👤︎ u/stoenhearts

📅︎ Aug 04 2021

🚨︎ report

Infinity Search - search engine with its crawlers and web framework built with Python

https://infinitysearch.co (try it out with the username demo and password demo)

We created this search engine with the goal of transforming how people search the web into a more enjoyable, customizable, and efficient experience.

We built this service using Flask for our web framework and Python for our web crawlers, which use some popular packages like Beautiful Soup and Requests.

Several of our projects are open source and on our Gitlab.

👍︎ 32

💬︎

👤︎ u/InfinitySearch1

📅︎ Sep 06 2021

🚨︎ report

The unix-way web crawler github.com/s0rg/crawley

👍︎ 2

💬︎

👤︎ u/donutloop

📅︎ Oct 30 2021

🚨︎ report

I built a huge web crawler in python, to automate some osint stuff. Here is how I did it.

I built a web crawler in python years ago, to automate some osint stuff. Ive added to it over the years and it now crawls 600+ sites. I’ve posted this over in the osint sub but thought you guys may be interested. I’ve written an article on how you can build a similar crawler with a little bit of python and very little experience. Hope it’s useful :

Article here

👍︎ 416

💬︎

👤︎ u/justbrowsingtosay

📅︎ Jul 21 2021

🚨︎ report

A Unix-style personal search engine and web crawler for your digital footprint. github.com/amirgamil/apol…

👍︎ 56

💬︎

👤︎ u/beleeee_dat

📅︎ Jul 26 2021

🚨︎ report

[TASK] Paying $20 for a python script for creating a very basic web crawler

I need a python script which scrapes the data of the website, it is just one website so there is nothing advanced or time consuming about it. The script should be able to run from windows command line and should take the website url as the parameter

Please message if you have any more questions or are interested in making the script. I am open to offers.

Payment will be made using PayPal

Thank you.

👍︎ 2

💬︎

👤︎ u/ByMAster2

📅︎ Aug 12 2021

🚨︎ report

Possible Web Crawler Research Program?

I realize this may not fit this sub, but I thought I’d try here anyway. This is probably a stupid question, but bare with me.

I’m trying to automate tasks at work, and much of that work is web research - sorting/reading through public documents, govt websites, etc and creating memos for my boss and our clients to read over. This work is extremely tedious and much of my time is spent just searching for the information. So, I feel like there has to be a program that can help me skip the searching portion and just review the results.

My question is:

Are there consumer programs available that use a web crawler (or some equivalent mechanism) to automate an advanced keyword search, and compile the results in a Word Doc template?

If not, how many hours would it take an average developer to create such a program?

I know very little about programming, but I feel like there has to be some sort of software out there that can scrape data for me so I don’t have to.

Work smarter, not harder. Amirite?

Thanks in advance.

👍︎ 4

💬︎

👤︎ u/oliver--cromwell

📅︎ Dec 16 2021

🚨︎ report

Recently released a SSR Proxy (Server-Side Rendenring), which allows for SEO-friendly SPAs, serving pre-rendered web pages for Web Crawlers. Any feedback is more than welcome!

It's focused on flexibility and customization, and also works with any SPA framework, such as React.js, Vue.js and Angular, using Puppeteer to render the pages.

https://github.com/Tpessia/ssr-proxy-js

https://www.npmjs.com/package/ssr-proxy-js

For more info about SSR in general, here is a very good article about it: https://medium.com/@baphemot/whats-server-side-rendering-and-do-i-need-it-cb42dc059b38.

👍︎ 24

💬︎

👤︎ u/pessiat

📅︎ Dec 06 2021

🚨︎ report

Recently released a SSR Proxy (Server-Side Rendenring), which allows for SEO-friendly SPAs, serving pre-rendered web pages for Web Crawlers. Any feedback is more than welcome!

It's focused on flexibility and customization, and also works with any SPA framework, such as React.js, Vue.js and Angular, using Puppeteer to render the pages.

https://github.com/Tpessia/ssr-proxy-js

https://www.npmjs.com/package/ssr-proxy-js

For more info about SSR in general, here is a very good article about it: https://medium.com/@baphemot/whats-server-side-rendering-and-do-i-need-it-cb42dc059b38.

👍︎ 6

💬︎

👤︎ u/pessiat

📅︎ Dec 12 2021

🚨︎ report

Recently released a SSR Proxy (Server-Side Rendenring), which allows for SEO-friendly SPAs, serving pre-rendered web pages for Web Crawlers. Any feedback is more than welcome!

It's focused on flexibility and customization, and also works with any SPA framework, such as React.js, Vue.js and Angular, using Puppeteer to render the pages.

https://github.com/Tpessia/ssr-proxy-js

https://www.npmjs.com/package/ssr-proxy-js

For more info about SSR in general, here is a very good article about it: https://medium.com/@baphemot/whats-server-side-rendering-and-do-i-need-it-cb42dc059b38.

👍︎ 23

💬︎

👤︎ u/pessiat

📅︎ Dec 06 2021

🚨︎ report

Crawley v1.0.0: fast unix-way web crawler/spider

https://github.com/s0rg/crawley

Main features: