I managed to build a web crawler…why? Because I was horny… reddit.com/r/ProgrammerHu…
πŸ‘︎ 145
πŸ’¬︎
πŸ‘€︎ u/exploooooosions
πŸ“…︎ Dec 28 2021
🚨︎ report
Does DDG have its own web crawler and search index?

Hello,

Does DDG have its own web crawler and search index? Can someone please provide an abstract overview of DDG web crawler and search index architecture?

πŸ‘︎ 38
πŸ’¬︎
πŸ‘€︎ u/git_world
πŸ“…︎ Jan 07 2022
🚨︎ report
Recently released a SSR Proxy (Server-Side Rendenring), which allows for SEO-friendly SPAs, serving pre-rendered web pages for Web Crawlers. Any feedback is more than welcome! github.com/Tpessia/ssr-pr…
πŸ‘︎ 13
πŸ’¬︎
πŸ‘€︎ u/pessiat
πŸ“…︎ Dec 06 2021
🚨︎ report
Anyone know what web crawler Muta uses for DW?

I want to browse funky stuff and find ARGs too

πŸ‘︎ 7
πŸ’¬︎
πŸ“…︎ Jan 06 2022
🚨︎ report
crawley - the unix-way web-crawler

https://github.com/s0rg/crawley

features:

  • fast html SAX-parser (powered by golang.org/x/net/html)
  • small (<1000 SLOC), idiomatic, 100% test covered codebase
  • grabs most of useful resources urls (pics, videos, audios, etc...)
  • found urls are streamed to stdout and guranteed to be unique
  • scan depth (limited by starting host and path, by default - 0) can be configured
  • can crawl robots.txt rules and sitemaps
  • brute mode - scan html comments for urls (this can lead to bogus results)
  • make use of HTTP_PROXY / HTTPS_PROXY environment values
πŸ‘︎ 35
πŸ’¬︎
πŸ“…︎ Nov 10 2021
🚨︎ report
Found this Web Crawler in my dining room
πŸ‘︎ 39
πŸ’¬︎
πŸ‘€︎ u/0spectating0
πŸ“…︎ Dec 07 2021
🚨︎ report
Should we archive deleted YouTube videos archived on wayback that have been archived with the web crawler? /r/YTdatahoarding/comment…
πŸ‘︎ 30
πŸ’¬︎
πŸ‘€︎ u/Deathguard72
πŸ“…︎ Dec 12 2021
🚨︎ report
Can we really trust the SBU web crawler if he’s attacking our beloved mascot? v.redd.it/haptnehofc481
πŸ‘︎ 45
πŸ’¬︎
πŸ‘€︎ u/Nimenog
πŸ“…︎ Dec 08 2021
🚨︎ report
Recently released a SSR Proxy (Server-Side Rendenring), which allows for SEO-friendly SPAs, serving pre-rendered web pages for Web Crawlers. Any feedback is more than welcome!

It's focused on flexibility and customization, and also works with any SPA framework, such as React.js, Vue.js and Angular, using Puppeteer to render the pages.

https://github.com/Tpessia/ssr-proxy-js

https://www.npmjs.com/package/ssr-proxy-js

For more info about SSR in general, here is a very good article about it: https://medium.com/@baphemot/whats-server-side-rendering-and-do-i-need-it-cb42dc059b38.

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/pessiat
πŸ“…︎ Dec 12 2021
🚨︎ report
Do research web crawler programs exist?

This may be a stupid question, but bare with me.

I’m trying to automate tasks at work, and much of that work is web research - sorting/reading through public documents, govt websites, etc and create memos for my boss and our clients to read over. This work is extremely tedious and much of my time is spent just searching for the information. So, I feel like there has to be a program that can help me skip the searching portion and just review the results.

My question is:

Are there consumer programs available that use a web crawler (or some equivalent mechanism) to automate an advanced keyword search, and compile the results in a Word Doc template?

If not, how many hours would it take an average developer to create such a program?

I know very little about programming, but I feel like there has to be some sort of tech out there that can scrape data for me so I don’t have to.

Work smarter, not harder. Amirite?

Thanks in advance.

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/oliver--cromwell
πŸ“…︎ Dec 16 2021
🚨︎ report
A lightweight web crawler framework for your daily needs

Hello again,

I created another framework just for you, to ease your life and help you with your daily dose of scraping the internet!

It has a nice look and feel to it, try inspecting other web scraping framework examples available for Crystal and you will see what I mean!

When you write using Anonymous you feel like you are driving around a Mercedes-Benz vehicle, when you use something else to write your scraping logic you feel like you are driving a crusty Honda Civic.

Anyways thank you for your attention, feel free to contribute!

Behold the link to the GitHub page: https://github.com/grkek/anonymous

πŸ‘︎ 14
πŸ’¬︎
πŸ“…︎ Dec 30 2021
🚨︎ report
Should we archive deleted YouTube videos archived on wayback that have been archived with the web crawler? /r/YTdatahoarding/comment…
πŸ‘︎ 22
πŸ’¬︎
πŸ‘€︎ u/Deathguard72
πŸ“…︎ Dec 12 2021
🚨︎ report
Does presearch employ its own web crawler?

Does presearch crawl sites on its own and serve results based on this data? If it uses external search engine APIs, are there plans to wean off this and employ their own crawler & algorithm to serve results?

My concern is that reliance on third party engines is prone to service denial, should presearch become a large enough competitor. Could Google et al lock out search queries originating from presearch nodes and effectively bring down the ecosystem?

πŸ‘︎ 14
πŸ’¬︎
πŸ‘€︎ u/cryptoburna
πŸ“…︎ Nov 26 2021
🚨︎ report
Can web crawlers reach pages that have no inbound links?

Can web crawlers reach pages that have no inbound links?

For example, say I create a "private" page on my webserver that isn't the index of the site, and has no links pointing to it.

Can it be reached by web crawlers somehow? Or is it un-indexable?

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/thelonious_skunk
πŸ“…︎ Dec 02 2021
🚨︎ report
How do Dark Web Crawlers/Scrapers Work?

How do companies like Recorded Future and their competitors scrape and index Dark Web data? While I understand they use NLP to processes and categorize the data, how do they get it in the first place? For example, do they use scripts that work in a similar fashion as ones that would scrape the Clear Net? Do Dark Web (Tor) hidden services employ things like the robots exclusion standard? I’m probably just over thinking this…

πŸ‘︎ 17
πŸ’¬︎
πŸ‘€︎ u/pwnanon
πŸ“…︎ Nov 10 2021
🚨︎ report
Steps to build a web crawler

I want to build a simple web crawler as a project, but I'm not sure where to start. Everything I'm trying to google wants to sell me something, have me download something, or just present the code to build it.

I want to know what a crawler does, how it processes dynamic sites, what to look for in terms of relevant links to follow, how to avoid ads. Is there a good resource for getting into web crawling that doesn't want to sell you something or do it all for you? Many thanks

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Loose-Cranberry85
πŸ“…︎ Dec 06 2021
🚨︎ report
What is the difference between a web scraper, web crawler and a bot?
πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/MagazineVivid
πŸ“…︎ Nov 24 2021
🚨︎ report
Sanity Eater, XTZ crawler among the web...
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/No_Claim_5706
πŸ“…︎ Nov 28 2021
🚨︎ report
[FOR HIRE] Web Scrapers, Crawlers, Spiders / Website automation. from 0.01 BNB

All size websites considered. Message/ chat with url and data requirements for a specific quote

0.01 BNB refers to a single-page site.
Github with plenty of scraping experience https://github.com/coderpaddy

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/coderpaddy
πŸ“…︎ Nov 21 2021
🚨︎ report
WTS> I want to do web crawler for XMR

I can help you crawling data in exchange for XMR

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/love_tinker
πŸ“…︎ Nov 24 2021
🚨︎ report
[Web][2010] Kid's educational dungeon crawler

**Platform(s):**This was a web game, which I believe was for school. It may have been its own standalone website, definitely wasn't attached to sites like Hoodamath or Coolmathgames. It was freely accessible online, because I played it at home.

Genre: It was part dungeon crawler and part-puzzle game I believe. There was definitely an educational aspect, so probably something to do with math.

Estimated year of release: I think I played it in 2010 but it could've very well been earlier.

Graphics/art style: It believe the background of the game and website was black, with the game being built off bright primary colors. The environment was in white and I believe it was top-down and 2D. Maybe tile based.

Notable gameplay mechanics: I believe you had to get a key of some sort, and there was stages/levels/floors. I believe there were enemies. I said dungeon crawler in the title but "maze" might be a more appropriate descriptor.

Other details: May have been fantasy/wizard themed. When I say this is a kid's game, I mean for very young kids. I couldn't have been beyond the 3rd grade. Looking through the subreddit, I found Fun School 6 Magic Land, which is similar but not it.

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/greyli
πŸ“…︎ Oct 24 2021
🚨︎ report
Does CSS impact SEO - is a Search Engine's web crawler CSS agnostic?

Suppose I have a standard landing page with a clear value proposition, visual, CTA, & social proof.

Now consider the following three scenarios for how I can host this landing page:

  1. without any styling (just HTML)
  2. with styling (external CSS files)
  3. with styling but UX-unfriendly. For example, elements overlap or do not fit inside the viewport.

In all these three scenarios the content and HTML are exactly the same, so would the landing page score the same points regardless of which of the scenario?

TL;DR

Putting UX aside, does "good" CSS add any value to a landing page's SEO?

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/lewz3000
πŸ“…︎ Oct 12 2021
🚨︎ report
Crawley v1.0.0: fast unix-way web crawler/spider /r/golang/comments/qh8rrl…
πŸ‘︎ 3
πŸ’¬︎
πŸ“…︎ Oct 29 2021
🚨︎ report
a portable lightweight web crawler using Powerpage.

Just code a portable lightweight web crawler using Powerpage. Powerpage Web Crawler is a portable javascript-application running with Powerpage. It is coded by vanilla javascript in about 350 lines codes, without any dependency.

Screen Preview

Powerpage Web Crawler is a portable program, just simply download and run powerpage.exe. It is a powerful and easy-to-use web-scrawler suitable for blog site crawling and offline-reading.

Just simply define below, for example

  • base-url := https://dev.to/casualwriter // the home page of favor blog site
  • index-pattern := none // RegExp of the url pattern of category page
  • page-pattern := /casualwriter/[a-z] // RegExp of the url pattern of content page
  • content-css := #main-title h1, #article-body //css selector for blog content.

Program will

  • crawl all category pages.
  • find out all url of content pages.
  • crawl content for one page, or all pages.
  • save setting and links to database (support multiple sites)
  • save content pages to local files.
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/casualwriter-hk
πŸ“…︎ Nov 09 2021
🚨︎ report
Found mushrooms growing out of the bottom of my monstera, opened her up and found this. There were little creepy crawlers crawling around in the webs and the white dots all through the soil. Stripped all the soil out and put the clean roots in fresh water. What is it? reddit.com/gallery/oy451j
πŸ‘︎ 97
πŸ’¬︎
πŸ‘€︎ u/stoenhearts
πŸ“…︎ Aug 04 2021
🚨︎ report
Infinity Search - search engine with its crawlers and web framework built with Python

https://infinitysearch.co (try it out with the username demo and password demo)

We created this search engine with the goal of transforming how people search the web into a more enjoyable, customizable, and efficient experience.

We built this service using Flask for our web framework and Python for our web crawlers, which use some popular packages like Beautiful Soup and Requests.

Several of our projects are open source and on our Gitlab.

πŸ‘︎ 32
πŸ’¬︎
πŸ‘€︎ u/InfinitySearch1
πŸ“…︎ Sep 06 2021
🚨︎ report
The unix-way web crawler github.com/s0rg/crawley
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/donutloop
πŸ“…︎ Oct 30 2021
🚨︎ report
I built a huge web crawler in python, to automate some osint stuff. Here is how I did it.

I built a web crawler in python years ago, to automate some osint stuff. Ive added to it over the years and it now crawls 600+ sites. I’ve posted this over in the osint sub but thought you guys may be interested. I’ve written an article on how you can build a similar crawler with a little bit of python and very little experience. Hope it’s useful :

Article here

πŸ‘︎ 416
πŸ’¬︎
πŸ‘€︎ u/justbrowsingtosay
πŸ“…︎ Jul 21 2021
🚨︎ report
A Unix-style personal search engine and web crawler for your digital footprint. github.com/amirgamil/apol…
πŸ‘︎ 56
πŸ’¬︎
πŸ‘€︎ u/beleeee_dat
πŸ“…︎ Jul 26 2021
🚨︎ report
[TASK] Paying $20 for a python script for creating a very basic web crawler

I need a python script which scrapes the data of the website, it is just one website so there is nothing advanced or time consuming about it. The script should be able to run from windows command line and should take the website url as the parameter

Please message if you have any more questions or are interested in making the script. I am open to offers.

Payment will be made using PayPal

Thank you.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/ByMAster2
πŸ“…︎ Aug 12 2021
🚨︎ report
Possible Web Crawler Research Program?

I realize this may not fit this sub, but I thought I’d try here anyway. This is probably a stupid question, but bare with me.

I’m trying to automate tasks at work, and much of that work is web research - sorting/reading through public documents, govt websites, etc and creating memos for my boss and our clients to read over. This work is extremely tedious and much of my time is spent just searching for the information. So, I feel like there has to be a program that can help me skip the searching portion and just review the results.

My question is:

Are there consumer programs available that use a web crawler (or some equivalent mechanism) to automate an advanced keyword search, and compile the results in a Word Doc template?

If not, how many hours would it take an average developer to create such a program?

I know very little about programming, but I feel like there has to be some sort of software out there that can scrape data for me so I don’t have to.

Work smarter, not harder. Amirite?

Thanks in advance.

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/oliver--cromwell
πŸ“…︎ Dec 16 2021
🚨︎ report
Recently released a SSR Proxy (Server-Side Rendenring), which allows for SEO-friendly SPAs, serving pre-rendered web pages for Web Crawlers. Any feedback is more than welcome!

It's focused on flexibility and customization, and also works with any SPA framework, such as React.js, Vue.js and Angular, using Puppeteer to render the pages.

https://github.com/Tpessia/ssr-proxy-js

https://www.npmjs.com/package/ssr-proxy-js

For more info about SSR in general, here is a very good article about it: https://medium.com/@baphemot/whats-server-side-rendering-and-do-i-need-it-cb42dc059b38.

πŸ‘︎ 24
πŸ’¬︎
πŸ‘€︎ u/pessiat
πŸ“…︎ Dec 06 2021
🚨︎ report
Recently released a SSR Proxy (Server-Side Rendenring), which allows for SEO-friendly SPAs, serving pre-rendered web pages for Web Crawlers. Any feedback is more than welcome!

It's focused on flexibility and customization, and also works with any SPA framework, such as React.js, Vue.js and Angular, using Puppeteer to render the pages.

https://github.com/Tpessia/ssr-proxy-js

https://www.npmjs.com/package/ssr-proxy-js

For more info about SSR in general, here is a very good article about it: https://medium.com/@baphemot/whats-server-side-rendering-and-do-i-need-it-cb42dc059b38.

πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/pessiat
πŸ“…︎ Dec 12 2021
🚨︎ report
Recently released a SSR Proxy (Server-Side Rendenring), which allows for SEO-friendly SPAs, serving pre-rendered web pages for Web Crawlers. Any feedback is more than welcome!

It's focused on flexibility and customization, and also works with any SPA framework, such as React.js, Vue.js and Angular, using Puppeteer to render the pages.

https://github.com/Tpessia/ssr-proxy-js

https://www.npmjs.com/package/ssr-proxy-js

For more info about SSR in general, here is a very good article about it: https://medium.com/@baphemot/whats-server-side-rendering-and-do-i-need-it-cb42dc059b38.

πŸ‘︎ 23
πŸ’¬︎
πŸ‘€︎ u/pessiat
πŸ“…︎ Dec 06 2021
🚨︎ report
Crawley v1.0.0: fast unix-way web crawler/spider

https://github.com/s0rg/crawley

Main features:

  • fast parser
  • small (<1000 SLOC), idiomatic, 100% test covered codebase
πŸ‘︎ 14
πŸ’¬︎
πŸ“…︎ Oct 27 2021
🚨︎ report
How to build a Web-Crawler for OSINT |

I built a huge osint web crawler in python, to automate some osint queries, which has just been added too over the years.

I’ve written an article about how to make a basic one with very little coding experience. Hope it’s useful!

Article

πŸ‘︎ 41
πŸ’¬︎
πŸ‘€︎ u/justbrowsingtosay
πŸ“…︎ Jul 21 2021
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.