ROCKPRO64 for SCRAPING
#1
Hi all,
 
I am a new on forum, and sorry in advance of my speack english which is so bad.
 
My project is to create a small device to scrap web site like amazon, this device will be set ON 24/24h and each day it will scrap again all webpage list.
To scrap webpage, I used Python3.6 with Selenium and BeautifulSoup.
 
I have trying Rapsberry Pi3+ and an Odroid XU4 to scrap more 10000 web site.
Pi3+ : it's scraps 100 webpage/H
XU4 : 360 webpage/H
 
The goal is to get 600webpage/H   Angel
 
There are few days, I found RockPro64 with its multi quad core, and I thinks this device can be my solution to get my objective.
 
------------
My cart:
ROCKPro64 2GB/4GB
7″ LCD TOUCH SCREEN PANEL
ROCKPro64 2×2 MIMO Dual Band WIFI 802.11AC/BLUETOOTH 4.2 MODULE
ROCKPro64 PLAYBOX ENCLOSURE
ROCKPro64 12V 5A POWER SUPPLY or ROCKPro64 12V 3A POWER SUPPLY
16GB eMMC Module
------------
 
Before validate my cart, I want know how many time while take a Rockpro64 to open x webpage/H
 
Someone can help me and run this test (python) on RockPro64? And share me the elapsed time Smile
 
If it is work, I will add in my cart x10 Rockpro64  Big Grin

Thanks you very much  Heart

---------------PYTHON CODE------------
from selenium import webdriver
import time
from urllib.request import urlopen
from bs4 import BeautifulSoup
 
#---CHROME OPTION AND OPEN DRIVER---
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
#driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver',chrome_options=chrome_options)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.set_window_size(2000, 1000)
 
#---LIST of WEBPAGE---
list=['https://www.amazon.fr/dp/B07SJFD9N4','https://www.amazon.fr/dp/B07S6FB64X','https://www.amazon.fr/dp/B07Q2J5NCZ','https://www.amazon.fr/dp/B07QR9MDXS','https://www.amazon.fr/dp/B07Q2LSS1P','https://www.amazon.fr/dp/B07RBTKJ7H','https://www.amazon.fr/dp/B07PZFNG1F','https://www.amazon.fr/dp/B07HG3XG7Q','https://www.amazon.fr/dp/B07DDFC9B9','https://www.amazon.fr/dp/B07Q4216T1']
 
#---START TIME---
start = time.time()
 
#---Show WEBPAGE---
h='url'
for h in list:
          driver.get(h)
          driver.execute_script('window.scrollBy(0,10000);')
          print('-----------> '+str(h))
          time.sleep(1)
          html = driver.page_source
          soup = BeautifulSoup(html, 'html.parser')
          print('-----> BS4 OK')
         
#---END TIME---
end = time.time()
 
#---ELAPSED---
elapsed = end - start
 
print (elapsed)
#2
(07-30-2019, 06:47 AM)Ulthor_31 Wrote: I have trying Rapsberry Pi3+ and an Odroid XU4 to scrap more 10000 web site.
Pi3+ : it's scraps 100 webpage/H
XU4 : 360 webpage/H
 
The goal is to get 600webpage/H   Angel 

I ran your code on a pristine buster install of armbian and it output the following:

Code:
# python3 test.py
test.py:11: DeprecationWarning: use options instead of chrome_options
 driver = webdriver.Chrome(chrome_options=chrome_options)
-----------> https://www.amazon.fr/dp/B07SJFD9N4
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07S6FB64X
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07Q2J5NCZ
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07QR9MDXS
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07Q2LSS1P
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07RBTKJ7H
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07PZFNG1F
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07HG3XG7Q
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07DDFC9B9
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07Q4216T1
-----> BS4 OK
102.66483640670776


There is a few things to note, firstly I was ssh'd into the system, so the gui was forwarded over X. I'm in Australia so .fr lookups may faster if you are in France
#3
Thanks very much EvilBunny  Heart

Can you try again with this Python, I modify just with com.au

---------------PYTHON CODE------------
from selenium import webdriver
import time
from urllib.request import urlopen
from bs4 import BeautifulSoup
 
#---CHROME OPTION AND OPEN DRIVER---
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
#driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver',chrome_options=chrome_options)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.set_window_size(2000, 1000)
 
#---LIST of WEBPAGE---
list=['https://www.amazon.com.au/dp/B07HPKWGJH','https://www.amazon.com.au/dp/B07FTN21JL','https://www.amazon.com.au/dp/B07HPCDHQS','https://www.amazon.com.au/dp/B07FTN21JL','https://www.amazon.com.au/dp/B079DQ7JK6','https://www.amazon.com.au/dp/B07CBPS16T','https://www.amazon.com.au/dp/B07CBP38HS','https://www.amazon.com.au/dp/B073R3MJ87','https://www.amazon.com.au/dp/B01LZG4KPC','https://www.amazon.com.au/dp/B077DX1NFH']
 
#---START TIME---
start = time.time()
 
#---Show WEBPAGE---
h='url'
for h in list:
          driver.get(h)
          driver.execute_script('window.scrollBy(0,10000);')
          print('-----------> '+str(h))
          time.sleep(1)
          html = driver.page_source
          soup = BeautifulSoup(html, 'html.parser')
          print('-----> BS4 OK')
          
#---END TIME---
end = time.time()
 
#---ELAPSED---
elapsed = end - start
 
print (elapsed)


-----
Thanks you again Smile
#4
(08-01-2019, 03:51 AM)Ulthor_31 Wrote: Thanks very much EvilBunny  Heart

Can you try again with this Python, I modify just with com.au


Was slightly faster: 87.21881079673767
#5
(08-01-2019, 04:01 AM)evilbunny Wrote:
(08-01-2019, 03:51 AM)Ulthor_31 Wrote: Thanks very much EvilBunny  Heart

Can you try again with this Python, I modify just with com.au


Was slightly faster: 87.21881079673767

Thanks you very much  Rolleyes

I will go in holidays... So I will validate my cart at September  Big Grin


Possibly Related Threads…
Thread Author Replies Views Last Post
Brick Logical Extensible Gizmo Organizing RockPro64 Enclosure hoarfrosty 0 439 08-06-2023, 09:32 PM
Last Post: hoarfrosty
  RockPro64 programing GPIO by Rust yanagawa3 0 1,453 11-24-2021, 08:43 PM
Last Post: yanagawa3
Photo RockPro64 programing GPIO, I2C,UART and SPI madhuks 3 6,321 08-06-2020, 09:07 AM
Last Post: Gienek
  RockPro64 as PATA/SATA bridge? Count Omega 1 4,191 12-07-2019, 02:25 PM
Last Post: Count Omega
Question HOW TO fix Wifi on RockPro64 - Recalbox OS and upload roms on SSD External Gouki 6 8,507 08-25-2019, 05:13 AM
Last Post: Gouki
  RockPro64 as PVR/DVR bm_00 1 3,792 02-27-2019, 04:46 AM
Last Post: mabs

Forum Jump:


Users browsing this thread: 1 Guest(s)