Web Elements - Question 1 use only the SECOND search box

ChrisGreaves · Post by **ChrisGreaves** » 01 Nov 2021, 13:59

I am developing an application in Word2003/VBA which uses the Selenium driver for Chrome to communicate with eight or so web browsers, harvest the data (to MSWord documents) and then turn the text over to Word/VBA parsing routines.
It is not as simple as it first appeared, and I would like to pose some extremely basic questions about the technology in the hopes of receiving some extremely basic answers. I want to get the job done, so I can cut a few corners; the job is massive and will run overnight on this laptop and 24/7 on two other laptops. It is way faster than the manual copy/paste fraught with human errors method employed to date, so timing is not ultra-critical. Non-stop running is critical.

Question 1: Chrome Browser, DuckDuckGo search engine. Search element INSPECT
With a right-click I choose Inspect in the search box of the home page for DuckDuckGo and then Edit Html and see this:

Code: Select all

<input id="search_form_input_homepage" class="js-search-input search__input--adv" type="text" autocomplete="off" name="q" tabindex="1" value="" autocapitalize="off" autocorrect="off" placeholder="Search the web without being tracked">

I type some text into the search box, tap <Enter>, and r/c Inspect the search box again which gives me this:

Code: Select all

<input type="text" name="q" tabindex="1" autocomplete="off" id="search_form_input" class="search__input--adv js-search-input" value="search terms" autocapitalize="off" autocorrect="off">

Leaving aside my choice of identification (id, class, name etc) It is apparent that the two search boxes are different (although the user generally doesn’t notice this; it’s just The Search Box, right?

Rather than code two separate chunks of code – one for "search_form_input_homepage" and one for "search_form_input" – I am inclined to make a fake at the first search, and use only the second search box for my searches, so that my code is always dealing with a standard search box – the second html code shown above "search_form_input".

Are there any known downfalls in this approach?
The slight extra time to institute a dummy search is trivial to me. Being able to use a single working chunk of code to access as many search engines as I care to access, through as many browsers as I can, is important. Simplicity of code is the watchword.

Thanks
Chris

ChrisGreaves · Post by **ChrisGreaves** » 01 Nov 2021, 22:25

ChrisGreaves wrote: ↑
01 Nov 2021, 13:59
Leaving aside my choice of identification (id, class, name etc) It is apparent that the two search boxes are different (although the user generally doesn’t notice this; it’s just The Search Box, right?

I just thought to check the URL in each case.
The URL is the same "https://duckduckgo.com/" on the initial search page as it is on the second and subsequent pages, although the search box element differs, as I have stated above:-
<input id="search_form_input_homepage" class="js-search-input search__input--adv" type="text" autocomplete="off" name="q" tabindex="1" value="" autocapitalize="none" autocorrect="off">
and
<input type="text" name="q" tabindex="1" autocomplete="off" id="search_form_input" class="search__input--adv js-search-input" value="" autocapitalize="none" autocorrect="off">

Sigh!
If the different search elements arose from different URLs I could use always the second url, hence the second element, and avoid shifting between first/second elements.
If there were a way to force the web page with the second element ("search_form_input") I would be happy. The change in visual cosmetics don't concern me, because I won't be watching the screen.

[later] MonsterCrawler, a meta-search enging has two URLs: http://monstercrawler.com/ and https://search.monstercrawler.com
[later] as does Search: https://www.search.com and https://www.search.com/web
Cheers
Chris

ChrisGreaves · Post by **ChrisGreaves** » 20 Nov 2021, 07:47

ChrisGreaves wrote: ↑
01 Nov 2021, 13:59
Leaving aside my choice of identification (id, class, name etc) It is apparent that the two search boxes are different (although the user generally doesn’t notice this; it’s just The Search Box, right?

I have found two workarounds.
Since this is a programmed search, I need the URL and the web element identifiers for both pages - "first", and "suceeding" search boxes.

(1) Store both sets of data (URL, web-element descriptors) and write the program loop as follows:-

Code: Select all

Load first set of descriptors
Do
    some work in here
    Load second set of descriptors
Loop until all searches are done

(2) Since the program doesn't care about the cosmetic appearance of a web page,
Just use the second URL and ignore the first URL altogether.
That is, use "searchengine.com/web" from the get-go and ignore "searchengine.com/home"

Cheers
Chris

PJ_in_FL · Post by **PJ_in_FL** » 20 Nov 2021, 20:46

So Chris, you're a mild-mannered gardener/composter by day, crypto-spy data miner by night. Who'd a thunk!

ChrisGreaves · Post by **ChrisGreaves** » 20 Nov 2021, 21:19

PJ_in_FL wrote: ↑
20 Nov 2021, 20:46
So Chris, you're a mild-mannered gardener/composter by day, crypto-spy data miner by night. Who'd a thunk!

Indeed, yes, Who'd a thunk I could find the hours for a pastime posing as a pandemic-period part-time pumpkin pulp, peel, plus pips preserver!
Peripatetically yours, PJ,
your Pal

P.S. NOT mild-mannered.

Eileen's Lounge

Web Elements - Question 1 use only the SECOND search box

Web Elements - Question 1 use only the SECOND search box

Re: Web Elements - Question 1 use only the SECOND search box

Re: Web Elements - Question 1 use only the SECOND search box

Re: Web Elements - Question 1 use only the SECOND search box

Re: Web Elements - Question 1 use only the SECOND search box