Print

Print


I already responded directly to Frank, but then thought of some things 
I'd like to add, and figured others might have some discussion points. 
Particularly I'm interested in thoughts about protecting emails, etc 
from crawlers. I will say that I didn't do anything to protect 
instructor emails on CE's course listings until today, primarily because 
the information is publicly available in the first place. Instructor 
emails also available in the course catalogue, and the websites of 
several colleges. However, that doesn't mean there's nothing I can do to 
prevent crawlers from grabbing contact information.

What I did was output the base64 of the mailto url instead of the mailto 
url itself into a link's href attribute. Additionally, I used CSS to 
hide email address container span. It then gets decoded and unhidden 
using javascript clientside. If you don't have javascript enabled, then 
you don't see an email address.

Another alternative would be using something like person (at) example 
(dot) com, but if I were to design a crawler with the intent of making 
it grab email addresses, I'd have already started looking for that.

What about instructors phone and fax numbers, as well as office 
addresses? These are also listed on many pages throughout different 
parts of UVM's website. What should my concerns be?

On 08/18/2011 03:09 PM, Francis Swasey wrote:
> On 8/18/11 10:42 AM, Jacob Beauregard wrote:
>> * The labels aren't always intuitive. E.g. EduCause Affiliations, 
>> what is EduCause?
> http://www.educause.edu ... "EduCause is a nonprofit association whose 
> mission is to advance
> higher education by promoting the intelligent use of information 
> technology."  As a group, they
> define a list of Affiliations with defined meanings that we use to 
> enable our Federated Logins
> to provide services to broad classes of affiliates.  If you want a 
> much finer grained list of
> UVM specific affiliations, check out the uvmEduAffiliation attribute.
This is interesting to know. I wasn't really asking what EduCause is, 
but using it as an example of something someone wouldn't be able to 
intuit. I'm not particularly well-versed in LDAP schemas, and 
particularly the process of registering them. I remember when I was 
first reading up on LDAP, I hit a wall when it came to how oids get 
assigned.
> The original (and I believe still stated) purpose of the UVM Directory 
> was to be an online
> white pages.  It has replaced the UVM paper phone book that was 
> published every year.  It also
> replaced the older CSO and ph commands (well, actually, we implemented 
> a ph command replacement
> on the zoo.uvm.edu systems that performs its searches against 
> ldap.uvm.edu).  If you don't know
> what CSO and ph are, check out http://www.faqs.org/faqs/ph-faq/ .
It serves its purpose in that sense. It would be convenient if it could 
be linked to for general contact information; more like a simple profile 
page than a phone book. It would also better centralize efforts of 
addressing security concerns like the ones you mention below.
>>  From this reasoning, I've decided I should probably query data from 
>> LDAP and present it on my
>> own. However, I'm wondering what the preferred way to go about that 
>> is. Since I can't pull
>> everyone at once, I'm pretty much limited to pulling by netid. If 
>> multiple query overhead is
>> a concern, I could probably combine multiple netids into a single query.
> Doing your own LDAP searches is probably best if you have a specific 
> format you want to display
> things in.  I do hope you are thinking about things like the security 
> necessary to not put
> people's names, addresses (physical and email), and phone numbers 
> within easy reach of spiders
> crawling the web so that they become spammer fodder.
Speaking of keeping data away from crawlers, what approach do you think 
would work best? I would think to encode the data in some way during 
output, and using javascript to decode it when the page loads.
>> Personally, I'm leaning toward option 3, but wouldn't mind 
>> input/feedback on my thought
>> process (hence my posting to the it-discuss list).
> Whether you cache locally or go against ldap.uvm.edu each time you 
> need to is going to boil
> down to a performance measurement.  Which is faster - asking 
> ldap.uvm.edu or looking in your
> cache?   How you structure the LDAP search filter (you could find an 
> attribute that is not
> indexed) will affect how fast ldap.uvm.edu will respond to your search.
Aside from performance, there's the value of immediately reflecting 
new/updated data.
> The servers that make up ldap.uvm.edu each process between 2 and 3 
> million operations per day,
> and we know they are capable of at least double that load without 
> falling over.
>
I was just thinking, hey, so it must take between 2 and 3 MIPS CPUs to 
keep it running! I also just got to imagine a sentient server rack 
knocking itself over. Thanks for the imagery! I think if I can hammer it 
with a hundred or so queries in a batch without it raising any problems, 
I might as well.